-
-
Notifications
You must be signed in to change notification settings - Fork 240
Option for custom compare function #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The new compare function takes two items of an iterable and should return True it matching, False if no match, and raise CannotCompare if unable to compare the two items. The default behavior is the same as before which is comparing each item in order. If the compare function raises CannotCompare then behavior reverts back to the default in order. This also introduces a new report key which is `iterable_item_moved` to track if iterable items have moved.
Hi @dtorres-sf |
tests/test_ignore_order.py
Outdated
@@ -779,3 +779,166 @@ def test_ignore_order_and_group_by(self): | |||
diff2 = DeepDiff(t1, t2, group_by='id', ignore_order=True) | |||
expected2 = {'iterable_item_added': {"root['BB']['ate'][1]": 'Brownies'}} | |||
assert expected2 == diff2 | |||
|
|||
def test_compare_func(self): | |||
t1 = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the test objects are not small, could you please move them as fixtures to the tests/fixtures
directory?
tests/test_ignore_order.py
Outdated
replay_diff = DeepDiff(recreated_t2, t2) | ||
assert replay_diff.to_dict() == {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be replaced with assert t2 == recreated_t2
tests/test_ignore_order.py
Outdated
@@ -779,3 +779,166 @@ def test_ignore_order_and_group_by(self): | |||
diff2 = DeepDiff(t1, t2, group_by='id', ignore_order=True) | |||
expected2 = {'iterable_item_added': {"root['BB']['ate'][1]": 'Brownies'}} | |||
assert expected2 == diff2 | |||
|
|||
def test_compare_func(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add smaller unit tests with much smaller test objects? It will make it easier to debug in the future. The current test will be a better fit for test_delta.py
. Perhaps we could add a new test class there something like class TestDeltaCompareFunc
and add the tests that include delta there.
This will compare in sequence order. | ||
""" | ||
|
||
return [((i, i), (x, y)) for i, (x, y) in enumerate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if(id(y) not in y_matched): | ||
matches.append(((-1, j), (ListItemRemovedOrAdded, y))) | ||
return matches | ||
except CannotCompare: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log to the user that the CannotCompare
is raised or let the user's iterable_compare_func
log if they want it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be common that CannotCompare
is raised, since the most basic case of a simple list of numbers or strings will most likely not want to use a compare function, and would revert to comparing in order. Since it will be so common that this will be raised, I would say probably let the creator of the compare function determine when logging is necessary.
deepdiff/diff.py
Outdated
for j, y in enumerate(level.t2): | ||
|
||
if(self.iterable_compare_func(x, y)): | ||
y_matched.add(id(y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the ID is cool but there could be issues when 2 things have different IDs but they are equal. Hence we usually hash the objects using DeepHash and use the hash instead. The goal is that even if the IDs are different, the hashes of 2 equal objects should be the same. I used to be concerned about the hashing being a bottleneck but so far I have not seen any bottlenecks from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated to use hashes, can you please take a look and make sure I am hashing these objects correctly?
Hi @dtorres-sf |
- Use DeepHash instead of id when tracking elements - Add and cleanup unit tests - Pass in level object to compare in order to get path if desired
@seperman I have made the changes and also added Regarding report_repetition case, I was going through that and did not figure out exactly how to translate that to iterable item moved. It looks like it recreates the old iterable in order to try and get values, is that right? Do you think it is possible to have the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@dtorres-sf |
Actually scratch that. I will leave it up to the user to raise |
@seperman Yeah, I thought about that, I think it could go either way. As the application developer I think I would want to know if my compare function crashed rather than an implicit assumption that it should revert back to a native compare function (essentially bypassing the user compare function). If a user wants to they could always wrap their compare function body in a try/except and raise |
This is a PR to address comments in #185. There are a few changes this introduces:
iterable_compare_func
that takes a function pointer to compare two items. It function takes two parameters and should returnTrue
if it is a match,False
if it is not a match or raiseCannotCompare
if it is unable to compare the two. IfCannotCompare
is raised then it will revert back to comparing in order. Ifiterable_compare_func
is not provided or set to None the behavior defaults to comparing items in order.iterable_item_moved
this will only ever be added if there is a custom compare function.Because of # 2 above there was a need to track both the
t2
andt1
path which is why some additional options are needed inmodel.py
functions.For # 2 above we need to also save the
new_value
as part of this report item. This is becausevalues_changed
is applied before items are moved in the Delta. And to move items we remove it and add thenew_value
back in the new position.