Skip to content

Fixing == __eq__ operator for MultiIndex ... closes #9785 #9872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

pykler
Copy link
Contributor

@pykler pykler commented Apr 13, 2015

closes #9785

@cpcloud cpcloud added this to the 0.16.1 milestone Apr 13, 2015
@cpcloud cpcloud added the Bug label Apr 13, 2015
@jreback jreback added Error Reporting Incorrect or improved errors from pandas Compat pandas objects compatability with Numpy or Python functions Bug and removed Bug labels Apr 13, 2015
@pykler pykler changed the title Fixing == __eq__ operator for MultiIndex #9875 Fixing == __eq__ operator for MultiIndex ... closes #9785 Apr 13, 2015
@@ -2439,6 +2439,8 @@ def _get_dtype_type(arr_or_dtype):
return np.dtype(arr_or_dtype).type
elif isinstance(arr_or_dtype, CategoricalDtype):
return CategoricalDtypeType
elif arr_or_dtype is NotImplemented:
return type(NotImplemented)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strange to me. Why would we ever want to use NotImplementedType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function just returns the type of an object ... NotImplemented is an object. So not sure what the question is?

if the question is why is this feature important? Pandas has code that checks the type of return values and acts accordingly. Occasionally Numpy will return this type and this code breaks with an attribute error because NotImplementedType has no attribute dtype.

PS. Without this the tests below fail. This may not be clear from the small diff but the bug in this case is visible in _indexOp pandas/core/index.py:58

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for clarifying. My sense is that this method probably should raise if it gets NotImplemented, because this is not really a valid numpy dtype.

Instead, I would make the wrapper functions in pandas.core.index handle NotImplemented explicitly, e.g., add something like

if result is NotImplemented:
    return result

Operations that wrap arithmetic should know how to handle NotImplemented -- that's a saner place to put it than here (where it could eat some strange errors)..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you are saying about numpy types, but it shouldn't eat up any errors, since this function just says that the type of the input is so and so ... right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based purely on the name, function expects a numpy/pandas array or dtype object, and NotImplemented is certainly not one of those :). Returning arbitrary output instead of raising an error feels like an anti-pattern to me.

In contrast, for something like is_bool_dtype it does make sense to return False when given NotImplemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll make this change in is_bool_dtype instead ... ccing @jreback for reference

@@ -2510,6 +2510,8 @@ def is_floating_dtype(arr_or_dtype):


def is_bool_dtype(arr_or_dtype):
if arr_or_dtype is NotImplemented:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K, this seems more reasonable, though from my perspective I would probably still be happier explicitly handling NotImplemented inside the wrapper functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So conceptually though, the code in the wrapper got a result, its trying to check if this result is a bool type ... the check is crashing on this NULL type of exception. So this is where its gets tricky conceptually, is NULL part of this function's domain or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed... might be better to use try/except logic, e.g.,

try:
    tipo = _get_dtype_type(arr_or_dtype)
except Exception:
    return False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, one last thing. I tested the following and it also crapped out.

_get_dtype_type(False)

so is this also supposed to crap out? or is it supposed to return False?

is_bool_dtype(False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is an internal function, so it's not extensively tested (and not worth worrying about testing too much). It expects the argument to already be a numpy dtype or numpy array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I caught ValueError and made sure that's raised ... does this look better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes :)

@jreback
Copy link
Contributor

jreback commented Apr 14, 2015

merged via 07257a0

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when comparing equality of indexes
4 participants