-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
get_loc() returns integer or slice or KeyError nondeterministic in multiindex data frame #6501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is defined behavior and is very dependent on what the index type and what you are doing with it - for all intents this is an internal method what are you using get_loc for? |
I use But the problem here is |
ok, you have to deal with the variable output then. I am not sure what you are trying to accomplish; what are you doing after you index ? |
so that |
read this : http://pandas.pydata.org/pandas-docs/stable/indexing.html#fallback-indexing you should not use ix as you have an integer index, use loc instead it will KeyError I wrote the answer to the so question you still haven't shown what you are actually going to do trying to speed up indexing is not the right thing to do instead you should groupby or iterate depending in what you are actually trying to accomplish |
see #6507 which is the only bug here |
Presumably the reason it's 25 to 50 is probability of having multiple rows (a,b,c) with same values. examples with random numbers in make for tricky reproducing (usually best to keep the seed)! (though good for fuzztesting...) @colinfang Since you're looking deep into the codebase, I recommend going a little further and exploring/tweaking the source while you do it - (If you fix this/something else, PRs are very welcome!) :D |
See example, if
n
is big,get_loc
returnsslice
, otherwise it returns an integer. The boundary ofn
being big changes from time to time (but frequently 25 or 50).http://stackoverflow.com/questions/22067205/when-does-pandas-xs-drop-dimensions-and-how-can-i-force-it-to-not-to
The directly consequence is,
xs
would now returns a Series or a Data Frame (even if there is only 1 match) nondeterministicly (up to whether an integer or a slice is returned fromget_loc
)What more, if the key is not in the indices,
get_loc
would sometimes throwKeyError
exception, sometimes returnsSlice(0,0,None)
Try
df.index.get_loc((-2,-1,-1))
more times and you will see. I suspect it depends on whether there are duplicate values in the multiindex.The text was updated successfully, but these errors were encountered: