-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Incorrect sequence handling with MultiIndex.from_tuples #14794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
is what we do with a list-of-tuples (though), first making a list of this before hand doesn't change the result. I am not sure of what zip does with unbalanced tuples. So in this case we should be raising ( So if you can figure out a nice performant way to zip these (w/o losing things), all ears. |
Zip terminates at the shortest iterable. As I mentioned in my first post, I see that list(zip(*tuples)) seems to do a transpose.
I'd propose the following the if statement. elif isinstance(tuples, Sequence): # Sequence is an abstract class from collections.abc
arrays = list(lib.to_object_array_tuples(list(tuples)).T)
else:
arrays = list(zip_longest(*tuples)) But, if the elif is checking that tuples is a Sequence, then I think the else clause is largely unnecessary, because if tuples is a sequence, then lib.to_object_array_tuples should be able to handle it. In fact, the call to Maybe the else clause should raise an error if tuples isn't a sequence. This avoids the need to zip anything, unless I'm misunderstanding how lib.to_object_array_tuples can fail. elif isinstance(tuples, Sequence):
arrays = list(zip_longest(*tuples))
else:
raise ValueError("tuples must be a sequence")
return MultiIndex.from_arrays(arrays, ...) Upon further investigation, to_object_array_tuples appears to mostly reimplement zip_longest. It seems for small iterables, zip_longest is can be much faster. Though for longer iterables, to_object_array_tuples starts to get an edge (timings done with python 3.5). Should we optimize index creation between large and small indexes?
If you prefer to discuss code details in a PR, I can open one tonight. |
not sure of the original rationale for having all of the clauses in the if so feel free to see what happens if you add your examples as a test (and make changes as u suggest) |
Problem description
The docstring leaves some ambiguity as to whether a tuple or list can be passed in. It does say a list of tuples at the top, but in the argument specification, it hints that tuples can be a list/sequence of tuple-likes. A tuple of tuples is a sequence of tuple-likes. I find it strange that my tuple of tuples was truncated to a single level MultiIndex, while my list of tuples was not.
Changing https://github.com/groutr/pandas/blob/master/pandas/indexes/multi.py#L983
to call
itertools.zip_longest
would fix the truncation issue.I also don't understand why sometimes the arrays are transposed and every other case we don't transpose.
https://github.com/groutr/pandas/blob/master/pandas/indexes/multi.py#L980-L983
Expected Output
I would expect lists and tuples and other sequences of tuple-likes (sets, for example) to be treated the same. Do lists have a special meaning here?
This is in the latest master (588e29d) and pandas 0.19.1.
I'd be happy to submit a PR if this is unintended behavior
The text was updated successfully, but these errors were encountered: