-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Unable to read MultiIndex columns from CSV if empty levels #13054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I suppose this could be fixed, very odd case though. Using empty levels is really confusing / prob should be banned. |
From a discussion with @mralgos this morning, there seems to be a specific check in the implementation about this: Lines 1124 to 1132 in 9d10b76
So it seems this is done on purpose. The question is also what would be the expected output of this should not error.
While @jluttine expects a second index level with NaNs? |
Yep, I would have expected a second index level with NaNs. |
cc @gfyoung @chris-b1 Any opinion on whether we should raise an error, fill empty level with 'Unnamed ...' (see #13054 (comment)), or fill empty level with NaNs? So case is:
If we don't raise an error, filling with 'Unnamed ..' seems more in line with other cases where a column label is missing. |
I would agree that |
I don't like this "Unnamed..." thing in the multilevel columns, when reading from CSV. Some levels of the column are filled only partially in my case. The empty levels of some columns should probably be imported as NaNs. The reason is that it looks less ugly and can be easily replaced with empty string or whatever the user wishes. |
@denfromufa : I agree that it is not aesthetically as pleasing, but IMO we should also keep in mind that clarity is of great importance. So while I do agree about the aesthetics, I would not be in favor of using Perhaps it is best that we first settle (or regather consensus on) the initial matter: is the example in the issue allowed, or should we ban it? If we decide to ban it, your point (and all other points about the naming) are somewhat moot. However, we largely agree that the example should be allowed, then we can perhaps open the discussion up about the naming scheme. Personally, I am okay with allowing this example through. It's unusual, but I see no reason to ban it. Thoughts, anyone? |
I ended up not defining the header for columns while reading CSV, and later
combining these rows representing the levels into one string or tuple with
"" or None used for NaNs respectively. This is not general-purpose solution.
…On Mon, Jan 16, 2017, 12:44 AM gfyoung ***@***.***> wrote:
@denfromufa <https://github.com/denfromufa> : I agree that it is not
aesthetically as pleasing, but IMO we should also keep in mind that clarity
is of great importance. Unnamed: ... sticks out very clearly in the
result to indicate that something was missing in the header provided. In
addition, the naming system protects against duplicate levels, which can
make indexing more difficult.
So while I do agree about the aesthetics, I would not be in favor of using
NaN as placeholders. While I am not against banning the example provided
above, I am inclined to stick the Unnamed paradigm (which we use in
single-level columns) for the time being.
Perhaps it is best that we first settle (or regather consensus on) the
initial matter: is the example in the issue allowed, or should we ban it?
If we decide to ban it, your point (and all other points about the naming)
are somewhat moot. However, we largely agree that the example should be
allowed, then we can perhaps open the discussion up about the naming scheme.
Personally, I am okay with allowing this example through. It's unusual,
but I see no reason to ban it. Thoughts, anyone?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13054 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHgZ5fmJA4dY7kAzrryzO-ExMTzH3Wdjks5rSxG5gaJpZM4IVJZg>
.
|
@denfromufa : I hate to break it to you, but there is no such thing as a general-purpose solution. 😄 That's why I'm asking for consensus before we do anything. |
I'm new at contributing to pandas, and I would like to know if this issue is still relevant to solve. Also, I agree with @gfyoung that |
Is it possible for me to pick this ticket up? I haven't really worked with Pandas before and would like to do it to get familar |
Has this been implemented yet? If so, in which pandas version? @fpunny |
I'll just throw an extra opinion into here, where I'm very much in favour of the "Unnamed" approach here rather than error'ing out. My use case is that we have a standard format for some Excel files we want to let staff import, which includes three rows of headers (label, symbol, and units). However, there are cases where, for example, none of the columns should have a unit attached to them. We've told our staff that they should just always put a random space in the heading rows of their spreadsheet if they get a weird error message. While functional, adding a random space is a really silly solution. |
If I use MultiIndex columns and if a level happens to have empty values for all columns, the saved CSV file cannot be read. I expected to recover the dataframe from the saved CSV perfectly.
I believe #6618 might be related, because this is somehow related to how Pandas uses an empty data row to separate column names and actual data when using MultiIndex columns.
Code Sample, a copy-pastable example if possible
This works as expected:
However, if a level is empty (i.e., all columns are
''
on that level), it doesn't work:Expected Output
Expected that the empty columns are read correctly because I had explicitly specified the rows to use as column index:
output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: