-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: fix the bad error raised by HDFStore.put() #38919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @1MLightyears! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-01-05 00:15:24 UTC |
The interesting thing is that in the failed tests (like # test append with invalid input to get good error messages
# list in column
df = tm.makeDataFrame()
df["invalid"] = [["a"]] * len(df)
assert df.dtypes["invalid"] == np.object_
msg = re.escape("object of type 'int' has no len()")
with pytest.raises(TypeError, match=msg):
store.append("df", df) line with ensure_clean_store(setup_path) as store:
# this fails because we have a date in the object block......
msg = "object of type 'int' has no len()"
with pytest.raises(TypeError, match=msg):
store.append("df_unimplemented", df) should I change them too? |
yes these were wrong in the first place, your change should fix them. pls also add a whatsnew note (1.3 bug fixes I/O) |
* Remove bad error raised * Add column label information when raising error for debugging * Update test cases and comments
Now it's using a new approach to show the error column label, meanwhile keep the interface part as-is. The test cases is also updated. And ... as a new comer, actually I don't exactly know what a whatsnew note is 😅 I copied a commit format from About the two failed test cases, looks like it's involved with HTTP server, and should not be business of the HDF5 I/O part I commit. No idea about it so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whatsnew note goes here: https://github.com/pandas-dev/pandas/blob/master/doc/source/whatsnew/v1.3.0.rst in bug fixes I/O section
pandas/io/pytables.py
Outdated
nan_rep, | ||
encoding, | ||
errors, | ||
block_columns: List[str] = [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the default arg, the annotation is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default []
is used to keep the interface consistency.
but since the function is only called once I think it's okay to remove, maybe
will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and never should you use mutable arguments
pandas/io/pytables.py
Outdated
): | ||
# block_columns(list[str]): the label of columns for debug info use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string template e.g. Parameter / Returns (just list the arg names, and document the new one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
but giving a private function-which is only internally called once-a docstring looks a bit deluxe...
will add docstring.
pandas/io/pytables.py
Outdated
): | ||
""" | ||
Ensure all elements in the given block are "string". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this is more confusing, let's just remove this doc-string entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry english is not my first language and I don't totally understand why non-strings aren't okay to be written in HDF5.
it's just literally do this thing, Ensure all elements in the given block are "string".
It is now removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the elements are not actual strings but python objects we cann't serialize to this column type.
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -271,6 +271,7 @@ I/O | |||
- Allow custom error values for parse_dates argument of :func:`read_sql`, :func:`read_sql_query` and :func:`read_sql_table` (:issue:`35185`) | |||
- Bug in :func:`to_hdf` raising ``KeyError`` when trying to apply | |||
for subclasses of ``DataFrame`` or ``Series`` (:issue:`33748`). | |||
- Bug in :func:`put` raising a wrong ``TypeError`` when saving a DataFrame with non-string dtype (:issue:`34274`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be :meth:`~HDFStore.put`
say this is an improved error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thank you for that
It's morning in my timezone here and I just wake up 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heheh, no prob this PR looks good. just a small change.
thanks @1MLightyears |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
P.S. something was wrong with
git diff upstream/master
so I directly ranflake8 ./pandas/io/pytables.py
as it's the only file changedI was running into the same problem as #34274 and found where the error is. But (as I've just started using pandas for several days:sweet_smile:) I have few knowledge about pandas, so maybe I didn't make the full use of pandas' components.
As seen in #34274, if a
DataFrame
contains non-string elements and is about to be written into an HDF5 file byHDFStore.put()
, aTypeError: object of type 'int' has no len()
error is raised.But it's not the right "error" expected.
HDFStore.put()
can't serialize some types of element,so report an exception on a column by column basis
is actually needed.This commit fixes this, now it raises
TypeError: Cannot serialize the column [{column_No}] because its data contents are not string but [{non_string_type}] object dtype
as expected.