-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG/ENH: consistent gzip compression arguments #35645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
might we need to update the docstring or do you think it's good as is?
updating the doc string is a good idea, will do that! I assume that this will affect multiple |
You could maybe add the more explicit explanation to |
The PR adding arguments for bz2/gzip #33398 mentioned that it affects I could make sure that all three |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. sligthly OT, we want to add typing for the compression arg (I think we have an issue for this), similar to StorageOptions
whereby we define it in pandas._typing.py
cc @gfyoung @WillAyd @TomAugspurger if comments. |
I will look into that, I assume it is going to be: class CompressionArgs(TypedDict, total=False):
method: str
compresslevel: Optional[int]
mtime:Optional[int]
compression:int
allowZip64:bool
strict_timestamps:bool technically, there are a few more but users should not pass them (filename, fileobj, buffer (deprecated since python 3.0), mode).
Do you have opinions about that? compression does not only affect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - nice PR
Hello @twoertwein! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-13 02:56:33 UTC |
oh, I didn't know that |
@@ -816,6 +827,8 @@ def close(self): | |||
self.open_stream.close() | |||
except (IOError, AttributeError): | |||
pass | |||
for file_handle in self.file_handles: | |||
file_handle.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably unrelated to the recent CI issues, but we should definitely close those handles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, is there a ResoucceWarning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen any when reading/writing json files
thanks @twoertwein very nice! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
to_csv
let's the user set all keyword arguments for gzip. Depending on whether the user provides a filename or a file object different keyword arguments can be set (gzip.open
vsgzip.GzipFile
).This PR always uses
gzip.GzipFile
. The additional keyword arguments valid forgzip.open
but not valid forgzip.GzipFile
(encoding
,errors
, and) are still accessible:newline
pandas/pandas/io/common.py
Line 512 in aefae55
Using
gzip.GzipFile
, also allows us to setmtime
to create reproducible gzip archives.