-
-
Notifications
You must be signed in to change notification settings - Fork 32k
bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly #29468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
No longer use len() to get the length of the input data. For some buffer protocol objects, the length obtained by using len() is wrong. Co-authored-by: Marco Ribeiro <[email protected]>
Getting `self->unconsumed_tail` before acquiring the thread lock may mix up decompress state.
if isinstance(data, (bytes, bytearray)): | ||
nbytes = len(data) | ||
else: | ||
data = memoryview(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's recommended to manually release a memoryview
. Implementations other than CPython may use a garbage collector that doesn't immediately finalize unreferenced objects. Maybe it's simpler to always use a memoryview
:
# Accept any data that supports the buffer protocol
with memoryview(data) as data:
nbytes = data.nbytes
self._file_size += nbytes
self._crc = crc32(data, self._crc)
if self._compressor:
data = self._compressor.compress(data)
self._compress_size += len(data)
self._fileobj.write(data)
return nbytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
There are some sites in stdlib have this problem, maybe they can be solved in another issue together.
In addition, I suspect if other Python implementations don't release the underlying buffer of memoryview
, the program will run abnormally.
Let @serhiy-storchaka decide.
update: I ran a benchmark, the performances are no significant different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eryksun is right. But it may be simpler to merge the current solution and fix issues with releasing memoryviews in all places uniformly later.
Fix this in another issue.
This PR is stale because it has been open for 30 days with no activity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran 258 tests in 27.542s
OK (skipped=1)
== Tests result: SUCCESS ==
Look ok to me.
Lib/test/test_zipfile.py
Outdated
with zipfile.ZipFile(io.BytesIO(), 'w') as zip: | ||
with zip.open('data', 'w') as data: | ||
self.assertEqual(data.write(q), LENGTH) | ||
self.assertEqual(data._file_size, LENGTH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_file_size
is a private attribute. It is better to use a public API in tests. For example read the content of the file.
Lib/test/test_zipfile.py
Outdated
@@ -1718,6 +1719,14 @@ def test_non_existent_file_raises_OSError(self): | |||
# quickly. | |||
self.assertRaises(OSError, zipfile.ZipFile, TESTFN) | |||
|
|||
def test_issue44439(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to move this test to AbstractWriterTests and test with different compressions.
if isinstance(data, (bytes, bytearray)): | ||
nbytes = len(data) | ||
else: | ||
data = memoryview(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eryksun is right. But it may be simpler to merge the current solution and fix issues with releasing memoryviews in all places uniformly later.
@@ -0,0 +1,2 @@ | |||
Fix in `_ZipWriteFile.write()` method, when the input data is an object that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ZipWriteFile
is a private class. It would be better to reword the NEWS entry in terms of the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM.
Thanks for your review. Close and reopen to trigger tests. |
@serhiy-storchaka |
Thanks @animalize for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9. |
Thanks @animalize for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10. |
GH-31755 is a backport of this pull request to the 3.9 branch. |
…thonGH-29468) Co-authored-by: Marco Ribeiro <[email protected]> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <[email protected]>
GH-31756 is a backport of this pull request to the 3.10 branch. |
…thonGH-29468) Co-authored-by: Marco Ribeiro <[email protected]> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <[email protected]>
…-29468) Co-authored-by: Marco Ribeiro <[email protected]> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <[email protected]>
…-29468) Co-authored-by: Marco Ribeiro <[email protected]> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <[email protected]>
…thonGH-29468) Co-authored-by: Marco Ribeiro <[email protected]> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <[email protected]>
https://bugs.python.org/issue44439