Skip to content

bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly #29468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from Mar 8, 2022
Merged

bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly #29468

merged 6 commits into from Mar 8, 2022

Conversation

ghost
Copy link

@ghost ghost commented Nov 8, 2021

wjssz and others added 2 commits November 8, 2021 20:42
No longer use len() to get the length of the input data. For some buffer protocol objects,
the length obtained by using len() is wrong.

Co-authored-by: Marco Ribeiro <[email protected]>
Getting `self->unconsumed_tail` before acquiring the thread lock may mix up decompress state.
if isinstance(data, (bytes, bytearray)):
nbytes = len(data)
else:
data = memoryview(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's recommended to manually release a memoryview. Implementations other than CPython may use a garbage collector that doesn't immediately finalize unreferenced objects. Maybe it's simpler to always use a memoryview:

        # Accept any data that supports the buffer protocol
        with memoryview(data) as data:
            nbytes = data.nbytes
            self._file_size += nbytes
            self._crc = crc32(data, self._crc)
            if self._compressor:
                data = self._compressor.compress(data)
                self._compress_size += len(data)
            self._fileobj.write(data)
            return nbytes

Copy link
Author

@ghost ghost Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
There are some sites in stdlib have this problem, maybe they can be solved in another issue together.
In addition, I suspect if other Python implementations don't release the underlying buffer of memoryview, the program will run abnormally.

Let @serhiy-storchaka decide.

update: I ran a benchmark, the performances are no significant different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eryksun is right. But it may be simpler to merge the current solution and fix issues with releasing memoryviews in all places uniformly later.

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Dec 15, 2021
Copy link
Contributor

@MaxwellDupre MaxwellDupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran 258 tests in 27.542s

OK (skipped=1)

== Tests result: SUCCESS ==
Look ok to me.

with zipfile.ZipFile(io.BytesIO(), 'w') as zip:
with zip.open('data', 'w') as data:
self.assertEqual(data.write(q), LENGTH)
self.assertEqual(data._file_size, LENGTH)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_file_size is a private attribute. It is better to use a public API in tests. For example read the content of the file.

@@ -1718,6 +1719,14 @@ def test_non_existent_file_raises_OSError(self):
# quickly.
self.assertRaises(OSError, zipfile.ZipFile, TESTFN)

def test_issue44439(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to move this test to AbstractWriterTests and test with different compressions.

if isinstance(data, (bytes, bytearray)):
nbytes = len(data)
else:
data = memoryview(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eryksun is right. But it may be simpler to merge the current solution and fix issues with releasing memoryviews in all places uniformly later.

@@ -0,0 +1,2 @@
Fix in `_ZipWriteFile.write()` method, when the input data is an object that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ZipWriteFile is a private class. It would be better to reword the NEWS entry in terms of the public API.

@serhiy-storchaka serhiy-storchaka removed the stale Stale PR or inactive for long period of time. label Mar 7, 2022
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM.

@ghost
Copy link
Author

ghost commented Mar 7, 2022

Thanks for your review.

Close and reopen to trigger tests.

@ghost ghost closed this Mar 7, 2022
@ghost ghost reopened this Mar 7, 2022
@ghost
Copy link
Author

ghost commented Mar 7, 2022

@serhiy-storchaka
I modified the NEWS file slightly, the tests are all green now.

@serhiy-storchaka serhiy-storchaka merged commit 36dd739 into python:main Mar 8, 2022
@serhiy-storchaka serhiy-storchaka added needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes type-bug An unexpected behavior, bug, or error labels Mar 8, 2022
@miss-islington
Copy link
Contributor

Thanks @animalize for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks @animalize for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-31755 is a backport of this pull request to the 3.9 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.9 only security fixes label Mar 8, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 8, 2022
…thonGH-29468)

Co-authored-by: Marco Ribeiro <[email protected]>
(cherry picked from commit 36dd739)

Co-authored-by: Ma Lin <[email protected]>
@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Mar 8, 2022
@bedevere-bot
Copy link

GH-31756 is a backport of this pull request to the 3.10 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 8, 2022
…thonGH-29468)

Co-authored-by: Marco Ribeiro <[email protected]>
(cherry picked from commit 36dd739)

Co-authored-by: Ma Lin <[email protected]>
miss-islington added a commit that referenced this pull request Mar 8, 2022
…-29468)

Co-authored-by: Marco Ribeiro <[email protected]>
(cherry picked from commit 36dd739)

Co-authored-by: Ma Lin <[email protected]>
miss-islington added a commit that referenced this pull request Mar 8, 2022
…-29468)

Co-authored-by: Marco Ribeiro <[email protected]>
(cherry picked from commit 36dd739)

Co-authored-by: Ma Lin <[email protected]>
@ghost ghost deleted the fix_compression branch March 8, 2022 12:11
hello-adam pushed a commit to hello-adam/cpython that referenced this pull request Jun 2, 2022
…thonGH-29468)

Co-authored-by: Marco Ribeiro <[email protected]>
(cherry picked from commit 36dd739)

Co-authored-by: Ma Lin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants