bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly #29468

ghost · 2021-11-08T12:49:37Z

https://bugs.python.org/issue44439

No longer use len() to get the length of the input data. For some buffer protocol objects, the length obtained by using len() is wrong. Co-authored-by: Marco Ribeiro <[email protected]>

Getting `self->unconsumed_tail` before acquiring the thread lock may mix up decompress state.

eryksun · 2021-11-08T13:43:19Z

Lib/zipfile.py

+        if isinstance(data, (bytes, bytearray)):
+            nbytes = len(data)
+        else:
+            data = memoryview(data)


I think it's recommended to manually release a memoryview. Implementations other than CPython may use a garbage collector that doesn't immediately finalize unreferenced objects. Maybe it's simpler to always use a memoryview:

# Accept any data that supports the buffer protocol with memoryview(data) as data: nbytes = data.nbytes self._file_size += nbytes self._crc = crc32(data, self._crc) if self._compressor: data = self._compressor.compress(data) self._compress_size += len(data) self._fileobj.write(data) return nbytes

Good point.
There are some sites in stdlib have this problem, maybe they can be solved in another issue together.
In addition, I suspect if other Python implementations don't release the underlying buffer of memoryview, the program will run abnormally.

Let @serhiy-storchaka decide.

update: I ran a benchmark, the performances are no significant different.

@eryksun is right. But it may be simpler to merge the current solution and fix issues with releasing memoryviews in all places uniformly later.

Fix this in another issue.

github-actions · 2021-12-15T00:06:24Z

This PR is stale because it has been open for 30 days with no activity.

MaxwellDupre

Ran 258 tests in 27.542s

OK (skipped=1)

== Tests result: SUCCESS ==
Look ok to me.

serhiy-storchaka · 2022-03-07T10:36:17Z

Lib/test/test_zipfile.py

+        with zipfile.ZipFile(io.BytesIO(), 'w') as zip:
+            with zip.open('data', 'w') as data:
+                self.assertEqual(data.write(q), LENGTH)
+                self.assertEqual(data._file_size, LENGTH)


_file_size is a private attribute. It is better to use a public API in tests. For example read the content of the file.

serhiy-storchaka · 2022-03-07T10:36:21Z

Lib/test/test_zipfile.py

@@ -1718,6 +1719,14 @@ def test_non_existent_file_raises_OSError(self):
        # quickly.
        self.assertRaises(OSError, zipfile.ZipFile, TESTFN)

+    def test_issue44439(self):


It would be better to move this test to AbstractWriterTests and test with different compressions.

serhiy-storchaka · 2022-03-07T10:39:31Z

Lib/zipfile.py

+        if isinstance(data, (bytes, bytearray)):
+            nbytes = len(data)
+        else:
+            data = memoryview(data)


@eryksun is right. But it may be simpler to merge the current solution and fix issues with releasing memoryviews in all places uniformly later.

serhiy-storchaka · 2022-03-07T10:40:56Z

Misc/NEWS.d/next/Library/2021-11-08-20-27-41.bpo-44439.I_8qro.rst

@@ -0,0 +1,2 @@
+Fix in `_ZipWriteFile.write()` method, when the input data is an object that


_ZipWriteFile is a private class. It would be better to reword the NEWS entry in terms of the public API.

serhiy-storchaka

Thanks. LGTM.

ghost · 2022-03-07T12:08:13Z

Thanks for your review.

Close and reopen to trigger tests.

ghost · 2022-03-07T13:02:21Z

@serhiy-storchaka
I modified the NEWS file slightly, the tests are all green now.

miss-islington · 2022-03-08T09:34:43Z

Thanks @animalize for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖

miss-islington · 2022-03-08T09:34:43Z

Thanks @animalize for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

bedevere-bot · 2022-03-08T09:35:02Z

GH-31755 is a backport of this pull request to the 3.9 branch.

…thonGH-29468) Co-authored-by: Marco Ribeiro <[email protected]> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <[email protected]>

bedevere-bot · 2022-03-08T09:35:07Z

GH-31756 is a backport of this pull request to the 3.10 branch.

…thonGH-29468) Co-authored-by: Marco Ribeiro <[email protected]> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <[email protected]>

…-29468) Co-authored-by: Marco Ribeiro <[email protected]> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <[email protected]>

…thonGH-29468) Co-authored-by: Marco Ribeiro <[email protected]> (cherry picked from commit 36dd739) Co-authored-by: Ma Lin <[email protected]>

wjssz and others added 2 commits November 8, 2021 20:42

1. _ZipWriteFile.write() handle buffer protocol correctly

f53ae11

No longer use len() to get the length of the input data. For some buffer protocol objects, the length obtained by using len() is wrong. Co-authored-by: Marco Ribeiro <[email protected]>

2. zlib: fix thread lock may go wrong in rare cases

d103eb0

Getting `self->unconsumed_tail` before acquiring the thread lock may mix up decompress state.

the-knights-who-say-ni added the CLA signed label Nov 8, 2021

bedevere-bot added the awaiting review label Nov 8, 2021

eryksun reviewed Nov 8, 2021

View reviewed changes

Revert "2. zlib: fix thread lock may go wrong in rare cases"

d6d5548

Fix this in another issue.

github-actions bot added the stale Stale PR or inactive for long period of time. label Dec 15, 2021

MaxwellDupre approved these changes Mar 3, 2022

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels Mar 3, 2022

serhiy-storchaka reviewed Mar 7, 2022

View reviewed changes

serhiy-storchaka removed the stale Stale PR or inactive for long period of time. label Mar 7, 2022

address comments

6ba267a

serhiy-storchaka approved these changes Mar 7, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Mar 7, 2022

add a newline to NEWS

bdb8487

ghost closed this Mar 7, 2022

ghost reopened this Mar 7, 2022

fix NEWS

451e24f

serhiy-storchaka merged commit 36dd739 into python:main Mar 8, 2022

bedevere-bot removed the awaiting merge label Mar 8, 2022

serhiy-storchaka added needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes type-bug An unexpected behavior, bug, or error labels Mar 8, 2022

bedevere-bot removed the needs backport to 3.9 only security fixes label Mar 8, 2022

bedevere-bot removed the needs backport to 3.10 only security fixes label Mar 8, 2022

ghost deleted the fix_compression branch March 8, 2022 12:11

animalize mannequin mentioned this pull request Apr 12, 2022

stdlib wrongly uses len() for bytes-like object #88605

Closed

		@@ -0,0 +1,2 @@
		Fix in `_ZipWriteFile.write()` method, when the input data is an object that

Uh oh!

bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly #29468

bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly #29468

Uh oh!

Conversation

ghost commented Nov 8, 2021 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eryksun Nov 8, 2021

Choose a reason for hiding this comment

Uh oh!

ghost Nov 8, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 15, 2021

Uh oh!

MaxwellDupre left a comment

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

ghost commented Mar 7, 2022

Uh oh!

ghost commented Mar 7, 2022

Uh oh!

miss-islington commented Mar 8, 2022

Uh oh!

miss-islington commented Mar 8, 2022

Uh oh!

bedevere-bot commented Mar 8, 2022

Uh oh!

bedevere-bot commented Mar 8, 2022

Uh oh!

Uh oh!

ghost commented Nov 8, 2021 •

edited by bedevere-bot

Loading

ghost Nov 8, 2021 •

edited by ghost

Loading