Skip to content

expose the offset of a zipfile from the start of the file as a public API #84481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
massimosala mannequin opened this issue Apr 16, 2020 · 30 comments
Closed

expose the offset of a zipfile from the start of the file as a public API #84481

massimosala mannequin opened this issue Apr 16, 2020 · 30 comments
Labels
3.14 bugs and security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@massimosala
Copy link
Mannequin

massimosala mannequin commented Apr 16, 2020

BPO 40301
Nosy @Yhg1s, @stevendaprano, @serhiy-storchaka, @danifus, @massimosala
Files
  • py27_zipfile.patch: patch for Python 2.7.17
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-04-16.13:59:27.010>
    labels = ['type-feature', 'library', '3.9']
    title = 'zipfile module: new feature (two lines of code), useful for test, security and forensics'
    updated_at = <Date 2020-04-18.14:35:44.170>
    user = 'https://github.com/massimosala'

    bugs.python.org fields:

    activity = <Date 2020-04-18.14:35:44.170>
    actor = 'massimosala'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2020-04-16.13:59:27.010>
    creator = 'massimosala'
    dependencies = []
    files = ['49067']
    hgrepos = []
    issue_num = 40301
    keywords = ['patch']
    message_count = 13.0
    messages = ['366597', '366601', '366685', '366689', '366690', '366695', '366709', '366711', '366712', '366713', '366714', '366717', '366720']
    nosy_count = 6.0
    nosy_names = ['twouters', 'alanmcintyre', 'steven.daprano', 'serhiy.storchaka', 'dhillier', 'massimosala']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue40301'
    versions = ['Python 3.9']

    Linked PRs

    @massimosala
    Copy link
    Mannequin Author

    massimosala mannequin commented Apr 16, 2020

    module zipfile

    Tag "Components": I am not sure "Library (Lib)" is the correct one. If it isn't, please fix.

    I use python to check zip files against malware.
    In these files the are binary blobs outside the ZIP archive.
    The malware payload isn't inside the ZIP file structure.
    Example: a file "openme.zip" with this content :
    [blob from offset 0 to offset 5678]
    [ZIP archive from offset 5679 to end of file]

    zipfile already handles this, finding the ZIP structure inside the file.

    My change is just to add a new public property, to expose an internal variable: the file offset of the ZIP structure.

    I know, I am after the code freeze of Python 2.7.18.
    But the change is really trivial, see the diff.
    I hope you can approve this patch for all the Python versions, also for 2.7, to have consistency. For 2.7 this is the last call.

    @massimosala massimosala mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 16, 2020
    @massimosala massimosala mannequin changed the title zipfile module: new feature (two lines of code) zipfile module: new feature (two lines of code), useful for test, security and forensics Apr 16, 2020
    @massimosala massimosala mannequin changed the title zipfile module: new feature (two lines of code) zipfile module: new feature (two lines of code), useful for test, security and forensics Apr 16, 2020
    @stevendaprano
    Copy link
    Member

    This is a new feature and cannot be added to older versions which are in feature-freeze.

    Adding the feature to (say) Python 2.7.18 would be inconsistent, because it wouldn't exist in 2.7.0 through .17. Likewise for all the other versions before 3.9.

    Personally, this sounds like a nice feature to have, and your use-case sounds convincing to me.

    @stevendaprano stevendaprano removed 3.7 (EOL) end of life 3.8 (EOL) end of life labels Apr 16, 2020
    @massimosala
    Copy link
    Mannequin Author

    massimosala mannequin commented Apr 17, 2020

    Hi Steven

    Every software "ecosystem" has its guidelines and I am a newbie about
    python development.

    Mmh I see your concerns. I agree about your deletions of all py 3 versions
    before the latest 3.9.

    About Py 2, I remark these facts:

    • there are a lot of forensics tools still written for py 2;
    • python 2.7.18 will be forever the last python 2 and I think is it fine to
      end-users to have zipfile with this feature both in py 2.7 and py 3.9;
    • in the code there isn't any new routine to test: the change is just to
      expose one internal variable.

    I agree my request is an exception but I think you have to agree this
    situation is exceptional.
    IMHO rules must exist to help us and I think this request doesn't carry any
    burden.

    I ask you please

    • to reconsider my request
    • anyway, to put me in contact with zipfile mainteners, I don't know how to
      reach them but I want to hear them about this.

    Many thanks, Massimo

    @danifus
    Copy link
    Mannequin

    danifus mannequin commented Apr 18, 2020

    Could something similar be achieved by looking for the earliest file header offset?

    def find_earliest_header_offset(zf):
        earliest_offset = None
        for zinfo in zf.infolist():
            if earliest_offset is None:
                earliest_offset = zinfo.header_offset
            else:
                earliest_offset = min(zinfo.header_offset, earliest_offset)
        return earliest_offset

    You could also adapt this using

    zinfo.compress_size + len(zinfo.FileHeader())
    

    to see if there were any sections inside the archive which were not referenced from the central directory. Not sure if zip files with arbitrary bytes inside the archive would be valid everywhere, but I think they are using zipfile.

    You can also have zipped content inside an archive which has a valid fileheader but no reference from the central directory. Those entries are discoverable by implementations which process content serially from the start of the file but not implementations which rely on the central directory.

    @stevendaprano
    Copy link
    Member

    Sorry Massimo, there are no new features being added to 2.7, not even
    critical security fixes. That's not my decision.

    https://www.python.org/doc/sunset-python-2/

    Python 2 is effectively now a dead project from the point of view of us
    here at CPython. The very final bug fix release, 2.7.18, is due out any
    time soon, but it only includes fixes up to 1st of January.

    You could try submitting your feature request to third-party bundlers of
    2.7, such as Red Hat, but I expect they will say no to new features.

    For what it is worth, I don't agree that this situation is exceptional.
    Even if 2.7 wasn't obsolete, this is still a new feature. If we made an
    exception for you, then people using Python 2.7 still couldn't use this
    feature: myzipfile.offset would fail on code using Python 2.7, 2.7.1,
    2.7.2, 2.7.3, ... 2.7.17 and only work with 2.7.18. Nobody could use it
    unless their application required 2.7.18.

    If you want this in 2.7 for your own personal use, wait for the 2.7.18
    final release, then add it into your personal copy. It is open source
    and you are completely permitted!

    @serhiy-storchaka
    Copy link
    Member

    I am not sure it would help you. There are legitimate files which contain a payload followed by the ZIP archive (self-extracting archives, programs with embedded ZIP archives). And the malware can make the offset of the ZIP archive be zero.

    If you want to check whether the file looks like an executable, analyze first few bytes of the file. All executable files should start by one of well recognized signatures, otherwise the OS would not know how to execute them and they would not be malware.

    @massimosala
    Copy link
    Mannequin Author

    massimosala mannequin commented Apr 18, 2020

    On Sat, 18 Apr 2020 at 04:37, Steven D'Aprano [email protected]
    wrote:
    If we made an exception for you, then people using Python 2.7 still
    couldn't use this feature:
    myzipfile.offset would fail on code using Python 2.7, 2.7.1, 2.7.2,
    2.7.3, ... 2.7.17 and only work with 2.7.18.
    Nobody could use it unless their application required 2.7.18.

    Yes, it seems to me obvious it will work only with Python 2.7.18, and I see
    no problem.
    If you need new features, you have always to update (to a new MINOR version
    or, like you said, MAJOR version).

    I am used to other softwares where some features are backported to older
    versions and IMHO it is very useful.
    Sometimes you just need a specific feature and it isn't possible to update
    to a MAJOR version.
    You have to consider there are many legacy softwares, also in business, and
    a version leap means a lot of work and tests.

    Speaking in general, not only python: if the maintainers backport that
    specific feature, bingo! you have only to update to the same MAJOR new
    MINOR version. And this is good for the user base, there isn't "one size
    fits all".
    I shot my bullet but I cannot change python.org way of life.

    Steven many thanks for your answers and patience to explain.
    BTW yes I will patch python 2.7 sources and compile it... also on legacy,
    intranet, centos 5 servers we cannot update :-)

    @massimosala
    Copy link
    Mannequin Author

    massimosala mannequin commented Apr 18, 2020

    Hi Serhiy

    Thanks for the suggestion but I don't need to analyse different
    self-extraction payloads (and I think it is always unreliable, there are
    too many self-extractors in the wild).

    I spend two words about my work.

    I analyze ZIP archives because they are the "incarnation" also of microsoft
    OOXML and openoffice OASIS ODF documents.

    I always find these kind of files with not zero offset aren't strictly
    compliant documents (by their respective file formats specifications).
    Sometimes there is a self-extrator, sometimes there are pieces of malware
    blobs (outside the ZIP structure or inside it, into the compressed files),
    sometimes other errors.

    For us checking the offset is very effective: we discard "bad" documents at
    maximum speed before any other checks and it is more reliable than
    antivirus (checking against specific blobs signatures, everytime changing).
    With just a single test we have a 100% go/nogo result. Every colleague
    grasp this check, there aren't hard to read and maintain routines.

    Massimo

    On Sat, 18 Apr 2020 at 09:36, Serhiy Storchaka <[email protected]>
    wrote:

    Serhiy Storchaka <[email protected]> added the comment:

    I am not sure it would help you. There are legitimate files which contain
    a payload followed by the ZIP archive (self-extracting archives, programs
    with embedded ZIP archives). And the malware can make the offset of the ZIP
    archive be zero.

    If you want to check whether the file looks like an executable, analyze
    first few bytes of the file. All executable files should start by one of
    well recognized signatures, otherwise the OS would not know how to execute
    them and they would not be malware.

    ----------


    Python tracker <[email protected]>
    <https://bugs.python.org/issue40301\>


    @massimosala
    Copy link
    Mannequin Author

    massimosala mannequin commented Apr 18, 2020

    Hi Daniel

    Could you please elaborate the advantages of your loop versus my two lines
    of code?
    I don't grasp...

    Thanks, Massimo

    On Sat, 18 Apr 2020 at 03:26, Daniel Hillier <[email protected]> wrote:

    Daniel Hillier <[email protected]> added the comment:

    Could something similar be achieved by looking for the earliest file
    header offset?

    def find_earliest_header_offset(zf):
    earliest_offset = None
    for zinfo in zf.infolist():
    if earliest_offset is None:
    earliest_offset = zinfo.header_offset
    else:
    earliest_offset = min(zinfo.header_offset, earliest_offset)
    return earliest_offset

    You could also adapt this using

    zinfo.compress_size + len(zinfo.FileHeader())
    

    to see if there were any sections inside the archive which were not
    referenced from the central directory. Not sure if zip files with arbitrary
    bytes inside the archive would be valid everywhere, but I think they are
    using zipfile.

    You can also have zipped content inside an archive which has a valid
    fileheader but no reference from the central directory. Those entries are
    discoverable by implementations which process content serially from the
    start of the file but not implementations which rely on the central
    directory.

    ----------
    nosy: +dhillier


    Python tracker <[email protected]>
    <https://bugs.python.org/issue40301\>


    @danifus
    Copy link
    Mannequin

    danifus mannequin commented Apr 18, 2020

    Hi Massimo,

    Unless I'm missing something about your requirements, the advantage is that
    it already works in python 2.7 so there is no need to patch Python. Just
    bundle the above function with your analysis tool and you're good to go.

    Cheers,
    Dan

    On Sat, Apr 18, 2020 at 11:36 PM Massimo Sala <[email protected]>
    wrote:

    Massimo Sala <[email protected]> added the comment:

    Hi Daniel

    Could you please elaborate the advantages of your loop versus my two lines
    of code?
    I don't grasp...

    Thanks, Massimo

    On Sat, 18 Apr 2020 at 03:26, Daniel Hillier <[email protected]>
    wrote:

    >
    > Daniel Hillier <[email protected]> added the comment:
    >
    > Could something similar be achieved by looking for the earliest file
    > header offset?
    >
    > def find_earliest_header_offset(zf):
    > earliest_offset = None
    > for zinfo in zf.infolist():
    > if earliest_offset is None:
    > earliest_offset = zinfo.header_offset
    > else:
    > earliest_offset = min(zinfo.header_offset, earliest_offset)
    > return earliest_offset
    >
    >
    > You could also adapt this using
    >
    > zinfo.compress_size + len(zinfo.FileHeader())
    >
    > to see if there were any sections inside the archive which were not
    > referenced from the central directory. Not sure if zip files with
    arbitrary
    > bytes inside the archive would be valid everywhere, but I think they are
    > using zipfile.
    >
    > You can also have zipped content inside an archive which has a valid
    > fileheader but no reference from the central directory. Those entries are
    > discoverable by implementations which process content serially from the
    > start of the file but not implementations which rely on the central
    > directory.
    >
    > ----------
    > nosy: +dhillier
    >
    > _______________________________________
    > Python tracker <[email protected]>
    > <https://bugs.python.org/issue40301\>
    > _______________________________________
    >

    ----------


    Python tracker <[email protected]>
    <https://bugs.python.org/issue40301\>


    @serhiy-storchaka
    Copy link
    Member

    Just check the first 4 bytes of the file. In "normal" ZIP archive they are b'PK\3\4' (or b'PK\5\6' if it is empty). It is so reliable as checking the offset, and more efficient. It is even more reliable, because a malware can have zero ZIP archive offset, but it cannot start with b'PK\3\4'.

    @massimosala
    Copy link
    Mannequin Author

    massimosala mannequin commented Apr 18, 2020

    Sorry I forgot to mention one specific case.
    We have valid archives with a starting "blob": digitally signed zip files,
    their filename extension is ".zip.p7m".

    I agree your tip can be useful to other readers.
    Best regards, Sala

    On Sat, 18 Apr 2020 at 15:45, Serhiy Storchaka <[email protected]>
    wrote:

    Serhiy Storchaka <[email protected]> added the comment:

    Just check the first 4 bytes of the file. In "normal" ZIP archive they are
    b'PK\3\4' (or b'PK\5\6' if it is empty). It is so reliable as checking the
    offset, and more efficient. It is even more reliable, because a malware can
    have zero ZIP archive offset, but it cannot start with b'PK\3\4'.

    ----------


    Python tracker <[email protected]>
    <https://bugs.python.org/issue40301\>


    @massimosala
    Copy link
    Mannequin Author

    massimosala mannequin commented Apr 18, 2020

    I choosed to use the internal variable *concat* because

    • if I recollect correctly, it is calculated before successive routines;
    • I didn't see your solution (!), there is a very nice computed variable in
      front of my eyes.

    Mmh

    1. Reliability
      Cannot be sure this always run with malformed files :
      for zinfo in zf.infolist():

    We can try / except but we loose the computation.
    If *concat* is already computed (unless completely damaged files), IMHO my
    solution is better.

    1. Performance
      What are the performance for big files?
      Are there file seeks due to traversing zf.infolist() ?

    Daniel wrote:
    the advantage is that it already works in python 2.7 so there is no need
    to patch Python

    Yes, indeed.

    If I am right about the pros of my patch, I stand for it.

    Many thanks for you attention.

    On Sat, 18 Apr 2020 at 15:45, Daniel Hillier <[email protected]> wrote:

    Daniel Hillier <[email protected]> added the comment:

    Hi Massimo,

    Unless I'm missing something about your requirements, the advantage is that
    it already works in python 2.7 so there is no need to patch Python. Just
    bundle the above function with your analysis tool and you're good to go.

    Cheers,
    Dan

    On Sat, Apr 18, 2020 at 11:36 PM Massimo Sala <[email protected]>
    wrote:

    >
    > Massimo Sala <[email protected]> added the comment:
    >
    > Hi Daniel
    >
    > Could you please elaborate the advantages of your loop versus my two
    lines
    > of code?
    > I don't grasp...
    >
    > Thanks, Massimo
    >
    > On Sat, 18 Apr 2020 at 03:26, Daniel Hillier <[email protected]>
    > wrote:
    >
    > >
    > > Daniel Hillier <[email protected]> added the comment:
    > >
    > > Could something similar be achieved by looking for the earliest file
    > > header offset?
    > >
    > > def find_earliest_header_offset(zf):
    > > earliest_offset = None
    > > for zinfo in zf.infolist():
    > > if earliest_offset is None:
    > > earliest_offset = zinfo.header_offset
    > > else:
    > > earliest_offset = min(zinfo.header_offset, earliest_offset)
    > > return earliest_offset
    > >
    > >
    > > You could also adapt this using
    > >
    > > zinfo.compress_size + len(zinfo.FileHeader())
    > >
    > > to see if there were any sections inside the archive which were not
    > > referenced from the central directory. Not sure if zip files with
    > arbitrary
    > > bytes inside the archive would be valid everywhere, but I think they
    are
    > > using zipfile.
    > >
    > > You can also have zipped content inside an archive which has a valid
    > > fileheader but no reference from the central directory. Those entries
    are
    > > discoverable by implementations which process content serially from the
    > > start of the file but not implementations which rely on the central
    > > directory.
    > >
    > > ----------
    > > nosy: +dhillier
    > >
    > > _______________________________________
    > > Python tracker <[email protected]>
    > > <https://bugs.python.org/issue40301\>
    > > _______________________________________
    > >
    >
    > ----------
    >
    > _______________________________________
    > Python tracker <[email protected]>
    > <https://bugs.python.org/issue40301\>
    > _______________________________________
    >

    ----------


    Python tracker <[email protected]>
    <https://bugs.python.org/issue40301\>


    @gpshead
    Copy link
    Member

    gpshead commented May 4, 2023

    skimming the issue I think what was being asked for here is a way to expose the offset of the zipfile from the start of the file as an documented public API? Is that accurate? Does anyone still want this feature?

    A PR against main would be useful.

    @gpshead gpshead changed the title zipfile module: new feature (two lines of code), useful for test, security and forensics expose the offset of a zipfile from the start of the file as a public API Dec 21, 2023
    @gpshead gpshead removed the 3.9 only security fixes label Dec 21, 2023
    @emmatyping
    Copy link
    Member

    Does anyone still want this feature?

    I think it is useful to be able to interact with the prefix portion of a file that has a zip file suffix.

    I've opened #132165 with a patch to fix this.

    gpshead pushed a commit that referenced this issue Apr 6, 2025
    * Add ZipFile.data_offset attribute
    
    This attribute provides the offset to zip data from the start of the file, when available.
    
    * Add blurb-it
    
    * Try fixing class ref in NEWS
    @gpshead gpshead added the 3.14 bugs and security fixes label Apr 6, 2025
    @gpshead
    Copy link
    Member

    gpshead commented Apr 6, 2025

    thanks, merged!

    @picnixz
    Copy link
    Member

    picnixz commented Apr 6, 2025

    I'm re-opening this because of that: https://github.com/python/cpython/pull/132165/files#r2030278958.

    @emmatyping Can you take care of either amending the docs or make sure that the attribute is correctly defined (whether it's None or not) independently of the opening mode? TiA

    @befeleme
    Copy link
    Contributor

    befeleme commented Apr 9, 2025

    @emmatyping, how is the offset calculated?

    I'm trying to build Python 3.14.0a7 - I run the tests twice - once, during the build, where the tests pass, and then using the installed Python. In the second case the tests testing the offset fail:

    In the CI run, this is the outcome:

    FAIL: test_data_offset_with_exe_prepended (test.test_zipfile.test_core.TestDataOffsetPrependedZip.test_data_offset_with_exe_prepended)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/usr/lib64/python3.14/test/test_zipfile/test_core.py", line 3431, in test_data_offset_with_exe_prepended
        self._test_data_offset(self.exe_zip)
        ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
      File "/usr/lib64/python3.14/test/test_zipfile/test_core.py", line 3428, in _test_data_offset
        self.assertEqual(zipfp.data_offset, 713)
        ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
    AssertionError: 717 != 713
    
    ======================================================================
    FAIL: test_data_offset_with_exe_prepended_zip64 (test.test_zipfile.test_core.TestDataOffsetPrependedZip.test_data_offset_with_exe_prepended_zip64)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/usr/lib64/python3.14/test/test_zipfile/test_core.py", line 3434, in test_data_offset_with_exe_prepended_zip64
        self._test_data_offset(self.exe_zip64)
        ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.14/test/test_zipfile/test_core.py", line 3428, in _test_data_offset
        self.assertEqual(zipfp.data_offset, 713)
        ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
    AssertionError: 717 != 713

    Do you have any pointers as how to approach this?

    @emmatyping
    Copy link
    Member

    emmatyping commented Apr 9, 2025

    @befeleme can you provide more info on your platform, and the steps you took to run the tests on the installed Python?

    When I make altinstall my Python checkout the tests still pass.

    You can also set self.debug=3 in ZipFile.__init__ and re-run the tests to get more information.

    Also, the code is here: https://github.com/python/cpython/blob/main/Lib/zipfile/__init__.py#L1486-L1494

    @befeleme
    Copy link
    Contributor

    One way to reproduce the issue is to install the built Python packages to a Fedora container.
    I grabbed all *.x86_64.rpm packages from the test build to $PWD and installed them to a fresh Fedora container.

    $ podman run --rm -ti --security-opt label=disable -v $PWD:/src -w /src fedora:rawhide /usr/bin/bash
    # dnf install -y python3.14*
    # python3.14 -m test test_zipfile -vvv
    

    This ends up with the above AssertionError.

    @befeleme
    Copy link
    Contributor

    Output with self.debug = 3 added:

    test_data_offset_with_exe_prepended (test.test_zipfile.test_core.TestDataOffsetPrependedZip.test_data_offset_with_exe_prepended) ... [b'PK\x05\x06', 0, 0, 1, 1, 99, 156, 0, b'', 972]
    given, inferred, offset 156 873 717
    (b'PK\x01\x02', 30, 3, 10, 0, 0, 0, 23573, 20266, 1396662089, 69, 69, 29, 24, 0, 0, 0, 2175008768, 0)
    total 99
    FAIL
    test_data_offset_with_exe_prepended_zip64 (test.test_zipfile.test_core.TestDataOffsetPrependedZip.test_data_offset_with_exe_prepended_zip64) ... [b'PK\x06\x06', 0, 0, 1, 1, 47, 120, 0, b'', 960]
    given, inferred, offset 120 837 717
    (b'PK\x01\x02', 30, 3, 45, 0, 0, 0, 23573, 20266, 1396662089, 69, 69, 1, 0, 0, 0, 0, 2175008768, 0)
    total 47
    FAIL
    

    @emmatyping
    Copy link
    Member

    emmatyping commented Apr 11, 2025

    One way to reproduce the issue is to install the built Python packages to a Fedora container.
    I grabbed all *.x86_64.rpm packages from the test build to $PWD and installed them to a fresh Fedora container.

    $ podman run --rm -ti --security-opt label=disable -v $PWD:/src -w /src fedora:rawhide /usr/bin/bash
    # dnf install -y python3.14*
    # python3.14 -m test test_zipfile -vvv
    

    This ends up with the above AssertionError.

    I'm afraid the tests pass for me. @befeleme What CPU is your host? I am on an x86_64 machine.

    @emmatyping
    Copy link
    Member

    I've tested this on a few more systems I have access to (all x86_64) and I cannot reproduce the failure.

    @befeleme
    Copy link
    Contributor

    I also have an x86_64 machine. I took the latest container with Fedora Rawhide today (coming from quay.io, installed Python 3.14.0a7 available in the repositories and reran the test_zipfile, with the same result of two failing tests.

    $ podman run --rm -it -p 8080:8080 fedora:rawhide                                                                                                       
    # dnf install -y python3.14 python3.14-test
    # python3.14 -m test test_zipfile -vvv
    

    I see the same failure on my machine, on Fedora CI, and running the container on Debian, x86_64 machine.

    @vstinner
    Copy link
    Member

    The problem is that the Fedora Python specfile changes the shebang of two files:

    • Lib/test/archivetestdata/exe_with_zip
    • Lib/test/archivetestdata/exe_with_z64

    It replaces #!/bin/bash with #!/usr/bin/bash.

    This issue is unrelated to Python, and specific to Fedora specfile (RPM).

    @picnixz
    Copy link
    Member

    picnixz commented Apr 14, 2025

    I'm closing this issue as completed in this case. Thanks for the help Victor!

    @picnixz picnixz closed this as completed Apr 14, 2025
    @hroncok
    Copy link
    Contributor

    hroncok commented Apr 14, 2025

    Do the test files need to be executable?

    @vstinner
    Copy link
    Member

    Python and Python test suite don't need the script to be executable, but you can run these 2 test scripts.

    Example:

    $ Lib/test/archivetestdata/exe_with_zip $PWD/python
    Opening Lib/test/archivetestdata/exe_with_zip as a zipfile.
    Favorite number in executable: 5
    

    @befeleme
    Copy link
    Contributor

    Would you accept a PR removing the executable bit? (I don't know if Fedora is the only distribution mangling the shebangs, but I imagine this can unexpectedly hit other downstream packagers).

    @hroncok
    Copy link
    Contributor

    hroncok commented Apr 14, 2025

    Python and Python test suite don't need the script to be executable...

    Turns out test_execute_zip64 and other similar tests need that.

    seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
    * Add ZipFile.data_offset attribute
    
    This attribute provides the offset to zip data from the start of the file, when available.
    
    * Add blurb-it
    
    * Try fixing class ref in NEWS
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.14 bugs and security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    Status: Done
    Development

    No branches or pull requests

    8 participants