Skip to content

Check the hash of the downloaded MuPDF tarball #3463

@apyrgio

Description

@apyrgio

Is your feature request related to a problem? Please describe.

When building PyMuPDF from source, the default behavior is to download the MuPDF source tarball from the Internet:

location = 'https://mupdf.com/downloads/archive/mupdf-1.24.2-source.tar.gz'

This tarball though is not verified against a signature, or a hash. In the event of a modified MuPDF tarball, either maliciously or unintentionally, this will lead to non-reproducible PyMuPDF builds, or downright unsafe ones.

Describe the solution you'd like

It would be a nice improvement to take advantage of the SHA-1 hashes in the MuPDF downloads page. This way, we could ensure proper reproducibility, and security against supply chain attacks.

We can further improve here by using SHA-256 hashes (since SHA-1 is considered unsafe), or using PGP signatures.

Describe alternatives you've considered

Users can:

  1. Download the MuPDF source locally.
  2. Check it against the SHA-1 hash in the website.
  3. Build the PyMuPDF source using the PYMUPDF_SETUP_MUPDF_TGZ envvar.

This approach has several drawbacks though:

  1. Environment flags defeat the purpose of reproducibility. A stale envvar means that PyMuPDF will build against an older MuPDF source, and users will most likely not notice it.
  2. Checking the SHA-1 hash from their browser before building a package is a weak defense mechanism in the case of a compromised site. If the contents of the tarball can change, so can the advertised SHA-1 in the same page.
  3. It interrupts the common poetry lock -> poetry install (or equivalent) flow that is part of modern Python development.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions