-
Notifications
You must be signed in to change notification settings - Fork 648
Description
Is your feature request related to a problem? Please describe.
When building PyMuPDF from source, the default behavior is to download the MuPDF source tarball from the Internet:
Line 389 in e6e1daa
location = 'https://mupdf.com/downloads/archive/mupdf-1.24.2-source.tar.gz' |
This tarball though is not verified against a signature, or a hash. In the event of a modified MuPDF tarball, either maliciously or unintentionally, this will lead to non-reproducible PyMuPDF builds, or downright unsafe ones.
Describe the solution you'd like
It would be a nice improvement to take advantage of the SHA-1 hashes in the MuPDF downloads page. This way, we could ensure proper reproducibility, and security against supply chain attacks.
We can further improve here by using SHA-256 hashes (since SHA-1 is considered unsafe), or using PGP signatures.
Describe alternatives you've considered
Users can:
- Download the MuPDF source locally.
- Check it against the SHA-1 hash in the website.
- Build the PyMuPDF source using the
PYMUPDF_SETUP_MUPDF_TGZ
envvar.
This approach has several drawbacks though:
- Environment flags defeat the purpose of reproducibility. A stale envvar means that PyMuPDF will build against an older MuPDF source, and users will most likely not notice it.
- Checking the SHA-1 hash from their browser before building a package is a weak defense mechanism in the case of a compromised site. If the contents of the tarball can change, so can the advertised SHA-1 in the same page.
- It interrupts the common
poetry lock
->poetry install
(or equivalent) flow that is part of modern Python development.