Skip to content

Feature request: "view source" tool for inspecting package contents #5118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
simonw opened this issue Nov 27, 2018 · 11 comments
Open

Feature request: "view source" tool for inspecting package contents #5118

simonw opened this issue Nov 27, 2018 · 11 comments

Comments

@simonw
Copy link

simonw commented Nov 27, 2018

The recent event-stream problem
on npm highlighted an issue that is also relevant to PyPI: even if a package links to a GitHub repository there is no guarantee that the code in the uploaded package matches the code in the repo.

One way this could be helped is for PyPI to provide a "view package contents" link next to each downloadable archive that opens a web interface for browsing the files in that package.

This could make it easier to spot deliberate exploits, but would also be a useful general feature for people who want to quickly understand more about the details of a package before they install it.

@simonw
Copy link
Author

simonw commented Nov 27, 2018

There are quite a few challenges in building such a feature.

For smaller packages, writing code which pulls a .tar.gz and turns it into a file listing / visible source code in response to an incoming HTTP request would be feasible (and highly catchable via varnish), but this probably won't work for larger files - pulling a 100MB .tar.gz and decompressing it on demand may not be feasible. Can we get numbers on the average size of packages and we how many outliers there are?

For those larger packages, maybe this will require extra processing on upload. This could be expensive in terms of both CPU and storage, and could open up zip bomb exploits if not implemented carefully.

@simonw
Copy link
Author

simonw commented Nov 27, 2018

One problem that is more specific to PyPI is that some packages can be uploaded in multiple formats - different wheels for example. Malicious code could potentially be hidden in just one of the wheel variants.

A really great implementation of this feature would also highlight differences between the contents of those different packages. This becomes not just a more complex implementation challenge but a UI design challenge as well.

@simonw
Copy link
Author

simonw commented Nov 27, 2018

... and while I'm throwing around crazy ideas: a really neat implementation of this would include a way to render diffsbetween different versions.

Now we are re-implementing a non-trivial portion of GitHub!

@simonw
Copy link
Author

simonw commented Nov 27, 2018

npm COO Laurie Voss says about this suggestion:

One issue you don't mention is that it creates a simply enormous vector for spam and a distribution mechanism for illegal content (such as various illegal forms of pornography). All the solutions I'm aware of for this problem involve expensive teams doing unpleasant jobs.

@di
Copy link
Member

di commented May 19, 2020

Merging duplicate issue #7877 originally posted by @uranusjr

@uranusjr wrote:

What's the problem this feature will solve?
@pfmoore, @pradyunsg and I were talking about dependency conflict debugging, and it came up as a topic that a significant number of users opt to “read setup.py” when they are looking for dependency information.

This is, however, currently quite awkward to do. The user either needs to find the project’s repository (e.g. GitHub) and hunt for the correct tag/commit, or download the distribution file and extract it manually. It is also difficult for pip to implement a feature to help with the process.

Describe the solution you'd like
A view that allows the user to view the contents of a given distribution (wheel or sdist), and read the content of a given file in the archive. The viewer can be a bare minimal text/plain page, but some basic features like line numbers would be very nice to have.

Each list view and file view should have a unique URL, e.g. I can paste a URL to the browser and read setup.py in Django-3.0.tar.gz, or the METADATA file of a wheel directly. This would be valuable for sharing package information and help with user support.

Additional context
N/A

@di wrote:

Is the goal here to inspect the metadata for a release, or to view arbitrary files in a distribution?

If it's the former, this would be relatively simple to implement. We already have a similar view in the Admin UI:

Screen Shot 2020-05-01 at 1 07 28 PM

If it's the latter, it's going to be quite a bit more challenging, as PyPI does not actually extract any files from the distribution archives or do any introspection of them.

@uranusjr wrote:

Metadata extraction would be very useful for wheels, but for source inspection is the only way to inspect an sdist. I think both are valuable features, but this issue is more about the latter.

@di
Copy link
Member

di commented Jun 27, 2022

We've got something like this now: https://inspector.pypi.io/

This isn't anything close to production-grade so I wouldn't recommend pointing a lot of traffic at it, but it provides a way to introspect packages on PyPI, without exposing PyPI to the need to introspect packages.

This isn't integrated into PyPI in any way except for the admin interface, but once it is a little more developed that could be possible.

@simonw
Copy link
Author

simonw commented Nov 19, 2024

We've got something like this now: https://inspector.pypi.io/

This is really neat, I love it! Exactly the kind of thing I was hoping for here.

If you're worried about traffic load on it, one alternative could be to implement the same thing entirely client-side. PyPI serves wheels etc with open CORS headers, so it's possible for JavaScript in a browser to fetch those packages, decode them and display them.

I built a very basic demo of that here: https://tools.simonwillison.net/zip-wheel-explorer

@di
Copy link
Member

di commented Nov 20, 2024

I think we're less worried about the traffic and more worried about the risk of extracting or displaying user-submitted content on the pypi.org domain. For example, your demo has an XSS vulnerability: try exploring this wheel and click on the __init__.py file.

@merwok
Copy link
Contributor

merwok commented Nov 21, 2024

With trusted publishing, would it be easy to have a link from pypi release to github commit tree?
(I know pypi has info about the CI run, but don’t know how easily the git info can be gotten for the CI run)

@di
Copy link
Member

di commented Nov 21, 2024

Yep, see #17122 (comment)

@woodruffw
Copy link
Member

Triaging: I think this is complete per both #5118 (comment) and also the new UI view we have for attestation contents!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants