Skip to content

[pypi] Support pulling dependencies using direct url specifiers without using an index #2363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mering opened this issue Oct 31, 2024 · 6 comments · Fixed by #2871
Closed
Assignees

Comments

@mering
Copy link
Contributor

mering commented Oct 31, 2024

🐞 bug report

Affected Rule

Module extension:

pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(...)

Is this a regression?

Not sure.

Description

The repository rule for pip dependencies doesn't use repository_ctx.download(). This breaks usages of --experimental_downloader_config for example to use a transparent dependency mirror.

🔬 Minimal Reproduction

https://github.com/bazelbuild/rules_python/blob/b6fc2a080d4884510dea02cc77b4e0b8fd3a0ccd/python/private/pypi/whl_library.bzl#L245-L256

https://github.com/bazelbuild/rules_python/blob/b6fc2a080d4884510dea02cc77b4e0b8fd3a0ccd/python/private/pypi/whl_installer/wheel_installer.py#L168-L190

🔥 Exception or Error

Stack trace prints to the code linked above.

🌍 Your Environment

Operating System:

Linux

Output of bazel version:

7.0.2

Rules_python version:

0.37.2

Anything else relevant?

@aignas
Copy link
Collaborator

aignas commented Nov 3, 2024

Please see the documentation here: https://rules-python.readthedocs.io/en/latest/pypi-dependencies.html#bazel-downloader-and-multi-platform-wheel-hub-repository.

This currently has a limitation that it needs hashes for all of the files in the requirements.txt. The stabilization of this feature is tracked by #260.

@aignas aignas closed this as not planned Won't fix, can't repro, duplicate, stale Nov 3, 2024
@mering
Copy link
Contributor Author

mering commented Nov 4, 2024

@aignas How can I use experimental_index_url in combination with dependencies without an index like the following:

proto-breaking-change-detector @ https://github.com/googleapis/proto-breaking-change-detector/archive/refs/tags/v2.4.0.tar.gz
tensorflow_graphics @ git+https://github.com/tensorflow/graphics@ca63cf788978db0bde2db4ce488e102c455da2e7

@aignas
Copy link
Collaborator

aignas commented Nov 14, 2024

The support for direct references has not been added yet, I'll reopen this issue and rename it to scope it to supporting it, but I am not sure when I can work on it.

proto-breaking-change-detector @ https://github.com/googleapis/proto-breaking-change-detector/archive/refs/tags/v2.4.0.tar.gz could be easy enough. Outline of the changes:

For the second one it may be harder - I am not sure how to handle git repos yet, however, with bazel 7.4 the extension can pass a labels within the extension more easily, so one could look into how to do that.

For now the workaround would be to build the wheels from sdists outside bazel and host them in some PyPI compatible artifact mirror and then use experimental_index_url flag to enable the usage of the bazel-downloader.

@aignas aignas reopened this Nov 14, 2024
@aignas aignas changed the title pip rules not using rctx.download() [pypi] Support pulling dependencies without using an index Nov 14, 2024
@aignas aignas changed the title [pypi] Support pulling dependencies without using an index [pypi] Support pulling dependencies using direct url specifiers without using an index Nov 14, 2024
github-merge-queue bot pushed a commit that referenced this issue Mar 12, 2025
This PR adds support for installing wheels via direct urls in the
requirements lock file:
```
foo==0.0.1 @ https://someurl.org/package.whl
bar==0.0.1 @ https://someurl.org/package.tar.gz
```
This is to improve parity between bazel downloader and pip behavior.
Before this change, direct urls used fallback to pip install.

Partially addresses #2363 as it does not add support for git urls.
github-merge-queue bot pushed a commit that referenced this issue Apr 2, 2025
…ied (#2695)

This pull request modifies the SimpleAPI HTML parsing to add a new
field where we can get the `sha256` values by package version. This
allows us to very easily fallback to all packages of a particular
version when using `experimental_index_url` if the hashes are not
specified.

The code deciding which packages to query the SimpleAPI for has been
also modified to only omit queries for packages that are included via
direct URL references.

If we fail to get the data from the SimpleAPI, we will fallback to
`pip` and try to install it via the legacy behaviour.

Fixes #2023
Work towards #260
Work towards #1357
Work towards #2363

---------

Co-authored-by: Ignas Anikevicius <[email protected]>
@aignas aignas self-assigned this Apr 3, 2025
aignas added a commit to aignas/rules_python that referenced this issue Apr 3, 2025
Whilst integrating bazel-contrib#2695 I introduced a regression and here I add a test
for that and fix it. The code that was getting the filename from the URL
was too eager and would break if there was a git ref as noted in the
test.

Before this commit and bazel-contrib#2695 the code was not handling all of the cases
that are tested now either, so I think now we are in a good place. I am
not sure how we should handle the `git_repository` URLs. Maybe having
`http_archive` and `git_repository` usage would be nice, but I am not sure
how we can introduce it at the moment.

Work towards bazel-contrib#2363
github-merge-queue bot pushed a commit that referenced this issue Apr 5, 2025
Whilst integrating #2695 I introduced a regression and here I add a test
for that and fix it. The code that was getting the filename from the URL
was too eager and would break if there was a git ref as noted in the
test.

Before this commit and #2695 the code was not handling all of the cases
that are tested now either, so I think now we are in a good place. I am
not sure how we should handle the `git_repository` URLs. Maybe having
`http_archive` and `git_repository` usage would be nice, but I am not
sure
how we can introduce it at the moment.

Work towards #2363
@aignas
Copy link
Collaborator

aignas commented Apr 8, 2025

As @mering suggests the git_repository archive is downloading things by shelling out to the git executable source. I am not entirely sure that this is something that we can do easily because duplicating calling git should be out of scope of rules_python.

However, we should support git_repository when pulling sources for sdist building, which we want to officially support via build actions (#2410). Ideally we could just use the http_archive or git_repository to get the sources.

I'd like to mark this as done and add a note to the sdist building ticket for git sources support. Does that sound good to you, @mering?

@mering
Copy link
Contributor Author

mering commented Apr 8, 2025

@aignas thanks for fixing the direct download case 1. When we applied this to our code base, we noted that it still tries to reach out to an index-like URL. Could this be skipped in that case?

I am fine with continuing the conversation about the git repository in the other issue.

aignas added a commit to aignas/rules_python that referenced this issue Apr 29, 2025
Summary:
- Better handle git references for sdists.
- Better handle direct whl references.
- Add an extra test that turned out to be not needed in the end, but I
  left it to increase the code coverage.

Work towards bazel-contrib#2363
Fixes bazel-contrib#2828
github-merge-queue bot pushed a commit that referenced this issue Apr 29, 2025
Summary:
- Better handle git references for sdists.
- Better handle direct whl references.
- Add an extra test that turned out to be not needed in the end, but I
  left it to increase the code coverage.

Work towards #2363
Fixes #2828
aignas added a commit that referenced this issue Apr 29, 2025
Summary:
- Better handle git references for sdists.
- Better handle direct whl references.
- Add an extra test that turned out to be not needed in the end, but I
  left it to increase the code coverage.

Work towards #2363
Fixes #2828

(cherry picked from commit a79bbfa)
aignas added a commit to aignas/rules_python that referenced this issue May 11, 2025
Summary:
- Make the requirement line the same as the one that is used in whls. It
  only contains extras and the version if it is present.
- Add debug log statements if we fail to get the version from a direct
  URL reference.
- Move some tests from `parse_requirements_tests` to
  `index_sources_tests` to improve test
  maintenance.
- Replace the URL encoded `+` to a regular `+` in the filename.
- Correctly handle the case when the `=sha256:` is used in the URL.

I cannot think of anything else that we can do for this as of now, so
will mark the associated issue as resolved.

Fixes bazel-contrib#2363
@aignas
Copy link
Collaborator

aignas commented May 11, 2025

Spent more time on this, could you please check if the linked PR fixes the remaining issues?

github-merge-queue bot pushed a commit that referenced this issue May 12, 2025
…ust (#2871)

Summary:
- Make the requirement line the same as the one that is used in whls. It
  only contains extras and the version if it is present.
- Add debug log statements if we fail to get the version from a direct
  URL reference.
- Move some tests from `parse_requirements_tests` to
  `index_sources_tests` to improve test
  maintenance.
- Replace the URL encoded `+` to a regular `+` in the filename.
- Correctly handle the case when the `=sha256:` is used in the URL.

Once this is merged I plan to tackle #2648 by changing the
`parse_requirements` code to de-duplicate entries returned
by the `parse_requirements` function.

I cannot think of anything else that we can do for this as of now, so
will mark the associated issue as resolved.

Fixes #2363
Work towards #2648
github-merge-queue bot pushed a commit that referenced this issue May 14, 2025
When given .whl file URLs and no file name, `whl_library` writes the
wheel it downloads to a file with the same file name as the first URL's.
But then at extraction time, it always consults ctx.attr.filename
for that file name, leading to failure when that attribute is None.
This patch should fix that.

Related #2363
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants