generated from bazel-contrib/rules-template
-
-
Notifications
You must be signed in to change notification settings - Fork 51
fix(py_venv): Repair external repository imports #635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
arrdem
wants to merge
6
commits into
main
Choose a base branch
from
arrdem/fix-610
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
79e298b
to
038e6bc
Compare
10 tasks
arrdem
added a commit
that referenced
this pull request
Sep 25, 2025
As reported by customers, the naive but correct strategy of using copies in `py_venv_*` can lead to laughable disk usage. Some clients are reporting order 10min slowdowns and order 100GiB disk usage wasted copying inputs into binaries. We need a more scalable strategy such as symlinking. Thankfully we can generate symlinks from tools driven by Bazel into a TreeArtifact so long as the symlinks aren't dangling. By carefully crafting relative symlinks we're able to produce a tree of links which is valid both at and after action time. When relocating a `.runfiles` tree containing such links (for instance into a OCI later tar) these links must be dereferenced but that Just Works. While I'm at it, refactor the venv machinery to operate in terms of strategies and combinators on strategies so that it's simpler to talk about the production-grade behavior we want which is: * `site-packages` trees in 1stparty code get relocated/linked into the venv * `bin` sibling trees in 1stparty code get relocated/patched into the venv * General trees in 1stparty code are referred to by `.pth` file entries * General trees in 3rdparty code get relocated/linked into the venv * `bin` sibling trees in 3rdparty code get relocated/patched into the venv This makes the venv builder significantly more flexible, allows for better error reporting and opens the door to more flexible error handling. Incorporates an implementation of #606, but testing is required. Should include an implementation of #635, but testing is required. ### Changes are visible to end-users: yes - Searched for relevant documentation and updated as needed: yes - Breaking change (forces users to change their own code or config): no - Suggested release notes appear below: yes `py_venv_*` now use symlinks rather than hard file copies which radically reduce disk usage while improving venv building performance. ### Test plan - Covered by existing test cases - New test cases added - Manual testing; please provide instructions so we can reproduce: TODO. ### Remaining work - [x] Strip debug prints - [x] Improve collision handling - [x] Rework the command interpreter to implement the last-wins semantics - [x] Mitigate spooky dangling symlink issues - [x] Fix a regression which can cause a `site-packages/__init__.py` file to be linked - [x] Add sha256-sum based collision ignoring - [ ] Add a test covering that a `site-packages/__init__.py` file will not be linked - [ ] Add a test covering bin shebang patching - [ ] Integrate the test case from #635 - [ ] Manually test that linked venvs still work; should just be fine --------- Co-authored-by: Alexander Payne <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Builds on #629 and fixes #610.
A brief summary of the bug is that previously the static venv machinery assumed that external imports were used only for
site-packages/
style repositories coming from a pip implementation. This is in some cases false -- one of which is the use of local sub-repositories as sources of Python code.Because of the false assumption that all external imports were being converted into copies, the existing implementation assumed that the repository name path segment could be dropped when inserting paths for non-relocated imports into the
_aspect.pth
file within a venv.Unfortunately repairing this oversight in a portable and sound way is tricky. Simply using relative paths in
.pth
file can incur undesirable canonicalization and resolution of symlinks which produce broken runtime paths depending on the specifics of the sandboxing strategy. The Linux sandboxing implementation seems especially vulnerable to this.Ultimately the only really sound way to achieve this would be to use a runfiles library, which has the further problem that the runfiles manifest structure operates at the level of individual files and not file trees so we'd have to make potentially unsound assumptions about scanning runfiles manifest entries for path prefixes and also take on a dependency on a "real" implementation of interacting with runfiles.
This PR explores a workaround where we use a customized
.pth
flow which allows us to be really careful about how entries are added to the path to avoid implicit symlink resolution operations that can produce.runfiles
sandbox escapes.Changes are visible to end-users: yes
Fixed a bug which caused imports from external modules to fail under the new
py_[static_]venv_binary
machinery.Test plan