Skip to content

Conversation

arrdem
Copy link
Collaborator

@arrdem arrdem commented Sep 2, 2025

Builds on #629 and fixes #610.

A brief summary of the bug is that previously the static venv machinery assumed that external imports were used only for site-packages/ style repositories coming from a pip implementation. This is in some cases false -- one of which is the use of local sub-repositories as sources of Python code.

Because of the false assumption that all external imports were being converted into copies, the existing implementation assumed that the repository name path segment could be dropped when inserting paths for non-relocated imports into the _aspect.pth file within a venv.

Unfortunately repairing this oversight in a portable and sound way is tricky. Simply using relative paths in .pth file can incur undesirable canonicalization and resolution of symlinks which produce broken runtime paths depending on the specifics of the sandboxing strategy. The Linux sandboxing implementation seems especially vulnerable to this.

Ultimately the only really sound way to achieve this would be to use a runfiles library, which has the further problem that the runfiles manifest structure operates at the level of individual files and not file trees so we'd have to make potentially unsound assumptions about scanning runfiles manifest entries for path prefixes and also take on a dependency on a "real" implementation of interacting with runfiles.

This PR explores a workaround where we use a customized .pth flow which allows us to be really careful about how entries are added to the path to avoid implicit symlink resolution operations that can produce .runfiles sandbox escapes.

Changes are visible to end-users: yes

  • Searched for relevant documentation and updated as needed: yes
  • Breaking change (forces users to change their own code or config): no
  • Suggested release notes appear below: yes

Fixed a bug which caused imports from external modules to fail under the new py_[static_]venv_binary machinery.

Test plan

  • New test cases added based on the repro report.

@CLAassistant
Copy link

CLAassistant commented Sep 2, 2025

CLA assistant check
All committers have signed the CLA.

@arrdem arrdem changed the base branch from main to myrrlyn/interpreter-sans-activate-avec-couture September 2, 2025 22:22
@arrdem arrdem changed the title Arrdem/fix 610 fix(py_venv): Repair external repository imports Sep 2, 2025
Copy link

aspect-workflows bot commented Sep 2, 2025

Test

26 test targets passed

Targets
//examples/multi_version:py_version_default_test [k8-fastbuild]                           1s
//examples/multi_version:py_version_test [k8-fastbuild-ST-494921797612]                   1s
//examples/pytest:pytest_test [k8-fastbuild]                                              2s
//examples/pytest:sharded/test [k8-fastbuild]                                             3s
//examples/virtual_deps:pytest_test [k8-fastbuild]                                        1s
//py/private/py_venv:test_link [k8-fastbuild]                                             252ms
//py/tests/cc-deps:test_smoke [k8-fastbuild]                                              485ms
//py/tests/external-deps:test_can_import_runfiles_helper [k8-fastbuild]                   800ms
//py/tests/internal-deps:assert [k8-fastbuild]                                            445ms
//py/tests/py-binary:runfiles_from_pip_test [k8-fastbuild]                                953ms
//py/tests/py-external-venv:test [k8-fastbuild]                                           194ms
//py/tests/py-internal-venv:test [k8-fastbuild]                                           150ms
//py/tests/py-test:test_env_vars [k8-fastbuild]                                           658ms
//py/tests/py-venv-disable-systemsite:test [k8-fastbuild]                                 148ms
//py/tests/py-venv-disable-usersite:test [k8-fastbuild]                                   169ms
//py/tests/py-venv-enable-site:test [k8-fastbuild]                                        152ms
//py/tests/py-venv-standalone-interpreter:test [k8-fastbuild]                             171ms
//py/tests/py_image_layer:py_image_test [k8-fastbuild]                                    10s
//py/tests/py_venv_conflict:validate_import_roots [k8-fastbuild]                          225ms
//py/tests/py_venv_image_layer:my_app_amd64_layers_test [k8-fastbuild]                    56ms
//py/tests/py_venv_image_layer:my_app_arm64_layers_test [k8-fastbuild]                    72ms
//py/tests/py_venv_image_layer:py_amd64_image_command_test [k8-fastbuild]                 3s
//py/tests/py_venv_image_layer:py_amd64_image_content_test [k8-fastbuild]                 235ms
//py/tests/py_venv_image_layer:py_arm64_image_content_test [k8-fastbuild]                 5s
//py/tests/repo_relative_imports/test:test [k8-fastbuild]                                 491ms
//py/tests/rpy610:test [k8-fastbuild]                                                     150ms

Total test execution time was 32s. 20 tests (43.5%) were fully cached saving 1m 1s.

Copy link

github-actions bot commented Sep 2, 2025

e2e/use_release folder: LCOV of commit 7e899c1 during CI #1955

Summary coverage rate:
  lines......: 100.0% (2 of 2 lines)
  functions..: 100.0% (1 of 1 function)
  branches...: no data found

Files changed coverage rate: n/a

@arrdem arrdem requested a review from myrrlyn September 2, 2025 22:48
@arrdem arrdem changed the base branch from myrrlyn/interpreter-sans-activate-avec-couture to main September 3, 2025 02:57
arrdem added a commit that referenced this pull request Sep 25, 2025
As reported by customers, the naive but correct strategy of using copies
in `py_venv_*` can lead to laughable disk usage. Some clients are
reporting order 10min slowdowns and order 100GiB disk usage wasted
copying inputs into binaries. We need a more scalable strategy such as
symlinking.

Thankfully we can generate symlinks from tools driven by Bazel into a
TreeArtifact so long as the symlinks aren't dangling. By carefully
crafting relative symlinks we're able to produce a tree of links which
is valid both at and after action time. When relocating a `.runfiles`
tree containing such links (for instance into a OCI later tar) these
links must be dereferenced but that Just Works.

While I'm at it, refactor the venv machinery to operate in terms of
strategies and combinators on strategies so that it's simpler to talk
about the production-grade behavior we want which is:

* `site-packages` trees in 1stparty code get relocated/linked into the
venv
* `bin` sibling trees in 1stparty code get relocated/patched into the
venv
* General trees in 1stparty code are referred to by `.pth` file entries
* General trees in 3rdparty code get relocated/linked into the venv
* `bin` sibling trees in 3rdparty code get relocated/patched into the
venv

This makes the venv builder significantly more flexible, allows for
better error reporting and opens the door to more flexible error
handling.

Incorporates an implementation of #606, but testing is required.
Should include an implementation of #635, but testing is required.

### Changes are visible to end-users: yes

- Searched for relevant documentation and updated as needed: yes
- Breaking change (forces users to change their own code or config): no
- Suggested release notes appear below: yes

`py_venv_*` now use symlinks rather than hard file copies which
radically reduce disk usage while improving venv building performance.

### Test plan

- Covered by existing test cases
- New test cases added
- Manual testing; please provide instructions so we can reproduce:
TODO.

### Remaining work
- [x] Strip debug prints
- [x] Improve collision handling
- [x] Rework the command interpreter to implement the last-wins
semantics
- [x] Mitigate spooky dangling symlink issues 
- [x] Fix a regression which can cause a `site-packages/__init__.py`
file to be linked
- [x] Add sha256-sum based collision ignoring
- [ ] Add a test covering that a `site-packages/__init__.py` file will
not be linked
- [ ] Add a test covering bin shebang patching
- [ ] Integrate the test case from #635
- [ ] Manually test that linked venvs still work; should just be fine

---------

Co-authored-by: Alexander Payne <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: imports attribute not propagating from external repos through py_venv
2 participants