refactor: defer zip manifest building to execution phase to improve analysis phase performance #3381

tobyh-canva · 2025-10-27T03:20:14Z

When py_binary/py_test were being built, they were flattening the runfiles
depsets at analysis time in order to create the zip file mapping manifest for
their implicit zipapp outputs. This flattening was necessary because they had
to filter out the original main executable from the runfiles that didn't belong
in the zipapp. This flattening is expensive for large builds, in some cases
adding over 400 seconds of time and significant memory overhead.

To fix, have the zip file manifest use the runfiles_with_exe object, which is
the runfiles, but pre-filtered for the files zip building doesn't want. This
then allows passing the depsets directly to Args.add_all and using map_each
to transform them.

Additionally, pass runfiles.empty_filenames using a lambda. Accessing that
attribute implicitly flattens the runfiles.

Finally, because the original profiles indicated str.format() was a non-trivial
amount of time (46 seconds / 15% of build time), switch to using + instead.

This is a more incremental alternative to #3380 which achieves most of the
same optimization with only Starlark changes, as opposed to introducing an
external script written in C++.

Profile of a large build, which shows a Starlark CPU profile. It shows an overall build
time of 305 seconds. 46 seconds (15%) are spent in map_zip_runfiles, half of which
is in str.startswith() and the other half in str.format().

rickeylev · 2025-11-09T22:11:42Z

As mentioned in the other PR comment: we can definitely accept PR until the questions about using an external tool are worked out.

Overall, LGTM. Because of the profile results in the other thread showing format() (as called by map_zip_runfiles) was a significant chunk of the time, changing to % or + will probably save some cycles. format_each might also help. I'm going to poke it a bit.

…into pr-3381

rickeylev · 2025-11-10T00:14:35Z

For posterity: Before this change, about 462 seconds is spent in py_binary related to zip file manifest building.

With this change, the profile shows about 46 seconds spent in py_binary related to zip file manifest building.

This profile is this change, but before changing to use + instead of format()

rickeylev · 2025-11-10T00:27:23Z

Ok, cleaned this up, switched it to use + instead of format().

@tobyh-canva If you have opportunity, could you run another profile? I'm interested to see how much of the 23 seconds of format() overhead in building the path strings is gone.

tobyh-canva · 2025-11-10T00:50:17Z

Running a profile right now, thanks heaps!

perf: improve analysis performance of py_binary and py_test

a982e40

tobyh-canva mentioned this pull request Oct 27, 2025

perf: improve analysis performance by 95% for py_binary and py_test rules #3380

Open

rickeylev marked this pull request as ready for review November 9, 2025 22:09

rickeylev requested review from aignas and rickeylev as code owners November 9, 2025 22:09

tobyh-canva changed the title ~~zip file analysis optimisation: alternate starlark-only approach~~ perf: improve analysis performance by for py_binary and py_test rules Nov 9, 2025

use + instead of format(); pass empty_filenames lambda

cf4ddb9

rickeylev changed the title ~~perf: improve analysis performance by for py_binary and py_test rules~~ refactor: defer zip manifest building to execution phase to improve analysis phase performance Nov 9, 2025

rickeylev added 2 commits November 9, 2025 16:09

Merge branch 'main' of https://github.com/bazel-contrib/rules_python …

1de91b1

…into pr-3381

format code

6ec96b7

update changelog

76f1ee8

rickeylev enabled auto-merge November 10, 2025 00:27

rickeylev approved these changes Nov 10, 2025

View reviewed changes

rickeylev added this pull request to the merge queue Nov 10, 2025

Merged via the queue into bazel-contrib:main with commit 4fb634e Nov 10, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

refactor: defer zip manifest building to execution phase to improve analysis phase performance #3381

refactor: defer zip manifest building to execution phase to improve analysis phase performance #3381

tobyh-canva commented Oct 27, 2025 •

edited by rickeylev

Loading

Uh oh!

rickeylev commented Nov 9, 2025

Uh oh!

rickeylev commented Nov 10, 2025 •

edited

Loading

Uh oh!

rickeylev commented Nov 10, 2025

Uh oh!

tobyh-canva commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

refactor: defer zip manifest building to execution phase to improve analysis phase performance #3381

refactor: defer zip manifest building to execution phase to improve analysis phase performance #3381

Conversation

tobyh-canva commented Oct 27, 2025 • edited by rickeylev Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rickeylev commented Nov 9, 2025

Uh oh!

rickeylev commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rickeylev commented Nov 10, 2025

Uh oh!

tobyh-canva commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tobyh-canva commented Oct 27, 2025 •

edited by rickeylev

Loading

rickeylev commented Nov 10, 2025 •

edited

Loading