Skip to content

Parallel signing #1468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 23, 2025
Merged

Parallel signing #1468

merged 11 commits into from
Jul 23, 2025

Conversation

jku
Copy link
Member

@jku jku commented Jul 15, 2025

This uses threads to sign multiple artifacts in parallel. The issue that is being solved here is that rekor v2 only responds to the entry request after the entry has been included in the log and the inclusion proof is available. When entry requests are done in parallel, they all get responses in the next "inclusion batch" (instead of one response per batch), speeding up multi-artifact signing by a lot. A practical example of the use case could be cpython releases where they sign ~10 artifacts.

There are multiple changes:

  • Refactor signing so the "threaded method" is as small as possible -- the threadpool code is from ramon
  • Avoid non-thread-safe print() in threaded method
  • Avoid requests Session use from multiple threads. It's not 100% certain this is required as the thread safety of session is unclear -- it used to be unsafe but the situation has improved -- so I tried to be conservative and use single-use Sessions since doing so does not seem like a major issue for the sigstore client. With Refactor: replace requests with urllib3 #1040 we can likely get rid of this.
  • refactor CLI output so that it remains in logical order even with threads
  • Add test that signs multiple artifacts

Signer.sign_artifact() and Signer.sign_dsse() are now expected to be thread safe.

ramonpetgrave64 and others added 5 commits July 10, 2025 12:07
This is (especially) useful with rekor 2 where the service only
responds after log inclusion: We would prefer to get all signatures
in the same inclusion batch.

The change still affects both rekor v1 and rekor v2.

This commit is a rebase of a bunch of Ramons commits.

Signed-off-by: Ramon Petgrave <[email protected]>
Signed-off-by: Jussi Kukkonen <[email protected]>
* Session thread safety is ambiguous: it may now be safe but situation
  is unclear
* Use a single Session per create_entry() call: This may have a little
  more overhead but seems like the safest option
* Note that RekorLog (v1) other methods may not be thread safe: I only
  reviewed the create_entry() flow

Signed-off-by: Jussi Kukkonen <[email protected]>
It's a little unclear if Session is now thread safe or not: avoid reuse
just in case

Signed-off-by: Jussi Kukkonen <[email protected]>
* Separate the thread method so it's easier to see potential safety issues
* Using print() with the same file object is generally not thread safe: Avoid
  it from the threaded method

The output remains effectively the same except:
* The b64 encoded signature is no longer printed to terminal
* Some print()s are now logger.info(): e.g.
    Transparency log entry created at index: 5562
* Other print()s happen in a batch now, after he signing has finished

Signed-off-by: Jussi Kukkonen <[email protected]>
@jku jku force-pushed the parallel-signing branch from 74c94d4 to 9d6455e Compare July 15, 2025 12:56
@jku
Copy link
Member Author

jku commented Jul 16, 2025

The included test only signs a few artifacts at a time but I have run this on more pathological cases:

touch a b c d e f g h i j k l m n o p q r s t u v x y z
time sigstore sign --overwrite a b c d e f g h i j k l m n o p q r s t u v x y z

Runtime on current main branch

  • rekor1: 20 seconds
  • rekor2: 63 seconds

Runtime on parallel-sign branch

  • rekor1: 7 seconds (reduction of 65%)
  • rekor2: 12 seconds (reduction of 80%)

The time differences here are not that big but I think that's mostly because staging rekor 2 is currently configured with small and quick batches: With longer batching times the rekor2 difference will only grow

@jku jku marked this pull request as ready for review July 16, 2025 11:24
Signed-off-by: Jussi Kukkonen <[email protected]>
@woodruffw woodruffw added component:cli CLI components component:signing Core signing functionality labels Jul 17, 2025
Make every RekorLog have a Session of their own by default.
This means RekorClient no longer needs to manage that.

Signed-off-by: Jussi Kukkonen <[email protected]>
@jku
Copy link
Member Author

jku commented Jul 21, 2025

The time differences here are not that big but I think that's mostly because staging rekor 2 is currently configured with small and quick batches: With longer batching times the rekor2 difference will only grow

Colleen confirmed this for me: tessera checkpoint interval is currently configured at 2 seconds in staging, this might get bumped up to 10 seconds

The effect of this with the 25 artifact rekor2 test (with the current 10 thread limit) is :

  • with current staging tessera configuration (2 second checkpoint interval)
    • sigstore-python main waits for 25 checkpoints: ~49 secs
    • parallel-signing branch waits for 3 checkpoints: ~5 secs
  • with possible staging tessera configuration (10 second checkpoint interval)
    • sigstore-python main waits for 25 checkpoints: ~ 240 secs
    • sigstore-python main waits for 3 checkpoints: ~ 25 secs

The timings for this branch depend on the max number of threads: current 10 threads means 25 artifacts splits to 3 checkpoints.

The takeaways to me are:

  • we can/should increase the max number of threads as far as is reasonable
  • if there are more than max_threads artifacts to sign, we could warn the user that the signing might take a while -- basically signing max_threads artifacts takes ~0.5 checkpoint intervals on average but even one more artifact takes ~1.5 checkpoint intervals on average

woodruffw
woodruffw previously approved these changes Jul 22, 2025
Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jku! I left two small comments/nitpicks 🙂

This number does affect the number of concurrent rekor POST requests
we have in flight, but we are unlikely to hit rate limits as
they are defined in "requests from same host per minute".

Signed-off-by: Jussi Kukkonen <[email protected]>
@jku jku requested a review from woodruffw July 23, 2025 13:23
Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work @jku!

One thing I was thinking about it whether the increased latency with Rekor v2 means that it might make sense to show a progress bar while signing (i.e. when we can detect an interactive user terminal). Do you have thoughts on that? Not a blocker here either way.

@jku
Copy link
Member Author

jku commented Jul 23, 2025

One thing I was thinking about it whether the increased latency with Rekor v2 means that it might make sense to show a progress bar while signing (i.e. when we can detect an interactive user terminal). Do you have thoughts on that? Not a blocker here either way.

I'd like it... but I don't think it's possible to implement one that is useful: for the typical case (signing less than max_workers artifacts) the runtime can be anything from X seconds to X+Y seconds where X is the "current baseline" and Y is the checkpoint interval that rekor-tiles uses... So there's problems:

  • currently X and Y are similar (a couple of seconds) but Y may be increased to up to 10 secs so it could dominate the progress bar
  • we don't know for sure what Y is (since it's a knob that rekor-tiles admins can tweak)
  • even if we did, our runtime depends on when during the interval we happen to make our request -- if we are lucky we get a response immediately

@jku jku merged commit 3a26b7d into sigstore:main Jul 23, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:cli CLI components component:signing Core signing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants