Skip to content

Create "Trusted Publishing" database tables #11062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Turbo87
Copy link
Member

@Turbo87 Turbo87 commented Apr 25, 2025

This PR creates the three necessary database tables to implement Trusted Publishing on crates.io:

  • The trustpub_configs_github table is used to store the Trusted Publishing configurations for GitHub Actions. Users will be able to use the web interface to create new configurations, or remove existing ones. The repository_owner_id column will be looked up from the GitHub API when the configuration is created to avoid account resurrection attacks.
  • The trustpub_tokens table contains temporary access tokens that can be used to publish corresponding crates. They can be created by exchanging a GitHub Actions OIDC token for a temporary access token, if the OIDC token corresponds to an existing Trusted Publising configuration.
  • The trustpub_used_jtis table is used to ensure that OIDC tokens can only once be exchanged for temporary access tokens to avoid potential replay attacks.

Related:

@Turbo87 Turbo87 added C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works A-backend ⚙️ labels Apr 25, 2025
@@ -0,0 +1,64 @@
create table github_oidc_configs
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to have a table per OIDC provider? or would it be better to save them all in the same table?

since they all use different claims I lean towards table per provider, but I'm open to the opposite if there are good reasons for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards starting with a single table for two reasons: 1. It's the simplest way to begin. 2. After a quick glance at the given columns, I don't anticipate significant variations between different providers, and it seems like a good starting point for all providers already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't anticipate significant variations between different providers

PyPI started to support Google as an OIDC provider for Trusted Publishing, which only uses two fields: email address (required) and subject (optional).

in other words, this is significantly different from what GitHub/GitLab support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good to know! I was only considering the repo holder previously 😅. Then this indeed makes more sense to me, as it might vary for each provider!

I haven't investigated this yet, but if those columns won't be used for filtering and mainly serve as data, or if we only operate on them within the app, another possible solution could be to save those configurations as JSONB to accommodate different providers. Though the flow chart doesn't include this, I see an index clause (github_oidc_configs (repository_owner_id, repository_name)) in the schema, which means we might need these as filtering columns rather than just relying on either id or crate_id, so this might not be suitable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which means we might need these as filtering columns

the way it works is that we receive a JWT with various "claims". the iss claim (issuer) tells us where the JWT is coming from (e.g. GitHub Actions), which lets us select the correct table to search for corresponding Trusted Publishing configurations. the rest of the provider-specific claims are then used to find one or more matching configurations (multiple crates could be in the same repository). from these configurations we then have a list of crate IDs, that the newly generated temporary access token is valid for.

so yes, these columns are mainly used for filtering to find matching configurations for a JWT.

@@ -0,0 +1,64 @@
create table github_oidc_configs
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be names oidc_github_configs instead so that all the related tables use a oidc_ prefix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this would only be used for a trusted publishing service, should we add trusted publishing (or its abbreviation/alias) to the table name? This is considering we might have other services that also have their own OIDC-related configurations. However, we could also rename it later when we actually need to, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've avoided trusted_publishing_ so far because it seems quite long, and for tp_ it isn't really obvious what it stands for, but I'm open to suggestions :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could add _publisher after the provider (e.g., github_publisher), similar to how PyPI has the _publishers suffix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be names oidc_github_configs instead so that all the related tables use a oidc_ prefix?

I think that makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've avoided trusted_publishing_ so far because it seems quite long, and for tp_ it isn't really obvious what it stands for, but I'm open to suggestions :)

I just came up with trustpub_. Short and somewhat self-describing with enough context. I think I prefer that to oidc_.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the names to trustpub_configs_github, trustpub_tokens and trustpub_used_jtis

Comment on lines 40 to 41
comment on column oidc_tokens.crate_id is 'Unique identifier of the crate that can be published using this token';
comment on column oidc_tokens.hashed_token is 'SHA256 hash of the token that can be used to publish the crate';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a JWT can generally be used to publish multiple crates, if they all have a similar Trusted Publishing configuration for the same repository. during the exchange request the API server would create a dedicated oidc_tokens row for each crate with the same hashed_token.

would it be better to use a crate_ids array instead of row-per-crate? I'm not sure if an array would be able to support the references crates above.

on the other hand, the current design does not prevent hash collisions since a uniqueness constraint on only the hashed_token column is not possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I forget how arrays work when used as foreign keys. Seems simpler to leave it the way you have it here with multiple rows and no constraint on the hashed_token.

Copy link
Member Author

@Turbo87 Turbo87 Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I forget how arrays work when used as foreign keys

according to the internet they are not implemented at all, but since the tokens are short-lived anyway it's probably not a huge deal to have IDs in the array that no longer correspond to an existing crate. since the crate IDs are unique and won't get reassigned we shouldn't be vulnerable to resurrection attacks or anything like that.

I've now adjusted the PR and the prototype to use the one-row-per-token approach with a crate_ids: int[] column, which then gives us the extra safety regarding hash collisions.

@Turbo87 Turbo87 requested review from a team, LawnGnome and eth3lbert April 25, 2025 07:48
Copy link
Member

@jtgeibel jtgeibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall, just a few minor comments/questions.

Comment on lines 40 to 41
comment on column oidc_tokens.crate_id is 'Unique identifier of the crate that can be published using this token';
comment on column oidc_tokens.hashed_token is 'SHA256 hash of the token that can be used to publish the crate';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I forget how arrays work when used as foreign keys. Seems simpler to leave it the way you have it here with multiple rows and no constraint on the hashed_token.

@@ -0,0 +1,64 @@
create table github_oidc_configs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be names oidc_github_configs instead so that all the related tables use a oidc_ prefix?

I think that makes sense.

Comment on lines 571 to 574
/// GitHub user or organization that owns the repository
repository_owner -> Varchar,
/// Unique identifier of the user or organization that owns the repository
repository_owner_id -> Int4,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need to store both string and ID versions of this? Also, since this isn't a foreign key, I assume it is the GH ID and not any of our own internal IDs. If so, maybe we should tweak the comment to clarify that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this isn't a foreign key, I assume it is the GH ID and not any of our own internal IDs. If so, maybe we should tweak the comment to clarify that.

good catch. I actually updated the comments in the migration script already to reflect that, but forgot to regenerate the schema file 😅

Is there a reason we need to store both string and ID versions of this?

storing the string is useful to be able to show it in the list of active trusted publishing configs for a crate. but also, if you rename a repository then the ID will stay the same, but the name will be different. in that case we probably want the publish to fail and require a reconfiguration?

@Turbo87 Turbo87 force-pushed the trusted-publishing-tables branch from 7378247 to 19c4ba0 Compare April 28, 2025 08:47
@Turbo87 Turbo87 requested a review from Copilot April 28, 2025 13:42
Copilot

This comment was marked as resolved.

@Turbo87 Turbo87 force-pushed the trusted-publishing-tables branch from 19c4ba0 to fea3aa1 Compare April 28, 2025 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants