Create "Trusted Publishing" database tables #11062

Turbo87 · 2025-04-25T07:38:23Z

This PR creates the three necessary database tables to implement Trusted Publishing on crates.io:

The trustpub_configs_github table is used to store the Trusted Publishing configurations for GitHub Actions. Users will be able to use the web interface to create new configurations, or remove existing ones. The repository_owner_id column will be looked up from the GitHub API when the configuration is created to avoid account resurrection attacks.
The trustpub_tokens table contains temporary access tokens that can be used to publish corresponding crates. They can be created by exchanging a GitHub Actions OIDC token for a temporary access token, if the OIDC token corresponds to an existing Trusted Publising configuration.
The trustpub_used_jtis table is used to ensure that OIDC tokens can only once be exchanged for temporary access tokens to avoid potential replay attacks.

Turbo87 · 2025-04-25T07:41:40Z

migrations/2025-04-25-090000_trusted-publishing/up.sql

@@ -0,0 +1,64 @@
+create table github_oidc_configs


does it make sense to have a table per OIDC provider? or would it be better to save them all in the same table?

since they all use different claims I lean towards table per provider, but I'm open to the opposite if there are good reasons for it.

I lean towards starting with a single table for two reasons: 1. It's the simplest way to begin. 2. After a quick glance at the given columns, I don't anticipate significant variations between different providers, and it seems like a good starting point for all providers already.

I don't anticipate significant variations between different providers

PyPI started to support Google as an OIDC provider for Trusted Publishing, which only uses two fields: email address (required) and subject (optional).

in other words, this is significantly different from what GitHub/GitLab support.

Ah, good to know! I was only considering the repo holder previously 😅. Then this indeed makes more sense to me, as it might vary for each provider!

I haven't investigated this yet, but if those columns won't be used for filtering and mainly serve as data, or if we only operate on them within the app, another possible solution could be to save those configurations as JSONB to accommodate different providers. Though the flow chart doesn't include this, I see an index clause (github_oidc_configs (repository_owner_id, repository_name)) in the schema, which means we might need these as filtering columns rather than just relying on either id or crate_id, so this might not be suitable?

which means we might need these as filtering columns

the way it works is that we receive a JWT with various "claims". the iss claim (issuer) tells us where the JWT is coming from (e.g. GitHub Actions), which lets us select the correct table to search for corresponding Trusted Publishing configurations. the rest of the provider-specific claims are then used to find one or more matching configurations (multiple crates could be in the same repository). from these configurations we then have a list of crate IDs, that the newly generated temporary access token is valid for.

so yes, these columns are mainly used for filtering to find matching configurations for a JWT.

My initial reaction here was the same as @eth3lbert's, but I think I'm convinced that provider-specific tables are the way to go here.

Turbo87 · 2025-04-25T07:42:27Z

migrations/2025-04-25-090000_trusted-publishing/up.sql

@@ -0,0 +1,64 @@
+create table github_oidc_configs


should this be named oidc_github_configs instead so that all the related tables use a oidc_ prefix?

Since this would only be used for a trusted publishing service, should we add trusted publishing (or its abbreviation/alias) to the table name? This is considering we might have other services that also have their own OIDC-related configurations. However, we could also rename it later when we actually need to, though.

I've avoided trusted_publishing_ so far because it seems quite long, and for tp_ it isn't really obvious what it stands for, but I'm open to suggestions :)

maybe we could add _publisher after the provider (e.g., github_publisher), similar to how PyPI has the _publishers suffix.

should this be names oidc_github_configs instead so that all the related tables use a oidc_ prefix?

I think that makes sense.

I've avoided trusted_publishing_ so far because it seems quite long, and for tp_ it isn't really obvious what it stands for, but I'm open to suggestions :)

I just came up with trustpub_. Short and somewhat self-describing with enough context. I think I prefer that to oidc_.

I changed the names to trustpub_configs_github, trustpub_tokens and trustpub_used_jtis

Turbo87 · 2025-04-25T07:48:07Z

migrations/2025-04-25-090000_trusted-publishing/up.sql

+comment on column oidc_tokens.crate_id is 'Unique identifier of the crate that can be published using this token';
+comment on column oidc_tokens.hashed_token is 'SHA256 hash of the token that can be used to publish the crate';


a JWT can generally be used to publish multiple crates, if they all have a similar Trusted Publishing configuration for the same repository. during the exchange request the API server would create a dedicated oidc_tokens row for each crate with the same hashed_token.

would it be better to use a crate_ids array instead of row-per-crate? I'm not sure if an array would be able to support the references crates above.

on the other hand, the current design does not prevent hash collisions since a uniqueness constraint on only the hashed_token column is not possible.

Yeah, I forget how arrays work when used as foreign keys. Seems simpler to leave it the way you have it here with multiple rows and no constraint on the hashed_token.

Yeah, I forget how arrays work when used as foreign keys

according to the internet they are not implemented at all, but since the tokens are short-lived anyway it's probably not a huge deal to have IDs in the array that no longer correspond to an existing crate. since the crate IDs are unique and won't get reassigned we shouldn't be vulnerable to resurrection attacks or anything like that.

I've now adjusted the PR and the prototype to use the one-row-per-token approach with a crate_ids: int[] column, which then gives us the extra safety regarding hash collisions.

migrations/2025-04-25-090000_trusted-publishing/up.sql

jtgeibel

Looks good to me overall, just a few minor comments/questions.

jtgeibel · 2025-04-27T16:11:04Z

migrations/2025-04-25-090000_trusted-publishing/up.sql

+comment on column oidc_tokens.crate_id is 'Unique identifier of the crate that can be published using this token';
+comment on column oidc_tokens.hashed_token is 'SHA256 hash of the token that can be used to publish the crate';


Yeah, I forget how arrays work when used as foreign keys. Seems simpler to leave it the way you have it here with multiple rows and no constraint on the hashed_token.

jtgeibel · 2025-04-27T16:14:24Z

migrations/2025-04-25-090000_trusted-publishing/up.sql

@@ -0,0 +1,64 @@
+create table github_oidc_configs


should this be names oidc_github_configs instead so that all the related tables use a oidc_ prefix?

I think that makes sense.

jtgeibel · 2025-04-27T16:20:40Z

crates/crates_io_database/src/schema.rs

+        /// GitHub user or organization that owns the repository
+        repository_owner -> Varchar,
+        /// Unique identifier of the user or organization that owns the repository
+        repository_owner_id -> Int4,


Is there a reason we need to store both string and ID versions of this? Also, since this isn't a foreign key, I assume it is the GH ID and not any of our own internal IDs. If so, maybe we should tweak the comment to clarify that.

since this isn't a foreign key, I assume it is the GH ID and not any of our own internal IDs. If so, maybe we should tweak the comment to clarify that.

good catch. I actually updated the comments in the migration script already to reflect that, but forgot to regenerate the schema file 😅

Is there a reason we need to store both string and ID versions of this?

storing the string is useful to be able to show it in the list of active trusted publishing configs for a crate. but also, if you rename a repository then the ID will stay the same, but the name will be different. in that case we probably want the publish to fail and require a reconfiguration?

LawnGnome · 2025-04-30T21:53:54Z

migrations/2025-04-25-090000_trusted-publishing/up.sql

+    crate_id int not null references crates on delete cascade,
+    repository_owner varchar not null,
+    repository_owner_id int not null,
+    repository_name varchar not null,


Should we also be tracking repository ID?

we use the owner ID to protect against resurrection attacks, but I'm not sure if the same would apply for the repository ID. if the user recreates the repository with the same name, do we really want to break the publishing workflow because of it?

aside from that, we might not be able to get the repository ID for private repositories, unless we use the auth token of a repository owner. that might work right now with GitHub being our only auth provider, but would cause some issues if that were to change eventually.

Just chiming in from PyPI's side: the reasons @Turbo87 mentioned are the same ones we don't track the repository ID either -- in the best case it's no stronger than the resurrection guarantee of the owner ID, but it's hard/annoying to obtain in the private repo case 🙂

LawnGnome · 2025-04-30T21:55:10Z

migrations/2025-04-25-090000_trusted-publishing/up.sql

+(
+    id serial primary key,
+    created_at timestamptz not null default now(),
+    crate_id int not null references crates on delete cascade,


This is fine on a database level, but we will have to ensure that there's a reasonable path for multi-crate workspaces to be configured without necessarily having to do each one with many clicks in the UI.

from what I can tell, PyPI requires you to set it up for each individual project too. let's start simple, this is what was agreed upon in the RFC.

we could open up the configuration creation for API tokens in the future and not just cookie auth, then those that want to automate it can do so without us having to make the web UI more complex for the simple use cases.

LawnGnome · 2025-04-30T21:55:38Z

migrations/2025-04-25-090000_trusted-publishing/up.sql

+    id bigserial primary key,
+    jti varchar not null,
+    used_at timestamptz not null default now(),
+    expires_at timestamptz not null


Do we need to track this?

otherwise we wouldn't be able to clean up this table and it would grow forever. I was planning on having a background job that regularly deletes expired tokens and jtis.

LawnGnome · 2025-04-30T21:56:07Z

migrations/2025-04-25-090000_trusted-publishing/up.sql

@@ -0,0 +1,64 @@
+create table github_oidc_configs


My initial reaction here was the same as @eth3lbert's, but I think I'm convinced that provider-specific tables are the way to go here.

Turbo87 · 2025-05-02T07:57:50Z

thanks for all the reviews. unless there are any last minute objections I plan on merging this tomorrow and then continue implementing the first API endpoints.

woodruffw

This is really exciting!

Turbo87 added C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works A-backend ⚙️ labels Apr 25, 2025

Turbo87 commented Apr 25, 2025

View reviewed changes

Turbo87 requested review from a team, LawnGnome and eth3lbert April 25, 2025 07:48

eth3lbert reviewed Apr 25, 2025

View reviewed changes

migrations/2025-04-25-090000_trusted-publishing/up.sql Outdated Show resolved Hide resolved

Turbo87 added this to crates.io team meetings Apr 25, 2025

Turbo87 moved this to For next meeting in crates.io team meetings Apr 25, 2025

jtgeibel approved these changes Apr 27, 2025

View reviewed changes

Turbo87 force-pushed the trusted-publishing-tables branch from 7378247 to 19c4ba0 Compare April 28, 2025 08:47

Turbo87 requested a review from Copilot April 28, 2025 13:42

This comment was marked as resolved.

Sign in to view

Create "Trusted Publishing" database tables

fea3aa1

Turbo87 force-pushed the trusted-publishing-tables branch from 19c4ba0 to fea3aa1 Compare April 28, 2025 13:45

Turbo87 mentioned this pull request Apr 30, 2025

Tracking Issue for "Trusted Publishing Support" #10247

Open

19 tasks

Turbo87 removed this from crates.io team meetings Apr 30, 2025

LawnGnome approved these changes Apr 30, 2025

View reviewed changes

woodruffw approved these changes May 2, 2025

View reviewed changes

Turbo87 merged commit 486988a into rust-lang:main May 3, 2025
9 checks passed

Turbo87 deleted the trusted-publishing-tables branch May 3, 2025 13:16

This was referenced May 5, 2025

Add PUT /api/v1/trusted_publishing/github_configs API endpoint #11113

Merged

Add PUT /api/v1/trusted_publishing/tokens API endpoint #11131

Merged

This was referenced May 20, 2025

Add DELETE /api/v1/trusted_publishing/github_configs/{id} API endpoint #11209

Merged

Add GET /api/v1/trusted_publishing/github_configs API endpoint #11230

Merged

Add DELETE /api/v1/trusted_publishing/tokens API endpoint #11234

Merged

Turbo87 mentioned this pull request Jun 4, 2025

controllers/krate/publish: Add support for Trusted Publishing access tokens #11294

Merged

		comment on column oidc_tokens.crate_id is 'Unique identifier of the crate that can be published using this token';
		comment on column oidc_tokens.hashed_token is 'SHA256 hash of the token that can be used to publish the crate';

Create "Trusted Publishing" database tables #11062

Create "Trusted Publishing" database tables #11062

Uh oh!

Conversation

Turbo87 commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Turbo87 Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Turbo87 Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jtgeibel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

woodruffw May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Turbo87 commented May 2, 2025

Uh oh!

woodruffw left a comment

Turbo87 commented Apr 25, 2025 •

edited

Loading

Turbo87 Apr 25, 2025 •

edited

Loading

Turbo87 Apr 28, 2025 •

edited

Loading

woodruffw May 2, 2025 •

edited

Loading