-
Notifications
You must be signed in to change notification settings - Fork 642
Create "Trusted Publishing" database tables #11062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,64 @@ | |||
create table github_oidc_configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to have a table per OIDC provider? or would it be better to save them all in the same table?
since they all use different claims I lean towards table per provider, but I'm open to the opposite if there are good reasons for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards starting with a single table for two reasons: 1. It's the simplest way to begin. 2. After a quick glance at the given columns, I don't anticipate significant variations between different providers, and it seems like a good starting point for all providers already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't anticipate significant variations between different providers
PyPI started to support Google as an OIDC provider for Trusted Publishing, which only uses two fields: email address (required) and subject (optional).
in other words, this is significantly different from what GitHub/GitLab support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good to know! I was only considering the repo holder previously 😅. Then this indeed makes more sense to me, as it might vary for each provider!
I haven't investigated this yet, but if those columns won't be used for filtering and mainly serve as data, or if we only operate on them within the app, another possible solution could be to save those configurations as JSONB to accommodate different providers. Though the flow chart doesn't include this, I see an index clause (github_oidc_configs (repository_owner_id, repository_name)
) in the schema, which means we might need these as filtering columns rather than just relying on either id
or crate_id
, so this might not be suitable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which means we might need these as filtering columns
the way it works is that we receive a JWT with various "claims". the iss
claim (issuer) tells us where the JWT is coming from (e.g. GitHub Actions), which lets us select the correct table to search for corresponding Trusted Publishing configurations. the rest of the provider-specific claims are then used to find one or more matching configurations (multiple crates could be in the same repository). from these configurations we then have a list of crate IDs, that the newly generated temporary access token is valid for.
so yes, these columns are mainly used for filtering to find matching configurations for a JWT.
@@ -0,0 +1,64 @@ | |||
create table github_oidc_configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be names oidc_github_configs
instead so that all the related tables use a oidc_
prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this would only be used for a trusted publishing service, should we add trusted publishing (or its abbreviation/alias) to the table name? This is considering we might have other services that also have their own OIDC-related configurations. However, we could also rename it later when we actually need to, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've avoided trusted_publishing_
so far because it seems quite long, and for tp_
it isn't really obvious what it stands for, but I'm open to suggestions :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could add _publisher
after the provider (e.g., github_publisher
), similar to how PyPI has the _publishers
suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be names
oidc_github_configs
instead so that all the related tables use aoidc_
prefix?
I think that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've avoided
trusted_publishing_
so far because it seems quite long, and fortp_
it isn't really obvious what it stands for, but I'm open to suggestions :)
I just came up with trustpub_
. Short and somewhat self-describing with enough context. I think I prefer that to oidc_
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the names to trustpub_configs_github
, trustpub_tokens
and trustpub_used_jtis
comment on column oidc_tokens.crate_id is 'Unique identifier of the crate that can be published using this token'; | ||
comment on column oidc_tokens.hashed_token is 'SHA256 hash of the token that can be used to publish the crate'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a JWT can generally be used to publish multiple crates, if they all have a similar Trusted Publishing configuration for the same repository. during the exchange request the API server would create a dedicated oidc_tokens
row for each crate with the same hashed_token
.
would it be better to use a crate_ids
array instead of row-per-crate? I'm not sure if an array would be able to support the references crates
above.
on the other hand, the current design does not prevent hash collisions since a uniqueness constraint on only the hashed_token
column is not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I forget how arrays work when used as foreign keys. Seems simpler to leave it the way you have it here with multiple rows and no constraint on the hashed_token
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I forget how arrays work when used as foreign keys
according to the internet they are not implemented at all, but since the tokens are short-lived anyway it's probably not a huge deal to have IDs in the array that no longer correspond to an existing crate. since the crate IDs are unique and won't get reassigned we shouldn't be vulnerable to resurrection attacks or anything like that.
I've now adjusted the PR and the prototype to use the one-row-per-token approach with a crate_ids: int[]
column, which then gives us the extra safety regarding hash collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall, just a few minor comments/questions.
comment on column oidc_tokens.crate_id is 'Unique identifier of the crate that can be published using this token'; | ||
comment on column oidc_tokens.hashed_token is 'SHA256 hash of the token that can be used to publish the crate'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I forget how arrays work when used as foreign keys. Seems simpler to leave it the way you have it here with multiple rows and no constraint on the hashed_token
.
@@ -0,0 +1,64 @@ | |||
create table github_oidc_configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be names
oidc_github_configs
instead so that all the related tables use aoidc_
prefix?
I think that makes sense.
/// GitHub user or organization that owns the repository | ||
repository_owner -> Varchar, | ||
/// Unique identifier of the user or organization that owns the repository | ||
repository_owner_id -> Int4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we need to store both string and ID versions of this? Also, since this isn't a foreign key, I assume it is the GH ID and not any of our own internal IDs. If so, maybe we should tweak the comment to clarify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this isn't a foreign key, I assume it is the GH ID and not any of our own internal IDs. If so, maybe we should tweak the comment to clarify that.
good catch. I actually updated the comments in the migration script already to reflect that, but forgot to regenerate the schema file 😅
Is there a reason we need to store both string and ID versions of this?
storing the string is useful to be able to show it in the list of active trusted publishing configs for a crate. but also, if you rename a repository then the ID will stay the same, but the name will be different. in that case we probably want the publish to fail and require a reconfiguration?
7378247
to
19c4ba0
Compare
19c4ba0
to
fea3aa1
Compare
This PR creates the three necessary database tables to implement Trusted Publishing on crates.io:
trustpub_configs_github
table is used to store the Trusted Publishing configurations for GitHub Actions. Users will be able to use the web interface to create new configurations, or remove existing ones. Therepository_owner_id
column will be looked up from the GitHub API when the configuration is created to avoid account resurrection attacks.trustpub_tokens
table contains temporary access tokens that can be used to publish corresponding crates. They can be created by exchanging a GitHub Actions OIDC token for a temporary access token, if the OIDC token corresponds to an existing Trusted Publising configuration.trustpub_used_jtis
table is used to ensure that OIDC tokens can only once be exchanged for temporary access tokens to avoid potential replay attacks.Related: