Skip to content

✨ Add support for a chunked release storage driver #350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

joelanford
Copy link
Member

This commit adds a custom helm release storage driver that overcomes limitations in the size of a single value that can be stored in etcd.

In order to remain backward-compatible and also make this storage driver available, this commit also refactors the ActionConfigGetter options so that a custom function can be provided to the ActionConfigGetter that can create the desired storage driver.

This commit also separates the rest config mapping into two separate options. One for interactions with the storage backend, and the other for managing release content.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 16, 2024
@joelanford joelanford force-pushed the chunked-release-secret-driver branch from eeef665 to 3fdf9f9 Compare July 16, 2024 19:37
@itroyano
Copy link

Hi @joelanford some questions:

  1. Will this mean Helm CLI won't be able to see/parse the "chunked" release secret we made with Create()?
  2. Some funcs have deprecation warnings. possible to implement the newer ones?
  3. Does using base32 in the "chunk" reduce the size of the resulting secret?
  4. I don't see an explicit call to get release. Will helm.sh Client be calling it when we use it in the code?

@joelanford
Copy link
Member Author

@itroyano

Will this mean Helm CLI won't be able to see/parse the "chunked" release secret we made with Create()?

Yes, and that is desirable because:

  1. Our use of helm libraries is an implementation detail that helm CLI users should not know about.
  2. The Helm CLI would barf on them anyway

Some funcs have deprecation warnings. possible to implement the newer ones?

Yes, the newer ones are already implemented in this PR, and the deprecation comments on the exported functions tell users what to switch to. Technically, because helm-operator-plugins is pre-1.0, we could remove instead of deprecate, but I figure we can let them stay for at least a few more releases.

Does using base32 in the "chunk" reduce the size of the resulting secret?

Base32 is just to encode a hash for the chunk with a string that is short enough and also valid for a metadata.name and label value. It is not used to store the chunk data itself. The chunk data itself is written directly as gzipped binary byte slices.

I don't see an explicit call to get release. Will helm.sh Client be calling it when we use it in the code?

The storage driver is setup in the helm library's action.Configuration.Releases. Other bits of the helm library use the action.Configuration struct to interact with the cluster.

@joelanford joelanford force-pushed the chunked-release-secret-driver branch from 3fdf9f9 to 637a156 Compare July 17, 2024 17:05
@joelanford joelanford force-pushed the chunked-release-secret-driver branch from 637a156 to fb76722 Compare July 17, 2024 17:11
@codecov-commenter
Copy link

codecov-commenter commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 69 lines in your changes missing coverage. Please review.

Project coverage is 51.85%. Comparing base (08ab7fb) to head (14906f8).
Report is 27 commits behind head on main.

Files Patch % Lines
pkg/storage/chunked.go 80.08% 29 Missing and 18 partials ⚠️
pkg/client/actionconfig.go 67.92% 12 Missing and 5 partials ⚠️
pkg/client/ownerrefclient.go 80.76% 3 Missing and 2 partials ⚠️

❗ There is a different number of reports uploaded between BASE (08ab7fb) and HEAD (14906f8). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (08ab7fb) HEAD (14906f8)
2 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #350       +/-   ##
===========================================
- Coverage   85.06%   51.85%   -33.22%     
===========================================
  Files          19       65       +46     
  Lines        1346     3238     +1892     
===========================================
+ Hits         1145     1679      +534     
- Misses        125     1456     +1331     
- Partials       76      103       +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joelanford joelanford marked this pull request as ready for review July 17, 2024 18:02
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 17, 2024
@joelanford joelanford force-pushed the chunked-release-secret-driver branch from fb76722 to 6c3032e Compare July 17, 2024 18:07
@everettraven
Copy link
Contributor

everettraven commented Jul 17, 2024

This is cool to see a custom helm storage driver!

I do have a more fundamental question - Is chunking/partitioning across multiple secrets a scalable approach? I don't necessarily think it needs to be optimized immediately, but considering this is likely going to be used by OLMv1, do we know:

  • The load on the API server for a typical install via the ClusterExtension API (maybe a better ? - how many secrets are created on average when installing a package from the community catalog with this change?)
  • The load on the API server when getting the release information from the chunked secrets using a typical install via ClusterExtension (IIRC we fetch the release information to determine state on every reconcile)?
  • From OLMv0, the average number of operators installed on a cluster?

While I do like the concept, I'm not confident that a "chunked" secrets driver is going to be the best approach here just out of scalability and API server load concerns. Additionally, this would require OLMv1 as a user of this API to have access to secrets (which isn't any worse than it is now), is this something we are OK with?

Could a potential alternative to utilizing secrets be some other in-memory or file-based storage mechanism? Maybe something like https://github.com/dgraph-io/badger or BoltDB are some good embeddable key-value store options?

EDIT: For clarification, if data shows that the concerns I expressed in this comment are not realistically a concern then I am happy to look at this PR again with a new frame on the changes

@joelanford
Copy link
Member Author

I do have a more fundamental question - Is chunking/partitioning across multiple secrets a scalable approach?

I think it can be. We could impose a maximum number of chunks to prevent unchecked release size. We could also use a controller-runtime cache to read secrets instead of the default live client.

I don't necessarily think it needs to be optimized immediately, but considering this is likely going to be used by OLMv1, do we know:

  • The load on the API server for a typical install via the ClusterExtension API (maybe a better ? - how many secrets are created on average when installing a package from the community catalog with this change?)

Anecdotally, I tried reproducing operator-framework/operator-controller#923, and it only created one secret because this storage driver doesn't do an extra/unnecessary base64-encode (that the helm secrets driver does). I'm speculating, but I would guess this driver would create 1 secret on average from the community catalog, so no more than are already created.

  • The load on the API server when getting the release information from the chunked secrets using a typical install via ClusterExtension (IIRC we fetch the release information to determine state on every reconcile)?

It would only be increased for the releases that require chunking. One thing we could do relatively easily in the future is implement a SecretInterface that actually reads from the controller-runtime cache.

  • From OLMv0, the average number of operators installed on a cluster?

I think I've seen metrics of ~10 on average.

While I do like the concept, I'm not confident that a "chunked" secrets driver is going to be the best approach here just out of scalability and API server load concerns. Additionally, this would require OLMv1 as a user of this API to have access to secrets (which isn't any worse than it is now), is this something we are OK with?

This is actually part of the intent of this PR. OLM should be the thing that has access to secrets. We already have the ability to store all of these secrets in OLMv1's system namespace. This PR will also make it possible to use to use a different service account to manage the release secrets.

We would give OLMv1's service account read/write access to secrets just in its namespace AND we would not require users to provide service accounts with r/w permissions, which is good because that would leak an implementation detail that users should not be concerned by.

Could a potential alternative to utilizing secrets be some other in-memory or file-based storage mechanism? Maybe something like dgraph-io/badger or BoltDB are some good embeddable key-value store options?

This release data needs to persist beyond the lifetime of an individual operator-controller pod. As we've discovered in other areas, the only reliably available persistence mechanism is Kubernetes API + etcd.

@joelanford joelanford force-pushed the chunked-release-secret-driver branch 4 times, most recently from 41f17ea to 07de0c3 Compare July 19, 2024 14:39
This commit adds a custom helm release storage driver that overcomes
limitations in the size of a single value that can be stored in etcd.

In order to remain backward-compatible and also make this storage
driver available, this commit also refactors the ActionConfigGetter
options so that a custom function can be provided to the
ActionConfigGetter that can create the desired storage driver.

This commit also separates the rest config mapping into two separate
options. One for interactions with the storage backend, and the other
for managing release content.

Signed-off-by: Joe Lanford <[email protected]>
@joelanford joelanford force-pushed the chunked-release-secret-driver branch from 07de0c3 to 14906f8 Compare July 19, 2024 20:30
labels["status"] = rls.Info.Status.String()
labels["version"] = strconv.Itoa(rls.Version)
labels["key"] = key
labels["type"] = "index"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a constant?

labels := map[string]string{}
labels["owner"] = owner
labels["key"] = key
labels["type"] = "chunk"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a constant?

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor comments, but non-blocking IMO

@everettraven everettraven added this pull request to the merge queue Jul 22, 2024
Merged via the queue into operator-framework:main with commit 2e18c5b Jul 22, 2024
6 checks passed
@joelanford joelanford deleted the chunked-release-secret-driver branch July 24, 2024 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants