✨ Add support for a chunked release storage driver #350

joelanford · 2024-07-16T17:39:43Z

This commit adds a custom helm release storage driver that overcomes limitations in the size of a single value that can be stored in etcd.

In order to remain backward-compatible and also make this storage driver available, this commit also refactors the ActionConfigGetter options so that a custom function can be provided to the ActionConfigGetter that can create the desired storage driver.

This commit also separates the rest config mapping into two separate options. One for interactions with the storage backend, and the other for managing release content.

itroyano · 2024-07-17T13:48:21Z

Hi @joelanford some questions:

Will this mean Helm CLI won't be able to see/parse the "chunked" release secret we made with Create()?
Some funcs have deprecation warnings. possible to implement the newer ones?
Does using base32 in the "chunk" reduce the size of the resulting secret?
I don't see an explicit call to get release. Will helm.sh Client be calling it when we use it in the code?

joelanford · 2024-07-17T14:06:04Z

@itroyano

Will this mean Helm CLI won't be able to see/parse the "chunked" release secret we made with Create()?

Yes, and that is desirable because:

Our use of helm libraries is an implementation detail that helm CLI users should not know about.
The Helm CLI would barf on them anyway

Some funcs have deprecation warnings. possible to implement the newer ones?

Yes, the newer ones are already implemented in this PR, and the deprecation comments on the exported functions tell users what to switch to. Technically, because helm-operator-plugins is pre-1.0, we could remove instead of deprecate, but I figure we can let them stay for at least a few more releases.

Does using base32 in the "chunk" reduce the size of the resulting secret?

Base32 is just to encode a hash for the chunk with a string that is short enough and also valid for a metadata.name and label value. It is not used to store the chunk data itself. The chunk data itself is written directly as gzipped binary byte slices.

I don't see an explicit call to get release. Will helm.sh Client be calling it when we use it in the code?

The storage driver is setup in the helm library's action.Configuration.Releases. Other bits of the helm library use the action.Configuration struct to interact with the cluster.

codecov-commenter · 2024-07-17T17:19:57Z

Codecov Report

Attention: Patch coverage is 80.00000% with 69 lines in your changes missing coverage. Please review.

Project coverage is 51.85%. Comparing base (08ab7fb) to head (14906f8).
Report is 27 commits behind head on main.

Files	Patch %	Lines
pkg/storage/chunked.go	80.08%	29 Missing and 18 partials ⚠️
pkg/client/actionconfig.go	67.92%	12 Missing and 5 partials ⚠️
pkg/client/ownerrefclient.go	80.76%	3 Missing and 2 partials ⚠️

❗ There is a different number of reports uploaded between BASE (08ab7fb) and HEAD (14906f8). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (08ab7fb) HEAD (14906f8)

2 1

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #350       +/-   ##
===========================================
- Coverage   85.06%   51.85%   -33.22%     
===========================================
  Files          19       65       +46     
  Lines        1346     3238     +1892     
===========================================
+ Hits         1145     1679      +534     
- Misses        125     1456     +1331     
- Partials       76      103       +27

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

everettraven · 2024-07-17T19:16:26Z

This is cool to see a custom helm storage driver!

I do have a more fundamental question - Is chunking/partitioning across multiple secrets a scalable approach? I don't necessarily think it needs to be optimized immediately, but considering this is likely going to be used by OLMv1, do we know:

The load on the API server for a typical install via the ClusterExtension API (maybe a better ? - how many secrets are created on average when installing a package from the community catalog with this change?)
The load on the API server when getting the release information from the chunked secrets using a typical install via ClusterExtension (IIRC we fetch the release information to determine state on every reconcile)?
From OLMv0, the average number of operators installed on a cluster?

While I do like the concept, I'm not confident that a "chunked" secrets driver is going to be the best approach here just out of scalability and API server load concerns. Additionally, this would require OLMv1 as a user of this API to have access to secrets (which isn't any worse than it is now), is this something we are OK with?

Could a potential alternative to utilizing secrets be some other in-memory or file-based storage mechanism? Maybe something like https://github.com/dgraph-io/badger or BoltDB are some good embeddable key-value store options?

EDIT: For clarification, if data shows that the concerns I expressed in this comment are not realistically a concern then I am happy to look at this PR again with a new frame on the changes

joelanford · 2024-07-17T19:32:38Z

I do have a more fundamental question - Is chunking/partitioning across multiple secrets a scalable approach?

I think it can be. We could impose a maximum number of chunks to prevent unchecked release size. We could also use a controller-runtime cache to read secrets instead of the default live client.

I don't necessarily think it needs to be optimized immediately, but considering this is likely going to be used by OLMv1, do we know:

The load on the API server for a typical install via the ClusterExtension API (maybe a better ? - how many secrets are created on average when installing a package from the community catalog with this change?)

Anecdotally, I tried reproducing operator-framework/operator-controller#923, and it only created one secret because this storage driver doesn't do an extra/unnecessary base64-encode (that the helm secrets driver does). I'm speculating, but I would guess this driver would create 1 secret on average from the community catalog, so no more than are already created.

The load on the API server when getting the release information from the chunked secrets using a typical install via ClusterExtension (IIRC we fetch the release information to determine state on every reconcile)?

It would only be increased for the releases that require chunking. One thing we could do relatively easily in the future is implement a SecretInterface that actually reads from the controller-runtime cache.

From OLMv0, the average number of operators installed on a cluster?

I think I've seen metrics of ~10 on average.

While I do like the concept, I'm not confident that a "chunked" secrets driver is going to be the best approach here just out of scalability and API server load concerns. Additionally, this would require OLMv1 as a user of this API to have access to secrets (which isn't any worse than it is now), is this something we are OK with?

This is actually part of the intent of this PR. OLM should be the thing that has access to secrets. We already have the ability to store all of these secrets in OLMv1's system namespace. This PR will also make it possible to use to use a different service account to manage the release secrets.

We would give OLMv1's service account read/write access to secrets just in its namespace AND we would not require users to provide service accounts with r/w permissions, which is good because that would leak an implementation detail that users should not be concerned by.

Could a potential alternative to utilizing secrets be some other in-memory or file-based storage mechanism? Maybe something like dgraph-io/badger or BoltDB are some good embeddable key-value store options?

This release data needs to persist beyond the lifetime of an individual operator-controller pod. As we've discovered in other areas, the only reliably available persistence mechanism is Kubernetes API + etcd.

This commit adds a custom helm release storage driver that overcomes limitations in the size of a single value that can be stored in etcd. In order to remain backward-compatible and also make this storage driver available, this commit also refactors the ActionConfigGetter options so that a custom function can be provided to the ActionConfigGetter that can create the desired storage driver. This commit also separates the rest config mapping into two separate options. One for interactions with the storage backend, and the other for managing release content. Signed-off-by: Joe Lanford <[email protected]>

everettraven · 2024-07-22T17:34:54Z

pkg/storage/labels.go

+	labels["status"] = rls.Info.Status.String()
+	labels["version"] = strconv.Itoa(rls.Version)
+	labels["key"] = key
+	labels["type"] = "index"


Should this be a constant?

everettraven · 2024-07-22T17:35:13Z

pkg/storage/labels.go

+	labels := map[string]string{}
+	labels["owner"] = owner
+	labels["key"] = key
+	labels["type"] = "chunk"


Should this be a constant?

everettraven

Couple minor comments, but non-blocking IMO

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 16, 2024

github-actions bot added the area/sdk label Jul 16, 2024

joelanford force-pushed the chunked-release-secret-driver branch from eeef665 to 3fdf9f9 Compare July 16, 2024 19:37

joelanford mentioned this pull request Jul 16, 2024

✨ Wire up Service Account operator-framework/operator-controller#1038

Merged

joelanford force-pushed the chunked-release-secret-driver branch from 3fdf9f9 to 637a156 Compare July 17, 2024 17:05

github-actions bot added the area/testing label Jul 17, 2024

joelanford force-pushed the chunked-release-secret-driver branch from 637a156 to fb76722 Compare July 17, 2024 17:11

joelanford marked this pull request as ready for review July 17, 2024 18:02

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 17, 2024

openshift-ci bot requested review from everettraven and varshaprasad96 July 17, 2024 18:03

joelanford force-pushed the chunked-release-secret-driver branch from fb76722 to 6c3032e Compare July 17, 2024 18:07

joelanford force-pushed the chunked-release-secret-driver branch 4 times, most recently from 41f17ea to 07de0c3 Compare July 19, 2024 14:39

joelanford force-pushed the chunked-release-secret-driver branch from 07de0c3 to 14906f8 Compare July 19, 2024 20:30

everettraven reviewed Jul 22, 2024

View reviewed changes

everettraven approved these changes Jul 22, 2024

View reviewed changes

everettraven added this pull request to the merge queue Jul 22, 2024

Merged via the queue into operator-framework:main with commit 2e18c5b Jul 22, 2024
6 checks passed

joelanford deleted the chunked-release-secret-driver branch July 24, 2024 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨ Add support for a chunked release storage driver #350

✨ Add support for a chunked release storage driver #350

Uh oh!

joelanford commented Jul 16, 2024

Uh oh!

itroyano commented Jul 17, 2024

Uh oh!

joelanford commented Jul 17, 2024

Uh oh!

codecov-commenter commented Jul 17, 2024 •

edited

Loading

Uh oh!

everettraven commented Jul 17, 2024 •

edited

Loading

Uh oh!

joelanford commented Jul 17, 2024

Uh oh!

everettraven Jul 22, 2024

Uh oh!

everettraven Jul 22, 2024

Uh oh!

everettraven left a comment

Uh oh!

Uh oh!

Uh oh!

✨ Add support for a chunked release storage driver #350

✨ Add support for a chunked release storage driver #350

Uh oh!

Conversation

joelanford commented Jul 16, 2024

Uh oh!

itroyano commented Jul 17, 2024

Uh oh!

joelanford commented Jul 17, 2024

Uh oh!

codecov-commenter commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

everettraven commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joelanford commented Jul 17, 2024

Uh oh!

everettraven Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

everettraven Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

everettraven left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jul 17, 2024 •

edited

Loading

everettraven commented Jul 17, 2024 •

edited

Loading