Skip to content

✨ Add an internal package that implements a dynamic caching layer for ClusterExtension managed content #1001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

everettraven
Copy link
Contributor

@everettraven everettraven commented Jul 1, 2024

Description

  • Adds a new internal package internal/contentmanager that has an implementation of a dynamic caching layer for managing watches on content managed by a ClusterExtension

Fixes #974

His contributions were lost due to squashing of commits, but a huge shoutout to @OchiengEd for collaborating with me on this PR. @OchiengEd drove while we pair programmed and wrote most of the logic here (I started the branch with a skeleton which is why I am creating the PR)

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 1, 2024
Copy link

netlify bot commented Jul 1, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 0b03e11
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/668ed555e2f28500080899ca
😎 Deploy Preview https://deploy-preview-1001--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Jul 8, 2024

Codecov Report

Attention: Patch coverage is 68.53933% with 28 lines in your changes missing coverage. Please review.

Project coverage is 77.00%. Comparing base (10ebfdc) to head (0b03e11).
Report is 4 commits behind head on main.

Files Patch % Lines
internal/contentmanager/contentmanager.go 68.53% 20 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1001      +/-   ##
==========================================
- Coverage   78.05%   77.00%   -1.05%     
==========================================
  Files          18       19       +1     
  Lines        1253     1357     +104     
==========================================
+ Hits          978     1045      +67     
- Misses        193      222      +29     
- Partials       82       90       +8     
Flag Coverage Δ
e2e 56.44% <ø> (-0.21%) ⬇️
unit 53.86% <68.53%> (+0.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@everettraven everettraven force-pushed the feature/managed-content-cache branch from d202e6d to 09a8f23 Compare July 9, 2024 14:44
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 9, 2024
@everettraven everettraven force-pushed the feature/managed-content-cache branch from 09a8f23 to ed56a32 Compare July 9, 2024 14:49
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 9, 2024
@everettraven everettraven marked this pull request as ready for review July 9, 2024 14:49
@everettraven everettraven requested a review from a team as a code owner July 9, 2024 14:49
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 9, 2024
@everettraven everettraven changed the title ✨ WIP: Create a Go pkg with a managed content caching layer ✨ Add an internal package that implements a dynamic caching layer for ClusterExtension managed content Jul 9, 2024
oclabels "github.com/operator-framework/operator-controller/internal/labels"
)

type ContentManager interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically I wouldn't define the interface where it is implemented, but we wanted to have a good starting point without having to wire everything together (scoped for #975) and created this interface based on what we think would be needed by the ClusterExtensionReconciler.

This interface and code will likely get moved around a bit when we wire everything up in #975

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to note that it hasn't been wired up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically I wouldn't define the interface where it is implemented

I know this was on advice from Andy, but I haven't gotten used to this pattern at all and I think we may be taking it to an extreme. Looking at the Go standard library, locations of interfaces are fairly intuitive (e.g. io.Reader lives in the io package where there are lots of methods that accept or return it. json.Marshaler lives in the encoding/json package, where JSON marshaling/unmarshaling implementations exist.

To me, it is super-unituitive for our controllers package to be the place where the InstalledBundleGetter and Preflight interfaces are defined.

type InstalledBundleGetter interface {
GetInstalledBundle(ctx context.Context, ext *ocv1alpha1.ClusterExtension) (*ocv1alpha1.BundleMetadata, error)
}
// Preflight is a check that should be run before making any changes to the cluster
type Preflight interface {
// Install runs checks that should be successful prior
// to installing the Helm release. It is provided
// a Helm release and returns an error if the
// check is unsuccessful
Install(context.Context, *release.Release) error
// Upgrade runs checks that should be successful prior
// to upgrading the Helm release. It is provided
// a Helm release and returns an error if the
// check is unsuccessful
Upgrade(context.Context, *release.Release) error
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more so that the interfaces are defined where they are used. That being said, this isn't a hill I'm willing to die on, but it is a pattern I've come to appreciate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a problem with it. It's not C/C++ where you want to hide the implementation behind a header file. The way golang is compiled, it really doesn't matter.

Comment on lines +171 to +173
// TODO: Instead of stopping the existing cache and replacing it every time
// we should stop the informers that are no longer required
// and create any new ones as necessary. To keep the initial pass
// simple, we are going to keep this as is and optimize in a follow-up.
// Doing this in a follow-up gives us the opportunity to verify that this functions
// as expected when wired up in the ClusterExtension reconciler before going too deep
// in optimizations.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be done in a follow-up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we hooked this up as is, what would end up triggering a cache teardown and re-creation? Would it be just when we call actionclient.Upgrade? Or would we do it even after actionclient.Reconcile?

I think if it is just for an upgrade, we can probably get by because that would only happen when the resolved bundle changes or (eventually when we support it) the user-supplied configuration changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooked up as is, I would assume we only call this whenever the identified state is one of Install or Upgrade

Copy link
Contributor Author

@everettraven everettraven Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, by identified state I mean the same state that is used in a few other places in the reconcile function of the ClusterExtensionReconciler, so I imagine something like what we have for preflights here:

switch state {
case stateNeedsInstall:
err := preflight.Install(ctx, desiredRel)
if err != nil {
setInstalledStatusConditionFailed(ext, fmt.Sprintf("%s:%v", ocv1alpha1.ReasonInstallationFailed, err))
return ctrl.Result{}, err
}
case stateNeedsUpgrade:
err := preflight.Upgrade(ctx, desiredRel)
if err != nil {
setInstalledStatusConditionFailed(ext, fmt.Sprintf("%s:%v", ocv1alpha1.ReasonInstallationFailed, err))
return ctrl.Result{}, err
}
}

Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments.

}

ctx, cancel := context.WithCancel(ctx)
go c.Start(ctx) //nolint:errcheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error check? What happens when Start() returns an error and we don't check it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was scratching my head on how to resolve this problem and couldn't come up with anything because the Start() method is blocking. The only thing I could think of is maybe something like:

errChan := make(chan error)
go func(){
  err := c.Start(ctx)
  if err != nil {
    errChan <- err
  }
}()
// handle errors down here

But I'm not sure what would happen if the function that created the channel had already exited.

I decided to go the route of not doing an error check here, but waiting for the cache to sync as the next step. If the cache could not sync we cancel the context and return an error. Otherwise the cache has started and we should be good to go.

I'm open to suggestions for better ways to handle this, so if you have some ideas here I'm all ears!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll end up blocking on the channel, because you don't know exactly when the error will come in, if at all. You should have the error checking, but it should call something.Cancel() on error. This may mean you need to put the Start() below the addition to i.extensionCaches[] below. The Cancel() function will remove the item from the caches. This way, Cancel() is called regardless.

Copy link
Contributor

@tmshort tmshort Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
go c.Start(ctx) //nolint:errcheck
i.extensionCaches[ce.Name] := extensionCacheData{
Cache: c,
Cancel: cancel,
}
go func(){
err := c.Start(ctx)
if err != nil {
cancel()
}
}()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may need a synchronous chat to better understand how this should be handled. I haven't had a scenario like this before and my brain is failing me on understanding the approach you are suggesting @tmshort

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be addressed in the latest push. Thanks for the code example @tmshort !

@everettraven everettraven force-pushed the feature/managed-content-cache branch from a6df505 to ef5558b Compare July 10, 2024 14:29
@everettraven everettraven force-pushed the feature/managed-content-cache branch from 10516e7 to 0b03e11 Compare July 10, 2024 18:39
Copy link
Contributor

@tmshort tmshort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 11, 2024
@joelanford joelanford added this pull request to the merge queue Jul 11, 2024
Merged via the queue into operator-framework:main with commit 30c61b5 Jul 11, 2024
15 of 17 checks passed
perdasilva pushed a commit to LalatenduMohanty/operator-controller that referenced this pull request Aug 13, 2024
@skattoju skattoju mentioned this pull request Sep 25, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a dynamic caching layer that establishes watches utilizing a ServiceAccount
4 participants