Skip to content

Ansible Operator playbook / role getting executed twice after initial operator installation, without any modification in resources. #6250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nataraj24 opened this issue Jan 17, 2023 · 11 comments
Assignees
Labels
language/ansible Issue is related to an Ansible operator project triage/support Indicates an issue that is a support question.

Comments

@nataraj24
Copy link

nataraj24 commented Jan 17, 2023

Type of question

General operator-related help

Question

What did you do?

After installing sample memcached operator as explained in operator-sdk documentation. the sample playbook/role is executed twice even after successful completion of the playbook. Even for first time even if there is no change in any of resources.

What did you expect to see?

Playbook has to be executed only once.

What did you see instead? Under which circumstances?

Below are status of play book execution after first time installation. Full log is attached. memcached.log
kubectl logs deploy/memcached-operator-controller-manager -n memcached-operator-system -c manager -f | grep rescued
localhost : ok=3 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
localhost : ok=3 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

Environment

Operator type:

/language ansible

Kubernetes cluster type:

Native K8s cluster

$ operator-sdk version

operator-sdk version: "v1.25.0", commit: "3d4eb4b2de4b68519c8828f2289c2014979ccf2a", kubernetes version: "1.25.0", go version: "go1.19.2", GOOS: "linux", GOARCH: "amd64"

$ go version (if language is Go)

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:41:49Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}

Additional context

@everettraven
Copy link
Contributor

@nataraj24 Thanks for raising this issue! I did some brief digging on this and have a question - Could you share the link to the docs that you followed?

Aside from that, looking at the logs from the first run:

--------------------------- Ansible Task StdOut -------------------------------

TASK [start memcached] *********************************************************
task path: /opt/ansible/roles/memcached/tasks/main.yml:3

-------------------------------------------------------------------------------
{"level":"info","ts":1673948838.1551049,"logger":"proxy","msg":"Cache miss: apps/v1, Kind=Deployment, default/memcached-sample-memcached"}
{"level":"info","ts":1673948838.161831,"logger":"proxy","msg":"Injecting owner reference"}
{"level":"info","ts":1673948838.162155,"logger":"proxy","msg":"Watching child resource","kind":"apps/v1, Kind=Deployment","enqueue_kind":"cache.example.com/v1alpha1, Kind=Memcached"}
{"level":"info","ts":1673948838.1621983,"msg":"Starting EventSource","controller":"memcached-controller","source":"kind source: *unstructured.Unstructured"}

--------------------------- Ansible Task StdOut -------------------------------

TASK [memcached : Create Secret] ***********************************************
task path: /opt/ansible/roles/memcached/tasks/main.yml:33

-------------------------------------------------------------------------------
{"level":"info","ts":1673948838.4395416,"logger":"logging_event_handler","msg":"[playbook task start]","name":"memcached-sample","namespace":"default","gvk":"cache.example.com/v1alpha1, Kind=Memcached","event_type":"playbook_on_task_start","job":"7635955712780123854","EventData.Name":"memcached : Create Secret"}
{"level":"info","ts":1673948840.2461214,"logger":"proxy","msg":"Skipping, because gvk is blacklisted","GVK":"/v1, Kind=Secret"}
{"level":"info","ts":1673948840.2505968,"logger":"proxy","msg":"Injecting owner reference"}
{"level":"info","ts":1673948840.2508073,"logger":"proxy","msg":"Resource will not be watched/cached.","GVK":"/v1, Kind=Secret"}
{"level":"info","ts":1673948840.4567962,"logger":"logging_event_handler","msg":"[playbook task start]","name":"memcached-sample","namespace":"default","gvk":"cache.example.com/v1alpha1, Kind=Memcached","event_type":"playbook_on_task_start","job":"7635955712780123854","EventData.Name":"memcached : Create configmap"}

--------------------------- Ansible Task StdOut -------------------------------

TASK [memcached : Create configmap] ********************************************
task path: /opt/ansible/roles/memcached/tasks/main.yml:45

-------------------------------------------------------------------------------
{"level":"info","ts":1673948842.2737124,"logger":"proxy","msg":"Cache miss: /v1, Kind=ConfigMap, default/memcache-configmap"}
{"level":"info","ts":1673948842.2800245,"logger":"proxy","msg":"Injecting owner reference"}
{"level":"info","ts":1673948842.280217,"logger":"proxy","msg":"Watching child resource","kind":"/v1, Kind=ConfigMap","enqueue_kind":"cache.example.com/v1alpha1, Kind=Memcached"}
{"level":"info","ts":1673948842.2802489,"msg":"Starting EventSource","controller":"memcached-controller","source":"kind source: *unstructured.Unstructured"}
{"level":"info","ts":1673948842.9402254,"logger":"runner","msg":"Ansible-runner exited successfully","job":"7635955712780123854","name":"memcached-sample","namespace":"default"}

----- Ansible Task Status Event StdOut (cache.example.com/v1alpha1, Kind=Memcached, memcached-sample/default) -----


PLAY RECAP *********************************************************************
localhost                  : ok=3    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   


----------

It looks like it creates the child resources and establishes watches for these child resources.

Looking at the second run logs:

--------------------------- Ansible Task StdOut -------------------------------

TASK [start memcached] *********************************************************
task path: /opt/ansible/roles/memcached/tasks/main.yml:3

-------------------------------------------------------------------------------
{"level":"info","ts":1673948851.1720097,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/apis/apps/v1/namespaces/default/deployments/memcached-sample-memcached","Verb":"get","APIPrefix":"apis","APIGroup":"apps","APIVersion":"v1","Namespace":"default","Resource":"deployments","Subresource":"","Name":"memcached-sample-memcached","Parts":["deployments","memcached-sample-memcached"]}}
{"level":"info","ts":1673948851.4520702,"logger":"logging_event_handler","msg":"[playbook task start]","name":"memcached-sample","namespace":"default","gvk":"cache.example.com/v1alpha1, Kind=Memcached","event_type":"playbook_on_task_start","job":"1728165814478689862","EventData.Name":"memcached : Create Secret"}

--------------------------- Ansible Task StdOut -------------------------------

TASK [memcached : Create Secret] ***********************************************
task path: /opt/ansible/roles/memcached/tasks/main.yml:33

-------------------------------------------------------------------------------
{"level":"info","ts":1673948853.2526982,"logger":"proxy","msg":"Skipping, because gvk is blacklisted","GVK":"/v1, Kind=Secret"}

--------------------------- Ansible Task StdOut -------------------------------

TASK [memcached : Create configmap] ********************************************
task path: /opt/ansible/roles/memcached/tasks/main.yml:45

-------------------------------------------------------------------------------
{"level":"info","ts":1673948853.4712036,"logger":"logging_event_handler","msg":"[playbook task start]","name":"memcached-sample","namespace":"default","gvk":"cache.example.com/v1alpha1, Kind=Memcached","event_type":"playbook_on_task_start","job":"1728165814478689862","EventData.Name":"memcached : Create configmap"}
{"level":"info","ts":1673948855.2703552,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/api/v1/namespaces/default/configmaps/memcache-configmap","Verb":"get","APIPrefix":"api","APIGroup":"","APIVersion":"v1","Namespace":"default","Resource":"configmaps","Subresource":"","Name":"memcache-configmap","Parts":["configmaps","memcache-configmap"]}}
{"level":"info","ts":1673948855.9475477,"logger":"runner","msg":"Ansible-runner exited successfully","job":"1728165814478689862","name":"memcached-sample","namespace":"default"}

----- Ansible Task Status Event StdOut (cache.example.com/v1alpha1, Kind=Memcached, memcached-sample/default) -----


PLAY RECAP *********************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   


----------

It looks like it notices that the child resources are already created and just continues along. This makes me think that the reconciliation loop was triggered again by some change to one of the children resources (potentially even a status update) and as such the Ansible playbook was run again. This is from my understanding completely normal behavior due to the way that the watches are established.

@nataraj24
Copy link
Author

Thank you @everettraven for the revert. Following is the link which i had followed for memcached ansible operator https://sdk.operatorframework.io/docs/building-operators/ansible/quickstart/

if you notice timestamp between difference first run playbook completion and second playbook initiation its in milliseconds. so there is no change in the objects / resources.

First Playbook Run Completion: Tue Jan 17 15:17:22 IST 2023
Second Playbook Run Initiation: Tue Jan 17 15:17:28 IST 2023

Below are the full log statements.

{"level":"info","ts":1673948842.9402254,"logger":"runner","msg":"Ansible-runner exited successfully","job":"7635955712780123854","name":"memcached-sample","namespace":"default"}

----- Ansible Task Status Event StdOut (cache.example.com/v1alpha1, Kind=Memcached, memcached-sample/default) -----

PLAY RECAP *********************************************************************
localhost : ok=3 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0


{"level":"info","ts":1673948848.1544585,"logger":"logging_event_handler","msg":"[playbook task start]","name":"memcached-sample","namespace":"default","gvk":"cache.example.com/v1alpha1, Kind=Memcached","event_type":"playbook_on_task_start","job":"1728165814478689862","EventData.Name":"start memcached"}

@jberkhahn
Copy link
Contributor

So, I would say this is normal behavior - especially when something is first installed, it's common for things to happen in different orders, so things like cache misses can happen that cause several reconciliations to fire in a row. I wouldn't really be worried about this unless it was stuck in a loop, or causing performance issues.

Is there a specific concern you have about this behavior?

@jberkhahn jberkhahn self-assigned this Jan 23, 2023
@jberkhahn jberkhahn added this to the Backlog milestone Jan 23, 2023
@jberkhahn jberkhahn added the triage/support Indicates an issue that is a support question. label Jan 23, 2023
@nataraj24
Copy link
Author

thank you @jberkhahn for revert. The behavior is not consistent as in our application ansible operator its running thrice. We are using blacklisting functionality of the operator for secrets. If an secret is rotated after installation because of this multiple runs the changes are getting over written. Logically also once the playbook is completed it shouldn't be executing it again.

@nataraj24
Copy link
Author

any resolution to overcome the above problem would be helpful

@everettraven
Copy link
Contributor

We are using blacklisting functionality of the operator for secrets. If an secret is rotated after installation because of this multiple runs the changes are getting over written.

So to make sure I am following, the problem here is that Secret changes are being reverted despite it being blacklisted?

Logically also once the playbook is completed it shouldn't be executing it again.

As far as I am aware this isn't true in terms of how the ansible-operator works. It works by executing the playbook once for every reconciliation that is performed. The fact that this playbook is running more than a single time is not really an issue as like @jberkhahn mentioned, there are plenty of reasons that multiple reconciliations may occur after creating child resources, creating watches, etc.

@nataraj24
Copy link
Author

Hi Bryce Palmer,

For query #1 - That's correct the secrets are getting recreated even though they are blacklisted.

For query #2 - For first initial operator deployment without any changes the playbook is getting invoked twice instead of single time.

watches.txt

@nataraj24
Copy link
Author

Hi Team,

Request your help on the above issues

@nataraj24
Copy link
Author

Hi Team

Kindly provide help on the issue

@jberkhahn jberkhahn removed their assignment Feb 18, 2023
@jberkhahn jberkhahn removed this from the Backlog milestone Feb 18, 2023
@everettraven
Copy link
Contributor

/assign

@everettraven
Copy link
Contributor

@nataraj24 We spoke about this issue in the ansible operator community meeting and determined that there isn't an issue here.

For query 1 - That's correct the secrets are getting recreated even though they are blacklisted.

During the community meeting we discussed that just because a resource is blacklisted doesn't mean the resource can't be recreated during a normal reconciliation process, just that any changes to that resource type won't cause a reconciliation loop. This means your operator is likely recreating the Secret during a reconciliation process triggered by some other change, therefore overwriting any changes you've previously made to that Secret resource. To fix this, you would need to implement some kind of logic that checks to see if the Secret already exists and if it does then you don't overwrite it.

For query 2 - For first initial operator deployment without any changes the playbook is getting invoked twice instead of single time.

Again, I don't think this is an issue. The playbook is run any time reconciliation is triggered. There are numerous reasons why a reconciliation may be triggered resulting in multiple runs of the playbook.


For the reasons mentioned above and that multiple people have mentioned that this is a non-issue in multiple community calls I am going to close this issue. If you encounter any other issues please feel free to open a new issue or reopen this issue if you feel it was closed in error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language/ansible Issue is related to an Ansible operator project triage/support Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests

3 participants