Skip to content

Release lock when ansible operator pod is deleted #2242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
flickerfly opened this issue Nov 19, 2019 · 2 comments
Closed

Release lock when ansible operator pod is deleted #2242

flickerfly opened this issue Nov 19, 2019 · 2 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. language/ansible Issue is related to an Ansible operator project

Comments

@flickerfly
Copy link
Contributor

Feature Request

Is your feature request related to a problem? Please describe.
When I delete a pod from my ansible operator, the new pod comes up and finds a lock in place by the old pod. It seems to wait for an expiration timer or perhaps detects the other is gone before it then picks up and moves on with its tasks. This causes a small delay in processing.

Describe the solution you'd like
When a pod is deleted, it would be nice if part of it going down included releasing the lock. This would reduce the time where the responsibility of the operator is not performed during the transition. It could simply look for the lock and find none so it can grab it and go.

Here are the logs that show me how this is happening. At the time this is logged, pod my-operator-6bb9c45f77-wd2sb is already in a Terminated state.

{"level":"info","ts":1574199519.2566407,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1574199520.6176782,"logger":"leader","msg":"Found existing lock","LockOwner":"my-operator-6bb9c45f77-wd2sb"}
{"level":"info","ts":1574199520.621468,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1574199521.746846,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1574199524.127411,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1574199528.6667314,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1574199537.5454018,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1574199554.9148295,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1574199573.1443484,"logger":"leader","msg":"Became the leader."}
@joelanford joelanford added language/ansible Issue is related to an Ansible operator project kind/feature Categorizes issue or PR as related to a new feature. labels Nov 20, 2019
@joelanford
Copy link
Member

@flickerfly The leader for life approach to leader election used by the ansible operator makes use of owner references and kubernetes garbage collection to guarantee that the lock is deleted only after the leader pod has been deleted. This approach prevents any possibility of split brain, where 2 pods are able to be leaders simultaneously.

It seems like there are two things at play that determine how long it takes for another pod to be elected leader:

  1. The time between issuing the delete pod command and the pod actually being deleted.
  2. The time that the non-leader pod is sleeping between checks to see if it can become leader. The max wait time is 16-19.2 seconds, based on a the use of a random jitter.

Based on your description, it sounds like 1) is taking awhile. Are you able to tell what is causing the deleted pod to stay in the terminated state?

@fabianvf Does this ring any bells?

@flickerfly
Copy link
Contributor Author

Thanks for that breakdown and teaching me a few things. (Is this documented somewhere I missed? Maybe I can write that up.) I'll look into what #1 looks like. Perhaps my expectations are just too high and the delay is appropriate to avoid a split brain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. language/ansible Issue is related to an Ansible operator project
Projects
None yet
Development

No branches or pull requests

2 participants