-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41419][K8S] Decrement PVC_COUNTER when the pod deletion happens #38948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dongjoon-hyun I am trying to figure out how to add a test. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR but we need to remove PVC_COUNTER.decrementAndGet() only.
|
In other words, please revert all changes and remove one line, |
|
That's not the right way :-) See #38943 (comment) |
|
Okay. Since we don't agree, I will make my PR too. We can compare side-by-side, @tedyu . :) |
|
BTW, we need to add the test case to validate the ideas. I'll try to add to my PR. You may can reuse it. |
|
If exception happens before we reach the following line: the counter shouldn't be decremented. the counter should be decremented. We need to make this code future proof. |
|
Please make a valid est case for your claim. |
|
This is handled properly by removing
|
|
My point is: when exception happens, the exception may not come from this call: |
|
@dongjoon-hyun |
|
e.g. |
|
It's totally fine because
|
|
If possible, can you elaborate a bit ? If exception happens at |
|
@tedyu . It seems that you forgot
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, this is wrong because pod deletion doesn't change the total number of PVCs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the logic.
Can you take another look ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PVC_COUNTER is only used by SPARK-41410 . !reuse is a misleading and useless code, isn't it?
Line 413 in 30957a9
| if (reusablePVCs.isEmpty && podAllocOnPVC && maxPVCs <= PVC_COUNTER.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable podAllocOnPVC
private val podAllocOnPVC = conf.get(KUBERNETES_DRIVER_OWN_PVC) &&
conf.get(KUBERNETES_DRIVER_REUSE_PVC) && conf.get(KUBERNETES_DRIVER_WAIT_TO_REUSE_PVC)
includes one more condition for KUBERNETES_DRIVER_WAIT_TO_REUSE_PVC.
That's why I think it doesn't hurt to check again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you insist to use !reuse in order to focus on an unused condition's PVC_COUNTER variable,
this PR becomes more and more irrelevant to the original SPARK-41410. It doesn't look like a follow-up of SPARK-41410.
@tedyu .
- I'd like to recommend you to take a look at my PR.
- Or, use a different JIRA id for your PR.
|
@dongjoon-hyun @viirya Please take another look. |
|
In ExecutorPodsAllocatorSuite.scala, the pair of configs always have the following values: If one of them is false, @dongjoon-hyun |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This is follow-up to commit cc55de3 where
PVC_COUNTERwas introduced to track outstanding number of PVCs.PVC_COUNTERshould only be decremented when the pod deletion happens (in response to error).Why are the changes needed?
If the PVC isn't created successfully (where
PVC_COUNTERisn't incremented) (possibly due to execution not reachingresource(pvc).create()call), we shouldn't decrement the counter.successtracks the progress of PVC creation:value 0 means PVC is not created.
value 1 means PVC has been created.
value 2 means PVC has been created but due to subsequent error, the pod is deleted.
The counter is decremented when either
KUBERNETES_DRIVER_OWN_PVCorKUBERNETES_DRIVER_REUSE_PVChas false value since the PVC is not owned by Driver pod.Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing tests.