-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41410][K8S] Support PVC-oriented executor pod allocation #38943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you review this, @viirya ?
| // Check reusable PVCs for this executor allocation batch | ||
| val reusablePVCs = getReusablePVCs(applicationId, pvcsInUse) | ||
| for ( _ <- 0 until numExecutorsToAllocate) { | ||
| if (reusablePVCs.isEmpty && reusePVC && maxPVCs <= PVC_COUNTER.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is reusablePVCs always less than or equal to maxPVCs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review.
Theoretically, reusablePVCs are all driver-owned PVCs whose creation time is bigger than podAllocationDelay + now. So, it can be bigger than maxPVCs is there is other PVC creation logic (For example, Spark driver plugin).
Lines 364 to 382 in 89b2ee2
| private def getReusablePVCs(applicationId: String, pvcsInUse: Seq[String]) = { | |
| if (conf.get(KUBERNETES_DRIVER_OWN_PVC) && conf.get(KUBERNETES_DRIVER_REUSE_PVC) && | |
| driverPod.nonEmpty) { | |
| try { | |
| val createdPVCs = kubernetesClient | |
| .persistentVolumeClaims | |
| .inNamespace(namespace) | |
| .withLabel("spark-app-selector", applicationId) | |
| .list() | |
| .getItems | |
| .asScala | |
| val now = Instant.now().toEpochMilli | |
| val reusablePVCs = createdPVCs | |
| .filterNot(pvc => pvcsInUse.contains(pvc.getMetadata.getName)) | |
| .filter(pvc => now - Instant.parse(pvc.getMetadata.getCreationTimestamp).toEpochMilli | |
| > podAllocationDelay) | |
| logInfo(s"Found ${reusablePVCs.size} reusable PVCs from ${createdPVCs.size} PVCs") | |
| reusablePVCs |
Also, previously, Spark creates new pod and PVCs when some executors are dead. In that case, PVCs could be created a little more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So even maxPVCs <= PVC_COUNTER.get(), if reusablePVCs is not empty, the driver will continue executor pod allocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct! When we have reusablePVCs, PVC-oriented executor pod allocation doesn't need to be blocked. We halts executor allocation only when there is no available PVCs and reached PVC_COUNTER is greater than or equal to the maximum .
| ConfigBuilder("spark.kubernetes.driver.waitToReusePersistentVolumeClaims") | ||
| .doc("If true, driver pod counts the number of created on-demand persistent volume claims " + | ||
| s"and wait if the number is greater than or equal to the maximum which is " + | ||
| s"${EXECUTOR_INSTANCES.key} or ${DYN_ALLOCATION_MAX_EXECUTORS.key}. " + | ||
| s"This config requires both ${KUBERNETES_DRIVER_OWN_PVC.key}=true and " + | ||
| s"${KUBERNETES_DRIVER_REUSE_PVC.key}=true.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to mention PVC-oriented executor pod allocation in the config description? I think it is more clear on what this feature is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, initially, I tried to use it as a config name but PVC-oriented executor pod allocation was achieved by three configurations.
- spark.kubernetes.driver.waitToReusePersistentVolumeClaims
- spark.kubernetes.driver.ownPersistentVolumeClaims
- spark.kubernetes.driver.reusePersistentVolumeClaims
I'll add a K8s document section with that name.
| private val reusePVC = conf.get(KUBERNETES_DRIVER_OWN_PVC) && | ||
| conf.get(KUBERNETES_DRIVER_REUSE_PVC) && conf.get(KUBERNETES_DRIVER_WAIT_TO_REUSE_PVC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if don't use KUBERNETES_DRIVER_WAIT_TO_REUSE_PVC, cannot we still reuse PVCs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this named reusePVC is a bit confused. Isn't it the combination of three configs you mentioned (https://github.com/apache/spark/pull/38943/files#r1041347366)?
Maybe podAllocOnPVC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Let me rename~
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks clarifying. Understood the proposed logic. Looks good to me. Remaining comment is about variable naming (https://github.com/apache/spark/pull/38943/files#r1041414732).
We also need to update K8s document (https://github.com/apache/spark/pull/38943/files#r1041347366) which can be in this PR or a followup.
|
Thank you, @viirya . All test passed. Merged to master for Apache Spark 3.4.0. |
|
I think the If pvc isn't created successfully, it seems the counter shouldn't be decremented. |
|
Thank you for review, @tedyu . |
|
Do you mean that |
|
Yeah - the |
|
In case of creation failure, Lines 447 to 448 in e58f12d
In case of deletion failure, Lines 454 to 458 in e58f12d
So, we had better remove |
|
The catch block handles errors beyond PVC creation failure. Execution may not reach the The |
|
I commented on your PR. |
|
Here is a PR including test case to address the comment. |
### What changes were proposed in this pull request? This PR aims to support `PVC-oriented executor pod allocation` which means Spark driver will create a fixed number of PVCs (= `spark.executor.instances` or `spark.dynamicAllocation.maxExecutors`) and hold on new executor pod creations if the number of created PVCs reached the limit. ### Why are the changes needed? This will allow Spark to hand over the existing PVCs from dead executors to new executors. Previously, Spark creates new executors without waiting the dead executors release their PVCs. ### Does this PR introduce _any_ user-facing change? No, this is a new feature which is disabled by default. ### How was this patch tested? Pass the CIs with the newly added test case. Closes apache#38943 from dongjoon-hyun/SPARK-41410. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to support
PVC-oriented executor pod allocationwhich means Spark driver will create a fixed number of PVCs (=spark.executor.instancesorspark.dynamicAllocation.maxExecutors) and hold on new executor pod creations if the number of created PVCs reached the limit.Why are the changes needed?
This will allow Spark to hand over the existing PVCs from dead executors to new executors. Previously, Spark creates new executors without waiting the dead executors release their PVCs.
Does this PR introduce any user-facing change?
No, this is a new feature which is disabled by default.
How was this patch tested?
Pass the CIs with the newly added test case.