-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41525][K8S] Improve onNewSnapshots to use unique lists of known executor IDs and PVC names
#39070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s in onNewSnapshots
onNewSnapshots
onNewSnapshotsonNewSnapshots to use unique list of known executor IDs and PVC names
|
cc @viirya and @attilapiros |
onNewSnapshots to use unique list of known executor IDs and PVC namesonNewSnapshots to use unique list of known executor IDs and PVC names
onNewSnapshots to use unique list of known executor IDs and PVC namesonNewSnapshots to use unique lists of known executor IDs and PVC names
| schedulerBackend: KubernetesClusterSchedulerBackend, | ||
| snapshots: Seq[ExecutorPodsSnapshot]): Unit = { | ||
| val k8sKnownExecIds = snapshots.flatMap(_.executorPods.keys) | ||
| val k8sKnownExecIds = snapshots.flatMap(_.executorPods.keys).distinct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code is in Spark 3.1.2 and Spark 3.2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So snapshots may contain duplicates too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapshots is defined as a sequence of snapshot history originally. Logically, there can be a few snapshot which the same contents, but in the most cases, they will be different by object metadata or status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying it.
| pod.getSpec.getVolumes.asScala | ||
| .flatMap { v => Option(v.getPersistentVolumeClaim).map(_.getClaimName) } | ||
| } | ||
| }.distinct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is since Spark 3.2.0.
|
Thank you, @viirya ! |
|
All tests passed except the master branch dependency test failure. Merged to master. |
What changes were proposed in this pull request?
This PR improve
ExecutorPodsAllocator.onNewSnapshotsby removing duplications atk8sKnownExecIdsandk8sKnownPVCNames. In the large cluster, this causes inefficiency.Why are the changes needed?
The existing variables have lots of duplications because
snapshotsisSeq[ExecutorPodsSnapshot].For example, if we print out the values, it looks like the following.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manual review because this is an improvement on the local variable computation.