-
Notifications
You must be signed in to change notification settings - Fork 621
[LIVY-435] Allow floating point numbers to be used for the driver and executor core counts #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
AFAIK, Spark itself also doesn't support non-integer cores. How do you that Spark supports this non-integer cores, also what's the meaning of 0.1 cores? CC @ajbozarth |
|
Spark does not support non-Integer values. Those values are passed into Spark as String and converted to Integers, but even then passing it a double would fail the spark-submit. I'm not sure where you got the misconception, but I looked in the Livy and Spark code and the values you changed here definitely have to be Integers for the spark-submit to succeed. |
|
They are Ints in the YARN implementations, but Doubles for Mesos and Kubernetes. In k8s, the cores values get translated into Kubernetes Pod resource requests, where fractional cores are a thing that makes sense :) This isn't something you would generally do in real jobs (unless the job wasn't CPU bound) but it matters for local development. |
|
And I can 100% promise you this works with the k8s backend, it's the only way I can start a job in my 2 CPU dev VM (Kubernetes doesn't allow overcommit on requests, it will refuse to schedule things instead). |
|
OK, I see, let me first check the code of Spark on Mesos and k8s. |
|
https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala#L109 is the relevant code there :) And I think https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala#L62 is the relevant bit in Mesos land, but I don't know that side very well. |
|
Thanks for those links, given those usages are in forks of Spark and not the main repo I'm a little bit hesitant, but I know we have talked about formally extending support for Meso in the future so I'd be ok with this change. I think we should confirm how Livy would handle Spark throwing a spark-submit error for a non-Int when using YARN/Standalone though. If the error is handled cleanly then I'm fine with this, otherwise we would address that in this PR. |
|
Related to my above statement, if we're adding tests we should add a negative test as well to check that passing a string or some other value would not work. |
|
Heh, the k8s branch/fork is getting upstreamed, that subsystem is already in actually https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala#L94 |
|
Wow, no idea how I missed that in my search results, on second look it's definitely there. We should support this then, but I stand by my comments about testing. We should check how Livy handles YARN and Standalone failing on double input as well as add a test that makes sure Livy won't take in non-numerical input. |
|
I may need some pointers on where to add tests and how :) The few that I added was just tweaking some existing test code, I am still very much a n00b at this codebase. |
|
So for the string Unit Test I figured you write it just like the ones you added. As for the other testing, you should try out your code change using Livy with YARN and Standalone and give it a Double or a String and see how it handles the error. In the case of String, that error is what your checking for in the above Unit Test. In the case of a Double, you should make sure that the error is handled safely and doesn't cause anything to crash or hang, if it does you'll need to address that in this PR. Does that help you with next steps? |
…rigger an exception).
|
Okay, I added another pair of unit tests for the string case. I'm kind of surprised that Jackson is even willing to decode that kind of invalid value. I could make it more explicitly fail at decoding like is already done for the As for how this will impact other modes, I did some digging through the Spark code I think the answer is "not much". On the submit side, all CLI parsing into the SparkConf is handled purely as strings. The only type checks come when retrieving values from the conf, so as long as you don't actually use a fractional core value with YARN, it should be fine (technically if you tried to assign more than 253 cores it would cause a problem because that can't be represented in a double with exact precision, but I think we're probably okay on that one). Local mode actually ignores the values entirely and will happily accept |
|
Thanks for the updates. As a followup though, even if it's "not much" what exactly would happen if you give a fractional core value to Livy when using YARN with your changes? |
|
For batch requests, |
|
Ack, didn't mean to close, sorry. |
|
So it will cause the initial submit to error and the Livy session would then be ended as FAILED? |
|
Yep, definitely for batch sessions, probably (99%, unless I missed something in the code) for interactive. |
|
Ok then I believe this LGTM, @jerryshao want to take one last look? |
|
I will review again later today. |
|
Spark itself has issues to use fractional core numbers, see apache/spark#20460. Let's wait until Spark itself fix such issues. |
|
Nice catch @jerryshao I agree we should wait until Spark has that fix in. |
|
From the issues in apache/spark#20460 I think we should close this out and pursue a different tack :) Probably leaving the main driver,executor.cores as ints and use secondary properties to allow more fine-grained control over k8s resource requests. |
https://issues.apache.org/jira/browse/LIVY-435 is the ticket.
I added unit tests for the improved struct parsing but SparkProcessBuilder has no tests currently so I wasn't sure how to add them (if needed, the changes there are relatively minor).