Skip to content

Conversation

@coderanger
Copy link

https://issues.apache.org/jira/browse/LIVY-435 is the ticket.

I added unit tests for the improved struct parsing but SparkProcessBuilder has no tests currently so I wasn't sure how to add them (if needed, the changes there are relatively minor).

@jerryshao
Copy link
Contributor

AFAIK, Spark itself also doesn't support non-integer cores. How do you that Spark supports this non-integer cores, also what's the meaning of 0.1 cores?

CC @ajbozarth

@ajbozarth
Copy link
Member

Spark does not support non-Integer values. Those values are passed into Spark as String and converted to Integers, but even then passing it a double would fail the spark-submit. I'm not sure where you got the misconception, but I looked in the Livy and Spark code and the values you changed here definitely have to be Integers for the spark-submit to succeed.

@coderanger
Copy link
Author

They are Ints in the YARN implementations, but Doubles for Mesos and Kubernetes. In k8s, the cores values get translated into Kubernetes Pod resource requests, where fractional cores are a thing that makes sense :) This isn't something you would generally do in real jobs (unless the job wasn't CPU bound) but it matters for local development.

@coderanger
Copy link
Author

And I can 100% promise you this works with the k8s backend, it's the only way I can start a job in my 2 CPU dev VM (Kubernetes doesn't allow overcommit on requests, it will refuse to schedule things instead).

@jerryshao
Copy link
Contributor

OK, I see, let me first check the code of Spark on Mesos and k8s.

@coderanger
Copy link
Author

coderanger commented Jan 25, 2018

@ajbozarth
Copy link
Member

Thanks for those links, given those usages are in forks of Spark and not the main repo I'm a little bit hesitant, but I know we have talked about formally extending support for Meso in the future so I'd be ok with this change. I think we should confirm how Livy would handle Spark throwing a spark-submit error for a non-Int when using YARN/Standalone though. If the error is handled cleanly then I'm fine with this, otherwise we would address that in this PR.

@ajbozarth
Copy link
Member

Related to my above statement, if we're adding tests we should add a negative test as well to check that passing a string or some other value would not work.

@coderanger
Copy link
Author

@ajbozarth
Copy link
Member

Wow, no idea how I missed that in my search results, on second look it's definitely there. We should support this then, but I stand by my comments about testing. We should check how Livy handles YARN and Standalone failing on double input as well as add a test that makes sure Livy won't take in non-numerical input.

@coderanger
Copy link
Author

I may need some pointers on where to add tests and how :) The few that I added was just tweaking some existing test code, I am still very much a n00b at this codebase.

@ajbozarth
Copy link
Member

So for the string Unit Test I figured you write it just like the ones you added. As for the other testing, you should try out your code change using Livy with YARN and Standalone and give it a Double or a String and see how it handles the error. In the case of String, that error is what your checking for in the above Unit Test. In the case of a Double, you should make sure that the error is handled safely and doesn't cause anything to crash or hang, if it does you'll need to address that in this PR.

Does that help you with next steps?

@coderanger
Copy link
Author

Okay, I added another pair of unit tests for the string case. I'm kind of surprised that Jackson is even willing to decode that kind of invalid value. I could make it more explicitly fail at decoding like is already done for the kind key if you are interested in that :)

As for how this will impact other modes, I did some digging through the Spark code I think the answer is "not much". On the submit side, all CLI parsing into the SparkConf is handled purely as strings. The only type checks come when retrieving values from the conf, so as long as you don't actually use a fractional core value with YARN, it should be fine (technically if you tried to assign more than 253 cores it would cause a problem because that can't be represented in a double with exact precision, but I think we're probably okay on that one). Local mode actually ignores the values entirely and will happily accept --executor-cores banana.

@ajbozarth
Copy link
Member

Thanks for the updates. As a followup though, even if it's "not much" what exactly would happen if you give a fractional core value to Livy when using YARN with your changes?

@coderanger
Copy link
Author

For batch requests, spark-submit will error out with a type coercion error when it calls sparkConf.getInt('spark.driver.cores'). I'm less clear on session/interactive requests and I don't have the environment to test it but I think it should be the same.

@coderanger coderanger closed this Jan 30, 2018
@coderanger coderanger reopened this Jan 30, 2018
@coderanger
Copy link
Author

Ack, didn't mean to close, sorry.

@ajbozarth
Copy link
Member

So it will cause the initial submit to error and the Livy session would then be ended as FAILED?

@coderanger
Copy link
Author

Yep, definitely for batch sessions, probably (99%, unless I missed something in the code) for interactive.

@ajbozarth
Copy link
Member

Ok then I believe this LGTM, @jerryshao want to take one last look?

@jerryshao
Copy link
Contributor

I will review again later today.

@jerryshao
Copy link
Contributor

Spark itself has issues to use fractional core numbers, see apache/spark#20460. Let's wait until Spark itself fix such issues.

@ajbozarth
Copy link
Member

Nice catch @jerryshao I agree we should wait until Spark has that fix in.

@coderanger
Copy link
Author

From the issues in apache/spark#20460 I think we should close this out and pursue a different tack :) Probably leaving the main driver,executor.cores as ints and use secondary properties to allow more fine-grained control over k8s resource requests.

@coderanger coderanger closed this Feb 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants