[SPARK-21917][CORE][YARN] Supporting adding http(s) resources in yarn mode #19130

jerryshao · 2017-09-05T08:26:10Z

What changes were proposed in this pull request?

In the current Spark, when submitting application on YARN with remote resources ./bin/spark-shell --jars http://central.maven.org/maven2/com/github/swagger-akka-http/swagger-akka-http_2.11/0.10.1/swagger-akka-http_2.11-0.10.1.jar --master yarn-client -v, Spark will be failed with:

java.io.IOException: No FileSystem for scheme: http
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
	at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:354)
	at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:478)
	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:600)
	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:599)
	at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:599)
	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:598)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:598)
	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:848)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:173)

This is because YARN#client assumes resources are on the Hadoop compatible FS. To fix this problem, here propose to download remote http(s) resources to local and add this local downloaded resources to dist cache. This solution has one downside: remote resources are downloaded and uploaded again, but it only restricted to only remote http(s) resources, also the overhead is not so big. The advantages of this solution is that it is simple and the code changes restricts to only SparkSubmit.

How was this patch tested?

Unit test added, also verified in local cluster.

SparkQA · 2017-09-05T11:31:38Z

Test build #81402 has finished for PR 19130 at commit 42a79ab.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2017-09-05T13:27:23Z

Jenkins, retest this please.

SparkQA · 2017-09-05T16:36:37Z

Test build #81410 has finished for PR 19130 at commit 42a79ab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-06T10:39:52Z

Test build #81447 has finished for PR 19130 at commit 047578e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2017-09-07T13:24:52Z

@vanzin @tgravescs , would you please help to review, thanks!

tgravescs · 2017-09-07T20:06:44Z

sorry probably wont' get to this today, will look tomorrow.

tgravescs · 2017-09-11T18:23:54Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

There is a Filesystem.getFileSystemClass function we could use here instead of calling dummy uri

tgravescs · 2017-09-11T18:35:07Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

do we somehow want to make this configurable per scheme? Right now its basically http/https, in the future would we want to possibly handle other filesystems that hadoop doesn't support. Making this a settable config would make that easier

jerryshao · 2017-09-12T06:04:15Z

@tgravescs , thanks for your comments, can you review again, if it is what you expected.

SparkQA · 2017-09-12T07:04:46Z

Test build #81658 has finished for PR 19130 at commit 4bbc09d.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2017-09-12T07:16:07Z

Jenkins, retest this please.

SparkQA · 2017-09-12T10:39:57Z

Test build #81665 has finished for PR 19130 at commit 4bbc09d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-09-13T17:41:08Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

"When running in YARN cluster manager, ?"

Sorry for the broken comment, my bad, I will fix it.

vanzin · 2017-09-13T17:43:17Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Try { ... }.isSuccess? You could also avoid this call if the scheme is in the blacklist.

vanzin · 2017-09-13T17:53:04Z

docs/running-on-yarn.md

Better wording:

Comma-separated list of schemes for which files will be downloaded to the local disk prior to being added to YARN's distributed cache. For use in cases where the YARN service does not support schemes that are supported by Spark.

vanzin · 2017-09-13T17:56:39Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

I was going to say this is missing spark.yarn.dist.files and .jars, but later those properties seem to be set based on args.files and args.jars.

Which kinda raises the question of what happens when the user sets both. From the documentation it sounds like that should work (both sets of files get added), but from the code it seems --files and --jars would overwrite the spark.yarn.* configs...

In any case, that's not the fault of your change.

From the code --files and --jars overwrite spark.yarn.* long ago AFAIK. What I think is that we should make spark.yarn.* as an internal configurations to reduce the discrepancy.

vanzin · 2017-09-13T17:59:31Z

core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

It would be better if tests could avoid this... you could start a local http server, but that feels like a lot of work. Is there some way to mock the behavior instead?

Yes, that's my concern, let me think out another way to handle this.

vanzin · 2017-09-13T18:00:12Z

core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

...still are...

Also I'm not sure I understand the comment.

vanzin · 2017-09-13T18:01:17Z

core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

It seems you have 3 different tests in this block (at least), could you break them into separate tests?

jerryshao · 2017-09-14T08:26:23Z

core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala

@vanzin , it is a little difficult to mock the download behavior, so here I check if "spark.testing" is configured, return a dummy local path if it is configured. What do you think about this approach?

Sounds good.

SparkQA · 2017-09-14T11:34:14Z

Test build #81775 has finished for PR 19130 at commit d479ff0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-14T11:35:37Z

Test build #81776 has finished for PR 19130 at commit cc69bc8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin

A few minor things to address.

vanzin · 2017-09-14T18:48:27Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Code like this (break a comma-separate string into a list) is copy & pasted in so many places that it probably deserves a method in Utils.

There's one in ConfigHelpers.stringToSeq but that class is private to its package.

I added a help method in Utils and changed in SparkSubmit related codes. There still have some other places which requires to change, but I will not touch them in this PR.

vanzin · 2017-09-14T18:51:31Z

core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

break multi-line args one per line.

vanzin · 2017-09-14T18:52:59Z

core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

Better: "force download from blacklisted schemes"

vanzin · 2017-09-14T18:54:32Z

core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

This should already be set by the build scripts, was it not working for you?

I haven't tried, I saw some UT also set this configuration, let me check if it is explicitly required or not.

SparkQA · 2017-09-15T05:48:26Z

Test build #81804 has finished for PR 19130 at commit 1c5487c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-15T05:48:40Z

Test build #81805 has finished for PR 19130 at commit fc2eb2b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987

LGTM

cloud-fan · 2017-09-19T00:48:23Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

is it a problem only for YARN? Do standalone and Mesos have this problem?

This is a problem for YARN currently, because YARN uses dist cache to distribute resources to yarn cluster, dist cache requires supported Hadoop FS to copy resources, if our resource scheme is http, it will try to find http FS to handle such resource, which will be failed since no http FS supported in current Hadoop.

In standalone and Mesos cluster, we use Spark's internal logic to handle http resources, this logic handles well for the http(s) resources, so there should be no issue for standalone and mesos mode.

cloud-fan · 2017-09-19T00:54:12Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

why make it a function? Can't we just inline it?

It can be, let me change the code.

cloud-fan · 2017-09-19T00:55:57Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

shall we explicitly list "http" | "https" | "ftp"?

No, it is not required, because shouldDownload logic will handle this. If 1) this resource scheme is blacklisted, or 2) it is not support by Hadoop, then Spark will handle this through downloadFile method. Since "http" | "https" | "ftp" is not supported by Hadoop before 2.9, so it implies that resources with such scheme will be handled by Spark itself.

cloud-fan · 2017-09-19T00:57:12Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

can you give an example of these schemes?

SparkQA · 2017-09-19T04:52:14Z

Test build #81905 has finished for PR 19130 at commit 580d587.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-09-19T06:40:38Z

docs/running-on-yarn.md

update here too

cloud-fan · 2017-09-19T06:40:55Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

shall we make these 3 the default value of this config?

It is not necessary, we still want to leverage Hadoop's http(s) FS to distribute resources by default if it is running on Hadoop 2.9+ (https://issues.apache.org/jira/browse/HADOOP-14383)

SparkQA · 2017-09-19T07:04:44Z

Test build #81912 has finished for PR 19130 at commit 9a2c8c7.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

Change-Id: I7897817ceaaafecd779a6e085c96d2a28363d7d6

Change-Id: I9176970799c5aa33f0dbd6556509b2d1f77b6f6b

SparkQA · 2017-09-19T10:53:38Z

Test build #81914 has finished for PR 19130 at commit 0fb7943.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-09-19T11:53:14Z

LGTM, just one question: why we need spark.yarn.dist.forceDownloadSchemes?

jerryshao · 2017-09-19T13:06:33Z

Hi @cloud-fan , the main purpose of spark.yarn.dist.forceDownloadSchemes is to explicitly using Spark's own logic to handle remote resources instead of relying on Hadoop. For example if spark.yarn.dist.forceDownloadSchemes is configured to http,https, then this 2 kinds of resources will be downloaded by Spark prior to add to dist cache, even if they're supported by http FS in Hadoop 2.9+. For now if we use Hadoop 2.9-, since Hadoop doesn't support http FS, so we will always leverage Spark's own logic to download resources, it is not necessary to configure this parameter.

cloud-fan · 2017-09-19T14:27:03Z

thanks, merging to master!

jerryshao changed the title ~~[SPARK-21917][CORE][YARN] Supporting Download http(s) resources in yarn mode~~ [SPARK-21917][CORE][YARN] Supporting adding http(s) resources in yarn mode Sep 5, 2017

tgravescs reviewed Sep 11, 2017

View reviewed changes

vanzin reviewed Sep 13, 2017

View reviewed changes

jerryshao commented Sep 14, 2017

View reviewed changes

vanzin reviewed Sep 14, 2017

View reviewed changes

jiangxb1987 reviewed Sep 15, 2017

View reviewed changes

cloud-fan reviewed Sep 19, 2017

View reviewed changes

docs/running-on-yarn.md Outdated

Copy link

Contributor

cloud-fan Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update here too

cloud-fan reviewed Sep 19, 2017

View reviewed changes

Download remote http(s) resources to local in yarn mode

a67f6a1

Change-Id: I7897817ceaaafecd779a6e085c96d2a28363d7d6

jerryshao added 9 commits September 19, 2017 15:13

Address the comments

96720e8

Change-Id: I9176970799c5aa33f0dbd6556509b2d1f77b6f6b

Address the comments

b3e0961

Address the comments

95b48ea

minor changes

19b7fbf

Further address some minor issues

a011982

Style changes

9538cae

Minor changes

915de61

Fix test failure

3bd487f

Fix one left doc issue

0fb7943

jerryshao force-pushed the SPARK-21917 branch from 9a2c8c7 to 0fb7943 Compare September 19, 2017 07:39

asfgit closed this in 8319432 Sep 19, 2017

[SPARK-21917][CORE][YARN] Supporting adding http(s) resources in yarn mode #19130

[SPARK-21917][CORE][YARN] Supporting adding http(s) resources in yarn mode #19130

Uh oh!

Conversation

jerryshao commented Sep 5, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Sep 5, 2017

Uh oh!

jerryshao commented Sep 5, 2017

Uh oh!

SparkQA commented Sep 5, 2017

Uh oh!

SparkQA commented Sep 6, 2017

Uh oh!

jerryshao commented Sep 7, 2017

Uh oh!

tgravescs commented Sep 7, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryshao commented Sep 12, 2017

Uh oh!

SparkQA commented Sep 12, 2017

Uh oh!

jerryshao commented Sep 12, 2017

Uh oh!

SparkQA commented Sep 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 14, 2017

Uh oh!

SparkQA commented Sep 14, 2017

Uh oh!

vanzin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 15, 2017

Uh oh!

SparkQA commented Sep 15, 2017

Uh oh!

jiangxb1987 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

jiangxb1987 left a comment •

edited

Loading

jerryshao Sep 19, 2017 •

edited

Loading