Skip to content

Conversation

@tgravescs
Copy link
Contributor

@tgravescs tgravescs commented Jun 27, 2019

What changes were proposed in this pull request?

we are adding in generic resource support into spark where we have suffix for the amount of the resource so that we could support other configs.

Spark on yarn already had added configs to request resources via the configs spark.yarn.{executor/driver/am}.resource=, where the is value and unit together. We should change those configs to have a .amount suffix on them to match the spark configs and to allow future configs to be more easily added. YARN itself already supports tags and attributes so if we want the user to be able to pass those from spark at some point having a suffix makes sense. it would allow for a spark.yarn.{executor/driver/am}.resource.{resource}.tag= type config.

How was this patch tested?

Tested via unit tests and manually on a yarn 3.x cluster with GPU resources configured on.

@tgravescs
Copy link
Contributor Author

cc @szilard-nemeth I think you originally add these configs

@SparkQA
Copy link

SparkQA commented Jun 27, 2019

Test build #106977 has finished for PR 24989 at commit 428f63a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 28, 2019

Test build #107023 has finished for PR 24989 at commit 6af9988.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor Author

@vanzin I think you reviewed first change for the resources, you ok with this approach

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok.

I was thinking a little about what would happen if a new resource is supported by the scheduler (the user would have to set it in both configs, as your comment suggests) and whether that could be simplified somehow, but that sounds like a separate problem.

@tgravescs
Copy link
Contributor Author

yeah its not ideal to have it hardcoded and then have to specify 2 configs for those. Thought about coming up with some sort of mapping but figured for now we can do this and come up with something else later.

@SparkQA
Copy link

SparkQA commented Jul 16, 2019

Test build #107738 has finished for PR 24989 at commit 6e86ab9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Shall we test this with hadoop-2.7 profile, too?

@vanzin vanzin changed the title [SPARK-27959][YARN][test-hadoop3.2] Change YARN resource configs to use .amount [SPARK-27959][YARN] Change YARN resource configs to use .amount Jul 16, 2019
@vanzin
Copy link
Contributor

vanzin commented Jul 16, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Jul 16, 2019

Test build #107745 has finished for PR 24989 at commit 6e86ab9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Jul 16, 2019

Merging to master.

@vanzin vanzin closed this in 43d68cd Jul 16, 2019
Copy link
Contributor

@attilapiros attilapiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly NITs otherwise LGTM.

private val AMOUNT_AND_UNIT_REGEX = "([0-9]+)([A-Za-z]*)".r
private val RESOURCE_INFO_CLASS = "org.apache.hadoop.yarn.api.records.ResourceInformation"
val YARN_GPU_RESOURCE_CONFIG = "yarn.io/gpu"
val YARN_FPGA_RESOURCE_CONFIG = "yarn.io/fpga"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to reuse ResourceUtils.GPU and ResourceUtils.FPGA here?

Like:

  val YARN_GPU_RESOURCE_CONFIG = "yarn.io/$GPU"
  val YARN_FPGA_RESOURCE_CONFIG = "yarn.io/$FPGA"

For suggesting using as close mapping as possible between the Spark config keys and YARN resource config keys (now and the future).

I know your PR does not touched this part:

(ResourceID(SPARK_EXECUTOR_PREFIX, "fpga").amountConf,
s"${YARN_EXECUTOR_RESOURCE_TYPES_PREFIX}${YARN_FPGA_RESOURCE_CONFIG}"),
(ResourceID(SPARK_DRIVER_PREFIX, "fpga").amountConf,
s"${YARN_DRIVER_RESOURCE_TYPES_PREFIX}${YARN_FPGA_RESOURCE_CONFIG}"),
(ResourceID(SPARK_EXECUTOR_PREFIX, "gpu").amountConf,
s"${YARN_EXECUTOR_RESOURCE_TYPES_PREFIX}${YARN_GPU_RESOURCE_CONFIG}"),
(ResourceID(SPARK_DRIVER_PREFIX, "gpu").amountConf,
s"${YARN_DRIVER_RESOURCE_TYPES_PREFIX}${YARN_GPU_RESOURCE_CONFIG}"))

But at that place using the right constant ($GPU or $FPGA) instead of the string literal is reasonable.


private[yarn] def getYarnResourcesAndAmounts(
sparkConf: SparkConf,
componentName: String): Map[String, String] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in getYarnResourcesFromSparkResources the second parameter is the SparkConf what about taking the componentName (prefix) to the first place here too.

@attilapiros
Copy link
Contributor

It is already closed. Anyway if you think my comments useful even I can do this small changes in a follow-up or in a minor PR.

vinodkc pushed a commit to vinodkc/spark that referenced this pull request Jul 18, 2019
## What changes were proposed in this pull request?

we are adding in generic resource support into spark where we have suffix for the amount of the resource so that we could support other configs.

Spark on yarn already had added configs to request resources via the configs spark.yarn.{executor/driver/am}.resource=<some amount>, where the <some amount> is value and unit together.  We should change those configs to have a `.amount` suffix on them to match the spark configs and to allow future configs to be more easily added. YARN itself already supports tags and attributes so if we want the user to be able to pass those from spark at some point having a suffix makes sense. it would allow for a spark.yarn.{executor/driver/am}.resource.{resource}.tag= type config.

## How was this patch tested?

Tested via unit tests and manually on a yarn 3.x cluster with GPU resources configured on.

Closes apache#24989 from tgravescs/SPARK-27959-yarn-resourceconfigs.

Authored-by: Thomas Graves <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants