[SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory #32394

byungsoo-oh · 2021-04-29T07:15:30Z

What changes were proposed in this pull request?

This PR fixes an error in BenchmarkBase.scala that occurs when creating a benchmark file in a non-existent directory.

Why are the changes needed?

When submitting a benchmark job using org.apache.spark.benchmark.Benchmarks class with SPARK_GENERATE_BENCHMARK_FILES=1 option, an exception is raised if the directory where the benchmark file will be generated does not exist.
For more information, please refer to SPARK-35266.

Does this PR introduce any user-facing change?

No

How was this patch tested?

After building Spark, manually tested with the following command:

SPARK_GENERATE_BENCHMARK_FILES=1 bin/spark-submit --class \
    org.apache.spark.benchmark.Benchmarks --jars \
    "`find . -name '*-SNAPSHOT-tests.jar' -o -name '*avro*-SNAPSHOT.jar' | paste -sd ',' -`" \
    "`find . -name 'spark-core*-SNAPSHOT-tests.jar'`" \
    "org.apache.spark.ml.linalg.BLASBenchmark"

It successfully generated the benchmark result files.

Why it is sufficient:
As illustrated in the comments in Benchmarks.scala, the command below runs all benchmarks and generates the results:

SPARK_GENERATE_BENCHMARK_FILES=1 bin/spark-submit --class \
    org.apache.spark.benchmark.Benchmarks --jars \
    "`find . -name '*-SNAPSHOT-tests.jar' -o -name '*avro*-SNAPSHOT.jar' | paste -sd ',' -`" \
    "`find . -name 'spark-core*-SNAPSHOT-tests.jar'`" \
    "*"

Of all the benchmarks (55 benchmarks in total), only BLASBenchmark fails due to the proposed issue for the current code in the master branch. Thus, it is currently sufficient to test BLASBenchmark to validate this change.

…files in non-existent directory

HyukjinKwon · 2021-05-03T05:28:23Z

core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala

+      if (!dir.exists()) {
+        dir.mkdirs()
+      }
+      val file = new File(s"${dir}$resultFileName")


Ah, okay. the new benchmark were added at SPARK-33882 and SPARK-35150, that was after #32015 and #32044.

I got the point. Thank you!

HyukjinKwon · 2021-05-03T05:39:14Z

core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala

-      val file = new File(s"${prefix}benchmarks/$resultFileName")
+      val dir = new File(s"${prefix}benchmarks/")
+      if (!dir.exists()) {
+        dir.mkdirs()


Can you add println and say the directory is going to be created? e.g.)

// scalastyle:off println println(s"Creating ${dir.getAbsolutePath} for benchmark results.") // scalastyle:on println

My concern is that the benchmark directory is based on jars paths which are flaky. Might be better to explicitly show.

Thanks for the comment :) I added println as suggested.

HyukjinKwon · 2021-05-03T05:41:00Z

@byungsoo-oh:

would you mind checking https://github.com/apache/spark/pull/32394/checks?check_run_id=2464430892 and enable GitHub Actions in your forked repository?
It would be great if you're interested in submitting another PR to generate and update the results added at SPARK-33882 and SPARK-35150 (after this PR is merge). It's pretty straightforward to generate them: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks

HyukjinKwon · 2021-05-03T05:41:21Z

ok to test

HyukjinKwon

Looks good otherwise.

HyukjinKwon · 2021-05-03T05:48:04Z

@srowen and @zhengruifeng FYI from 9244066 and 5b77ebb. I think it was perfectly fine without including benchmark results (but codes only) because It was a bit weird to upload the results based on different spec machines.

Now there have been some latest changes at #32015 and #32044, and now the PR authors can run the benchmarks in similar specification very easily (https://spark.apache.org/developer-tools.html#github-workflow-benchmarks), and it makes more sense to include benchmark results in a PR :).

HyukjinKwon · 2021-05-03T05:50:06Z

ok to test

byungsoo-oh · 2021-05-03T06:44:29Z

@byungsoo-oh:

would you mind checking https://github.com/apache/spark/pull/32394/checks?check_run_id=2464430892 and enable GitHub Actions in your forked repository?

It would be great if you're interested in submitting another PR to generate and update the results added at SPARK-33882 and SPARK-35150 (after this PR is merge). It's pretty straightforward to generate them: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks

@HyukjinKwon
Thank you for the comments :)

I enabled it in my forked repository.
I would be delighted if I could submit another PR to add the benchmark results.

HyukjinKwon · 2021-05-03T09:05:58Z

Merged to master.

HyukjinKwon · 2021-05-03T09:07:04Z

Thanks for your first contribution and congrats for being a contributor!

Fix error in BenchmarkBase.scala that occurs when creating benchmark …

6646e47

…files in non-existent directory

github-actions bot added the CORE label Apr 29, 2021

HyukjinKwon reviewed May 3, 2021

View reviewed changes

HyukjinKwon approved these changes May 3, 2021

View reviewed changes

Add message when creating directory

fcf9fe2

HyukjinKwon closed this in be6ecb6 May 3, 2021

[SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory #32394

[SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory #32394

Uh oh!

Conversation

byungsoo-oh commented Apr 29, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HyukjinKwon May 3, 2021

Choose a reason for hiding this comment

Uh oh!

byungsoo-oh May 3, 2021

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon May 3, 2021

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon May 3, 2021

Choose a reason for hiding this comment

Uh oh!

byungsoo-oh May 3, 2021

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented May 3, 2021

Uh oh!

HyukjinKwon commented May 3, 2021

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented May 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented May 3, 2021

Uh oh!

byungsoo-oh commented May 3, 2021

Uh oh!

HyukjinKwon commented May 3, 2021

Uh oh!

HyukjinKwon commented May 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HyukjinKwon commented May 3, 2021 •

edited

Loading