[SPARK-34821][INFRA] Set up a workflow for developers to run benchmark in their fork #32015

HyukjinKwon · 2021-03-31T13:23:00Z

What changes were proposed in this pull request?

This PR proposes to add a workflow that allows developers to run benchmarks and download the results files. After this PR, developers can run benchmarks in GitHub Actions in their fork.

Why are the changes needed?

Very easy to use.
We can use the (almost) same environment to run the benchmarks. Given my few experiments and observation, the CPU, cores, and memory are same.
Does not burden ASF's resource at GitHub Actions.

Does this PR introduce any user-facing change?

No, dev-only.

How was this patch tested?

Manually tested in HyukjinKwon#31.

Entire benchmarks are being run as below:

How do developers use it in their fork?

Go to Actions in your fork, and click "Run benchmarks"
Run the benchmarks with JDK 8 or 11 with benchmark classes to run. Glob pattern is supported just like testOnly in SBT
After finishing the jobs, the benchmark results are available on the top in the underlying workflow:

After downloading it, unzip and untar at Spark git root directory:

cd .../spark
mv ~/Downloads/benchmark-results-8.zip .
unzip benchmark-results-8.zip
tar -xvf benchmark-results-8.tar

Check the results:

git status

...
    modified:   core/benchmarks/MapStatusesSerDeserBenchmark-results.txt

HyukjinKwon · 2021-03-31T13:23:51Z

Note that I tested subset of benchmarks, verified that it works, and now I am waiting for the final results of running all benchmarks:

.github/workflows/benchmark.yml

HyukjinKwon · 2021-03-31T14:20:47Z

BTW, I will document this in https://spark.apache.org/developer-tools.html, and add the link into our docs (and probably in GItHub PR template?)

wangyum · 2021-03-31T14:37:06Z

It seems we can't run TPCDSQueryBenchmark. But we can support it in the future.

HyukjinKwon · 2021-03-31T14:44:35Z

Yeah, good point. I will make it separate for now.

maropu · 2021-03-31T14:59:48Z

Cool, it looks useful.

HyukjinKwon · 2021-03-31T15:42:51Z

GA is unstable now (https://www.githubstatus.com/). I will retrigger the full benchmarks tomorrow ..

core/src/test/scala/org/apache/spark/benchmark/Benchmarks.scala

HyukjinKwon · 2021-04-02T11:43:56Z

.github/workflows/benchmark.yml

+      num-splits:
+        description: 'Number of job splits'
+        required: true
+        default: '1'


I had to add this parameter because GitHub Actions' limits job's timeout as 6 hours (workflow is 72 hours), and sequential running of benchmarks takes up to 50 hours. In this way, it runs the benchmarks in parallel so I think it's okay .. although it might expose too many parameters to control.

For example, I am now running all benchmarks in 20 splits (with JDK 11) at here:

which results in 20 jobs that runs benchmarks in parallel (hashed by 20)

HyukjinKwon · 2021-04-02T14:17:57Z

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmark.scala

-      jsonInDS(50 * 1000 * 1000, numIters)
-      jsonInFile(50 * 1000 * 1000, numIters)
-      datetimeBenchmark(rowsNum = 10 * 1000 * 1000, numIters)
+      schemaInferring(5 * 1000 * 1000, numIters)


@MaxGekk I had to reduce the size here. Otherwise GA job dies with complaining no disk space

With this, all benchmarks should pass now .. I will wait for the results before merging it in.

SparkQA · 2021-04-02T15:47:29Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41437/

SparkQA · 2021-04-02T15:47:30Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41437/

SparkQA · 2021-04-02T17:07:59Z

Test build #136859 has finished for PR 32015 at commit e6beeb5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2021-04-03T11:55:09Z

It's all passed:

I am merging this to master.

Thank you all for reviews and approvals

HyukjinKwon · 2021-04-03T11:55:53Z

Merged to master.

…GitHub Actions machines ### What changes were proposed in this pull request? #32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way. **NOTE** that looks like GitHub Actions use four types of CPU given my observations: - Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz - Intel(R) Xeon(R) CPU E5-2673 v4 2.30GHz - Intel(R) Xeon(R) CPU E5-2673 v3 2.40GHz - Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz Given my quick research, seems like they perform roughly similarly: ![Screen Shot 2021-04-03 at 9 31 23 PM](https://user-images.githubusercontent.com/6477701/113478478-f4b57b80-94c3-11eb-9047-f81ca8c59672.png) I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz but the performance seems roughly similar given the numbers. So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares. ### Why are the changes needed? To have a base line of the benchmarks accordingly. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? It was generated from: - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465) - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337) Closes #32044 from HyukjinKwon/SPARK-34950. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Max Gekk <[email protected]>

HyukjinKwon marked this pull request as draft March 31, 2021 13:23

HyukjinKwon requested review from MaxGekk, dongjoon-hyun, gengliangwang, maropu, viirya and wangyum March 31, 2021 13:24

HyukjinKwon force-pushed the SPARK-34821-pr branch from 716e52b to 588d8c7 Compare March 31, 2021 13:35

HyukjinKwon requested a review from srowen March 31, 2021 13:59

srowen approved these changes Mar 31, 2021

View reviewed changes

gengliangwang reviewed Mar 31, 2021

View reviewed changes

.github/workflows/benchmark.yml Outdated Show resolved Hide resolved

HyukjinKwon changed the title ~~[SPARK-34821][INFRA] Set up a workflow for developers to run benchmark in their fork~~ [WIP][SPARK-34821][INFRA] Set up a workflow for developers to run benchmark in their fork Mar 31, 2021

HyukjinKwon force-pushed the SPARK-34821-pr branch from 588d8c7 to 0e2f439 Compare March 31, 2021 14:32

This comment has been minimized.

Sign in to view

HyukjinKwon force-pushed the SPARK-34821-pr branch from 0e2f439 to e8a996f Compare March 31, 2021 15:33

HyukjinKwon changed the title ~~[WIP][SPARK-34821][INFRA] Set up a workflow for developers to run benchmark in their fork~~ [SPARK-34821][INFRA] Set up a workflow for developers to run benchmark in their fork Mar 31, 2021

This comment has been minimized.

Sign in to view

viirya reviewed Mar 31, 2021

View reviewed changes

core/src/test/scala/org/apache/spark/benchmark/Benchmarks.scala Outdated Show resolved Hide resolved

HyukjinKwon added 2 commits April 2, 2021 15:46

Fix benchmark tests and add failfast mode

ecf7a39

Remove changes in ExtractBenchmark

60d3f0e

HyukjinKwon force-pushed the SPARK-34821-pr branch from d9f9aae to 60d3f0e Compare April 2, 2021 06:48

This comment has been minimized.

Sign in to view

Support to split jobs

5eefec9

HyukjinKwon commented Apr 2, 2021

View reviewed changes

This comment has been minimized.

Sign in to view

Add matrix in the output artifact name

e6beeb5

HyukjinKwon force-pushed the SPARK-34821-pr branch from 60a349e to e6beeb5 Compare April 2, 2021 14:13

HyukjinKwon commented Apr 2, 2021

View reviewed changes

This comment has been minimized.

Sign in to view

HyukjinKwon marked this pull request as ready for review April 3, 2021 11:54

HyukjinKwon closed this in 71effba Apr 3, 2021

HyukjinKwon mentioned this pull request Apr 3, 2021

[SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines #32044

Closed

HyukjinKwon mentioned this pull request May 3, 2021

[SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory #32394

Closed

HyukjinKwon deleted the SPARK-34821-pr branch January 4, 2022 00:54

MaxGekk mentioned this pull request Aug 28, 2024

[SPARK-49410][SQL][TESTS] Update collation benchmarks #47893

Closed

[SPARK-34821][INFRA] Set up a workflow for developers to run benchmark in their fork #32015

[SPARK-34821][INFRA] Set up a workflow for developers to run benchmark in their fork #32015

Uh oh!

Conversation

HyukjinKwon commented Mar 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

How do developers use it in their fork?

Uh oh!

HyukjinKwon commented Mar 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

HyukjinKwon commented Mar 31, 2021

Uh oh!

wangyum commented Mar 31, 2021

Uh oh!

HyukjinKwon commented Mar 31, 2021

Uh oh!

This comment has been minimized.

This comment has been minimized.

maropu commented Mar 31, 2021

Uh oh!

HyukjinKwon commented Mar 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

HyukjinKwon Apr 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

HyukjinKwon Apr 2, 2021

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Apr 2, 2021

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

SparkQA commented Apr 2, 2021

Uh oh!

SparkQA commented Apr 2, 2021

Uh oh!

SparkQA commented Apr 2, 2021

Uh oh!

HyukjinKwon commented Apr 3, 2021

Uh oh!

HyukjinKwon commented Apr 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

HyukjinKwon commented Mar 31, 2021 •

edited

Loading

HyukjinKwon commented Mar 31, 2021 •

edited

Loading

HyukjinKwon commented Mar 31, 2021 •

edited

Loading

HyukjinKwon Apr 2, 2021 •

edited

Loading