Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Apr 3, 2021

What changes were proposed in this pull request?

#32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way.

NOTE that looks like GitHub Actions use four types of CPU given my observations:

  • Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
  • Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
  • Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
  • Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz

Given my quick research, seems like they perform roughly similarly:

Screen Shot 2021-04-03 at 9 31 23 PM

I couldn't find enough information about Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz but the performance seems roughly similar given the numbers.

So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guarantee the same number of cores and same memory with the same softwares.

Why are the changes needed?

To have a base line of the benchmarks accordingly.

Does this PR introduce any user-facing change?

No, dev-only.

How was this patch tested?

It was generated from:

@SparkQA
Copy link

SparkQA commented Apr 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41459/

@SparkQA
Copy link

SparkQA commented Apr 3, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41459/

@SparkQA
Copy link

SparkQA commented Apr 3, 2021

Test build #136883 has finished for PR 32044 at commit 33f2ebe.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

2500 select expressions 211 214 4 0.0 210927791.0 0.0X
1 select expressions 1 2 0 0.0 1296117.0 1.0X
100 select expressions 9 11 1 0.0 8808690.0 0.1X
2500 select expressions 422 426 5 0.0 421632363.0 0.0X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regression by 2 times?

Select 1000 columns 96330 99161 NaN 0.0 96329.7 1.0X
Select 100 columns 41414 42672 1556 0.0 41414.1 2.3X
Select one column 35365 36113 662 0.0 35365.4 2.7X
count() 18845 18867 26 0.1 18845.0 5.1X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regression by 2 times

@MaxGekk
Copy link
Member

MaxGekk commented Apr 3, 2021

+1, LGTM. The PR updates only benchmark results. The failed GA are not related to this PR. Merging to master.
Thank you @HyukjinKwon , and @wangyum @dongjoon-hyun for your reviews.

@LuciferYang
Copy link
Contributor

@HyukjinKwon Can we use this way to generate the benchmarks results with Java 17?

On the other hand, I found some benchmarks do not have corresponding Java 11 result files, such as UpdateFieldsBenchmark and CharVarcharBenchmark, Is this expected?

@LuciferYang
Copy link
Contributor

LuciferYang commented Oct 27, 2021

@HyukjinKwon Can we use this way to generate the benchmarks results with Java 17?

Let me study #32015 first. Should all new benchmarks results need generate in this way?

@HyukjinKwon
Copy link
Member Author

Yes, they all should generate the files for JDK 11. If they don't, it's a bug.

Yes, we should have another set of these benchmark result files for JDK 17 separately

@LuciferYang
Copy link
Contributor

Thank you for your explanation

@HyukjinKwon HyukjinKwon deleted the SPARK-34950 branch January 4, 2022 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants