Revert "[SPARK-48628][CORE] Add task peak on/off heap memory metrics" #47747

dongjoon-hyun · 2024-08-14T02:12:57Z

What changes were proposed in this pull request?

This reverts commit 717a6da.

Why are the changes needed?

To fix a performance regression.

During the regular performance audit,

[SPARK-49224][TESTS] Regenerate benchmark results #47743

ExternalAppendOnlyUnsafeRowArrayBenchmark detected a performance regression caused by SPARK-48626.

[SPARK-48628][CORE] Add task peak on/off heap memory metrics #47192

Does this PR introduce any user-facing change?

No. This is not released yet.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

This reverts commit 717a6da.

dongjoon-hyun · 2024-08-14T02:19:22Z

cc @liuzqt , @JoshRosen, @cloud-fan , @jiangxb1987 , @Ngone51 , @mridulm from #47192

dongjoon-hyun · 2024-08-14T02:21:54Z

Also, cc @yaooqinn from #47743 , too

dongjoon-hyun · 2024-08-14T03:38:55Z

Thank you, @cloud-fan and @LuciferYang .

dongjoon-hyun · 2024-08-14T03:49:57Z

All relevant tests passed.
Since this is a revert to the original code, let me merge this.

### What changes were proposed in this pull request? This PR is trying to revive #47192, which was [reverted](#47747) due to regression in `ExternalAppendOnlyUnsafeRowArrayBenchmark`. **Root cause** We eventually decided to aggregate peak memory usage from all consumers on each `acquireExecutionMemory` invocation. (see [this discussion](#47192 (comment))), which is O(n) complexity where `n` is the number of consumers. `ExternalAppendOnlyUnsafeRowArrayBenchmark` is implemented in a way that all iterations are run in a single task context, therefore the number of consumers is exploding. Notice that `TaskMemoryManager.consumers` is never cleaned up the whole lifecycle, and `TaskMemoryManager.acquireExecutionMemory` is a very frequent operation, doing a linear complexity(in terms of number of consumers) operation here might not be a good choice. This benchmark might be a corner case, but it's still possible to have a large number of consumers in a large query plan. I fallback to the previous implementation: maintain current execution memory with an extra lock. cc Ngone51 #### Benchmark result [ExternalAppendOnlyUnsafeRowArrayBenchmark-results](https://github.com/liuzqt/spark/actions/runs/10415213026) [ExternalAppendOnlyUnsafeRowArrayBenchmark-jdk21-results](https://github.com/liuzqt/spark/actions/runs/10414246805) ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? New unit tests. ### Was this patch authored or co-authored using generative AI tooling? NO Closes #47776 from liuzqt/SPARK-48628. Authored-by: Ziqi Liu <[email protected]> Signed-off-by: Josh Rosen <[email protected]>

### What changes were proposed in this pull request? This PR is trying to revive apache/spark#47192, which was [reverted](apache/spark#47747) due to regression in `ExternalAppendOnlyUnsafeRowArrayBenchmark`. **Root cause** We eventually decided to aggregate peak memory usage from all consumers on each `acquireExecutionMemory` invocation. (see [this discussion](apache/spark#47192 (comment))), which is O(n) complexity where `n` is the number of consumers. `ExternalAppendOnlyUnsafeRowArrayBenchmark` is implemented in a way that all iterations are run in a single task context, therefore the number of consumers is exploding. Notice that `TaskMemoryManager.consumers` is never cleaned up the whole lifecycle, and `TaskMemoryManager.acquireExecutionMemory` is a very frequent operation, doing a linear complexity(in terms of number of consumers) operation here might not be a good choice. This benchmark might be a corner case, but it's still possible to have a large number of consumers in a large query plan. I fallback to the previous implementation: maintain current execution memory with an extra lock. cc Ngone51 #### Benchmark result [ExternalAppendOnlyUnsafeRowArrayBenchmark-results](https://github.com/liuzqt/spark/actions/runs/10415213026) [ExternalAppendOnlyUnsafeRowArrayBenchmark-jdk21-results](https://github.com/liuzqt/spark/actions/runs/10414246805) ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? New unit tests. ### Was this patch authored or co-authored using generative AI tooling? NO Closes #47776 from liuzqt/SPARK-48628. Authored-by: Ziqi Liu <[email protected]> Signed-off-by: Josh Rosen <[email protected]>

Revert "[SPARK-48628][CORE] Add task peak on/off heap memory metrics"

61ccd25

This reverts commit 717a6da.

github-actions bot added the CORE label Aug 14, 2024

cloud-fan approved these changes Aug 14, 2024

View reviewed changes

LuciferYang approved these changes Aug 14, 2024

View reviewed changes

dongjoon-hyun closed this in 3fcf041 Aug 14, 2024

dongjoon-hyun deleted the SPARK-48628 branch August 14, 2024 03:50

liuzqt mentioned this pull request Aug 15, 2024

[SPARK-48628][CORE] Add task peak on/off heap memory metrics #47776

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "[SPARK-48628][CORE] Add task peak on/off heap memory metrics" #47747

Revert "[SPARK-48628][CORE] Add task peak on/off heap memory metrics" #47747

Uh oh!

dongjoon-hyun commented Aug 14, 2024 •

edited

Loading

Uh oh!

dongjoon-hyun commented Aug 14, 2024 •

edited

Loading

Uh oh!

dongjoon-hyun commented Aug 14, 2024

Uh oh!

dongjoon-hyun commented Aug 14, 2024

Uh oh!

dongjoon-hyun commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Revert "[SPARK-48628][CORE] Add task peak on/off heap memory metrics" #47747

Revert "[SPARK-48628][CORE] Add task peak on/off heap memory metrics" #47747

Uh oh!

Conversation

dongjoon-hyun commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Aug 14, 2024

Uh oh!

dongjoon-hyun commented Aug 14, 2024

Uh oh!

dongjoon-hyun commented Aug 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dongjoon-hyun commented Aug 14, 2024 •

edited

Loading

dongjoon-hyun commented Aug 14, 2024 •

edited

Loading