Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 14, 2024

What changes were proposed in this pull request?

This reverts commit 717a6da.

Why are the changes needed?

To fix a performance regression.

During the regular performance audit,

ExternalAppendOnlyUnsafeRowArrayBenchmark detected a performance regression caused by SPARK-48626.

Does this PR introduce any user-facing change?

No. This is not released yet.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Aug 14, 2024
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Aug 14, 2024

@dongjoon-hyun
Copy link
Member Author

Also, cc @yaooqinn from #47743 , too

@dongjoon-hyun
Copy link
Member Author

Thank you, @cloud-fan and @LuciferYang .

@dongjoon-hyun
Copy link
Member Author

All relevant tests passed.
Since this is a revert to the original code, let me merge this.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-48628 branch August 14, 2024 03:50
JoshRosen pushed a commit that referenced this pull request Aug 21, 2024
### What changes were proposed in this pull request?

This PR is trying to revive #47192, which was [reverted](#47747) due to regression in `ExternalAppendOnlyUnsafeRowArrayBenchmark`.

**Root cause**
We eventually decided to aggregate peak memory usage from all consumers on each `acquireExecutionMemory` invocation. (see [this discussion](#47192 (comment))), which is O(n) complexity where `n` is the number of consumers.

`ExternalAppendOnlyUnsafeRowArrayBenchmark` is implemented in a way that all iterations are run in a single task context, therefore the number of consumers is exploding.

Notice that `TaskMemoryManager.consumers` is never cleaned up the whole lifecycle, and `TaskMemoryManager.acquireExecutionMemory` is a very frequent operation, doing a linear complexity(in terms of number of consumers) operation here might not be a good choice. This benchmark might be a corner case, but it's still possible to have a large number of consumers in a large query plan.

I fallback to the previous implementation: maintain current execution memory with an extra lock. cc Ngone51

#### Benchmark result
[ExternalAppendOnlyUnsafeRowArrayBenchmark-results](https://github.com/liuzqt/spark/actions/runs/10415213026)
[ExternalAppendOnlyUnsafeRowArrayBenchmark-jdk21-results](https://github.com/liuzqt/spark/actions/runs/10414246805)

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
New unit tests.

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #47776 from liuzqt/SPARK-48628.

Authored-by: Ziqi Liu <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
a0x8o added a commit to a0x8o/spark that referenced this pull request Aug 21, 2024
### What changes were proposed in this pull request?

This PR is trying to revive apache/spark#47192, which was [reverted](apache/spark#47747) due to regression in `ExternalAppendOnlyUnsafeRowArrayBenchmark`.

**Root cause**
We eventually decided to aggregate peak memory usage from all consumers on each `acquireExecutionMemory` invocation. (see [this discussion](apache/spark#47192 (comment))), which is O(n) complexity where `n` is the number of consumers.

`ExternalAppendOnlyUnsafeRowArrayBenchmark` is implemented in a way that all iterations are run in a single task context, therefore the number of consumers is exploding.

Notice that `TaskMemoryManager.consumers` is never cleaned up the whole lifecycle, and `TaskMemoryManager.acquireExecutionMemory` is a very frequent operation, doing a linear complexity(in terms of number of consumers) operation here might not be a good choice. This benchmark might be a corner case, but it's still possible to have a large number of consumers in a large query plan.

I fallback to the previous implementation: maintain current execution memory with an extra lock. cc Ngone51

#### Benchmark result
[ExternalAppendOnlyUnsafeRowArrayBenchmark-results](https://github.com/liuzqt/spark/actions/runs/10415213026)
[ExternalAppendOnlyUnsafeRowArrayBenchmark-jdk21-results](https://github.com/liuzqt/spark/actions/runs/10414246805)

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
New unit tests.

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #47776 from liuzqt/SPARK-48628.

Authored-by: Ziqi Liu <[email protected]>
Signed-off-by: Josh Rosen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants