[SPARK-49827][SQL] Adding batches with retry mechanism for fetching … #48296

Madhukar525722 · 2024-09-29T17:56:00Z

…all partitions from metastore

What changes were proposed in this pull request?

When there is any predicate missing in getPartitionsbyFilter and it tries to fetch all the partitions, the request is broken into smaller chunks as:

Retrieve the names of all partitions using getPartitionNames
Divide the partition names list into smaller batches.
Fetch the partitions using their names with function getPartitionsByNames.

Why are the changes needed?

The change is to address the issue of heavy load on HMS, when there are huge number of partitions(~600,000), the metadata size exceeds the 2Gb limit on the thrift server buffer size. Hence we get socket time out and HMS crashes with OOM as well. Tried to replicate same behaviour as HIVE-27505

Does this PR introduce any user-facing change?

Yes
To enable batching they should be using parameters as:
spark.sql.hive.metastore.batchSize = 1000 , by default it is disabled
spark.sql.metastore.partition.batch.retry.count = 3

How was this patch tested?

Tested in local environment with following performance
With batch size = 1
24/09/28 18:11:21 INFO Shim_v2_3: Fetching all partitions completed in 718 ms

With batch size = -1
24/09/28 18:14:16 INFO Shim_v2_3: Fetching all partitions completed in 51 ms.

With batch size = 10
24/09/28 18:16:20 INFO Shim_v2_3: Fetching all partitions completed in 127 ms.

Was this patch authored or co-authored using generative AI tooling?

No

…all partitions from metastore

Madhukar525722 · 2024-09-30T06:13:57Z

Please review @pan3793 @cloud-fan @HyukjinKwon

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

pan3793 · 2024-10-01T06:54:11Z

the idea makes sense to me, we also have cases of accessing the table that has millions of partitions which causes high pressure on HMS.

given this is a new feature, please open the PR target to master branch, and the new configurations's version should be 4.0.0

HyukjinKwon · 2024-10-03T23:56:07Z

Yeah let's target master branch

Madhukar525722 · 2024-10-04T08:48:24Z

Hi @pan3793 @HyukjinKwon . I have raised the request for master in #48337 . Please review.
In master the get all partitions request is migrated to function getAllPartitionsOf, which has the implementation of HIVE-27505. If hive is upgraded to 3 from 2.x, this change will not be required anymore. Therefore, I believe this fix is more relevant to the lower versions of Spark as well.
Thank you

github-actions · 2025-01-27T00:24:54Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added the SQL label Sep 29, 2024

[SPARK-49827][CORE] Adding batches with retry mechanism for fetching …

dc26820

…all partitions from metastore

Madhukar525722 force-pushed the HMS_OOM branch from 5d23b84 to dc26820 Compare September 29, 2024 19:47

Madhukar525722 changed the title ~~[SPARK-49827][CORE] Adding batches with retry mechanism for fetching …~~ [SPARK-49827][SQL] Adding batches with retry mechanism for fetching … Sep 30, 2024

pan3793 reviewed Oct 1, 2024

View reviewed changes

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala Outdated Show resolved Hide resolved

Madhukar525722 force-pushed the HMS_OOM branch from 06b87d5 to 38e2caa Compare October 3, 2024 08:09

enable decaying batch size

af4adda

Madhukar525722 force-pushed the HMS_OOM branch from 38e2caa to af4adda Compare October 3, 2024 09:20

github-actions bot added the Stale label Jan 27, 2025

github-actions bot closed this Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-49827][SQL] Adding batches with retry mechanism for fetching … #48296

[SPARK-49827][SQL] Adding batches with retry mechanism for fetching … #48296

Uh oh!

Madhukar525722 commented Sep 29, 2024 •

edited

Loading

Uh oh!

Madhukar525722 commented Sep 30, 2024 •

edited

Loading

Uh oh!

Uh oh!

pan3793 commented Oct 1, 2024

Uh oh!

HyukjinKwon commented Oct 3, 2024

Uh oh!

Madhukar525722 commented Oct 4, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Jan 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-49827][SQL] Adding batches with retry mechanism for fetching … #48296

[SPARK-49827][SQL] Adding batches with retry mechanism for fetching … #48296

Uh oh!

Conversation

Madhukar525722 commented Sep 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Madhukar525722 commented Sep 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pan3793 commented Oct 1, 2024

Uh oh!

HyukjinKwon commented Oct 3, 2024

Uh oh!

Madhukar525722 commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Madhukar525722 commented Sep 29, 2024 •

edited

Loading

Madhukar525722 commented Sep 30, 2024 •

edited

Loading

Madhukar525722 commented Oct 4, 2024 •

edited

Loading