Skip to content

Conversation

@steven-aerts
Copy link
Contributor

For bigger spark runs the json parsed by the history server can contain string sizes which are too big for the default jackson string limit introduced in jackson 2.15.
This patch just disables this filter, to make sure JsonProtocol stays backwards compatible with previous versions.

What changes were proposed in this pull request?

Remove the Jackson limit on stringlength introduced in Jackson 2.15, preventing us to read the history of complex/bigger jobs in spark 3.5.3.

Why are the changes needed?

History server crash see SPARK-49872 for stacktrace.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Validated locally.

Was this patch authored or co-authored using generative AI tooling?

No

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49872][HISTORYSERVER] allow unlimited json size again [SPARK-49872][CORE] allow unlimited json size again Jan 18, 2025
@steven-aerts steven-aerts force-pushed the SPARK-49872-historyserver-string-overflow branch 2 times, most recently from 91220e8 to 4a1a306 Compare January 20, 2025 15:57
@dongjoon-hyun
Copy link
Member

Thank you for updating. Could you make CI happy, @steven-aerts ?

@steven-aerts steven-aerts force-pushed the SPARK-49872-historyserver-string-overflow branch 2 times, most recently from cbaecfc to c997d8c Compare January 28, 2025 11:43
@steven-aerts steven-aerts force-pushed the SPARK-49872-historyserver-string-overflow branch 2 times, most recently from 671f68b to 4c980e2 Compare January 28, 2025 14:13
@dongjoon-hyun
Copy link
Member

According to the CI results, this PR seems to introduce a binary compatibility issue.

[info] spark-examples: mimaPreviousArtifacts not set, not analyzing binary compatibility
[error] java.lang.RuntimeException: Failed binary compatibility check against org.apache.spark:spark-core_2.13:3.5.0! Found 6 potential problems (filtered 4083)

FYI, if the change is a valid one, you can add the broken parts explicitly here, @steven-aerts .

@steven-aerts steven-aerts force-pushed the SPARK-49872-historyserver-string-overflow branch from 4c980e2 to 701e34f Compare January 30, 2025 14:37
@steven-aerts
Copy link
Contributor Author

According to the CI results, this PR seems to introduce a binary compatibility issue.

[info] spark-examples: mimaPreviousArtifacts not set, not analyzing binary compatibility
[error] java.lang.RuntimeException: Failed binary compatibility check against org.apache.spark:spark-core_2.13:3.5.0! Found 6 potential problems (filtered 4083)

FYI, if the change is a valid one, you can add the broken parts explicitly here, @steven-aerts .

* https://github.com/apache/spark/blob/master/project/MimaExcludes.scala

@dongjoon-hyun by keeping/migrating some stuff in JsonProtocol object annotated as @deprecated I was able to make CI happy. Thanks for the suggestion.
Some other tests are still running, but last time they succeeded. 🤞

@steven-aerts steven-aerts force-pushed the SPARK-49872-historyserver-string-overflow branch from 0269071 to 903cb64 Compare February 21, 2025 07:15
@arunaru-te
Copy link

@steven-aerts thanks for this PR, do we have any ETA for merge :)

@allekai
Copy link

allekai commented Mar 28, 2025

This PR would be highly appreciated :)

@steven-aerts steven-aerts requested a review from pjfanning March 28, 2025 15:22
@steven-aerts
Copy link
Contributor Author

@dongjoon-hyun @LuciferYang I think this PR is ready to be merged.

@arunaru-te
Copy link

@steven-aerts any ETA on this PR..?

releasing this would help me to move away from custom solution i built, thanks

@steven-aerts
Copy link
Contributor Author

@arunaru-te for me this change is ready for submission, this can only done by someone with commit rights like @dongjoon-hyun or @LuciferYang .

@coderRPN
Copy link

coderRPN commented May 8, 2025

@steven-aerts any ETA on this PR..?

releasing this would help me to move away from custom solution i built, thanks

Would you be able to share that custom solution coz I can't upgrade my spark yet, even if this pr gets merged.

@LuciferYang
Copy link
Contributor

@steven-aerts Could you retrigger the failed test? I'll merge this pr once the GA passes.

@steven-aerts steven-aerts force-pushed the SPARK-49872-historyserver-string-overflow branch from 6e3fe8b to cbb4f6b Compare May 14, 2025 07:10
@steven-aerts
Copy link
Contributor Author

@LuciferYang the yarn tests seem to fail consistently.
I now tried to rebase this PR on the latest version of master in the hope it gets through.

When I run the failing tests locally in sbt they all run successfully.
In the github action those yarn tests however fail with exit code 13, pointing to yarn not being able to start the application somehow. In older versions of this patch yarn got through nicely.

I am still on it trying to reproduce, find out more, will keep you up to date.

For bigger spark runs the json parsed by the history server can contain
string sizes which are too big for the default jackson string limit
introduced in jackson 2.15.
This patch just disables this filter by default, to make sure
JsonProtocol stays backwards compatible with previous versions.

The internal configuration option spark.eventLog.readerMaxStringLength
can be used to re-introduce an arbitrary limit.
@steven-aerts steven-aerts force-pushed the SPARK-49872-historyserver-string-overflow branch from cbb4f6b to 34cb2bc Compare May 15, 2025 05:34
@steven-aerts
Copy link
Contributor Author

I rebased my patch on #50891 in the hope to get better visibility on what goes wrong here 🙏 .

@LuciferYang
Copy link
Contributor

I rebased my patch on #50891 in the hope to get better visibility on what goes wrong here 🙏 .

thank you @steven-aerts

@steven-aerts
Copy link
Contributor Author

@LuciferYang and now I cannot reproduce it anymore in the github actions either. So after the rebase #50891 the yarn tests are successful in github. I ran them 4 times today.
I assume this is a transient issue. With #50891 we will be able to debug it when it happens again.

For me this PR can now be submitted.

Thanks a lot for all your wonderful support.

@LuciferYang
Copy link
Contributor

Merged into master.
Thanks for your work @steven-aerts
Thanks for your review @dongjoon-hyun @pjfanning @roczei

yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
For bigger spark runs the json parsed by the history server can contain string sizes which are too big for the default jackson string limit introduced in jackson 2.15.
This patch just disables this filter, to make sure JsonProtocol stays backwards compatible with previous versions.

### What changes were proposed in this pull request?
Remove the Jackson limit on stringlength introduced in Jackson 2.15, preventing us to read the history of complex/bigger jobs in spark 3.5.3.

### Why are the changes needed?

History server crash see SPARK-49872 for stacktrace.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Validated locally.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#49163 from steven-aerts/SPARK-49872-historyserver-string-overflow.

Authored-by: Steven Aerts <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
@kraj007
Copy link

kraj007 commented Jun 18, 2025

Hello @steven-aerts Is this available in spark 3.5.0 version also ?

@steven-aerts
Copy link
Contributor Author

@kraj007 this is only available on the 4.0 branch today. But it should be possible to cherry pick it.

@dongjoon-hyun
Copy link
Member

Just a correction to the above comment: For the record, this(SPARK-49872) is only at master branch for Apache Spark 4.1.0 for now. We have no plan to backport this to Apache Spark 4.0.1 or 3.5.6 yet.

this is only available on the 4.0 branch today. But it should be possible to cherry pick it.

@amonteagudomoreno
Copy link

Is there someone that has been able to cherry pick this in a PySpark environment, we are specifically using Spark 3.5.3

Any help is appreciated

@pjfanning
Copy link
Member

@dongjoon-hyun @steven-aerts Would the Spark team consider supporting a simpler change for branch-4.0 and maybe branch-3.5, where the Jackson max string size is set to Integer.MAX_VALUE in the JsonProtocol object (as it still is in those branches)?

  // SPARK-49872 remove limit on string lengths (made configurable in Spark 4.1.0)
  mapper.getFactory.setStreamReadConstraints(
    StreamReadConstraints.builder().maxStringLength(Int.MaxValue).build()
  )

@cloud-fan
Copy link
Contributor

Hi @steven-aerts and @dongjoon-hyun , while I agree that it's more ideal to pass a SparkConf parameter to JsonProtocol, I feel it's a bit too late to make this change now as object JsonProtocol has been widely used already. Moreover, we can have a surgical fix to mitigate the Jackson regression, by calling setStreamReadConstraints. The invasive changes here are mostly for a new feature: making the json max size configurable. It's arguable how valuable it is to make it configurable, but users can still configure it via the conf file, and SparkConf instaintation will load it.

My proposal here is to revert this change and make a surgical fix for both master, 4.0 and 3.5. What do you think?

@dongjoon-hyun
Copy link
Member

Since this (SPARK-49872) is a part of 4.1.0 which is unreleased yet, I agree with @cloud-fan 's surgical change suggestion for all live branches (master/4.0/3.5). If the author and other committers agree with the new approach, I believe @cloud-fan can proceed in that way.

@dongjoon-hyun
Copy link
Member

Thank you for bringing up the alternative, @cloud-fan .

dongjoon-hyun pushed a commit that referenced this pull request Aug 15, 2025
### What changes were proposed in this pull request?

A clean revert of #49163 . We will make a surgical fix that can be backported to older branches.

### Why are the changes needed?

this fix is too large to backport.

### Does this PR introduce _any_ user-facing change?

no, it's not released yet.

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #52036 from cloud-fan/json.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
cloud-fan added a commit that referenced this pull request Aug 19, 2025
### What changes were proposed in this pull request?

This is a surgical fix extracted from #49163

The default jackson string limit introduced in jackson 2.15 can be too small for certain workloads, and this PR removes this limitation to avoid any regression.

### Why are the changes needed?

fix regression

### Does this PR introduce _any_ user-facing change?

Yes, users won't hit this size limitation anymore.

### How was this patch tested?

#49163 tested it. We won't add a test in this PR as generating a super large JSON will make the CI unstable.

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #52049 from cloud-fan/json.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan added a commit that referenced this pull request Aug 19, 2025
### What changes were proposed in this pull request?

This is a surgical fix extracted from #49163

The default jackson string limit introduced in jackson 2.15 can be too small for certain workloads, and this PR removes this limitation to avoid any regression.

### Why are the changes needed?

fix regression

### Does this PR introduce _any_ user-facing change?

Yes, users won't hit this size limitation anymore.

### How was this patch tested?

#49163 tested it. We won't add a test in this PR as generating a super large JSON will make the CI unstable.

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #52049 from cloud-fan/json.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 076618a)
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan added a commit that referenced this pull request Aug 19, 2025
This is a surgical fix extracted from #49163

The default jackson string limit introduced in jackson 2.15 can be too small for certain workloads, and this PR removes this limitation to avoid any regression.

fix regression

Yes, users won't hit this size limitation anymore.

#49163 tested it. We won't add a test in this PR as generating a super large JSON will make the CI unstable.

no

Closes #52049 from cloud-fan/json.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 076618a)
Signed-off-by: Wenchen Fan <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…tps://github.corp.ebay.com/carmel/ebay-spark/pull/863 (apache#22)

This is a surgical fix extracted from
apache#49163

The default jackson string limit introduced in jackson 2.15 can be too
small for certain workloads, and this PR removes this limitation to
avoid any regression.

fix regression

Yes, users won't hit this size limitation anymore.

apache#49163 tested it. We won't add a
test in this PR as generating a super large JSON will make the CI
unstable.

no

Closes apache#52049 from cloud-fan/json.

Lead-authored-by: Wenchen Fan <[email protected]>


(cherry picked from commit 076618a)

Signed-off-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
### What changes were proposed in this pull request?

This is a surgical fix extracted from apache#49163

The default jackson string limit introduced in jackson 2.15 can be too small for certain workloads, and this PR removes this limitation to avoid any regression.

### Why are the changes needed?

fix regression

### Does this PR introduce _any_ user-facing change?

Yes, users won't hit this size limitation anymore.

### How was this patch tested?

apache#49163 tested it. We won't add a test in this PR as generating a super large JSON will make the CI unstable.

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#52049 from cloud-fan/json.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a76c1a4)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.