-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-49872][CORE] allow unlimited json size again #49163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49872][CORE] allow unlimited json size again #49163
Conversation
91220e8 to
4a1a306
Compare
|
Thank you for updating. Could you make CI happy, @steven-aerts ? |
cbaecfc to
c997d8c
Compare
671f68b to
4c980e2
Compare
|
According to the CI results, this PR seems to introduce a binary compatibility issue. FYI, if the change is a valid one, you can add the broken parts explicitly here, @steven-aerts . |
4c980e2 to
701e34f
Compare
@dongjoon-hyun by keeping/migrating some stuff in |
0269071 to
903cb64
Compare
|
@steven-aerts thanks for this PR, do we have any ETA for merge :) |
|
This PR would be highly appreciated :) |
|
@dongjoon-hyun @LuciferYang I think this PR is ready to be merged. |
|
@steven-aerts any ETA on this PR..? releasing this would help me to move away from custom solution i built, thanks |
|
@arunaru-te for me this change is ready for submission, this can only done by someone with commit rights like @dongjoon-hyun or @LuciferYang . |
Would you be able to share that custom solution coz I can't upgrade my spark yet, even if this pr gets merged. |
|
@steven-aerts Could you retrigger the failed test? I'll merge this pr once the GA passes. |
6e3fe8b to
cbb4f6b
Compare
|
@LuciferYang the yarn tests seem to fail consistently. When I run the failing tests locally in I am still on it trying to reproduce, find out more, will keep you up to date. |
For bigger spark runs the json parsed by the history server can contain string sizes which are too big for the default jackson string limit introduced in jackson 2.15. This patch just disables this filter by default, to make sure JsonProtocol stays backwards compatible with previous versions. The internal configuration option spark.eventLog.readerMaxStringLength can be used to re-introduce an arbitrary limit.
cbb4f6b to
34cb2bc
Compare
|
I rebased my patch on #50891 in the hope to get better visibility on what goes wrong here 🙏 . |
thank you @steven-aerts |
|
@LuciferYang and now I cannot reproduce it anymore in the github actions either. So after the rebase #50891 the For me this PR can now be submitted. Thanks a lot for all your wonderful support. |
|
Merged into master. |
For bigger spark runs the json parsed by the history server can contain string sizes which are too big for the default jackson string limit introduced in jackson 2.15. This patch just disables this filter, to make sure JsonProtocol stays backwards compatible with previous versions. ### What changes were proposed in this pull request? Remove the Jackson limit on stringlength introduced in Jackson 2.15, preventing us to read the history of complex/bigger jobs in spark 3.5.3. ### Why are the changes needed? History server crash see SPARK-49872 for stacktrace. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Validated locally. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49163 from steven-aerts/SPARK-49872-historyserver-string-overflow. Authored-by: Steven Aerts <[email protected]> Signed-off-by: yangjie01 <[email protected]>
|
Hello @steven-aerts Is this available in spark 3.5.0 version also ? |
|
@kraj007 this is only available on the 4.0 branch today. But it should be possible to cherry pick it. |
|
Just a correction to the above comment: For the record, this(SPARK-49872) is only at
|
|
Is there someone that has been able to cherry pick this in a PySpark environment, we are specifically using Spark 3.5.3 Any help is appreciated |
|
@dongjoon-hyun @steven-aerts Would the Spark team consider supporting a simpler change for branch-4.0 and maybe branch-3.5, where the Jackson max string size is set to Integer.MAX_VALUE in the JsonProtocol object (as it still is in those branches)? |
|
Hi @steven-aerts and @dongjoon-hyun , while I agree that it's more ideal to pass a My proposal here is to revert this change and make a surgical fix for both master, 4.0 and 3.5. What do you think? |
|
Since this (SPARK-49872) is a part of 4.1.0 which is unreleased yet, I agree with @cloud-fan 's surgical change suggestion for all live branches (master/4.0/3.5). If the author and other committers agree with the new approach, I believe @cloud-fan can proceed in that way. |
|
Thank you for bringing up the alternative, @cloud-fan . |
### What changes were proposed in this pull request? A clean revert of #49163 . We will make a surgical fix that can be backported to older branches. ### Why are the changes needed? this fix is too large to backport. ### Does this PR introduce _any_ user-facing change? no, it's not released yet. ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #52036 from cloud-fan/json. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This is a surgical fix extracted from #49163 The default jackson string limit introduced in jackson 2.15 can be too small for certain workloads, and this PR removes this limitation to avoid any regression. ### Why are the changes needed? fix regression ### Does this PR introduce _any_ user-facing change? Yes, users won't hit this size limitation anymore. ### How was this patch tested? #49163 tested it. We won't add a test in this PR as generating a super large JSON will make the CI unstable. ### Was this patch authored or co-authored using generative AI tooling? no Closes #52049 from cloud-fan/json. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? This is a surgical fix extracted from #49163 The default jackson string limit introduced in jackson 2.15 can be too small for certain workloads, and this PR removes this limitation to avoid any regression. ### Why are the changes needed? fix regression ### Does this PR introduce _any_ user-facing change? Yes, users won't hit this size limitation anymore. ### How was this patch tested? #49163 tested it. We won't add a test in this PR as generating a super large JSON will make the CI unstable. ### Was this patch authored or co-authored using generative AI tooling? no Closes #52049 from cloud-fan/json. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 076618a) Signed-off-by: Wenchen Fan <[email protected]>
This is a surgical fix extracted from #49163 The default jackson string limit introduced in jackson 2.15 can be too small for certain workloads, and this PR removes this limitation to avoid any regression. fix regression Yes, users won't hit this size limitation anymore. #49163 tested it. We won't add a test in this PR as generating a super large JSON will make the CI unstable. no Closes #52049 from cloud-fan/json. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 076618a) Signed-off-by: Wenchen Fan <[email protected]>
…tps://github.corp.ebay.com/carmel/ebay-spark/pull/863 (apache#22) This is a surgical fix extracted from apache#49163 The default jackson string limit introduced in jackson 2.15 can be too small for certain workloads, and this PR removes this limitation to avoid any regression. fix regression Yes, users won't hit this size limitation anymore. apache#49163 tested it. We won't add a test in this PR as generating a super large JSON will make the CI unstable. no Closes apache#52049 from cloud-fan/json. Lead-authored-by: Wenchen Fan <[email protected]> (cherry picked from commit 076618a) Signed-off-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? This is a surgical fix extracted from apache#49163 The default jackson string limit introduced in jackson 2.15 can be too small for certain workloads, and this PR removes this limitation to avoid any regression. ### Why are the changes needed? fix regression ### Does this PR introduce _any_ user-facing change? Yes, users won't hit this size limitation anymore. ### How was this patch tested? apache#49163 tested it. We won't add a test in this PR as generating a super large JSON will make the CI unstable. ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#52049 from cloud-fan/json. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a76c1a4) Signed-off-by: Wenchen Fan <[email protected]>
For bigger spark runs the json parsed by the history server can contain string sizes which are too big for the default jackson string limit introduced in jackson 2.15.
This patch just disables this filter, to make sure JsonProtocol stays backwards compatible with previous versions.
What changes were proposed in this pull request?
Remove the Jackson limit on stringlength introduced in Jackson 2.15, preventing us to read the history of complex/bigger jobs in spark 3.5.3.
Why are the changes needed?
History server crash see SPARK-49872 for stacktrace.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Validated locally.
Was this patch authored or co-authored using generative AI tooling?
No