-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28355][CORE][PYTHON] Use Spark conf for threshold at which command is compressed by broadcast #25123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
| private[spark] val BROADCAST_UDF_THRESHOLD = ConfigBuilder("spark.broadcast.UDFThreshold") | ||
| .doc("The threshold at which a serialized command is compressed by broadcast, in " + | ||
| "bytes unless otherwise specified") | ||
| .bytesConf(ByteUnit.BYTE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add checkValue?
| "mechanisms to guarantee data won't be corrupted during broadcast") | ||
| .booleanConf.createWithDefault(true) | ||
|
|
||
| private[spark] val BROADCAST_UDF_THRESHOLD = ConfigBuilder("spark.broadcast.UDFThreshold") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name looks confusing. COMMAND_COMPRESSION_THRESHOLD?
|
cc @HyukjinKwon |
|
Test build #107560 has finished for PR 25123 at commit
|
|
|
||
| private[spark] val BROADCAST_FOR_UDF_COMPRESSION_THRESHOLD = | ||
| ConfigBuilder("spark.broadcast.UDFCompressionThreshold") | ||
| .doc("The threshold at which a a user-defined function (UDF) is compressed by broadcast, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also applies to RDD APIs. We can just say, for instance, The threshold at which Python commands for RDD APIs and user-defined function (UDF) are serialized by broadcast .... Feel free to change wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the description to include this.
|
Test build #107564 has finished for PR 25123 at commit
|
|
Test build #107603 has finished for PR 25123 at commit
|
|
Test build #107623 has finished for PR 25123 at commit
|
|
LGTM Thanks! Merged to master. |
|
LGTM too |
…mand is compressed by broadcast ## What changes were proposed in this pull request? The `_prepare_for_python_RDD` method currently broadcasts a pickled command if its length is greater than the hardcoded value `1 << 20` (1M). This change sets this value as a Spark conf instead. ## How was this patch tested? Unit tests, manual tests. Closes apache#25123 from jessecai/SPARK-28355. Authored-by: Jesse Cai <[email protected]> Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
The
_prepare_for_python_RDDmethod currently broadcasts a pickled command if its length is greater than the hardcoded value1 << 20(1M). This change sets this value as a Spark conf instead.How was this patch tested?
Unit tests, manual tests.