-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31878][SQL] Create date formatter only once in HiveResult
#28687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan @HyukjinKwon @juliuszsompolski Please, review this PR. |
| // Here, `dateFormatter` is used only for formatting, so, we can initialize it by | ||
| // any time zone once, for instance, by the current session time zone. And we can | ||
| // reuse it even when the session time zone might be changed. | ||
| private val dateFormatter = DateFormatter(zoneId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case, can hiveResultString() or toHiveString() be called from multiple threads in parallel? If so, we cannot use SimpleDateFormat here because it is not thread-safe. In that case, dateFormatter must use FastDateFormat because it is thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be called concurrently (e.g. by concurrent thriftserver queries).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I use FastDateFormat for java.sql.Date which is thread-safe and DateTimeFormatter for java.time.LocalDate. The former one is thread-safe as well. Should be fine.
HiveResultHiveResult
|
Test build #123350 has finished for PR 28687 at commit
|
|
Test build #123351 has finished for PR 28687 at commit
|
|
+1, LGTM pending tests. |
HiveResultHiveResult
|
Spark 2.4 also creates date formatter only once, we need to backport it to fix perf regression. Since the last commit just updates the comment, we don't need to wait for the jenkins. Thanks, merging to master/3.0! |
|
Test build #123484 has finished for PR 28687 at commit
|
### What changes were proposed in this pull request? 1. Replace `def dateFormatter` to `val dateFormatter`. 2. Modify the `date formatting in hive result` test in `HiveResultSuite` to check modified code on various time zones. ### Why are the changes needed? To avoid creation of `DateFormatter` per every incoming date in `HiveResult.toHiveString`. This should eliminate unnecessary creation of `SimpleDateFormat` instances and compilation of the default pattern `yyyy-MM-dd`. The changes can speed up processing of legacy date values of the `java.sql.Date` type which is collected by default. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Modified a test in `HiveResultSuite`. Closes #28687 from MaxGekk/HiveResult-val-dateFormatter. Authored-by: Max Gekk <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 125a89c) Signed-off-by: Wenchen Fan <[email protected]>
|
it has a logical conflict with #28706, I'm fixing it |
|
fixed in 349015d as it's urgent. EDIT: I should've opened a PR though... |
What changes were proposed in this pull request?
def dateFormattertoval dateFormatter.date formatting in hive resulttest inHiveResultSuiteto check modified code on various time zones.Why are the changes needed?
To avoid creation of
DateFormatterper every incoming date inHiveResult.toHiveString. This should eliminate unnecessary creation ofSimpleDateFormatinstances and compilation of the default patternyyyy-MM-dd. The changes can speed up processing of legacy date values of thejava.sql.Datetype which is collected by default.Does this PR introduce any user-facing change?
No
How was this patch tested?
Modified a test in
HiveResultSuite.