Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented May 31, 2020

What changes were proposed in this pull request?

  1. Replace def dateFormatter to val dateFormatter.
  2. Modify the date formatting in hive result test in HiveResultSuite to check modified code on various time zones.

Why are the changes needed?

To avoid creation of DateFormatter per every incoming date in HiveResult.toHiveString. This should eliminate unnecessary creation of SimpleDateFormat instances and compilation of the default pattern yyyy-MM-dd. The changes can speed up processing of legacy date values of the java.sql.Date type which is collected by default.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Modified a test in HiveResultSuite.

@MaxGekk
Copy link
Member Author

MaxGekk commented May 31, 2020

@cloud-fan @HyukjinKwon @juliuszsompolski Please, review this PR.

// Here, `dateFormatter` is used only for formatting, so, we can initialize it by
// any time zone once, for instance, by the current session time zone. And we can
// reuse it even when the session time zone might be changed.
private val dateFormatter = DateFormatter(zoneId)
Copy link
Member Author

@MaxGekk MaxGekk May 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, can hiveResultString() or toHiveString() be called from multiple threads in parallel? If so, we cannot use SimpleDateFormat here because it is not thread-safe. In that case, dateFormatter must use FastDateFormat because it is thread-safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be called concurrently (e.g. by concurrent thriftserver queries).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I use FastDateFormat for java.sql.Date which is thread-safe and DateTimeFormatter for java.time.LocalDate. The former one is thread-safe as well. Should be fine.

@MaxGekk MaxGekk changed the title [SPARK-31878][SQL] Create date formatter only once in HiveResult [WIP][SPARK-31878][SQL] Create date formatter only once in HiveResult May 31, 2020
@SparkQA
Copy link

SparkQA commented Jun 1, 2020

Test build #123350 has finished for PR 28687 at commit 9dffb04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 1, 2020

Test build #123351 has finished for PR 28687 at commit b97a84b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@juliuszsompolski
Copy link
Contributor

+1, LGTM pending tests.

@MaxGekk MaxGekk changed the title [WIP][SPARK-31878][SQL] Create date formatter only once in HiveResult [SPARK-31878][SQL] Create date formatter only once in HiveResult Jun 3, 2020
@cloud-fan
Copy link
Contributor

Spark 2.4 also creates date formatter only once, we need to backport it to fix perf regression.

Since the last commit just updates the comment, we don't need to wait for the jenkins. Thanks, merging to master/3.0!

@SparkQA
Copy link

SparkQA commented Jun 3, 2020

Test build #123484 has finished for PR 28687 at commit 6984544.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

cloud-fan pushed a commit that referenced this pull request Jun 3, 2020
### What changes were proposed in this pull request?
1. Replace `def dateFormatter` to `val dateFormatter`.
2. Modify the `date formatting in hive result` test in `HiveResultSuite` to check modified code on various time zones.

### Why are the changes needed?
To avoid creation of `DateFormatter` per every incoming date in `HiveResult.toHiveString`. This should eliminate unnecessary creation of `SimpleDateFormat` instances and compilation of the default pattern `yyyy-MM-dd`. The changes can speed up processing of legacy date values of the `java.sql.Date` type which is collected by default.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Modified a test in `HiveResultSuite`.

Closes #28687 from MaxGekk/HiveResult-val-dateFormatter.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 125a89c)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan cloud-fan closed this in 125a89c Jun 3, 2020
@cloud-fan
Copy link
Contributor

it has a logical conflict with #28706, I'm fixing it

@cloud-fan
Copy link
Contributor

cloud-fan commented Jun 3, 2020

fixed in 349015d as it's urgent.

EDIT: I should've opened a PR though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants