-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Respect execution timezone in to_timestamp
and related functions
#18025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implement timezone-aware handling for to_timestamp functions by adding the ConfiguredTimeZone utilities. Refactor shared helpers to ensure naïve strings are interpreted using the configured execution zone. Extend unit tests to cover naïve and formatted inputs respecting non-UTC execution timezones.
Deferred execution of timezone parsing ensures that the configured timezone string is only interpreted when dealing with UTF-8 inputs. This change keeps numeric arguments unaffected by any invalid session timezone values.
I ran a simple test, but it appears this PR doesn't resolve the issue: > SET TIMEZONE="+05";
0 row(s) fetched.
Elapsed 0.001 seconds.
> SELECT arrow_typeof(to_timestamp('2023-01-31T09:26:56.123456789'));
+-------------------------------------------------------------------+
| arrow_typeof(to_timestamp(Utf8("2023-01-31T09:26:56.123456789"))) |
+-------------------------------------------------------------------+
| Timestamp(Nanosecond, None) |
+-------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.006 seconds. The timezone is still |
Pausing this to continue after #18017 to see how we approach modifying return_field_from_args Modifying |
Thank you. I'm leaning towards handling the |
|
||
fn parse_fixed_offset(tz: &str) -> Option<FixedOffset> { | ||
let tz = tz.trim(); | ||
if tz.eq_ignore_ascii_case("utc") || tz.eq_ignore_ascii_case("z") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tz::from_str(tz)
doesn't account for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked to confirm that it does not handle lower case:
fn parse_fixed_offset_accepts_lowercase_and_z() -> Result<()> {
use std::str::FromStr;
assert!(!Tz::from_str("utc").is_err());
ConfiguredTimeZone::parse("utc")?; // succeeds via parse_fixed_offset fallback
ConfiguredTimeZone::parse("Z")?; // succeeds via parse_fixed_offset fallback
Ok(())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. While I expect it's to spec I can't see a reason why that impl shouldn't be case insensitive. I'll file a ticket for that as well.
} | ||
} | ||
|
||
fn parse_fixed_offset(tz: &str) -> Option<FixedOffset> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure as for the need for this as I think it already exists in https://github.com/apache/arrow-rs/blob/751b0822a7f0b2647c1c662131131b35f268bfef/arrow-array/src/timezone.rs#L25. As well, the Tz::from_str I think may already handle named and offsets https://github.com/apache/arrow-rs/blob/751b0822a7f0b2647c1c662131131b35f268bfef/arrow-array/src/timezone.rs#L91
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is a private fn
error[E0603]: function `parse_fixed_offset` is private
--> datafusion/functions/src/datetime/common.rs:18:29
|
18 | use arrow::array::timezone::parse_fixed_offset;
| ^^^^^^^^^^^^^^^^^^ private function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll file a ticket to make that pub.
|
||
pub(crate) fn parse(tz: &str) -> Result<Self> { | ||
if tz.trim().is_empty() { | ||
return Ok(Self::utc()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we may want to change this to allow for None - see the start of discussion @ #17993 (comment) and #18017 (comment)
Which issue does this PR close?
Closes #17998
Rationale for this change
Currently,
to_timestamp()
and its precision variants (to_timestamp_seconds
,to_timestamp_millis
,to_timestamp_micros
,to_timestamp_nanos
) ignore the execution timezone provided inScalarFunctionArgs.config_options
.As a result, timestamp strings without explicit timezone information are always interpreted as UTC, regardless of the session’s
datafusion.execution.time_zone
setting.This behavior is incorrect and can lead to inconsistent or unexpected timestamp conversions when users rely on a configured execution timezone.
This PR ensures that all
to_timestamp*
functions correctly respect and apply the configured execution timezone for naïve (timezone-free) datetime strings.What changes are included in this PR?
Introduced a new utility type:
ConfiguredTimeZone
, encapsulating both named timezones (viaarrow::array::timezone::Tz
) and fixed offsets (FixedOffset
).Added timezone parsing helpers:
ConfiguredTimeZone::parse()
to resolve IANA names or±HH:MM
offsets.parse_fixed_offset()
to safely interpret offset strings.Updated all
to_timestamp*
implementations to:config_options.execution.time_zone
.string_to_timestamp_nanos_with_timezone()
orstring_to_timestamp_nanos_formatted_with_timezone()
when parsing naïve datetime strings.Added robust conversion helpers for naïve vs. localized datetimes:
timestamp_to_naive
,datetime_to_timestamp
, andlocal_datetime_to_timestamp
(handling ambiguous/invalid times).Comprehensive test coverage:
to_timestamp_respects_execution_timezone
verifies that configured offsets shift timestamps as expected.to_timestamp_formats_respect_timezone
ensures format-based parsing respects named zones (e.g.,"Asia/Tokyo"
).Are these changes tested?
Yes.
Two new unit tests were added:
to_timestamp_respects_execution_timezone
to_timestamp_formats_respect_timezone
Existing
to_timestamp_*
tests also pass with the new timezone logic.Are there any user-facing changes?
Yes — intended behavioral improvement:
to_timestamp()
and its variants will now be interpreted relative to the configured execution timezone (fromdatafusion.execution.time_zone
), rather than defaulting to UTC.There are no breaking API changes, only corrected behavior aligning with user expectations.