-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix Timestamp constructor changes value on ambiguous DST #30995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix Timestamp constructor changes value on ambiguous DST #30995
Conversation
This took some digging, so please bear with me. Short version
Notes PS
|
b4bec5f
to
2abc37b
Compare
Wasn't able to reproduce the test fail on Windows. Will need to set up a dev environment in Linux and try to reproduce it. Ideas on why this might be happening would be appreciated. |
pandas/_libs/tslibs/conversion.pyx
Outdated
# Two check in if necessary because class | ||
# differs on Windows and Linux | ||
if (isinstance(ts.tzinfo, du_tzfile1) or | ||
isinstance(ts.tzinfo, du_tzfile2)): |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
isinstance(ts.tzinfo, du_tzfile2)): | ||
if ts.tzinfo.is_ambiguous(ts): | ||
dst_offset = ts.tzinfo.dst(ts) | ||
obj.value += int(dst_offset.total_seconds() * 1e9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because pydatetime_to_dt64
doesn't take DST into account but get_utcoffset
does, we need to add the DST timedelta to the result of pydatetime_to_dt64
. This is necessary only for dateutil
, because for ambiguous dates pytz
immediately switches off DST and reduces UTC offset, so no correction is necessary for pytz
timezones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment to this effect here
assert times[-1] == Timestamp("2013-10-27 01:00:00+0100", tz=tz, freq="H") | ||
else: | ||
assert times[-1] == Timestamp("2013-10-27 01:00:00+0000", tz=tz, freq="H") | ||
assert times[-1] == Timestamp("2013-10-27 01:00:00+0000", tz=tz, freq="H") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now no matter whether we use dateutil
or pytz
timezones, we get the same date range, so separate testing conditions are no longer necessary.
pandas/_libs/tslibs/conversion.pyx
Outdated
@@ -5,6 +5,8 @@ cimport numpy as cnp | |||
from numpy cimport int64_t, int32_t, intp_t, ndarray | |||
cnp.import_array() | |||
|
|||
from dateutil.tz import tzfile as _dateutil_tzfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used in timezones.pyx
when we need to check for dateutil
timezone and make sure it works both on Windows and Linux. Seems to work.
pandas/_libs/tslibs/conversion.pyx
Outdated
# GH 24329 Take DST offset into account | ||
# use dateutil.tz.tzfile to check type | ||
if ts.tzinfo is not None: | ||
if isinstance(ts.tzinfo, _dateutil_tzfile): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think you can usetreat_tz_as_dateutil
pandas/pandas/_libs/tslibs/timezones.pyx
Line 42 in d78c061
cdef inline bint treat_tz_as_dateutil(object tz): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And can also remove the if ts.tzinfo is not None
check if you use treat_tz_as_dateutil
directly.
Does this also fix the case for nonexistent times near DST? |
@mroeschke Unfortunately, no. Shifts it back by an hour. I would have to look into why this happens. |
@mroeschke Okay, looked into it. You'll be happy to know that, according to
This is also fine:
Meanwhile, this breaks representation and gets us nonexistent times:
And right on the cusp, the value breaks too:
So the bug is there, but it turns out to be unrelated and matters only for 128 nanoseconds before the switch to summer time. |
@@ -382,6 +382,11 @@ cdef _TSObject convert_datetime_to_tsobject(datetime ts, object tz, | |||
obj.tzinfo = tz | |||
else: | |||
obj.value = pydatetime_to_dt64(ts, &obj.dts) | |||
# GH 24329 Take DST offset into account | |||
if treat_tz_as_dateutil(ts.tzinfo): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke Switched to treat_tz_as_dateutil
. Much cleaner now, thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice investigative work! LGTM.
Yes, please make a separate issue for the nonexistent cases. Your examples are really good.
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -968,6 +968,7 @@ Datetimelike | |||
- Bug in :func:`date_range` with custom business hours as ``freq`` and given number of ``periods`` (:issue:`30593`) | |||
- Bug in :class:`PeriodIndex` comparisons with incorrectly casting integers to :class:`Period` objects, inconsistent with the :class:`Period` comparison behavior (:issue:`30722`) | |||
- Bug in :meth:`DatetimeIndex.insert` raising a ``ValueError`` instead of a ``TypeError`` when trying to insert a timezone-aware :class:`Timestamp` into a timezone-naive :class:`DatetimeIndex`, or vice-versa (:issue:`30806`) | |||
- Bug in :class:`Timestamp` where constructing :class:`Timestamp` from ambiguous epoch time and calling constructor again changed :meth:`Timestamp.value` property (:issue:`24329`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry one more thing, can you move to whatsnew version 1.1.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, otherwise lgtm (ex moving the whatsnew to 1.1). @mroeschke pls merge when satisfied.
isinstance(ts.tzinfo, du_tzfile2)): | ||
if ts.tzinfo.is_ambiguous(ts): | ||
dst_offset = ts.tzinfo.dst(ts) | ||
obj.value += int(dst_offset.total_seconds() * 1e9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment to this effect here
2a14b57
to
322178c
Compare
d980836
to
85767af
Compare
# GH 24329 When datetime is ambiguous, | ||
# pydatetime_to_dt64 doesn't take DST into account | ||
# but with dateutil timezone, get_utcoffset does | ||
# so we need to correct for it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment explaining the reasoning for this solution. Tried to make it as brief as possible.
@@ -59,6 +59,7 @@ Categorical | |||
|
|||
Datetimelike | |||
^^^^^^^^^^^^ | |||
- Bug in :class:`Timestamp` where constructing :class:`Timestamp` from ambiguous epoch time and calling constructor again changed :meth:`Timestamp.value` property (:issue:`24329`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to whatsnew for v1.1.0
@mroeschke Made the required changes. Also performed the final cleanup: rebased to updated master and squashed tiny commits. |
Thanks @AlexKirko! If you'd like to continue investigating #31043, it would be much appreciated. |
test_dti_construction_nonexistent_endpoint had an expected fail because of unresolved 24329. This fix in conjunction with pandas-dev#30995 means it now returns the expected value and does not fail
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff