Skip to content

BUG: catch overflow in timestamp + offset operations #15126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,7 @@ Bug Fixes
- Bug in ``DataFrame.groupby().describe()`` when grouping on ``Index`` containing tuples (:issue:`14848`)
- Bug in creating a ``MultiIndex`` with tuples and not passing a list of names; this will now raise ``ValueError`` (:issue:`15110`)

- Bug in catching an overflow in ``Timestamp`` + ``Timedelta/Offset`` operations (:issue:`15126`)



Expand Down
21 changes: 18 additions & 3 deletions pandas/tseries/offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -2710,6 +2710,9 @@ def __add__(self, other):
return self.apply(other)
except ApplyTypeError:
return NotImplemented
except OverflowError:
raise OverflowError("the add operation between {} and {} "
"will overflow".format(self, other))

def __eq__(self, other):
if isinstance(other, compat.string_types):
Expand Down Expand Up @@ -2748,14 +2751,26 @@ def nanos(self):

def apply(self, other):
# Timestamp can handle tz and nano sec, thus no need to use apply_wraps
if isinstance(other, (datetime, np.datetime64, date)):
if isinstance(other, Timestamp):

# in order to avoid a recursive
# call of __add__ and __radd__, when we
# call using the + operator, we directly
# call the known metho, as this by definition is
# a Timestamp
result = other.__add__(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to call __add__ here rather than using +?

Generally calling double-underscore methods directly is an anti-pattern, so if so this definitely deserves a comment to explain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment. The + causes it to segfault. I think its because of a fast-path in the interpreter, or maybe a cython bug. If there is an exception during the the operation. Then we compound this by trying the reversed ops in .apply and the cycle repeats. But I think its too hard (maybe impossible) to change this.

if result == NotImplemented:
raise OverflowError
return result
elif isinstance(other, (datetime, np.datetime64, date)):
return as_timestamp(other) + self

if isinstance(other, timedelta):
return other + self.delta
elif isinstance(other, type(self)):
return type(self)(self.n + other.n)
else:
raise ApplyTypeError('Unhandled type: %s' % type(other).__name__)

raise ApplyTypeError('Unhandled type: %s' % type(other).__name__)

_prefix = 'undefined'

Expand Down
5 changes: 5 additions & 0 deletions pandas/tseries/tests/test_timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -971,6 +971,11 @@ def test_overflow(self):
s2 = s[0:1000]
result = (s2 - s2.min()).sum()

def test_overflow_on_construction(self):
# xref https://github.com/statsmodels/statsmodels/issues/3374
value = pd.Timedelta('1day').value * 20169940
self.assertRaises(OverflowError, pd.Timedelta, value)

def test_timedelta_ops_scalar(self):
# GH 6808
base = pd.to_datetime('20130101 09:01:12.123456')
Expand Down
24 changes: 18 additions & 6 deletions pandas/tseries/tests/test_timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -3312,13 +3312,25 @@ def test_datetime64_with_DateOffset(self):
assert_func(klass([x + op for x in s]), s + op)
assert_func(klass([x - op for x in s]), s - op)
assert_func(klass([op + x for x in s]), op + s)
# def test_add_timedelta64(self):
# rng = date_range('1/1/2000', periods=5)
# delta = rng.values[3] - rng.values[1]

# result = rng + delta
# expected = rng + timedelta(2)
# self.assertTrue(result.equals(expected))
def test_overflow_offset(self):
# xref https://github.com/statsmodels/statsmodels/issues/3374
# ends up multiplying really large numbers which overflow

t = Timestamp('2017-01-13 00:00:00', freq='D')
offset = 20169940 * pd.offsets.Day(1)

def f():
t + offset
self.assertRaises(OverflowError, f)

def f():
offset + t
self.assertRaises(OverflowError, f)

def f():
t - offset
self.assertRaises(OverflowError, f)

def test_get_duplicates(self):
idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-02',
Expand Down
2 changes: 1 addition & 1 deletion pandas/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -3125,7 +3125,7 @@ class Timedelta(_Timedelta):
if not (is_integer_object(other) or is_float_object(other)):
return NotImplemented

return Timedelta(other *self.value, unit='ns')
return Timedelta(other * self.value, unit='ns')

__rmul__ = __mul__

Expand Down