Skip to content

datetimeindex.to_period not behaving as expected. #23253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bmoscon opened this issue Oct 20, 2018 · 5 comments
Open

datetimeindex.to_period not behaving as expected. #23253

bmoscon opened this issue Oct 20, 2018 · 5 comments
Labels
Bug Period Period data type

Comments

@bmoscon
Copy link
Contributor

bmoscon commented Oct 20, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd
from datetime import datetime as dt

dates = [
            dt(2000, 1, 1, 0, 0),
            dt(2000, 1, 1, 0, 0, 30),
            dt(2000, 1, 1, 0, 1),
            dt(2000, 1, 1, 0, 1, 30),
            dt(2000, 1, 1, 0, 2),
            dt(2000, 1, 1, 0, 2, 30),
            dt(2000, 1, 1, 0, 3),
            dt(2000, 1, 1, 0, 3, 30),
            dt(2000, 1, 1, 0, 4),
            dt(2000, 1, 1, 0, 4, 30),
            dt(2000, 1, 1, 0, 5),
            dt(2000, 1, 1, 0, 5, 30),
            dt(2000, 1, 1, 0, 6),
            dt(2000, 1, 1, 0, 6, 30)]

idx = pd.DatetimeIndex(dates)
idx.to_period('5T')

Problem description

By changing the period to 5 minutes, I expected the output to be:

PeriodIndex(['2000-01-01 00:00', '2000-01-01 00:00', '2000-01-01 00:00',
             '2000-01-01 00:00', '2000-01-01 00:00', '2000-01-01 00:00',
             '2000-01-01 00:00', '2000-01-01 00:00', '2000-01-01 00:00',
             '2000-01-01 00:00', '2000-01-01 00:05', '2000-01-01 00:05',
             '2000-01-01 00:05', '2000-01-01 00:05'],
            dtype='period[5T]', freq='5T')

but it is

PeriodIndex(['2000-01-01 00:00', '2000-01-01 00:00', '2000-01-01 00:01',
             '2000-01-01 00:01', '2000-01-01 00:02', '2000-01-01 00:02',
             '2000-01-01 00:03', '2000-01-01 00:03', '2000-01-01 00:04',
             '2000-01-01 00:04', '2000-01-01 00:05', '2000-01-01 00:05',
             '2000-01-01 00:06', '2000-01-01 00:06'],
            dtype='period[5T]', freq='5T')

Which makes it look like the actual period is one minute ('T') since thats what the output corresponds to, but the freq shows correctly as 5T.

Have I misunderstood what to_period should return? or is this a bug?

I am using the latest version of pandas (0.23.4)

@TomAugspurger
Copy link
Contributor

cc @jbrockmendel. Not sure if this is expected, but it does look strange.

@jbrockmendel
Copy link
Member

That is weird. Possibly related to #17666?

@sinhrks
Copy link
Member

sinhrks commented Oct 23, 2018

Looks the same as #14070.

I think there was an issue regarding Period definition. Current doc says Period is span, but not clear whether pd.Period('2000-01-01 00:01', freq='2T') is invalid or 2T period starting from '2000-01-01 00:01'.

@sinhrks sinhrks added the Period Period data type label Oct 23, 2018
@bmoscon
Copy link
Contributor Author

bmoscon commented Oct 23, 2018

I was going by this part of the docs:

http://pandas.pydata.org/pandas-docs/stable/timeseries.html#combining-aliases

the combined offsets '2H20min' look like I'd expect, periods that are spaced 2 hours and 20 minutes apart.

@bmoscon
Copy link
Contributor Author

bmoscon commented Oct 23, 2018

as best I can tell the issue is in indexes/period.py

def dt64arr_to_periodarr(data, freq, tz):
    if data.dtype != np.dtype('M8[ns]'):
        raise ValueError('Wrong dtype: %s' % data.dtype)

    freq = Period._maybe_convert_freq(freq)
    base, mult = _gfc(freq)
    return period.dt64arr_to_periodarr(data.view('i8'), base, tz)

_gfc returns the correct frequency code, 8000, but the mult, 5, is not used when constructing the periodarr in _libs/tslibs/period.pyx. The resulting period array thus buckets the original datetimes incorrectly into minute buckets.

@mroeschke mroeschke added the Bug label May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Period Period data type
Projects
None yet
Development

No branches or pull requests

5 participants