Skip to content

QST: How to prepare my data to avoid being unable to infer frequency #53713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
jambudipa opened this issue Jun 18, 2023 · 1 comment
Closed
2 tasks done
Labels
Needs Triage Issue that has not been reviewed by a pandas team member Usage Question

Comments

@jambudipa
Copy link

jambudipa commented Jun 18, 2023

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/76500849/how-to-prepare-my-data-to-avoid-being-unable-to-infer-frequency

Question about pandas

The following code from pandas/tseries/frequencies.py is causing my code to fall over:

if not self.is_monotonic or not self.index._is_unique:
    return None

delta = self.deltas[0]
ppd = periods_per_day(self._creso)
if delta and _is_multiple(delta, ppd):
    return self._infer_daily_rule()

# Business hourly, maybe. 17: one day / 65: one weekend
if self.hour_deltas in ([1, 17], [1, 65], [1, 17, 65]):
    return "BH"

# Possibly intraday frequency.  Here we use the
# original .asi8 values as the modified values
# will not work around DST transitions.  See #8772
if not self.is_unique_asi8:
    return None

The first test, self.index._is_unique, passes fine; the second, not self.is_unique_asi8, fails, and returns None.

I have looked at this issue and the corresponding PR but 🤷🏻

My code, it its current form, looks like this:

db = Database()
df, last_trade_time = db.fetch_trades()

# Convert the time column to a datetime object with the unit of seconds
df['time'] = pd.to_datetime(df['time'], unit='s')

# Localize the timestamps to UTC
df['time'] = df['time'].dt.tz_localize('UTC')

# Ensure uniqueness by adding the index as nanoseconds
df['time'] = df['time'] + pd.to_timedelta(df.index, unit='ns')

# Set DataFrame index
df.set_index('time', inplace=True)

dataset = PandasDataset(df, target="price")

These times are in seconds, with sub-nanometer precision (from Karken).

How can I prepare my data? Only a month or so of Python experience here...🤣

@jambudipa jambudipa added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jun 18, 2023
@mroeschke
Copy link
Member

Thanks for the issue but the stackoverflow link is the appropriate place for these types of question so closing. If you suspect there's a bug then please reopen with a minimal example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issue that has not been reviewed by a pandas team member Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants