[ENH] Improve performance of `TimeSeriesDataSet.getitem` #806

denix56 · 2021-12-21T18:39:10Z

Description

Pandas DataFrame is quite slow in comparison to numpy due to additional checks.
By replacing it with np.recarray I was able to improve performance by 5-10%.
Recarray allows us to have nice attribute access as in pandas, while improving performance.
The raw numpy arrays are a bit faster than recarray, however the difference is not as big as between pandas and recarray.
I have tested on Demand Forecasting with gpu=1, 0 workers and pin_memory=True.

codecov-commenter · 2021-12-28T17:35:41Z

Codecov Report

Merging #806 (eb706f9) into master (0b5892a) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #806   +/-   ##
=======================================
  Coverage   89.05%   89.06%           
=======================================
  Files          24       24           
  Lines        3829     3832    +3     
=======================================
+ Hits         3410     3413    +3     
  Misses        419      419

Flag	Coverage Δ
cpu	`89.06% <100.00%> (+<0.01%)`	⬆️
pytest	`89.06% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pytorch_forecasting/data/timeseries.py	`93.12% <100.00%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b5892a...eb706f9. Read the comment docs.

jdb78 · 2022-02-20T00:04:34Z

I am tempted to merge this. Think we should run the example notebooks also because things might change there - even if only visual.

jobs-git · 2025-06-07T20:08:26Z

any news on this?

fkiraly

I merged this manually, as I had already edited the same location in __getitem__, and the file has moved.

How would we know this is an actual performance improvement? Have you tested it, @jobs-git?

Let's see if the tests pass.

fkiraly

It appears the changes in this PR break internal API assumptions in other methods, e.g., get_groups - so it cannot be merged in its current state.

Still worth to keep open as long as we are reworking for v2.

denix56 added 3 commits December 21, 2021 19:33

Replace DataFrame of indices with np.recarray

39a28ba

Enable index field (to avoid changes in other files)

46646c0

Fix conversion to numpy, when we have numpy already

08cb636

Remove whitespaces

eb706f9

jobs-git mentioned this pull request Jun 7, 2025

[ENH] Precompute data to accelerate training in GPU #1850

Open

8 tasks

fkiraly added 2 commits June 8, 2025 19:05

Merge branch 'main' into pr/806

957b9c3

manual merge

8a8f57b

fkiraly requested review from benHeid, fkiraly, fnhirwa, jdb78 and yarnabrina as code owners June 8, 2025 17:09

fkiraly changed the title ~~Improve performance of __getitem__ of TimeSeriesDataSet~~ [ENH] Improve performance of TimeSeriesDataSet.__getitem__ Jun 8, 2025

fkiraly requested changes Jun 8, 2025

View reviewed changes

Update _timeseries.py

c0788fa

fkiraly requested changes Jun 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Improve performance of `TimeSeriesDataSet.getitem` #806

[ENH] Improve performance of `TimeSeriesDataSet.getitem` #806

Uh oh!

denix56 commented Dec 21, 2021

Uh oh!

codecov-commenter commented Dec 28, 2021 •

edited

Loading

Uh oh!

jdb78 commented Feb 20, 2022

Uh oh!

jobs-git commented Jun 7, 2025

Uh oh!

fkiraly left a comment

Uh oh!

fkiraly left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[ENH] Improve performance of TimeSeriesDataSet.__getitem__ #806

Are you sure you want to change the base?

[ENH] Improve performance of TimeSeriesDataSet.__getitem__ #806

Uh oh!

Conversation

denix56 commented Dec 21, 2021

Description

Uh oh!

codecov-commenter commented Dec 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jdb78 commented Feb 20, 2022

Uh oh!

jobs-git commented Jun 7, 2025

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[ENH] Improve performance of `TimeSeriesDataSet.getitem` #806

[ENH] Improve performance of `TimeSeriesDataSet.getitem` #806

codecov-commenter commented Dec 28, 2021 •

edited

Loading