Skip to content

Conversation

@phoeenniixx
Copy link
Member

@phoeenniixx phoeenniixx commented Oct 18, 2025

This PRs adds the predict function and related functionality to v2
Fixes #1883

  • Add PredictCallback
  • Add Base_pkg class and predict wrapper
  • Add tests for predict
  • Add functionality to return the predictions in a readable format (csv or pandas)
  • Add tests for checkpointing
  • Add support of yml file reading for cfgs (similar to pytorch-tabular)

Summary of changes:

  • Added Base_pkg acts as a base class for model_pkg classes which wrap the fit and predict methods of the model layer and also acts as "fixture containers" for test coverage.

    • It internally creates trainer, model object, datamodules, and dataloaders (when called fit or predict)
    • Gets TimeSeries classes from data_scenarios which are used by test fixtures to create datamodules for different data_scenarios (in get_test_dataset_from method)
  • Added PredictCallback which is used by the predict of BaseModel to perform predictions based on different modes (raw, predict, quantile).

  • Updated BaseModel to have predict function which perform predictions using PredictCallback.

  • Updated the pkg (or model_pkg) classes to used Base_pkg.

    • removed _get_test_datamodule_from method from the classes as the data modules are now created using the Base_pkg's predict and fit functions.
    • Added get_datamodule_cls which returns the datamodule a model is compatible with - tslib or EncoderDecoder.
  • Updated _integration function to reduce the workload of creating trainer object as that also happens inside the Base_pkg class and to use predict and fit functions of pkg class.

@fkiraly fkiraly added enhancement New feature or request module:models labels Oct 18, 2025
@codecov
Copy link

codecov bot commented Oct 18, 2025

Codecov Report

❌ Patch coverage is 79.33579% with 56 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@c78b273). Learn more about missing BASE report.

Files with missing lines Patch % Lines
pytorch_forecasting/base/_base_pkg.py 76.23% 24 Missing ⚠️
pytorch_forecasting/callbacks/predict.py 64.28% 20 Missing ⚠️
pytorch_forecasting/models/base/_base_model_v2.py 60.71% 11 Missing ⚠️
...ytorch_forecasting/tests/test_all_estimators_v2.py 96.55% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1984   +/-   ##
=======================================
  Coverage        ?   86.31%           
=======================================
  Files           ?      162           
  Lines           ?     9594           
  Branches        ?        0           
=======================================
  Hits            ?     8281           
  Misses          ?     1313           
  Partials        ?        0           
Flag Coverage Δ
cpu 86.31% <79.33%> (?)
pytest 86.31% <79.33%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Configs to initialise ``lightning.Trainer``. Defaults to {}.
datamodule_cfg : Union[dict, str, Path], optional
Configs to initialise a ``LightningDataModule``.
- If dict, the keys and values are used as configuration parameters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor formatting issue: please have newlines around bullet point lists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for poiniting it out. I will make the changes to the PR soon.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely neat!

I like the design, and I think this is brilliantly done!

Some questions and requests:

  • could you write a summary for the PR?
  • could you add a vignette somewhere in a notebook or an "examples" section? I think this is crucial to have a full review.
  • can you explain how or why the test configs can be replaced by a single test vignette?
  • design questions: why are fit and predict kwargs there and not in __init__? Have you thought about this?

@agobbifbk
Copy link

Hello there! I like the prediction callback, it is easy to read and fits in the whole architecture! I have 2 comments:
1- not sure if we want to put the scaling part at model level (like here) probably this logic should happen at base level , taking into account the quantile loss (rescaling all the quantiles). But I see it is a WIP so it does not bother me so much
2- I see the return of the predict callback is a tensor, is there any additional interface for have the prediction in a more readable format? Or is the next piece of the puzzle we need to clarify?
3- Given 2, probably we need to insert the temporal information as output, not as a tensor but probably as numpy array. I think we can have access to the time array using batch_idx and dataloader_idx in the on_predict_batch_end function, does it seems reasonable?

THX!

@phoeenniixx
Copy link
Member Author

  • can you explain how or why the test configs can be replaced by a single test vignette?

Sorry, I did not understand the question, can you please rephrase it?
Right now, the test configs are taken from get_test_train_params and are tested using the "loop over" method, similar to v1.

  • design questions: why are fit and predict kwargs there and not in __init__? Have you thought about this?

Well kwargs in fit are mainly for the trainer to add anymore customization to the trainer's initialization and fit. For predict, right now, they are mainly sent to PredictCallback which pass them to to_prediction and to_quantiles - something I took from v1 (see here).

But the usage of kwargs is still not mature enough, mainly these are added to provide a little more flexibility to the usage of predict and fit . As right now, I am not compeletly sure what more params these funcs may require. So, I was thinking when more people use this we might get a concrete design. And in order to do so, I thought of adding these kwargs to both the funcs.

@phoeenniixx
Copy link
Member Author

not sure if we want to put the scaling part at model level (like here) probably this logic should happen at base level , taking into account the quantile loss (rescaling all the quantiles). But I see it is a WIP so it does not bother me so much

I totally agree! I think this was added because the scaling was not present at D2 level when this model was implemented and we needed scaling here for some reason? (@PranavBhatP please correct me if I am wrong here).

Anyways, I think #1983 will solve this issue to some extent.

I see the return of the predict callback is a tensor, is there any additional interface for have the prediction in a more readable format? Or is the next piece of the puzzle we need to clarify?

Well a lot of work is done by to_prediction of the BaseModel which creates the tensors of final predictions (according to my understanding). I was thinking of having a method at pkg layer to create the final human readable format, but the design is still not concrete in my mind about it. I need some time to figure it out. Maybe we might need some discussion over it in near future to guide me in the right direction.

Given 2, probably we need to insert the temporal information as output, not as a tensor but probably as numpy array. I think we can have access to the time array using batch_idx and dataloader_idx in the on_predict_batch_end function, does it seems reasonable?

I am not quite clear about this right now, can you please elaborate your idea here? I agree we need temporal info in the output, but as numpy... I am not sure about it (how and why should we have numpy here?). I need some time to bring all the thoughts together!

@phoeenniixx
Copy link
Member Author

I will add the summary of the complete PR and add the example vignette to the notebooks as well. I just need some time, sorry, I got busy with some other things :)

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 28, 2025

  • can you explain how or why the test configs can be replaced by a single test vignette?

Sorry, I did not understand the question, can you please rephrase it? Right now, the test configs are taken from get_test_train_params and are tested using the "loop over" method, similar to v1.

The _integation method seems to change substantially, and _get_test_datamodule_from gets deleted everywhere. Why is that ok? I would like to understand the details, as I worry about loss of test coverage.

  • design questions: why are fit and predict kwargs there and not in __init__? Have you thought about this?

Well kwargs in fit are mainly for the trainer to add anymore customization to the trainer's initialization and fit. For predict, right now, they are mainly sent to PredictCallback which pass them to to_prediction and to_quantiles - something I took from v1 (see here).

But the usage of kwargs is still not mature enough, mainly these are added to provide a little more flexibility to the usage of predict and fit . As right now, I am not compeletly sure what more params these funcs may require. So, I was thinking when more people use this we might get a concrete design. And in order to do so, I thought of adding these kwargs to both the funcs.

This is consequential though - also think about composite models where fit etc gets called not by the user but internally by the pipeline. Does it also make sense to have them there in such a scenario?

@agobbifbk
Copy link

not sure if we want to put the scaling part at model level (like here) probably this logic should happen at base level , taking into account the quantile loss (rescaling all the quantiles). But I see it is a WIP so it does not bother me so much

I totally agree! I think this was added because the scaling was not present at D2 level when this model was implemented and we needed scaling here for some reason? (@PranavBhatP please correct me if I am wrong here).

Anyways, I think #1983 will solve this issue to some extent.

I see the return of the predict callback is a tensor, is there any additional interface for have the prediction in a more readable format? Or is the next piece of the puzzle we need to clarify?

Well a lot of work is done by to_prediction of the BaseModel which creates the tensors of final predictions (according to my understanding). I was thinking of having a method at pkg layer to create the final human readable format, but the design is still not concrete in my mind about it. I need some time to figure it out. Maybe we might need some discussion over it in near future to guide me in the right direction.

Given 2, probably we need to insert the temporal information as output, not as a tensor but probably as numpy array. I think we can have access to the time array using batch_idx and dataloader_idx in the on_predict_batch_end function, does it seems reasonable?

I am not quite clear about this right now, can you please elaborate your idea here? I agree we need temporal info in the output, but as numpy... I am not sure about it (how and why should we have numpy here?). I need some time to bring all the thoughts together!
At some point we need to link the tensor of the prediction to the correct timestamp. I think we have it in the D1 as np.array (since can not be casted as tensor), then we propagate it to the D2 dataset (I suppose). What I mean is that when we produce the prediction we need to associate the correct timestamp. We need to find the correct spot in which we have both the information aligned (prediction and timestamp). The temporal information is in the dataset (or in the dataloader.dataset) but I think we need to have a meeting, it is hard to describe it here :-)

@phoeenniixx
Copy link
Member Author

The _integation method seems to change substantially, and _get_test_datamodule_from gets deleted everywhere. Why is that ok? I would like to understand the details, as I worry about loss of test coverage.

The main change to _integration is because of the wrapper. Now we dont need to define the trainer beforehand as this is handled by the pkg class. And we just need the trainer_cfg and rest would happen in the pkg class. Same reason why we dont need the dataloaders, but just the cfgs and rest would happen inside the wrapper.

About the _get_test_datamodule_from, we dont need that as that was creating the compatible datamodule for the model- getting EncoderDecoder or tslib datamodule, based on the type of model. Again, due to the wrapper, we just need the compatible datmodule class (which we get using get_datamodule_cls and rest happens under the hood.
I tried to keep the flow everything same as much as I could, the main thing that I removed was early stopping which used val_loss. Actually train-test-split is done right now based on groups and there is no temporal split, so when there are very less groups (like in the current test scenario) we donot get val_dataloaders, and this leads to issues.
In main branch, We are passing the same dataset as train and val datasets, and I didnot feel good about it, and I have removed this "hack". If you think this is important, I can add it back.

This is consequential though - also think about composite models where fit etc gets called not by the user but internally by the pipeline. Does it also make sense to have them there in such a scenario?

True, I didnot think about this issue :)
when using a pipeline, we this code could work, but we would have to be very careful. I agree, in that scenario, we could face a problem. I will look into it, Thanks!

@phoeenniixx
Copy link
Member Author

not sure if we want to put the scaling part at model level (like here) probably this logic should happen at base level , taking into account the quantile loss (rescaling all the quantiles). But I see it is a WIP so it does not bother me so much

I totally agree! I think this was added because the scaling was not present at D2 level when this model was implemented and we needed scaling here for some reason? (@PranavBhatP please correct me if I am wrong here).
Anyways, I think #1983 will solve this issue to some extent.

I see the return of the predict callback is a tensor, is there any additional interface for have the prediction in a more readable format? Or is the next piece of the puzzle we need to clarify?

Well a lot of work is done by to_prediction of the BaseModel which creates the tensors of final predictions (according to my understanding). I was thinking of having a method at pkg layer to create the final human readable format, but the design is still not concrete in my mind about it. I need some time to figure it out. Maybe we might need some discussion over it in near future to guide me in the right direction.

Given 2, probably we need to insert the temporal information as output, not as a tensor but probably as numpy array. I think we can have access to the time array using batch_idx and dataloader_idx in the on_predict_batch_end function, does it seems reasonable?

I am not quite clear about this right now, can you please elaborate your idea here? I agree we need temporal info in the output, but as numpy... I am not sure about it (how and why should we have numpy here?). I need some time to bring all the thoughts together!
At some point we need to link the tensor of the prediction to the correct timestamp. I think we have it in the D1 as np.array (since can not be casted as tensor), then we propagate it to the D2 dataset (I suppose). What I mean is that when we produce the prediction we need to associate the correct timestamp. We need to find the correct spot in which we have both the information aligned (prediction and timestamp). The temporal information is in the dataset (or in the dataloader.dataset) but I think we need to have a meeting, it is hard to describe it here :-)

Yes, I also think we should someday have a meet and discuss about some things!

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 30, 2025

The main change to _integration is because of the wrapper. Now we dont need to define the trainer beforehand as this is handled by the pkg class. And we just need the trainer_cfg and rest would happen in the pkg class. Same reason why we dont need the dataloaders, but just the cfgs and rest would happen inside the wrapper.

Ok, that is good news - if you vouch for this being 1:1 (except for val_loader), this is a really great sign - it means the design goal is successfuly realized, in reducing complexity of usage substantially!

when using a pipeline, we this code could work, but we would have to be very careful. I agree, in that scenario, we could face a problem. I will look into it, Thanks!

I think that is the main reason for not having fit kwargs in sktime - it does not compose properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request module:models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API] v2 interface design - output (predict, backtesting etc) interface

3 participants