Skip to content

[BUG] utils.concat_sequences concatenation should be on dim=0 ? #1823

@cngmid

Description

@cngmid

Describe the bug
The utility function concat_sequences should concatenate a list of tensors on its sample dimension. With tensors "of which first index are samples and second are timesteps", the concatenation should be on dim=0.
The current code has dim=1.

This utility function is only used in the BaseModel.predict() method and creates the following issues.

To Reproduce

from pytorch_forecasting.data.examples import generate_ar_data
from pytorch_forecasting import Baseline, TimeSeriesDataSet
from pytorch_forecasting.data import NaNLabelEncoder

data = generate_ar_data(seasonality=10.0, timesteps=100, n_series=2, seed=42)
data["static"] = 2
data["date"] = pd.Timestamp("2020-01-01") + pd.to_timedelta(data.time_idx, "D")
data.head()

# create dataset and dataloaders
max_encoder_length = 20
max_prediction_length = 5

training_cutoff = data["time_idx"].max() - max_prediction_length

context_length = max_encoder_length
prediction_length = max_prediction_length

training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx="time_idx",
    target="value",
    categorical_encoders={"series": NaNLabelEncoder().fit(data.series)},
    group_ids=["series"],
    # only unknown variable is "value" - and N-Beats can also not take any additional variables
    time_varying_unknown_reals=["value"],
    max_encoder_length=context_length,
    max_prediction_length=prediction_length,
)

batch_size = 71
train_dataloader = training.to_dataloader(
    train=True, batch_size=batch_size, num_workers=0)

baseline_model = Baseline()
predictions = baseline_model.predict(
    train_dataloader, return_x=True, return_y=True,
    trainer_kwargs=dict(logger=None, accelerator="cpu")
)
predictions.output.size(), predictions.y[0].size()

Output:

(torch.Size([142, 5]), torch.Size([71, 10]))

Expected behavior
The predicted output and the target should have the same size:

(torch.Size([142, 5]), torch.Size([142, 5]))

In fact if batch_size is set to 70, so that the last batch has a smaller size, the code above returns the error:

File C:\ProgramData\miniconda3\envs\sq3\lib\site-packages\pytorch_forecasting\utils\_utils.py:275, in concat_sequences(sequences)
    273     return rnn.pack_sequence(sequences, enforce_sorted=False)
    274 elif isinstance(sequences[0], torch.Tensor):
--> 275     return torch.cat(sequences, dim=1)
    276 elif isinstance(sequences[0], (tuple, list)):
    277     return tuple(
    278         concat_sequences([sequences[ii][i] for ii in range(len(sequences))])
    279         for i in range(len(sequences[0]))
    280     )

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 70 but got size 2 for tensor number 2 in the list.

Additional context
I think utils.concat_sequences should be modified:

275     return torch.cat(sequences, dim=1)
--> 275     return torch.cat(sequences, dim=0)

Versions
1.3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Fixed/resolved

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions