Skip to content

16kHz configs result in shape mismatch #42

@many-hats

Description

@many-hats

Using the 16khz.yml configs on a different dataset, I get:

Traceback (most recent call last):
File "/mnt/archive2/dac/descript-audio-codec/scripts/train.py", line 441, in
train(args, accel)
File "/home/g/miniconda3/envs/dac/lib/python3.9/site-packages/argbind/argbind.py", line 159, in cmd_func
return func(*cmd_args, **kwargs)
File "/mnt/archive2/dac/descript-audio-codec/scripts/train.py", line 416, in train
train_loop(state, batch, accel, lambdas)
File "/home/g/miniconda3/envs/dac/lib/python3.9/site-packages/audiotools/ml/decorators.py", line 375, in decorated
output = fn(*args, **kwargs)
File "/home/g/miniconda3/envs/dac/lib/python3.9/site-packages/audiotools/ml/decorators.py", line 321, in decorated
output = fn(*args, **kwargs)
File "/home/g/miniconda3/envs/dac/lib/python3.9/site-packages/audiotools/ml/decorators.py", line 107, in decorated
output = fn(*args, **kwargs)
File "/mnt/archive2/dac/descript-audio-codec/scripts/train.py", line 259, in train_loop
output["mel/loss"] = state.mel_loss(recons, signal)
File "/home/g/miniconda3/envs/dac/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/archive2/dac/descript-audio-codec/dac/nn/loss.py", line 322, in forward
loss += self.log_weight * self.loss_fn(
File "/home/g/miniconda3/envs/dac/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs) File "/home/g/miniconda3/envs/dac/lib/python3.9/site-packages/torch/nn/modules/loss.py", line 101, in forward
return F.l1_loss(input, target, reduction=self.reduction)
File "/home/g/miniconda3/envs/dac/lib/python3.9/site-packages/torch/nn/functional.py", line 3263, in l1_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/home/g/miniconda3/envs/dac/lib/python3.9/site-packages/torch/functional.py", line 74, in broadcast_tensors
return _VF.broadcast_tensors(tensors) # type: ignore[attr-defined]

RuntimeError: The size of tensor a (760) must match the size of tensor b (761) at non-singleton dimension 3

The only changes to the configs that I've made is the num_workers, seed, iters, and valid_freq. batch["signal"] shows an signal of shape [24,1,6080] before transforms. I didn't have any issues with the baseline 44kHz model. The only changes in configs between 16kHz and base are DAC.sample_rate, DAC.encoder_rates, DAC.decoder_rates, n_codebooks, DAC.quantizer_dropout, Discriminator_sample_rate, and num_iters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions