🐛 [Bug] RuntimeError when attempting to compile Encoder model in Sockeye

##  Bug Description

When compiling the encoder transformer model for Sockeye inference, Torch-TensorRT throws a runtime error.

## To Reproduce

Steps to reproduce the behavior:

1. Start a docker container `docker run --gpus all --rm -it nvcr.io/nvidia/pytorch:21.11-py3`
2. Run the following to download+preprocess data and train a basic model:
```
git clone https://github.com/blchu/sockeye.git -b tensorrt_blchu
tail -n 4 sockeye/requirements/requirements.txt > requirements.txt.tmp \
    && mv requirements.txt.tmp sockeye/requirements/requirements.txt
pip install -e ./sockeye
git clone https://github.com/rsennrich/subword-nmt.git
export PYTHONPATH=$(pwd)/subword-nmt:$PYTHONPATH

wget http://data.statmt.org/wmt17/translation-task/preprocessed/de-en/corpus.tc.de.gz
wget http://data.statmt.org/wmt17/translation-task/preprocessed/de-en/corpus.tc.en.gz
gunzip corpus.tc.de.gz
gunzip corpus.tc.en.gz
curl https://data.statmt.org/wmt17/translation-task/preprocessed/de-en/dev.tgz | tar xvzf -

head -n 32768 corpus.tc.de > corpus.tc.de.tmp && mv corpus.tc.de.tmp corpus.tc.de
head -n 32768 corpus.tc.en > corpus.tc.en.tmp && mv corpus.tc.en.tmp corpus.tc.en

python -m learn_joint_bpe_and_vocab --input corpus.tc.de corpus.tc.en \
                                    -s 3000 \
                                    -o bpe.codes \
                                    --write-vocabulary bpe.vocab.de bpe.vocab.en

python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.de --vocabulary-threshold 50 < corpus.tc.de > corpus.tc.BPE.de
python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.en --vocabulary-threshold 50 < corpus.tc.en > corpus.tc.BPE.en

python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.de --vocabulary-threshold 50 < newstest2016.tc.de > newstest2016.tc.BPE.de
python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.en --vocabulary-threshold 50 < newstest2016.tc.en > newstest2016.tc.BPE.en

python -m sockeye.prepare_data_pt \
                        -s corpus.tc.BPE.de \
                        -t corpus.tc.BPE.en \
                        -o train_data \
                        --shared-vocab

torchrun --no_python --nproc_per_node 1 sockeye-train \
         --prepared-data train_data \
         --validation-source newstest2016.tc.BPE.de \
         --validation-target newstest2016.tc.BPE.en \
         --output model \
         --batch-size 2048 \
         --update-interval 1 \
         --checkpoint-interval 1 \
         --max-updates 1 \
         --decoder ssru_transformer \
         --shared-vocab \
         --seed 1 \
         --quiet-secondary-workers
```
3. Run the translate command to attempt to compile with Torch-TensorRT, here is where the error should occur:
```
sockeye-translate \
    --input newstest2016.tc.BPE.de \
    --output out \
    --model model \
    --dtype float16 \
    --beam-size 5 \
    --batch-size 64 \
    --output-type benchmark
```


Stack trace and logs:
```
WARNING: [Torch-TensorRT] - Input type for doing shape analysis could not be determined, defaulting to F32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Detected invalid timing cache, setup a local cache instead
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The logger passed into createInferBuilder differs from one already provided for an existing builder, runtime, or refitter. TensorRT maintains only a single logger pointer at any given time, so the existing value, which can be retrieved with getLogger(), will be used instead. In order to use a new logger, first destroy all existing builder, runner or refitter objects.

WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
[ERROR:root] Uncaught exception
Traceback (most recent call last):
  File "/opt/conda/bin/sockeye-translate", line 33, in <module>
    sys.exit(load_entry_point('sockeye', 'console_scripts', 'sockeye-translate')())
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 43, in main
    run_translate(args)
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 147, in run_translate
    read_and_translate(translator=translator,
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 234, in read_and_translate
    chunk_time = translate(output_handler, chunk, translator)
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 257, in translate
    trans_outputs = translator.translate(trans_inputs)
  File "/workspace/temp/sockeye/sockeye/inference_pt.py", line 807, in translate
    batch_translations = self._translate_np(*self._get_inference_input(translator_inputs))  # type: ignore
  File "/workspace/temp/sockeye/sockeye/inference_pt.py", line 995, in _translate_np
    return self._get_best_translations(self._search(source,
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/temp/sockeye/sockeye/beam_search_pt.py", line 778, in forward
    model_states, estimated_reference_lengths = self._inference.encode_and_initialize(source, source_length)
  File "/workspace/temp/sockeye/sockeye/beam_search_pt.py", line 70, in encode_and_initialize
    states, predicted_output_length = self._model.encode_and_initialize(inputs, valid_length, self._const_lr)
  File "/workspace/temp/sockeye/sockeye/model_pt.py", line 234, in encode_and_initialize
    source_encoded, source_encoded_lengths = self.encode(inputs, valid_length=valid_length)
  File "/workspace/temp/sockeye/sockeye/model_pt.py", line 200, in encode
    self.traced_encoder = torch_tensorrt.compile(self.traced_encoder,
  File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/_compile.py", line 97, in compile
    return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py", line 119, in compile
    compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: [Error thrown at ./core/conversion/var/Var_inl.h:38] Expected ivalue->isInt() to be true but got false
Requested unwrapping of arg IValue assuming it was l however type is NoneType
```

## Expected behavior

The model should compile successfully without error and translate sentences

## Environment

> Build information about Torch-TensorRT can be found by turning on debug messages

 - Torch-TensorRT Version (e.g. 1.0.0): 1.11.0a0+b6df043'
 - PyTorch Version (e.g. 1.0): 1.0.0a0
 - CPU Architecture: x86_64 (Intel Xeon Platinum 8259CL)
 - OS (e.g., Linux): Ubuntu 20.04
 - How you installed PyTorch (`conda`, `pip`, `libtorch`, source): NGC Container
 - Python version: 3.8.12
 - CUDA version: 11.5
 - GPU models and configuration: Tesla T4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 [Bug] RuntimeError when attempting to compile Encoder model in Sockeye #833

Bug Description

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐛 [Bug] RuntimeError when attempting to compile Encoder model in Sockeye #833

Description

Bug Description

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions