How to correctly run transformer?

Hi,

I have encountered a number of problems with fluid/neural_machine_translation/**transformer** model. Am I doing something wrong? How to correctly run it?

## Steps I have taken
Following instructions in https://github.com/PaddlePaddle/models/blob/develop/fluid/neural_machine_translation/transformer/README_cn.md I have downloaded WMT'16 EN-DE from https://github.com/google/seq2seq/blob/master/docs/data.md by clicking download.

Next I extracted it to `wmt16_en_de` directory.

Next I did `paste -d ' \ t ' train.tok.clean.bpe.32000.en train.tok.clean.bpe.32000.de > train.tok.clean.bpe.32000.en-de`

Then I did `sed -i '1i\<s>\n<e>\n<unk>' vocab.bpe.32000`

in config.py I changed `use_gpu = True` to `False`.
In train.py I added `import multiprocessing` and changed `dev_count = fluid.core.get_cuda_device_count()` to `dev_count = fluid.core.get_cuda_device_count() if TrainTaskConfig.use_gpu else multiprocessing.cpu_count()`.

### Training 
I launched training by `python -u train.py   --src_vocab_fpath wmt16_en_de/vocab.bpe.32000   --trg_vocab_fpath wmt16_en_de/vocab.bpe.32000   --special_token '<s>' '<e>' '<unk>'   --train_file_pattern wmt16_en_de/train.tok.clean.bpe.32000.en-de   --use_token_batch True   --batch_size 3200   --sort_type pool --pool_size 200000`

but I got
```
E0719 14:26:29.439303 55138 graph.cc:43] softmax_with_cross_entropy_grad input var not in all_var list: softmax_with_cross_entropy_0.tmp_0@GRAD
epoch: 0, consumed 0.000161s
Traceback (most recent call last):
  File "train.py", line 428, in <module>
    train(args)
  File "train.py", line 419, in train
    "pass_" + str(pass_id) + ".checkpoint"))
  File "/home/sfraczek/Paddle/build/python/paddle/fluid/io.py", line 288, in save_persistables
    filename=filename)
  File "/home/sfraczek/Paddle/build/python/paddle/fluid/io.py", line 166, in save_vars
    filename=filename)
  File "/home/sfraczek/Paddle/build/python/paddle/fluid/io.py", line 197, in save_vars
    executor.run(save_program)
  File "/home/sfraczek/Paddle/build/python/paddle/fluid/executor.py", line 449, in run
    self.executor.run(program.desc, scope, 0, True, True)
paddle.fluid.core.EnforceNotMet: holder_ should not be null
Tensor not initialized yet when Tensor::type() is called. at [/home/sfraczek/Paddle/paddle/fluid/framework/tensor.h:139]
PaddlePaddle Call Stacks:
0       0x7f060e948f1cp paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 572
1       0x7f060e94b901p paddle::framework::Tensor::type() const + 209
2       0x7f060f617bf6p paddle::operators::SaveOp::SaveLodTensor(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_,
boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::va
riant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> con
st&, paddle::framework::Variable*) const + 614
3       0x7f060f618472p paddle::operators::SaveOp::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boos
t::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::varian
t::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::
detail::variant::void_> const&) const + 210
```
So I have commented out 
```
#fluid.io.save_persistables(
#    exe,
#    os.path.join(TrainTaskConfig.ckpt_dir,
#                 "pass_" + str(pass_id) + ".checkpoint"))
``` 
and it worked.

### Inference
So next I have tried to run inference.
I have found  that the file **wmt16_en_de/newstest2013.tok.bpe.32000.en-de** doesn't exist but based on the README I guessed that I should run 
`paste -d ' \ t ' newstest2013.tok.bpe.32000.en newstest2013.tok.bpe.32000.de > newstest2013.tok.bpe.32000.en-de` is this correct?

`python -u infer.py   --src_vocab_fpath wmt16_en_de/vocab.bpe.32000   --trg_vocab_fpath wmt16_en_de/vocab.bpe.32000   --special_token '<s>' '<e>' '<unk>'   --test_file_pattern wmt16_en_de/newstest2013.tok.bpe.32000.en-de   --batch_size 4   model_path trained_models/pass_20.infer.model   beam_size 5` but there was no ouptut from the script. It ended without error too.

I tried giving other files but it doesn't output anything either.

I added profiling by adding `import paddle.fluid.profiler as profiler` and 
```
+    parser.add_argument(
+        "--profile",
+        type=bool,
+        default=False,
+        help="Enables/disables profiling.")
```
and
```
+    if args.profile:
+        with profiler.profiler("CPU", sorted_key='total') as cpuprof:
+            infer(args)
+    else:
+        infer(args)
```
But there is no output from the profile.

Please help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to correctly run transformer? #1059

Steps I have taken

Training

Inference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to correctly run transformer? #1059

Description

Steps I have taken

Training

Inference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions