memory not-release issue for large BS on FastSeq

I'd like to report a memory not-release issue for large BS on FastSeq. 

**Impact:**
I can re-produce it every time. As it does not release memory after crash, I am afraid if releasing this package to users, they may experience the same issue and are not easy to handle it. 

**How to reproduce:**
I tested in gpu0 machine.
Below are the detailed steps of re-producing this issue:

- Docker run image: 
sudo docker run --gpus all  --privileged --name fastseq_dev_py3_tiy -it adsbrainwestus2.azurecr.io/fastseq:dev-py3 /bin/bash

- Inside the container:
1. Create RSA-key, add it to github account (just make it easy to download code)
2. mkdir tiy & cd tiy
3. Install the latest fastseq:
	git clone git@github.com:microsoft/fastseq.git  
	cd fastseq
	pip install --editable ./
4.	cd benchmarks
	Set LOOP in utils.sh to be 1
5.	Run nvidia-smi the first time, no memory occupation, which is expected:
 
![image](https://user-images.githubusercontent.com/6299908/91931225-a58e3a80-ec97-11ea-910f-420ba5a98f37.png)

6.	Run ./benchmark.sh fairseq+fastseq bart.large.cnn cnn_dm/len-1024.bin valid 256
	Failed because of Bus error:
_Processing Loop=1/1 Util=fairseq_v0.9.0+fastseq_v0.0.3 Model=bart.large.cnn Task=cnn_dm/len-1024.bin Split=valid BS=256
benchmark_seq.sh: line 55:   **533 Bus error**               (core dumped) $util $data_dir --path $model_path --fp16 --task translation --batch-size $bs --gen-subset $split --truncate-source --bpe gpt2 --beam 4 --num-workers 4 --min-len 55 --max-len-b 140 --no-repeat-ngram-size 3 --lenpen 2.0 `#--print-alignment` `#--print-step       # KeyError: steps` --skip-invalid-size-inputs-valid-test $* > $STDOUT_FILE 2> $STDERR_FILE
Failed at benchmark_seq.sh (line 80): $util $data_dir --path $model_path --fp16 --task translation --batch-size $bs --gen-subset $split --truncate-source --bpe gpt2 --beam 4 --num-workers 4 --min-len 55 --max-len-b 140 --no-repeat-ngram-size 3 --lenpen 2.0 `#--print-alignment` `#--print-step        # KeyError: steps` --skip-invalid-size-inputs-valid-test $* > $STDOUT_FILE 2> $STDERR_FILE_
 
7.	Run nvidia-smi the second time, memory occupation on GPU0:
 
![image](https://user-images.githubusercontent.com/6299908/91931236-b048cf80-ec97-11ea-9130-08553fb23207.png)

 
**Other information:**
I re-run 5 times to check if there is any information in fastseq.stderr. Most of time, there is no any error msg in fastseq.stderr. 

- 4 times, no any error message in fastseq.stderr

_root@6e86574394fb:/workspace/tiy/fastseq/benchmarks# cat /tmp/fastseq.stderr
/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py:102: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "
/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py:102: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "_

- 1 time, there was EOFError recorded in fastseq.stderr

_root@6e86574394fb:/workspace/tiy/fastseq/benchmarks# cat /tmp/fastseq.stderr
/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py:102: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "
/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py:102: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/resource_sharer.py", line 142, in _serve
    with self._listener.accept() as conn:
  File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 456, in accept
    answer_challenge(c, self._authkey)
  File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError_



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memory not-release issue for large BS on FastSeq #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

memory not-release issue for large BS on FastSeq #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions