Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ Below shows the generation speed gain by using FastSeq.
|------------------|:--------------------------:|:-------------------------:|:-----:|
| [ProphetNet](examples/prophetnet/README.md) | 2.8 | 11.3 | 4.0x |
| [Bart (`fs`)](examples/bart/README.md) | 2.4 | 19.7 | 8.2x |
| [Bart (`hf`)](examples/bart/README.md#speedup-bart-huggingface-transformers-version-by-using-fastseq) | 3.5 | 11.4 | 3.3x |
| [DistilBart (`hf`)](examples/distilbart/README.md) | 4.3 | 13.8 | 3.2x |
| [T5 (`hf`)](examples/t5/README.md) | 5.0 | 11.5 | 2.3x |
| [Bart (`hf`)](examples/bart/README.md#speedup-bart-huggingface-transformers-version-by-using-fastseq) | 3.5 | 12.4 | 3.5x |
| [DistilBart (`hf`)](examples/distilbart/README.md) | 4.3 | 18.3 | 4.3x |
| [T5 (`hf`)](examples/t5/README.md) | 5.0 | 23.4 | 4.7x |
| [WMT16 En-De (`fs`)](examples/wmt/README.md) | 96.0 | 417.0 | 4.3x |

- All benchmarking experiments run on NVIDIA-V100-16GB with [docker](docker/Dockerfile). Highest speed recorded for each model by tuning batch size. For parameter setting details, click link of corresponding model.
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/models/fs_wmt.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ source utils.sh

# MODEL - wmt16
./benchmark.sh fairseq wmt16.en.de.32k wmt16_en_de_bpe32k/bin valid 256
./benchmark.sh fairseq+fastseq wmt16.en.de.32k wmt16_en_de_bpe32k/bin valid 256/512/1024 --post-process-workers 5
./benchmark.sh fairseq+fastseq wmt16.en.de.32k wmt16_en_de_bpe32k/bin valid 256/512/1024 --postprocess-workers 5
# Accuracy
grep " wmt16.en.de.32k wmt16_en_de_bpe32k/bin valid " perf | awk '{if($8!="NA"){c+=1;s+=$8}}END{print s/c}' | bash range.sh 0.05 0.07
# Speed on V100 16GB 250W
Expand Down
34 changes: 17 additions & 17 deletions benchmarks/models/hf_bart.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,25 @@ source utils.sh

# MODEL - bart large cnn from transformer
# TASK - cnn dm val 1k set
./benchmark.sh transformers facebook/bart-large-cnn cnn_dm.1k/raw val 32 --task summarization # each loop 5 minutes
./benchmark.sh transformers+fastseq facebook/bart-large-cnn cnn_dm.1k/raw val 32/64/128 --task summarization # each loop 8 minutes
#./benchmark.sh transformers facebook/bart-large-cnn cnn_dm.1k/raw val 32 --task summarization # each loop 5 minutes
#./benchmark.sh transformers+fastseq facebook/bart-large-cnn cnn_dm.1k/raw val 32/64/128 --task summarization # each loop 8 minutes
## TASK - cnn dm val full set
#./benchmark.sh transformers facebook/bart-large-cnn cnn_dm/raw val 32 --task summarization # each loop 2 hours
#./benchmark.sh transformers+fastseq facebook/bart-large-cnn cnn_dm/raw val 32/64/128 --task summarization # each loop 2 hours
./benchmark.sh transformers facebook/bart-large-cnn cnn_dm/raw val 32 --task summarization # each loop 2 hours
./benchmark.sh transformers+fastseq facebook/bart-large-cnn cnn_dm/raw val 32/64/128 --task summarization # each loop 2 hours

# Accuracy
grep "facebook/bart-large-cnn cnn_dm.1k/raw val " perf | awk '{print $9}' | awk -F'|' '{if($1!="NA"){c+=1;s+=$1}}END{print s/c}' | bash range.sh 34.8 35
# Speed on V100 16GB 250W
grep -E "transformers_v3.0.2 facebook/bart-large-cnn cnn_dm.1k/raw val 32 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 3.3 3.7
grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm.1k/raw val 32 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 7.3 100
grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm.1k/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 9.6 100
grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm.1k/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 9.9 100

## Accuracy
#grep "facebook/bart-large-cnn cnn_dm/raw val " perf | awk '{print $9}' | awk -F'|' '{if($1!="NA"){c+=1;s+=$1}}END{print s/c}' | bash range.sh 44.78 44.82
#grep "facebook/bart-large-cnn cnn_dm.1k/raw val " perf | awk '{print $9}' | awk -F'|' '{if($1!="NA"){c+=1;s+=$1}}END{print s/c}' | bash range.sh 34.8 35
## Speed on V100 16GB 250W
#grep -E "transformers_v3.0.2 facebook/bart-large-cnn cnn_dm/raw val 32 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 2.2 2.4
#grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm/raw val 32 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 3.9 100
#grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 4.5 100
#grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 4.9 100
#grep -E "transformers_v3.0.2 facebook/bart-large-cnn cnn_dm.1k/raw val 32 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 3.3 3.7
#grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm/raw val 32 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 7.6 100
#grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 11.3 100
#grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 12.4 100

# Accuracy
grep "facebook/bart-large-cnn cnn_dm/raw val " perf | awk '{print $9}' | awk -F'|' '{if($1!="NA"){c+=1;s+=$1}}END{print s/c}' | bash range.sh 44.78 44.82
# Speed on V100 16GB 250W
grep -E "transformers_v3.0.2 facebook/bart-large-cnn cnn_dm/raw val 32 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 2.2 2.4
grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm/raw val 32 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 7.6 100
grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 11.3 100
grep -E "transformers_v3.0.2\+fastseq_v.* facebook/bart-large-cnn cnn_dm/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 12.4 100

32 changes: 16 additions & 16 deletions benchmarks/models/hf_distibart.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,24 +10,24 @@ source utils.sh

# MODEL - distibart cnn
# TASK - cnn dm val 1k set
./benchmark.sh transformers hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64 --task summarization # each loop takes 7 minutes
./benchmark.sh transformers+fastseq hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64/128 --task summarization # each loop takes 7 minutes
#./benchmark.sh transformers hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64 --task summarization # each loop takes 7 minutes
#./benchmark.sh transformers+fastseq hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64/128 --task summarization # each loop takes 7 minutes
## TASK - cnn dm val full set
#./benchmark.sh transformers hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 64 --task summarization # each loop takes 2.5 hours
#./benchmark.sh transformers+fastseq hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 64/128 --task summarization # each loop takes 2.5 hours
./benchmark.sh transformers hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 64 --task summarization # each loop takes 2.5 hours
./benchmark.sh transformers+fastseq hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 64/128 --task summarization # each loop takes 2.5 hours

# Accuracy
grep "hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val " perf | awk '{print $9}' | awk -F'|' '{if($1!="NA"){c+=1;s+=$1}}END{print s/c}' | bash range.sh 35.1 35.3
# Speed on V100 16GB 250W
grep -E "transformers_v3.0.2 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 4.0 6.0
grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 13 100
# todo: bigger bs doesn't increase speed
grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 13.5 100

## Accuracy
#grep "hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val " perf | awk '{print $9}' | awk -F'|' '{if($1!="NA"){c+=1;s+=$1}}END{print s/c}' | bash range.sh 45 45.1
#grep "hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val " perf | awk '{print $9}' | awk -F'|' '{if($1!="NA"){c+=1;s+=$1}}END{print s/c}' | bash range.sh 35.1 35.3
## Speed on V100 16GB 250W
#grep -E "transformers_v3.0.2 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 2.95 3.05
#grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 5.2 100
#grep -E "transformers_v3.0.2 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 4.0 6.0
#grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 16.4 100
## todo: bigger bs doesn't increase speed
#grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 5.2 100
#grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 18.4 100

# Accuracy
grep "hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val " perf | awk '{print $9}' | awk -F'|' '{if($1!="NA"){c+=1;s+=$1}}END{print s/c}' | bash range.sh 45 45.1
# Speed on V100 16GB 250W
grep -E "transformers_v3.0.2 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 2.95 3.05
grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 16.5 100
# todo: bigger bs doesn't increase speed
grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 18.3 100
2 changes: 1 addition & 1 deletion benchmarks/models/hf_mbart.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ source utils.sh
grep "facebook/mbart-large-en-ro wmt_en_ro/raw val " perf | awk '{if($8!="NA"){c+=1;s+=$8}}END{print s/c}' | bash range.sh 27.79 27.95
# Speed on V100 16GB 250W
grep -E "transformers_v3.0.2 facebook/mbart-large-en-ro wmt_en_ro/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 6.0 100
grep -E "transformers_v3.0.2\+fastseq_v.* facebook/mbart-large-en-ro wmt_en_ro/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 7.2 100
grep -E "transformers_v3.0.2\+fastseq_v.* facebook/mbart-large-en-ro wmt_en_ro/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 9.3 100
4 changes: 2 additions & 2 deletions benchmarks/models/hf_t5.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@ source utils.sh
grep "t5-base wmt_en_ro/raw val " perf | awk '{if($8!="NA"){c+=1;s+=$8}}END{print s/c}' | bash range.sh 27.42 27.44
# Speed on V100 16GB 250W
grep -E "transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 4.6 5.2
grep -E "transformers_v3.0.2\+fastseq_v.* t5-base wmt_en_ro/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 9.3 100
grep -E "transformers_v3.0.2\+fastseq_v.* t5-base wmt_en_ro/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 11 100
grep -E "transformers_v3.0.2\+fastseq_v.* t5-base wmt_en_ro/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 19.3 100
grep -E "transformers_v3.0.2\+fastseq_v.* t5-base wmt_en_ro/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 23.4 100
2 changes: 1 addition & 1 deletion examples/bart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Refer to [file](../../tests/optimizer/fairseq/test_fairseq_optimizer.py).
| BatchSize | 32 | 64 | 128 |
|:-------------------:|:-------------:|:--------------:|:--------------:|
| transformers-3.0.2 | 3.5 samples/s | OOM | OOM |
| above + fastseq | 7.9 samples/s | 10.7 samples/s | 11.4 samples/s |
| above + fastseq | 7.6 samples/s | 11.3 samples/s | 12.4 samples/s |


### Model
Expand Down
2 changes: 1 addition & 1 deletion examples/distilbart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ More info can be found [here](https://github.com/huggingface/transformers/blob/m
| BatchSize | 64 | 128 |
|:-------------------:|:--------------:|:--------------:|
| transformers-3.0.2 | 4.3 samples/s | OOM |
| above + fastseq | 13.3 samples/s | 13.8 samples/s |
| above + fastseq | 16.5 samples/s | 18.3 samples/s |


### Model
Expand Down
2 changes: 1 addition & 1 deletion examples/t5/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The T5 model was presented in [Exploring the Limits of Transfer Learning with a
| BatchSize | 64 | 128 |
|:--------------------:|:---------------:|:--------------:|
| ransformers_v3.0.2 | 5.0 samples/s | OOM |
| above + fastseq | 9.6 samples/s | 11.5 samples/s |
| above + fastseq | 19.3 samples/s | 23.4 samples/s |


### Model
Expand Down
4 changes: 2 additions & 2 deletions examples/wmt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,6 @@ $ fastseq-generate-for-fairseq \
--lenpen 0.6 \
--remove-bpe \
--gen-subset test \
--post-process-workers 5
--postprocess-workers 5
```
To get baseline speed number which doesn't use FastSeq optimizations, replace `fastseq-generate-for-fairseq` by `fairseq-generate` and remove argument `--post-process-workers 5` since it is only provided by fastseq.
To get baseline speed number which doesn't use FastSeq optimizations, replace `fastseq-generate-for-fairseq` by `fairseq-generate` and remove argument `--postprocess-workers 5` since it is only provided by fastseq.
4 changes: 2 additions & 2 deletions fastseq/optimizer/fairseq/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ def add_generation_args_v1(parser):
group = original_add_generation_args(parser)
# fmt: off
group.add_argument(
'--post-process-workers',
'--postprocess-workers',
default=1,
type=int,
choices=range(1, 128, 1),
Expand Down Expand Up @@ -354,7 +354,7 @@ def main_v1(args):
message_queue = JoinableQueue()

p_list = []
for i in range(args.post_process_workers):
for i in range(args.postprocess_workers):
p = PostProcess(args, task, data_queue, message_queue)
p_list.append(p)
p.start()
Expand Down
Loading