Skip to content

Conversation

@NickNickGo
Copy link
Contributor

Moving #37 here.

@NickNickGo
Copy link
Contributor Author

transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm.1k/raw val 32 1024 NA NA 34.89|14.96|25.30 NA NA 123 8.3 NA  
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm.1k/raw val 64 1024 NA NA 34.92|14.95|25.25 NA NA 87 11.8 NA  
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm.1k/raw val 128 1024 NA NA 34.96|14.98|25.28 NA NA 82 12.5 NA  
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm.1k/raw val 32 1024 NA NA 34.90|14.95|25.30 NA NA 121 8.5 NA  
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm.1k/raw val 64 1024 NA NA 34.93|14.95|25.26 NA NA 87 11.8 NA  
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm.1k/raw val 128 1024 NA NA 34.97|14.96|25.27 NA NA 81 12.6 NA  
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm.1k/raw val 32 1024 NA NA 34.91|14.94|25.25 NA NA 122 8.4 NA  
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm.1k/raw val 64 1024 NA NA 34.93|14.97|25.25 NA NA 87 11.8 NA  
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm.1k/raw val 128 1024 NA NA 34.98|14.96|25.26 NA NA 81 12.6 NA  
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2+fastseq_v0.0.4 facebook/mbart-large-en-ro wmt_en_ro/raw val 64 1984 NA 27.89 NA|NA|NA NA NA 274 7.2 NA  
transformers_v3.0.2+fastseq_v0.0.4 facebook/mbart-large-en-ro wmt_en_ro/raw val 64 1984 NA 27.89 NA|NA|NA NA NA 251 7.9 NA  
transformers_v3.0.2+fastseq_v0.0.4 facebook/mbart-large-en-ro wmt_en_ro/raw val 64 1984 NA 27.89 NA|NA|NA NA NA 254 7.8 NA  
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 64 1984 NA 27.44 NA|NA|NA NA NA 162 11.2 NA  
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 128 1984 NA 27.38 NA|NA|NA NA NA 107 17.1 NA  
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 64 1984 NA 27.44 NA|NA|NA NA NA 134 13.8 NA  
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 128 1984 NA 27.38 NA|NA|NA NA NA 106 17.3 NA  
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 64 1984 NA 27.44 NA|NA|NA NA NA 129 13.8 NA  
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 128 1984 NA 27.38 NA|NA|NA NA NA 107 17.1 NA  
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2+fastseq_v0.0.4 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64 1024 NA NA 35.18|15.03|25.03 NA NA 68 15.1 NA  
transformers_v3.0.2+fastseq_v0.0.4 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 128 1024 NA NA 35.20|15.14|25.07 NA NA 64 15.9 NA  
transformers_v3.0.2+fastseq_v0.0.4 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64 1024 NA NA 35.21|15.07|25.00 NA NA 68 15.1 NA  
transformers_v3.0.2+fastseq_v0.0.4 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 128 1024 NA NA 35.19|15.10|25.08 NA NA 65 15.8 NA  
transformers_v3.0.2+fastseq_v0.0.4 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64 1024 NA NA 35.21|15.04|25.04 NA NA 67 15.3 NA  
transformers_v3.0.2+fastseq_v0.0.4 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 128 1024 NA NA 35.18|15.10|25.06 NA NA 65 15.8 NA  
~                            

@JiushengChen
Copy link
Contributor

CNN 1k data set is too small now, result is not reliable. Please use full valid set.

@NickNickGo NickNickGo reopened this Dec 10, 2020
@NickNickGo
Copy link
Contributor Author

transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm/raw val 32 13344 NA NA 44.80|21.64|31.17 NA NA 1763 7.6 NA
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm/raw val 64 13312 NA NA 44.79|21.66|31.18 NA NA 1174 11.3 NA
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm/raw val 128 13312 NA NA 44.78|21.64|31.16 NA NA 1075 12.4 NA
                           
transformers_v3.0.2+fastseq_v0.0.4 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 64 13312 NA NA 45.06|21.81|30.91 NA NA 812 16.4 NA
transformers_v3.0.2+fastseq_v0.0.4 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 128 13312 NA NA 45.05|21.79|30.90 NA NA 725 18.4 NA

@feihugis
Copy link
Contributor

For bart-large-cnn, why are the numbers of input examples different for different batch_sizes? Could you also paste the result for the baseline?

@NickNickGo
Copy link
Contributor Author

For bart-large-cnn, why are the numbers of input examples different for different batch_sizes? Could you also paste the result for the baseline?

This is because pytorch dataloader API only supports number of samples to be multiple of batch size. This is why last batch is dropped.
https://pytorch.org/docs/stable/data.html

@NickNickGo
Copy link
Contributor Author

@JiushengChen For Mbart and T5 , do we have larger dataset? I couldn't find it in benchmark scripts.

@feihugis
Copy link
Contributor

For bart-large-cnn, why are the numbers of input examples different for different batch_sizes? Could you also paste the result for the baseline?

This is because pytorch dataloader API only supports number of samples to be multiple of batch size. This is why last batch is dropped.
https://pytorch.org/docs/stable/data.html

The root cause may be here. Try to change it to be drop_last=False?

@JiushengChen
Copy link
Contributor

@JiushengChen For Mbart and T5 , do we have larger dataset? I couldn't find it in benchmark scripts.

Yes, I have larger data in my local. Please leave out these two, I will update them today.
BTW, looks CI test failed, please take a look.

@NickNickGo
Copy link
Contributor Author

NickNickGo commented Dec 10, 2020

For bart-large-cnn, why are the numbers of input examples different for different batch_sizes? Could you also paste the result for the baseline?

This is because pytorch dataloader API only supports number of samples to be multiple of batch size. This is why last batch is dropped.
https://pytorch.org/docs/stable/data.html

The root cause may be here. Try to change it to be drop_last=False?

I already did.

@feihugis
Copy link
Contributor

For bart-large-cnn, why are the numbers of input examples different for different batch_sizes? Could you also paste the result for the baseline?

This is because pytorch dataloader API only supports number of samples to be multiple of batch size. This is why last batch is dropped.
https://pytorch.org/docs/stable/data.html

The root cause may be here. Try to change it to be drop_last=False?

I already did.

Could you please explain more? I saw your code here use drop_last=True. I guess that's why the last batch was dropped. Do you mean you have tried drop_last=False but the last batch was still dropped?

Comment on lines 23 to 25
grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 13 100
grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 64 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 15.2 100
# todo: bigger bs doesn't increase speed
grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 13.5 100
grep -E "transformers_v3.0.2\+fastseq_v.* hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm.1k/raw val 128 " perf | awk '{s+=$13}END{print s/NR}' | bash range.sh 15.9 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@NickNickGo
Copy link
Contributor Author

Benchmarks on Larger dataset:

Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm/raw val 32 13368 NA NA 44.80|21.65|31.19 NA NA 1889 7.1 NA
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm/raw val 64 13368 NA NA 44.80|21.66|31.19 NA NA 1188 11.3 NA
transformers_v3.0.2+fastseq_v0.0.4 facebook/bart-large-cnn cnn_dm/raw val 128 13368 NA NA 44.78|21.64|31.18 NA NA 1082 12.4 NA
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2+fastseq_v0.0.4 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 64 13368 NA NA 45.07|21.81|30.91 NA NA 810 16.5 NA
transformers_v3.0.2+fastseq_v0.0.4 hf.sshleifer.distilbart-cnn-12-6.tar.gz cnn_dm/raw val 128 13368 NA NA 45.05|21.80|30.90 NA NA 729 18.3 NA
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2+fastseq_v0.0.4 facebook/mbart-large-en-ro wmt_en_ro/raw val 64 8191 NA 56.19 NA|NA|NA NA NA 897 9.1 NA
transformers_v3.0.2+fastseq_v0.0.4 facebook/mbart-large-en-ro wmt_en_ro/raw val 64 8191 NA 56.19 NA|NA|NA NA NA 884 9.3 NA
transformers_v3.0.2+fastseq_v0.0.4 facebook/mbart-large-en-ro wmt_en_ro/raw val 64 8191 NA 56.19 NA|NA|NA NA NA 885 9.3 NA
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 64 8191 NA 56.93 NA|NA|NA NA NA 425 19.3 NA
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 128 8191 NA 56.92 NA|NA|NA NA NA 350 23.4 NA
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 64 8191 NA 56.93 NA|NA|NA NA NA 436 18.8 NA
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 128 8191 NA 56.92 NA|NA|NA NA NA 362 22.6 NA
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 64 8191 NA 56.93 NA|NA|NA NA NA 442 18.5 NA
transformers_v3.0.2+fastseq_v0.0.4 t5-base wmt_en_ro/raw val 128 8191 NA 56.92 NA|NA|NA NA NA 340 24.1 NA

@NickNickGo
Copy link
Contributor Author

NickNickGo commented Dec 11, 2020

Before/After

. . .
DistilBart 13.8 18.3
T5 13.8 23.4
BART 11.4 12.4
Mbart 8.9 9.3

@NickNickGo
Copy link
Contributor Author

NickNickGo commented Dec 15, 2020

For bart-large-cnn, why are the numbers of input examples different for different batch_sizes? Could you also paste the result for the baseline?

This is because pytorch dataloader API only supports number of samples to be multiple of batch size. This is why last batch is dropped.
https://pytorch.org/docs/stable/data.html

The root cause may be here. Try to change it to be drop_last=False?

I already did.

Could you please explain more? I saw your code here use drop_last=True. I guess that's why the last batch was dropped. Do you mean you have tried drop_last=False but the last batch was still dropped?

Synced offline. Included last batch.

@NickNickGo NickNickGo merged commit 8d217ee into microsoft:main Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants