Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
5e7df83
starting attn refactor for encoder decoder models via bart (eager + s…
vasqu May 12, 2025
faf7914
flash attention works, remove unnecessary code
vasqu May 12, 2025
90a90d3
flex attention support for bart!, gotta check if the renaming is not …
vasqu May 12, 2025
259258d
some comments
vasqu May 12, 2025
afb2934
skip flex grad test for standalone as done with the other test
vasqu May 12, 2025
25db34a
revert flex attn rename (for now), sdpa simplify, and todos
vasqu May 13, 2025
131de1b
more todos
vasqu May 13, 2025
c8b8ed6
refactor mask creation for reuse
vasqu May 13, 2025
c0d83b6
modular attempt at biogpt
vasqu May 13, 2025
59cf07d
first batch of other models
vasqu May 13, 2025
146b02b
fix attn dropout
vasqu May 14, 2025
b7f0a2b
fix autoformer copies
vasqu May 14, 2025
00c27df
hubert
vasqu May 14, 2025
fc41dc2
another batch of models
vasqu May 14, 2025
1e2b4f0
copies/style + last round of bart models --> whisper next?
vasqu May 14, 2025
dccabeb
remove unnecessary _reshape function and remove copy to whisper
vasqu May 14, 2025
cecd0a4
add skip for decoder-only models out of enc-dec (same as in bart)
vasqu May 14, 2025
ac61dd7
bring back licences
vasqu May 14, 2025
a6e848d
remove comment, added to pr read instead
vasqu May 14, 2025
ddfc515
mostly docs
vasqu May 15, 2025
3e5da38
disable sew flex attn as it's unclear attn mask for now
vasqu May 15, 2025
9a9b140
oops
vasqu May 15, 2025
aecd5e2
test fixes for enc-dec
vasqu May 15, 2025
7bdb692
torch fx fixes + try at flex attn
vasqu May 15, 2025
f8260e6
skip on mbart
vasqu May 15, 2025
598a566
some more fixes
vasqu May 15, 2025
61b648f
musicgen skip / delete old attn class logic + sdpa compose compile skip
vasqu May 15, 2025
4316991
disable flex attn for musicgen, not worth the effort
vasqu May 15, 2025
6937106
more fixes and style
vasqu May 15, 2025
4f12347
flex attention test for dropout and encoder decoder that dont have ma…
vasqu May 15, 2025
05e38b1
informer fixes
vasqu May 15, 2025
9a8d4e4
the weirdest thing I've encountered yet...
vasqu May 15, 2025
2055759
style
vasqu May 15, 2025
adc808d
remove empty tensor attempt, found core root in previous commits
vasqu May 15, 2025
3d23455
disable time series due to tests being very text centric on inputs
vasqu May 16, 2025
8f9de86
add speech to text to be ignoring the other attns, also due to tests
vasqu May 16, 2025
b94c966
update docs
vasqu May 16, 2025
6f813cd
remaining issues resolved ?
vasqu May 16, 2025
3be2a9d
update docs for current state --> nllb moe and pegasus x sdpa is ques…
vasqu May 16, 2025
6dbd77a
some models have not set the is_causal flag...
vasqu May 16, 2025
dd3d307
change dtype in softmax tol old behaviour + some modular fixes
vasqu May 16, 2025
d77ea86
I hate it but it is what it is
vasqu May 16, 2025
71f7f1b
fixes from main for bart
vasqu May 16, 2025
8a43566
forgot this one
vasqu May 16, 2025
270c42a
some model fixes
vasqu May 16, 2025
cc6cae0
style
vasqu May 16, 2025
613d7ea
Merge branch 'main' into vas-enc-dec-attn-refactor
vasqu May 19, 2025
6369055
current status
vasqu May 19, 2025
66c93c1
marian works now
vasqu May 19, 2025
f8368cf
fixing some copies
vasqu May 19, 2025
f34d11d
some copy fixes + time series x informer
vasqu May 19, 2025
b2a2987
last models possibly and fixes on style/copies
vasqu May 20, 2025
dc180b2
Merge branch 'main' into vas-enc-dec-attn-refactor
vasqu May 20, 2025
917d3c9
some post merge fixes
vasqu May 20, 2025
776e3ca
more fixes
vasqu May 20, 2025
a27bfb9
make attention interface callable and move warnings there
vasqu May 20, 2025
a066c85
style lol
vasqu May 20, 2025
ece5b09
add comment to "unsupported"
vasqu May 20, 2025
ffdc566
remove callable interface and change interface warnings + some copies
vasqu May 21, 2025
63e38fa
fix
vasqu May 21, 2025
c8e10d1
ternary is ugly af, make it simpler
vasqu May 21, 2025
4742583
how did that happen
vasqu May 21, 2025
b43f3fd
fix flex attn test
vasqu May 21, 2025
e8a9139
failing the test
vasqu May 21, 2025
ab0754f
no more fallback! fixing copies next
vasqu May 22, 2025
c7c1499
style + attn fixed
vasqu May 22, 2025
e62a8ac
fixing copies and mask creation
vasqu May 22, 2025
cd39964
wrong copy
vasqu May 22, 2025
c450a3d
fixup tests and disable flex attn for now
vasqu May 22, 2025
aca05b7
Merge branch 'main' into vas-enc-dec-attn-refactor
vasqu May 22, 2025
22114cd
fixup last tests?
vasqu May 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions docs/source/en/model_doc/biogpt.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ rendered properly in your Markdown viewer.

<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>

Expand All @@ -40,13 +41,13 @@ This model was contributed by [kamalkraj](https://huggingface.co/kamalkraj). The

### Using Scaled Dot Product Attention (SDPA)

PyTorch includes a native scaled dot-product attention (SDPA) operator as part of `torch.nn.functional`. This function
encompasses several implementations that can be applied depending on the inputs and the hardware in use. See the
[official documentation](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html)
PyTorch includes a native scaled dot-product attention (SDPA) operator as part of `torch.nn.functional`. This function
encompasses several implementations that can be applied depending on the inputs and the hardware in use. See the
[official documentation](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html)
or the [GPU Inference](https://huggingface.co/docs/transformers/main/en/perf_infer_gpu_one#pytorch-scaled-dot-product-attention)
page for more information.

SDPA is used by default for `torch>=2.1.1` when an implementation is available, but you may also set
SDPA is used by default for `torch>=2.1.1` when an implementation is available, but you may also set
`attn_implementation="sdpa"` in `from_pretrained()` to explicitly request SDPA to be used.

```
Expand Down Expand Up @@ -109,7 +110,7 @@ we saw the following speedups during inference.
[[autodoc]] BioGptForCausalLM
- forward


## BioGptForTokenClassification

[[autodoc]] BioGptForTokenClassification
Expand Down
4 changes: 3 additions & 1 deletion docs/source/en/model_doc/blenderbot-small.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ rendered properly in your Markdown viewer.
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>

Note that [`BlenderbotSmallModel`] and
Expand Down Expand Up @@ -52,7 +54,7 @@ found [here](https://github.com/facebookresearch/ParlAI).

## Usage tips

Blenderbot Small is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
Blenderbot Small is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.


Expand Down
6 changes: 4 additions & 2 deletions docs/source/en/model_doc/blenderbot.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ rendered properly in your Markdown viewer.
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>

## Overview
Expand All @@ -45,7 +47,7 @@ This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The

## Usage tips and example

Blenderbot is a model with absolute position embeddings so it's usually advised to pad the inputs on the right
Blenderbot is a model with absolute position embeddings so it's usually advised to pad the inputs on the right
rather than the left.

An example:
Expand All @@ -71,7 +73,7 @@ An example:
`facebook/blenderbot_small_90M`, have a different architecture and consequently should be used with
[BlenderbotSmall](blenderbot-small).


## Resources

- [Causal language modeling task guide](../tasks/language_modeling)
Expand Down
4 changes: 3 additions & 1 deletion docs/source/en/model_doc/marian.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ rendered properly in your Markdown viewer.
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>

## Overview
Expand Down Expand Up @@ -155,7 +157,7 @@ Example of translating english to many romance languages, using old-style 2 char
>>> model = MarianMTModel.from_pretrained(model_name)
>>> translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
>>> tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
["c'est une phrase en anglais que nous voulons traduire en français",
["c'est une phrase en anglais que nous voulons traduire en français",
'Isto deve ir para o português.',
'Y esto al español']
```
Expand Down
8 changes: 4 additions & 4 deletions docs/source/en/model_doc/nllb-moe.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,10 @@ The original code can be found [here](https://github.com/facebookresearch/fairse

## Implementation differences with SwitchTransformers

The biggest difference is the way the tokens are routed. NLLB-MoE uses a `top-2-gate` which means that for each input, only the top two experts are selected based on the
highest predicted probabilities from the gating network, and the remaining experts are ignored. In `SwitchTransformers`, only the top-1 probabilities are computed,
which means that tokens have less probability of being forwarded. Moreover, if a token is not routed to any expert, `SwitchTransformers` still adds its unmodified hidden
states (kind of like a residual connection) while they are masked in `NLLB`'s top-2 routing mechanism.
The biggest difference is the way the tokens are routed. NLLB-MoE uses a `top-2-gate` which means that for each input, only the top two experts are selected based on the
highest predicted probabilities from the gating network, and the remaining experts are ignored. In `SwitchTransformers`, only the top-1 probabilities are computed,
which means that tokens have less probability of being forwarded. Moreover, if a token is not routed to any expert, `SwitchTransformers` still adds its unmodified hidden
states (kind of like a residual connection) while they are masked in `NLLB`'s top-2 routing mechanism.

## Generating with NLLB-MoE

Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/model_doc/pegasus.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ rendered properly in your Markdown viewer.
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>

## Overview
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/model_doc/pegasus_x.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ rendered properly in your Markdown viewer.

<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
</div>

## Overview
Expand Down
Loading