tracker: `generate` compatibility with `torch.compile`

# `generate` 🤜 🤛  `torch.compile`

Part of the [PyTorch 2024 H2 roadmap](https://drive.google.com/file/d/1Ucm17fyUeF0PWSd2g7jonM144XMx2t2r/view).

This issue is a tracker of the compatibility between `.generate` and `torch.compile` ([intro docs by pytorch](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html)). The goal is to enable `fullgraph=True` compilation on the main `generate` use cases.

⚠️ Is *your* `generate` use case not covered by this tracker? Check if it was requested below and upvote it if it was. Otherwise, add a comment. We will consider expanding the selection below on widely requested use cases 🤗 

### Decoding Strategies (end-to-end compilation)
- [x] `greedy_search` / `sample` are compatible (https://github.com/huggingface/transformers/pull/30788)
- [ ] `beam_search` / `beam_sample` are compatible, depends on the step above
- [ ] `assisted_decoding` (aka speculative decoding) is compatible, depends on the steps above

### Generate Flags and Options
- [ ] all `LogitsProcessor` classes were checked for compatibility (and the appropriate exceptions are raised when not compatible)
- [ ] all `StoppingCriteria` classes were checked for compatibility (and the appropriate exceptions are raised when not compatible)

### Models

Notes:
1. models tagged as "important models" in our CI + popular models
2. language models released starting from v4.42 should ALL support compile

Decoder-only:
- [x] GPT-J is compatible (#31421)
- [ ] GPT2 is compatible
- [x] Llama is compatible (#27931)
- [x] Gemma is compatible (#29167)
- [x] Llava is compatible (https://github.com/huggingface/transformers/pull/34502 removed dynamic control flow)
- [ ] Llava-Next is compatible (#29891)
- [x] Mistral is compatible (https://github.com/huggingface/transformers/pull/30642)
- [ ] Mixtral is compatible (https://github.com/huggingface/transformers/pull/30793/)
- [X] Phi is compatible (https://github.com/huggingface/transformers/pull/32617)
- [ ] Phi3 is compatible (https://github.com/huggingface/transformers/pull/30688)
- [X] BLOOM is compatible (note: this one might be tricky due to cache format) https://github.com/huggingface/transformers/pull/32617
- [x] Mamba is compatible (requested [here](https://github.com/huggingface/transformers/issues/29699#issuecomment-2072233223)) (PR: https://github.com/huggingface/transformers/pull/31247)
- [X] Persimmon is compatible (https://github.com/huggingface/transformers/pull/32617)
- [x] Qwen2 https://github.com/huggingface/transformers/pull/32617
- [x] Qwen2-VL https://github.com/huggingface/transformers/pull/32617
- [x] Falcon https://github.com/huggingface/transformers/pull/32617
- [x] GPTNeoX https://github.com/huggingface/transformers/pull/32617
- [x] Starcoder2 https://github.com/huggingface/transformers/pull/32617
- [x] StableLM https://github.com/huggingface/transformers/pull/32617

Encoder-decoder:
- [ ] BART is compatible (https://github.com/huggingface/transformers/pull/35314)
- [x] T5 is compatible (https://github.com/huggingface/transformers/pull/34089)
- [x] Whisper is compatible (https://github.com/huggingface/transformers/pull/31166)


### Quantization
- [ ] BNB support
- [ ] GPTQ support
- [ ] AWQ support

### Others
- [x] We have a benchmark script to quickly compare the impact of PRs
- [x] Add section to existing docs on the topic
- [x] Confirm that pipelines work after compiling generate


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tracker: `generate` compatibility with `torch.compile` #28981

`generate` 🤜 🤛 `torch.compile`

Decoding Strategies (end-to-end compilation)

Generate Flags and Options

Models

Quantization

Others

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tracker: generate compatibility with torch.compile #28981

Description

generate 🤜 🤛 torch.compile

Decoding Strategies (end-to-end compilation)

Generate Flags and Options

Models

Quantization

Others

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

tracker: `generate` compatibility with `torch.compile` #28981

`generate` 🤜 🤛 `torch.compile`