-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Closed
Labels
WIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Description
generate
🤜 🤛 torch.compile
Part of the PyTorch 2024 H2 roadmap.
This issue is a tracker of the compatibility between .generate
and torch.compile
(intro docs by pytorch). The goal is to enable fullgraph=True
compilation on the main generate
use cases.
generate
use case not covered by this tracker? Check if it was requested below and upvote it if it was. Otherwise, add a comment. We will consider expanding the selection below on widely requested use cases 🤗
Decoding Strategies (end-to-end compilation)
-
greedy_search
/sample
are compatible (Generate: end-to-end compilation #30788) -
beam_search
/beam_sample
are compatible, depends on the step above -
assisted_decoding
(aka speculative decoding) is compatible, depends on the steps above
Generate Flags and Options
- all
LogitsProcessor
classes were checked for compatibility (and the appropriate exceptions are raised when not compatible) - all
StoppingCriteria
classes were checked for compatibility (and the appropriate exceptions are raised when not compatible)
Models
Notes:
- models tagged as "important models" in our CI + popular models
- language models released starting from v4.42 should ALL support compile
Decoder-only:
- GPT-J is compatible (Cache: new Cache format in decoder-only models #31421)
- GPT2 is compatible
- Llama is compatible ([
Core generation
] Adds support for static KV cache #27931) - Gemma is compatible ([
gemma
] Adds support for Gemma 💎 #29167) - Llava is compatible (VLMs: major clean up 🧼 #34502 removed dynamic control flow)
- Llava-Next is compatible (LLaVA
torch.compile
implementation #29891) - Mistral is compatible (Add torch.compile for Mistral #30642)
- Mixtral is compatible (Add torch compile for mixtral #30793)
- Phi is compatible (Compile compatibilty for decoder-only models #32617)
- Phi3 is compatible (Phi: static cache & compile compatibility #30688)
- BLOOM is compatible (note: this one might be tricky due to cache format) Compile compatibilty for decoder-only models #32617
- Mamba is compatible (requested here) (PR: Add torch.compile Support For Mamba #31247)
- Persimmon is compatible (Compile compatibilty for decoder-only models #32617)
- Qwen2 Compile compatibilty for decoder-only models #32617
- Qwen2-VL Compile compatibilty for decoder-only models #32617
- Falcon Compile compatibilty for decoder-only models #32617
- GPTNeoX Compile compatibilty for decoder-only models #32617
- Starcoder2 Compile compatibilty for decoder-only models #32617
- StableLM Compile compatibilty for decoder-only models #32617
Encoder-decoder:
- BART is compatible (Bart: new cache format #35314)
- T5 is compatible (T5 compile compatibilty #34089)
- Whisper is compatible ([whisper] static kv cache #31166)
Quantization
- BNB support
- GPTQ support
- AWQ support
Others
- We have a benchmark script to quickly compare the impact of PRs
- Add section to existing docs on the topic
- Confirm that pipelines work after compiling generate
zucchini-nlp, SunMarc, Vaibhavs10, omerhac, y1xia0w and 8 more
Metadata
Metadata
Assignees
Labels
WIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress