Skip to content

tracker: generate compatibility with torch.compile #28981

@gante

Description

@gante

generate 🤜 🤛 torch.compile

Part of the PyTorch 2024 H2 roadmap.

This issue is a tracker of the compatibility between .generate and torch.compile (intro docs by pytorch). The goal is to enable fullgraph=True compilation on the main generate use cases.

⚠️ Is your generate use case not covered by this tracker? Check if it was requested below and upvote it if it was. Otherwise, add a comment. We will consider expanding the selection below on widely requested use cases 🤗

Decoding Strategies (end-to-end compilation)

  • greedy_search / sample are compatible (Generate: end-to-end compilation #30788)
  • beam_search / beam_sample are compatible, depends on the step above
  • assisted_decoding (aka speculative decoding) is compatible, depends on the steps above

Generate Flags and Options

  • all LogitsProcessor classes were checked for compatibility (and the appropriate exceptions are raised when not compatible)
  • all StoppingCriteria classes were checked for compatibility (and the appropriate exceptions are raised when not compatible)

Models

Notes:

  1. models tagged as "important models" in our CI + popular models
  2. language models released starting from v4.42 should ALL support compile

Decoder-only:

Encoder-decoder:

Quantization

  • BNB support
  • GPTQ support
  • AWQ support

Others

  • We have a benchmark script to quickly compare the impact of PRs
  • Add section to existing docs on the topic
  • Confirm that pipelines work after compiling generate

Metadata

Metadata

Assignees

Labels

WIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions