Skip to content

What's going on with T5 x torch.compile ? #33221

@kanpuriyanawab

Description

@kanpuriyanawab

System Info

Hi Team,
First of all huge thanks for all the great work you are doing.

Recently, I was benchmarking inference for T5 model on ‪AWS EC2 ( G6E machine with L40 GPU) for batch sizes of 1, 2, 4.

I have heard tons about torch. compile and wanted to try it out and see if it reduces the inference time. Surprisingly, it did the other way around. On average, I saw an increase of ~1 sec in inference time for a sample size of 50 with a length of each sample ranging from [2200, 3000] characters, with an average of around 2550 chars.

I had a chat with a friend about this who told me that T5 is not a very suitable architecture for compilation yet and there are lots of graphbreaks. With his advice, I decided to open an issue here.

From my experience, T5 is still a very good model and I would want to see it work seamlessly with torch compile. If chance comes, I am ready to put my own time into this and contribute to the cause. Let me know what you think.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

‪AWS EC2 ( G6E machine with L40 GPU) for batch sizes of 1, 2, and 4.

Expected behavior

The inference time should reduce post compilation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CompilationIssues related to torchdynamo and torchinductorWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressbug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions