-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Description
System Info
Hi Team,
First of all huge thanks for all the great work you are doing.
Recently, I was benchmarking inference for T5 model on AWS EC2 ( G6E machine with L40 GPU) for batch sizes of 1, 2, 4.
I have heard tons about torch. compile and wanted to try it out and see if it reduces the inference time. Surprisingly, it did the other way around. On average, I saw an increase of ~1 sec in inference time for a sample size of 50 with a length of each sample ranging from [2200, 3000] characters, with an average of around 2550 chars.
I had a chat with a friend about this who told me that T5 is not a very suitable architecture for compilation yet and there are lots of graphbreaks. With his advice, I decided to open an issue here.
From my experience, T5 is still a very good model and I would want to see it work seamlessly with torch compile. If chance comes, I am ready to put my own time into this and contribute to the cause. Let me know what you think.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
AWS EC2 ( G6E machine with L40 GPU) for batch sizes of 1, 2, and 4.
Expected behavior
The inference time should reduce post compilation.