Hi Team,
Thanks a lot for this.
Few questions -
-
Is the speedup only for GPU or the inference on CPU is also boosted?
-
Wondering if an inference example with T5/BART summarization from Huggingface etc can be provided in a colab notebook or so. Easier to adopt.
Sorry if it is a bit of a stretch to request this. Appreciate you reading this.