How to use TGI for a finetuned LORA adapter of llama-3.3-70B using unsloth? #3278

InderjeetVishnoi · 2025-06-26T07:31:43Z

InderjeetVishnoi
Jun 26, 2025

Hi ,

I've fine-tuned a LoRA adapter on the LLaMA-3.3-70B model for my task and am exploring using TGI for optimized inference. From what I understand, TGI doesn’t natively support loading adapters directly for inference — is there a workaround for this?

Do I need to merge the adapter with the base model before serving it via TGI? I’ve been trying to avoid that due to compute constraints.

Any guidance or suggestions would be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use TGI for a finetuned LORA adapter of llama-3.3-70B using unsloth? #3278

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to use TGI for a finetuned LORA adapter of llama-3.3-70B using unsloth? #3278

Uh oh!

InderjeetVishnoi Jun 26, 2025

Replies: 0 comments

InderjeetVishnoi
Jun 26, 2025