How to use TGI for a finetuned LORA adapter of llama-3.3-70B using unsloth? #3278
                  
                    
                      InderjeetVishnoi
                    
                  
                
                  started this conversation in
                General
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hi ,
I've fine-tuned a LoRA adapter on the LLaMA-3.3-70B model for my task and am exploring using TGI for optimized inference. From what I understand, TGI doesn’t natively support loading adapters directly for inference — is there a workaround for this?
Do I need to merge the adapter with the base model before serving it via TGI? I’ve been trying to avoid that due to compute constraints.
Any guidance or suggestions would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions