-
Notifications
You must be signed in to change notification settings - Fork 3
Fix support for hardware accelerated embedding generation via ollama #2008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
After going through getting this set up locally in both ollama and llama.cpp, I get the same error regardless of where the model is running:
All this giant stack trace is really saying is that it got a 404 back from the LLM server, which it is indeed returning:
I assume that this message is coming back because embeddings have not yet been created and it's trying to query them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I finally got this working, and the 404 above was a "problem between chair and keyboard" where I was on the wrong branch. It did give me the opportunity to learn how to use llama.cpp
though and how to run models with GPU acceleration.
What are the relevant tickets?
Closes https://github.com/mitodl/hq/issues/6649
Description (What does it do?)
This PR re-enables support for using ollama as an embedding provider. initial support for this was broken during a recent update of litellm - see ticket. Someone posted a workaround of using the openai compatable endpoint which seems to work.
How can this be tested?