Fix support for hardware accelerated embedding generation via ollama #2008

shanbady · 2025-01-31T19:21:59Z

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/6649

Description (What does it do?)

This PR re-enables support for using ollama as an embedding provider. initial support for this was broken during a recent update of litellm - see ticket. Someone posted a workaround of using the openai compatable endpoint which seems to work.

How can this be tested?

Checkout this branch
Follow the updated instructions in the docs on this branch

gumaerc · 2025-02-04T00:24:12Z

After going through getting this set up locally in both ollama and llama.cpp, I get the same error regardless of where the model is running:

[2025-02-04 00:21:46] WARNING 1341 [root] litellm.py:25 - [c6a22163e13a] - Model all-minilm-l6-v2-q4_k_m not found in tiktoken. defaulting to None
[2025-02-04 00:21:46] INFO 1341 [httpx] _client.py:1786 - [c6a22163e13a] - HTTP Request: POST http://192.168.1.50:11434/api/embed "HTTP/1.1 404 Not Found"

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.


Provider List: https://docs.litellm.ai/docs/providers

Traceback (most recent call last):
  File "/opt/venv/lib/python3.12/site-packages/litellm/main.py", line 3608, in embedding
    response = ollama_embeddings_fn(  # type: ignore
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/llms/ollama/completion/handler.py", line 92, in ollama_embeddings
    return asyncio.run(
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/llms/ollama/completion/handler.py", line 53, in ollama_aembeddings
    response = await litellm.module_level_aclient.post(url=url, json=data)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_utils.py", line 131, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 232, in post
    raise e
  File "/opt/venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 190, in post
    response.raise_for_status()
  File "/opt/venv/lib/python3.12/site-packages/httpx/_models.py", line 763, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '404 Not Found' for url 'http://192.168.1.50:11434/api/embed'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/src/manage.py", line 29, in <module>
    execute_from_command_line(sys.argv)
  File "/opt/venv/lib/python3.12/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/opt/venv/lib/python3.12/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/opt/venv/lib/python3.12/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/opt/venv/lib/python3.12/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/vector_search/management/commands/generate_embeddings.py", line 74, in handle
    create_qdrand_collections(force_recreate=True)
  File "/src/vector_search/utils.py", line 89, in create_qdrand_collections
    size=encoder.dim(), distance=models.Distance.COSINE
         ^^^^^^^^^^^^^
  File "/src/vector_search/encoders/base.py", line 33, in dim
    return len(self.embed("test"))
               ^^^^^^^^^^^^^^^^^^
  File "/src/vector_search/encoders/base.py", line 27, in embed
    return next(iter(self.embed_documents([text])))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/vector_search/encoders/litellm.py", line 28, in embed_documents
    return [result["embedding"] for result in self.get_embedding(documents)["data"]]
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/vector_search/encoders/litellm.py", line 32, in get_embedding
    return embedding(
           ^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/utils.py", line 1100, in wrapper
    raise e
  File "/opt/venv/lib/python3.12/site-packages/litellm/utils.py", line 978, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/main.py", line 3767, in embedding
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2190, in exception_type
    raise e
  File "/opt/venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2159, in exception_type
    raise APIConnectionError(
litellm.exceptions.APIConnectionError: litellm.APIConnectionError: OllamaException - Client error '404 Not Found' for url 'http://192.168.1.50:11434/api/embed'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

All this giant stack trace is really saying is that it got a 404 back from the LLM server, which it is indeed returning:

{"error":{"code":404,"message":"File Not Found","type":"not_found_error"}}

I assume that this message is coming back because embeddings have not yet been created and it's trying to query them?

gumaerc

👍

I finally got this working, and the 404 above was a "problem between chair and keyboard" where I was on the wrong branch. It did give me the opportunity to learn how to use llama.cpp though and how to run models with GPU acceleration.

shanbady added 3 commits January 31, 2025 14:01

config changes

a4a85da

fixing default settings:

9f9cd8e

updating docs

12aa59c

shanbady added the Needs Review An open Pull Request that is ready for review label Jan 31, 2025

shanbady marked this pull request as ready for review January 31, 2025 19:23

gumaerc self-assigned this Feb 3, 2025

gumaerc approved these changes Feb 4, 2025

View reviewed changes

shanbady merged commit 0b336b9 into main Feb 4, 2025
11 checks passed

shanbady deleted the shanbady/ollama-workaround branch February 4, 2025 22:37

odlbot mentioned this pull request Feb 6, 2025

Release 0.30.1 #2018

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix support for hardware accelerated embedding generation via ollama #2008

Fix support for hardware accelerated embedding generation via ollama #2008

Uh oh!

shanbady commented Jan 31, 2025 •

edited

Loading

Uh oh!

gumaerc commented Feb 4, 2025

Uh oh!

gumaerc left a comment

Uh oh!

Uh oh!

Uh oh!

Fix support for hardware accelerated embedding generation via ollama #2008

Fix support for hardware accelerated embedding generation via ollama #2008

Uh oh!

Conversation

shanbady commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What are the relevant tickets?

Description (What does it do?)

How can this be tested?

Uh oh!

gumaerc commented Feb 4, 2025

Uh oh!

gumaerc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shanbady commented Jan 31, 2025 •

edited

Loading