Skip to content

Fix support for hardware accelerated embedding generation via ollama #2008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 4, 2025

Conversation

shanbady
Copy link
Contributor

@shanbady shanbady commented Jan 31, 2025

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/6649

Description (What does it do?)

This PR re-enables support for using ollama as an embedding provider. initial support for this was broken during a recent update of litellm - see ticket. Someone posted a workaround of using the openai compatable endpoint which seems to work.

How can this be tested?

  1. Checkout this branch
  2. Follow the updated instructions in the docs on this branch

@shanbady shanbady added the Needs Review An open Pull Request that is ready for review label Jan 31, 2025
@shanbady shanbady marked this pull request as ready for review January 31, 2025 19:23
@gumaerc gumaerc self-assigned this Feb 3, 2025
@gumaerc
Copy link
Contributor

gumaerc commented Feb 4, 2025

After going through getting this set up locally in both ollama and llama.cpp, I get the same error regardless of where the model is running:

[2025-02-04 00:21:46] WARNING 1341 [root] litellm.py:25 - [c6a22163e13a] - Model all-minilm-l6-v2-q4_k_m not found in tiktoken. defaulting to None
[2025-02-04 00:21:46] INFO 1341 [httpx] _client.py:1786 - [c6a22163e13a] - HTTP Request: POST http://192.168.1.50:11434/api/embed "HTTP/1.1 404 Not Found"

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.


Provider List: https://docs.litellm.ai/docs/providers

Traceback (most recent call last):
  File "/opt/venv/lib/python3.12/site-packages/litellm/main.py", line 3608, in embedding
    response = ollama_embeddings_fn(  # type: ignore
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/llms/ollama/completion/handler.py", line 92, in ollama_embeddings
    return asyncio.run(
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/llms/ollama/completion/handler.py", line 53, in ollama_aembeddings
    response = await litellm.module_level_aclient.post(url=url, json=data)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_utils.py", line 131, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 232, in post
    raise e
  File "/opt/venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 190, in post
    response.raise_for_status()
  File "/opt/venv/lib/python3.12/site-packages/httpx/_models.py", line 763, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '404 Not Found' for url 'http://192.168.1.50:11434/api/embed'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/src/manage.py", line 29, in <module>
    execute_from_command_line(sys.argv)
  File "/opt/venv/lib/python3.12/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/opt/venv/lib/python3.12/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/opt/venv/lib/python3.12/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/opt/venv/lib/python3.12/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/vector_search/management/commands/generate_embeddings.py", line 74, in handle
    create_qdrand_collections(force_recreate=True)
  File "/src/vector_search/utils.py", line 89, in create_qdrand_collections
    size=encoder.dim(), distance=models.Distance.COSINE
         ^^^^^^^^^^^^^
  File "/src/vector_search/encoders/base.py", line 33, in dim
    return len(self.embed("test"))
               ^^^^^^^^^^^^^^^^^^
  File "/src/vector_search/encoders/base.py", line 27, in embed
    return next(iter(self.embed_documents([text])))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/vector_search/encoders/litellm.py", line 28, in embed_documents
    return [result["embedding"] for result in self.get_embedding(documents)["data"]]
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/vector_search/encoders/litellm.py", line 32, in get_embedding
    return embedding(
           ^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/utils.py", line 1100, in wrapper
    raise e
  File "/opt/venv/lib/python3.12/site-packages/litellm/utils.py", line 978, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/main.py", line 3767, in embedding
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2190, in exception_type
    raise e
  File "/opt/venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2159, in exception_type
    raise APIConnectionError(
litellm.exceptions.APIConnectionError: litellm.APIConnectionError: OllamaException - Client error '404 Not Found' for url 'http://192.168.1.50:11434/api/embed'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

All this giant stack trace is really saying is that it got a 404 back from the LLM server, which it is indeed returning:

{"error":{"code":404,"message":"File Not Found","type":"not_found_error"}}

I assume that this message is coming back because embeddings have not yet been created and it's trying to query them?

Copy link
Contributor

@gumaerc gumaerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I finally got this working, and the 404 above was a "problem between chair and keyboard" where I was on the wrong branch. It did give me the opportunity to learn how to use llama.cpp though and how to run models with GPU acceleration.

@shanbady shanbady merged commit 0b336b9 into main Feb 4, 2025
11 checks passed
@shanbady shanbady deleted the shanbady/ollama-workaround branch February 4, 2025 22:37
@odlbot odlbot mentioned this pull request Feb 6, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review An open Pull Request that is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants