Skip to content

[install-help]: Embedding server issue #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hulkito-nol opened this issue Apr 29, 2025 · 12 comments
Open

[install-help]: Embedding server issue #178

hulkito-nol opened this issue Apr 29, 2025 · 12 comments
Labels
help wanted Extra attention is needed

Comments

@hulkito-nol
Copy link

Describe the issue
Hello,
Before asking I have read I think all the threads about this Nextcloud application, here and everywhere…
I use the last version of Nextcloud AIO on a QNAP ts-464 NAS.
For the AI integration, I use the openAI connector application with a MistralAi paid account.
Because my current installation of the Context Chat Backend seems not to work (" Failed request (500): Embedding Request Error: Error: the embedding server is not responding") , I wonder what is this Embedding server ? I have seen the related configuration in the config.yaml but do I need an external application (server) to use it ? is it an Ollama or Local AI instance ? or an internal server in the application itself ?
Thanks in advance for your precious answers

Setup Details (please complete the following information):

  • Nextcloud Version: 31.0.2
  • AppAPI Version: 5.0.2
  • Context Chat PHP Version php8.3
  • Context Chat Backend Version 4.2.0
  • Nextcloud deployment method: Docker AIO
  • Context Chat Backend deployment method one-click
@hulkito-nol hulkito-nol added the help wanted Extra attention is needed label Apr 29, 2025
@hulkito-nol hulkito-nol changed the title [install-help]: <short description> [install-help]: Embedding server issue Apr 29, 2025
@hulkito-nol
Copy link
Author

Nobody to help me ? thanks in advance

@kyteinsky
Copy link
Contributor

kyteinsky commented May 3, 2025

hello,
thanks for looking up the available resources well in advance. The docs will be updated very soon to make the embedding server known.
We use a technique called Retrieval-augmented generation (RAG) where we generate embeddings (vectors or numbers) from the text of the Nextcloud documents to help find them easily. The embedding server that is included inside the docker container in the default install does this for us.

The shown error message will be improved too but it means that the internal embedding server could not start for some reason. The embedder's log files in the docker container can give us a clue.
Running these commands should get those files:

docker exec -it nc_app_context_chat_backend bash
tail /nc_app_context_chat_backend_data/embedding_server_*

@hulkito-nol
Copy link
Author

Hi, thank you for your help.
the only logs I have are these :
2025-05-04T10:49:13+0000: [ERROR|utils]: original traceback: Traceback (most recent call last): File "/app/context_chat_backend/utils.py", line 74, in exception_wrap resconn.send({ 'value': fun(*args, **kwargs), 'error': None }) ^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/chain/one_shot.py", line 65, in process_context_query db = vectordb_loader.load() ^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/dyn_loader.py", line 113, in load self.em_loader.load() File "/app/context_chat_backend/dyn_loader.py", line 92, in load raise EmbeddingException('Error: the embedding server is not responding') context_chat_backend.types.EmbeddingException: Error: the embedding server is not responding 2025-05-04T10:49:13+0000: [ERROR|controller]: Error occurred in an embedding request: /query: Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 73, in app response = await f(request) ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 301, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 214, in run_endpoint_function return await run_in_threadpool(dependant.call, **values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/starlette/concurrency.py", line 37, in run_in_threadpool return await anyio.to_thread.run_sync(func) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 962, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/controller.py", line 183, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/controller.py", line 466, in _ return execute_query(query) ^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/controller.py", line 455, in execute_query return exec_in_proc(target=target, args=args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/context_chat_backend/utils.py", line 97, in exec_in_proc raise result['error'] context_chat_backend.types.EmbeddingException: Error: the embedding server is not responding 2025-05-04T10:49:13+0000: [ERROR|utils]: Failed request (500): Embedding Request Error: Error: the embedding server is not responding INFO: 172.29.172.20:56048 - "POST /query HTTP/1.1" 500 Internal Server Error

The log files related to the embedding server are all empty....

@kyteinsky
Copy link
Contributor

ah I'm sorry the path for the logs is incorrect. /logs/ is missing there.
The correct command would be

docker exec nc_app_context_chat_backend cat /nc_app_context_chat_backend_data/logs/embedding_server_*

@hulkito-nol
Copy link
Author

Hi, as I saisd I have these logs but all are empty ...

Image

@kyteinsky
Copy link
Contributor

kyteinsky commented May 8, 2025

ah okay.
empty logs could mean it did not start at all. Would you mind posting your config?
docker exec nc_app_context_chat_backend cat /nc_app_context_chat_backend_data/config.yaml

and does your system have avx support? grep avx /proc/cpuinfo
also is anything running on port 5000? ss -tulpn | grep LISTEN | grep 5000

@hulkito-nol
Copy link
Author

hulkito-nol commented May 8, 2025

I have a QNAP TS-464, no AVX Docker support... so grep avx /proc/cpuinfo return nothing
ss -tulpn | grep LISTEN | grep 5000 :
tcp 0 0 :::5000 :::* LISTEN

I have tried to change the port in the config.yml but with no success

@hulkito-nol
Copy link
Author

my config :

  GNU nano 6.2                                                                                                                                                                                                      config.yaml
# SPDX-FileCopyrightText: 2024 Nextcloud GmbH and Nextcloud contributors
# SPDX-License-Identifier: AGPL-3.0-or-later
debug: true
uvicorn_log_level: debug
disable_aaa: false
httpx_verify_ssl: false
use_colors: true
uvicorn_workers: 1
embedding_chunk_size: 2000
doc_parser_worker_limit: 10


vectordb:
  pgvector:
    # all options: https://python.langchain.com/api_reference/postgres/vectorstores/langchain_postgres.vectorstores.PGVector.html
    # 'connection' overrides the env var 'CCB_DB_URL'

embedding:
  protocol: http
  host: 192.168.1.179
  port: 6787
  workers: 1
  offload_after_mins: 15 # in minutes
  request_timeout: 1800 # in seconds
  llama:
    # all options: https://python.langchain.com/api_reference/community/embeddings/langchain_community.embeddings.llamacpp.LlamaCppEmbeddings.html
    # 'model_alias' is reserved
    # 'embedding' is always set to True
    model: multilingual-e5-large-instruct-q6_k.gguf
    n_batch: 16
    n_ctx: 8192

llm:
  nc_texttotext:

  llama:
    # all options: https://python.langchain.com/api_reference/community/llms/langchain_community.llms.llamacpp.LlamaCpp.html
    model_path: dolphin-2.2.1-mistral-7b.Q5_K_M.gguf
    n_batch: 512
    n_ctx: 8192
    max_tokens: 4096
    template: "<|im_start|> system \nYou're an AI assistant named Nextcloud Assistant, good at finding relevant context from documents to answer questions provided by the user. <|im_end|>\n<|im_start|> user\nUse the following documents as context to answer the question at the end. REMEMBER to excersice source critisicm as the documents are returned by a search provider that can return unrelated documents.\n\nSTART OF CONTEXT:>
    no_ctx_template: "<|im_start|> system \nYou're an AI assistant named Nextcloud Assistant.<|im_end|>\n<|im_start|> user\n{question}<|im_end|>\n<|im_start|> assistant\n"
    end_separator: "<|im_end|>"

  ctransformer:
    # all options: https://python.langchain.com/api_reference/community/llms/langchain_community.llms.ctransformers.CTransformers.html
    model: dolphin-2.2.1-mistral-7b.Q5_K_M.gguf
    template: "<|im_start|> system \nYou're an AI assistant named Nextcloud Assistant, good at finding relevant context from documents to answer questions provided by the user. <|im_end|>\n<|im_start|> user\nUse the following documents as context to answer the question at the end. REMEMBER to excersice source critisicm as the documents are returned by a search provider that can return unrelated documents.\n\nSTART OF CONTEXT:>
    no_ctx_template: "<|im_start|> system \nYou're an AI assistant named Nextcloud Assistant.<|im_end|>\n<|im_start|> user\n{question}<|im_end|>\n<|im_start|> assistant\n"
    end_separator: "<|im_end|>"
    config:
      context_length: 8192
      max_new_tokens: 4096
      local_files_only: True

  hugging_face:
    # all options: https://python.langchain.com/api_reference/community/llms/langchain_community.llms.huggingface_pipeline.HuggingFacePipeline.html
    model_id: gpt2
    task: text-generation
    pipeline_kwargs:
      config:
        max_length: 200
    template: ""


@kyteinsky
Copy link
Contributor

I have a QNAP TS-464, no AVX Docker support... so grep avx /proc/cpuinfo return nothing

well that's a bummer. We don't support systems without AVX.
If this were a manual setup, you could re-compile llama-cpp-python without AVX support with CMAKE_ARGS="-DLLAMA_AVX2=OFF" pip install llama-cpp-python (untested). See https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#installation-configuration and abetlen/llama-cpp-python#284 (comment)

host: 192.168.1.179

Which IP has been used here? Can you try with host: localhost?
The embedding server is to be primarily accessed by the python ccb server so this would be enough to get it started at least.

@hulkito-nol
Copy link
Author

hulkito-nol commented May 8, 2025

its the IP of my host .but I have tried localhost,127.0.0.1, 0.0.0.0... same result

this is not a manual setup .I have install the context chat backend dorectly from the Nextcloud application

@hulkito-nol
Copy link
Author

hulkito-nol commented May 8, 2025

is it to execute in the container or do I rebuild a new image ?

If this were a manual setup, you could re-compile llama-cpp-python without AVX support with CMAKE_ARGS="-DLLAMA_AVX2=OFF" pip install llama-cpp-python (untested).

@kyteinsky
Copy link
Contributor

can you confirm is nothing is running on port 6787? And if something on your system is preventing the binding of the port like selinux or apparmor.

sudo lsof -i :6787
sudo netstat -ltnup  | grep 6787

also, the logs should show the exception message if the embedding server does not start. Would you mind upgrading context_chat_backend to 4.3.0 and posting the logs?

is it to execute in the container or do I rebuild a new image ?

to execute in the container. Unfortunately it has to be done after every update. We might change things in the long run to automate it for people wanting to customize the build process of llama-cpp-python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants