Skip to content

Conversation

aniljava
Copy link
Contributor

Addresses : #1083

This would allow the endpoint to accept both Accept headers, application/json and text/event-stream.

Response model for the SSE response is left as str, i dont think openapi currently has mechanism to specify model for each events currently and using the list of chunk type might conflict if the client code is generated.

OpenAI allows Accept: text/event-stream, but does not use it as a flag for stream. It needs to be provided explicitly as a parameter to POST.

@aniljava aniljava mentioned this pull request Jan 15, 2024
4 tasks
@thiner
Copy link

thiner commented Jan 16, 2024

I tried to build this PR into a docker image. But when I ran the container, it's failed to startup with below error:

 Traceback (most recent call last):

   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main

     return _run_code(code, main_globals, None,

   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code

     exec(code, run_globals)

   File "/llama_cpp/server/__main__.py", line 88, in <module>

     main()

   File "/llama_cpp/server/__main__.py", line 74, in main

     app = create_app(

   File "/llama_cpp/server/app.py", line 133, in create_app

     set_llama_proxy(model_settings=model_settings)

   File "/llama_cpp/server/app.py", line 70, in set_llama_proxy

     _llama_proxy = LlamaProxy(models=model_settings)

   File "/llama_cpp/server/model.py", line 27, in __init__

     self._current_model = self.load_llama_from_model_settings(

   File "/llama_cpp/server/model.py", line 92, in load_llama_from_model_settings

     _model = llama_cpp.Llama(

   File "/llama_cpp/llama.py", line 861, in __init__

     raise ValueError(

 ValueError: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1

I was trying to load model TheBloke/openbuddy-mixtral-7bx8-v16.3-32k.Q5_K_M.gguf, and I have set the ENV LLAMA_MAX_DEVICES to 2. tensor_split: 0.5 0.5.
The same setting is working well with v0.2.28

@abetlen
Copy link
Owner

abetlen commented Jan 16, 2024

@aniljava thanks for catching this, it looks good to me, hopefully it fixes the issue in #1083

@thiner I think that's seperate, do you mind opening a new issue?

@abetlen abetlen merged commit cfb7da9 into abetlen:main Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants