Support Accept text/event-stream in chat and completion endpoints #1088

aniljava · 2024-01-15T19:11:27Z

Addresses : #1083

This would allow the endpoint to accept both Accept headers, application/json and text/event-stream.

Response model for the SSE response is left as str, i dont think openapi currently has mechanism to specify model for each events currently and using the list of chunk type might conflict if the client code is generated.

OpenAI allows Accept: text/event-stream, but does not use it as a flag for stream. It needs to be provided explicitly as a parameter to POST.

…solves abetlen#1083

thiner · 2024-01-16T10:04:02Z

I tried to build this PR into a docker image. But when I ran the container, it's failed to startup with below error:

 Traceback (most recent call last):

   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main

     return _run_code(code, main_globals, None,

   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code

     exec(code, run_globals)

   File "/llama_cpp/server/__main__.py", line 88, in <module>

     main()

   File "/llama_cpp/server/__main__.py", line 74, in main

     app = create_app(

   File "/llama_cpp/server/app.py", line 133, in create_app

     set_llama_proxy(model_settings=model_settings)

   File "/llama_cpp/server/app.py", line 70, in set_llama_proxy

     _llama_proxy = LlamaProxy(models=model_settings)

   File "/llama_cpp/server/model.py", line 27, in __init__

     self._current_model = self.load_llama_from_model_settings(

   File "/llama_cpp/server/model.py", line 92, in load_llama_from_model_settings

     _model = llama_cpp.Llama(

   File "/llama_cpp/llama.py", line 861, in __init__

     raise ValueError(

 ValueError: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1

I was trying to load model TheBloke/openbuddy-mixtral-7bx8-v16.3-32k.Q5_K_M.gguf, and I have set the ENV LLAMA_MAX_DEVICES to 2. tensor_split: 0.5 0.5.
The same setting is working well with v0.2.28

abetlen · 2024-01-16T17:52:38Z

@aniljava thanks for catching this, it looks good to me, hopefully it fixes the issue in #1083

@thiner I think that's seperate, do you mind opening a new issue?

Support Accept text/event-stream in chat and completion endpoints, re…

ed69903

…solves abetlen#1083

aniljava mentioned this pull request Jan 15, 2024

Unexpected end of JSON input #1083

Closed

4 tasks

Merge branch 'main' into accept-event-stream

85851e5

abetlen merged commit cfb7da9 into abetlen:main Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Accept text/event-stream in chat and completion endpoints #1088

Support Accept text/event-stream in chat and completion endpoints #1088

Uh oh!

aniljava commented Jan 15, 2024

Uh oh!

thiner commented Jan 16, 2024

Uh oh!

abetlen commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support Accept text/event-stream in chat and completion endpoints #1088

Support Accept text/event-stream in chat and completion endpoints #1088

Uh oh!

Conversation

aniljava commented Jan 15, 2024

Uh oh!

thiner commented Jan 16, 2024

Uh oh!

abetlen commented Jan 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants