server: Error\nvector::_M_default_append when using certain models since "llama_chat_apply_template"

### Description
Server returns "500 Internal Server Error\nvector::_M_default_append" when using certain models trying to use model's template with docker cuda image.

### Steps to Reproduce

I'm using openAI in python:

def api_openai(placeholder, system_prompt, user_prompt, temperature, logit_bias):
    full_response = ""
    for response in openai_client.chat.completions.create(
            model=st.session_state["openai_model"],
            messages=[{"role": "system",
                       "content": system_prompt},
                      {"role": "user",
                       "content": user_prompt}],
            stream=True, temperature=temperature, frequency_penalty=1, logit_bias=logit_bias):
        full_response += (response.choices[0].delta.content or "")
        placeholder.info(full_response + "▌")

    return full_response

### Actual Behavior
"500 Internal Server Error\nvector::_M_default_append"

### Screenshots
![Untitled](https://github.com/ggerganov/llama.cpp/assets/159903033/c56eb02a-7cb3-4a37-bb19-1cb3d860c5f5)


### Environment
- Operating System: Docker
- Docker compose: 
       api-server:
    container_name: api-server
    image: ghcr.io/ggerganov/llama.cpp:server-cuda
    command: >
      -m models/alphamonarch-7b.Q5_K_M.gguf
      --ctx-size 8192
      --host 0.0.0.0
      --port 8080
      --n-gpu-layers 1000
      -np 1
      -cb
      --grp-attn-n 4
      --grp-attn-w 2048
      --api-key key
      --verbose
    ports:
      - "8080:8080"
      
- Models that failed: 
- https://huggingface.co/mlabonne/AlphaMonarch-7B-GGUF
- https://huggingface.co/CultriX/OmniBeagle-7B-GGUF

### Additional Information
Models that I've tried that works: 
https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF
https://huggingface.co/brittlewis12/NeuralDaredevil-7B-GGUF

### Related Issues
i used #5593

### Proposed Solution
I think that the problem could be related to the extracted chat_template, in Hugginface are using without problems "tokenizer.apply_chat_template" but i don't know if llama.cpp implementation works like that

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: Error\nvector::_M_default_append when using certain models since "llama_chat_apply_template" #5627

Description

Steps to Reproduce

Actual Behavior

Screenshots

Environment

Additional Information

Related Issues

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server: Error\nvector::_M_default_append when using certain models since "llama_chat_apply_template" #5627

Description

Description

Steps to Reproduce

Actual Behavior

Screenshots

Environment

Additional Information

Related Issues

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions