Releases: LlamaEdge/rag-api-server
Releases · LlamaEdge/rag-api-server
LlamaEdge-RAG 0.13.5
Major change:
- Upgrade
llama-coredependency to0.26.7
LlamaEdge-RAG 0.13.4
Major changes:
- Improve the streaming workflow
LlamaEdge-RAG 0.13.3
Major changes:
- Update prompt type check against tool-use models in stream mode
LlamaEdge-RAG 0.13.2
Major changes
- Support tool use of second-state/Mistral-Small-24B-Instruct-2501-GGUF
- Upgrade
chat-promptsto0.21.0 - Upgrade
llama-coreto0.26.4
Known issue:
- Prompt type check blocked
Mistral-Small-24B-Instruct-2501in stream mode
LlamaEdge-RAG 0.13.1
Major changes:
- Support Mistral-Small-24B-Instruct-2501-GGUF
- Upgrade deps:
llama-core v0.26.3chat-prompts v0.20.0
LlamaEdge-RAG 0.13.0
Major changes:
-
Support keyword search
- (NEW) Add
--kw-search-urlCLI option for specifying the url of keyword search server - (BREAKING) Change the type of response body returned by
/v1/create/ragendpoint from EmbeddingResponse to CreateRagResponse
- (NEW) Add
-
Upgrade dependencies
chat-prompts v0.19.1endpoints v0.24.1llama-core v0.26.2
LlamaEdge-RAG 0.12.1
Major changes:
- (NEW) Add the
--ubatch-sizeCLI option
LlamaEdge-RAG 0.12.0
Major changes:
- (New) Add the
--split-modeCLI option - (BREAKING) Update the
--n-predictCLI option- Update the type to
i32 - Update the default value to
-1. Keep it consistent with the--n-predictCLI option ofllama.cpp
- Update the type to
- Upgrade deps:
llama-core v0.26.0chat-prompts v0.19.0endpoints v0.24.0
LlamaEdge-RAG 0.11.2
Major changes:
- Upgrade deps:
llama-core v0.25.3chat-prompts v0.18.6endpoints v0.23.2
LlamaEdge-RAG 0.11.1
Major changes:
-
(New) Support API key
- Use
API_KEYenvironment variable to set api-key when start API server, for exampleexport LLAMA_API_KEY=12345-6789-abcdef wasmedge --dir .:. --env API_KEY=$LLAMA_API_KEY \ --nn-preload default:GGML:AUTO:Llama-3.2-3B-Instruct-Q5_K_M.gguf \ --nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5.f16.gguf \ rag-api-server.wasm \ ...
- Send each request with the corresponding api-key, for example
curl --location 'http://localhost:8080/v1/chat/completions' \ --header 'Authorization: Bearer 12345-6789-abcdef' \ --header 'Content-Type: application/json' \ --data '...'
- Use
-
(New) Add
--context-windowCLI option for specifying the maximum number of user messages for the context retrieval. Note that if thecontext_windowfield in the chat completion request appears, then ignore the setting of the CLI option.--context-window <CONTEXT_WINDOW> Maximum number of user messages used in the retrieval [default: 1]