Skip to content

Releases: LlamaEdge/rag-api-server

LlamaEdge-RAG 0.13.5

10 Feb 15:23

Choose a tag to compare

Major change:

  • Upgrade llama-core dependency to 0.26.7

LlamaEdge-RAG 0.13.4

08 Feb 16:49

Choose a tag to compare

Major changes:

  • Improve the streaming workflow

LlamaEdge-RAG 0.13.3

06 Feb 04:29

Choose a tag to compare

Major changes:

  • Update prompt type check against tool-use models in stream mode

LlamaEdge-RAG 0.13.2

05 Feb 08:22

Choose a tag to compare

Major changes

Known issue:

  • Prompt type check blocked Mistral-Small-24B-Instruct-2501 in stream mode

LlamaEdge-RAG 0.13.1

01 Feb 09:15

Choose a tag to compare

Major changes:

LlamaEdge-RAG 0.13.0

23 Jan 09:31

Choose a tag to compare

Major changes:

  • Support keyword search

    • (NEW) Add --kw-search-url CLI option for specifying the url of keyword search server
    • (BREAKING) Change the type of response body returned by /v1/create/rag endpoint from EmbeddingResponse to CreateRagResponse
  • Upgrade dependencies

    • chat-prompts v0.19.1
    • endpoints v0.24.1
    • llama-core v0.26.2

LlamaEdge-RAG 0.12.1

13 Jan 06:51

Choose a tag to compare

Major changes:

  • (NEW) Add the --ubatch-size CLI option

LlamaEdge-RAG 0.12.0

09 Jan 14:30

Choose a tag to compare

Major changes:

  • (New) Add the --split-mode CLI option
  • (BREAKING) Update the --n-predict CLI option
    • Update the type to i32
    • Update the default value to -1. Keep it consistent with the --n-predict CLI option of llama.cpp
  • Upgrade deps:
    • llama-core v0.26.0
    • chat-prompts v0.19.0
    • endpoints v0.24.0

LlamaEdge-RAG 0.11.2

06 Jan 18:15

Choose a tag to compare

Major changes:

  • Upgrade deps:
    • llama-core v0.25.3
    • chat-prompts v0.18.6
    • endpoints v0.23.2

LlamaEdge-RAG 0.11.1

21 Dec 15:58

Choose a tag to compare

Major changes:

  • (New) Support API key

    • Use API_KEY environment variable to set api-key when start API server, for example
      export LLAMA_API_KEY=12345-6789-abcdef
      wasmedge --dir .:. --env API_KEY=$LLAMA_API_KEY \
        --nn-preload default:GGML:AUTO:Llama-3.2-3B-Instruct-Q5_K_M.gguf \
        --nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5.f16.gguf \
        rag-api-server.wasm \
        ...
    • Send each request with the corresponding api-key, for example
      curl --location 'http://localhost:8080/v1/chat/completions' \
      --header 'Authorization: Bearer 12345-6789-abcdef' \
      --header 'Content-Type: application/json' \
      --data '...'
  • (New) Add --context-window CLI option for specifying the maximum number of user messages for the context retrieval. Note that if the context_window field in the chat completion request appears, then ignore the setting of the CLI option.

    --context-window <CONTEXT_WINDOW>
              Maximum number of user messages used in the retrieval [default: 1]