Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@ $.validator.addMethod( "creditcard", function( value, element ) {
}, "Please enter a valid credit card number." );

/* NOTICE: Modified version of Castle.Components.Validator.CreditCardValidator
* Redistributed under the the Apache License 2.0 at http://www.apache.org/licenses/LICENSE-2.0
* Redistributed under the Apache License 2.0 at http://www.apache.org/licenses/LICENSE-2.0
* Valid Types: mastercard, visa, amex, dinersclub, enroute, discover, jcb, unknown, all (overrides all other settings)
*/
$.validator.addMethod( "creditcardtypes", function( value, element, param ) {
Expand Down
2 changes: 1 addition & 1 deletion LLama/ChatSession.cs
Original file line number Diff line number Diff line change
Expand Up @@ -637,7 +637,7 @@ public record SessionState
public IHistoryTransform HistoryTransform { get; set; } = new LLamaTransforms.DefaultHistoryTransform();

/// <summary>
/// The the chat history messages for this session.
/// The chat history messages for this session.
/// </summary>
public ChatHistory.Message[] History { get; set; } = [ ];

Expand Down
2 changes: 1 addition & 1 deletion LLama/Native/SafeLlamaModelHandle.cs
Original file line number Diff line number Diff line change
Expand Up @@ -702,7 +702,7 @@ public int Count
}

/// <summary>
/// Get the the type of this vocabulary
/// Get the type of this vocabulary
/// </summary>
public LLamaVocabType Type
{
Expand Down
2 changes: 1 addition & 1 deletion docs/Architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The figure below shows the core framework structure of LLamaSharp.

- **Native APIs**: LLamaSharp calls the exported C APIs to load and run the model. The APIs defined in LLamaSharp specially for calling C APIs are named `Native APIs`. We have made all the native APIs public under namespace `LLama.Native`. However, it's strongly recommended not to use them unless you know what you are doing.
- **LLamaWeights**: The holder of the model weight.
- **LLamaContext**: A context which directly interact with the native library and provide some basic APIs such as tokenization and embedding. It takes use of `LLamaWeights`.
- **LLamaContext**: A context which directly interacts with the native library and provides some basic APIs such as tokenization and embedding. It takes use of `LLamaWeights`.
- **LLamaExecutors**: Executors which define the way to run the LLama model. It provides text-to-text and image-to-text APIs to make it easy to use. Currently we provide four kinds of executors: `InteractiveExecutor`, `InstructExecutor`, `StatelessExecutor` and `BatchedExecutor`.
- **ChatSession**: A wrapping for `InteractiveExecutor` and `LLamaContext`, which supports interactive tasks and saving/re-loading sessions. It also provides a flexible way to customize the text process by `IHistoryTransform`, `ITextTransform` and `ITextStreamTransform`.
- **Integrations**: Integrations with other libraries to expand the application of LLamaSharp. For example, if you want to do RAG ([Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Prompt_engineering#Retrieval-augmented_generation)), kernel-memory integration is a good option for you.
Expand Down
6 changes: 3 additions & 3 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Generally, there are two possible cases for this problem:

Please set anti-prompt or max-length when executing the inference.

Anti-prompt can also be called as "Stop-keyword", which decides when to stop the response generation. Under interactive mode, the maximum tokens count is always not set, which makes the LLM generates responses infinitively. Therefore, setting anti-prompt correctly helps a lot to avoid the strange behaviours. For example, the prompt file `chat-with-bob.txt` has the following content:
Anti-prompt can also be called as "Stop-keyword", which decides when to stop the response generation. Under interactive mode, the maximum tokens count is always not set, which makes the LLM generate responses infinitively. Therefore, setting anti-prompt correctly helps a lot to avoid the strange behaviours. For example, the prompt file `chat-with-bob.txt` has the following content:

```
Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.
Expand All @@ -43,7 +43,7 @@ User:

Therefore, the anti-prompt should be set as "User:". If the last line of the prompt is removed, LLM will automatically generate a question (user) and a response (bob) for one time when running the chat session. Therefore, the antiprompt is suggested to be appended to the prompt when starting a chat session.

What if an extra line is appended? The string "User:" in the prompt will be followed with a char "\n". Thus when running the model, the automatic generation of a pair of question and response may appear because the anti-prompt is "User:" but the last token is "User:\n". As for whether it will appear, it's an undefined behaviour, which depends on the implementation inside the `LLamaExecutor`. Anyway, since it may leads to unexpected behaviors, it's recommended to trim your prompt or carefully keep consistent with your anti-prompt.
What if an extra line is appended? The string "User:" in the prompt will be followed with a char "\n". Thus when running the model, the automatic generation of a pair of question and response may appear because the anti-prompt is "User:" but the last token is "User:\n". As for whether it will appear, it's an undefined behaviour, which depends on the implementation inside the `LLamaExecutor`. Anyway, since it may lead to unexpected behaviors, it's recommended to trim your prompt or carefully keep consistent with your anti-prompt.

## How to run LLM with non-English languages

Expand All @@ -59,6 +59,6 @@ $$ len(prompt) + len(response) < len(context) $$

In this inequality, `len(response)` refers to the expected tokens for LLM to generate.

## Choose models weight depending on you task
## Choose models weight depending on your task

The differences between modes may lead to much different behaviours under the same task. For example, if you're building a chat bot with non-English, a fine-tuned model specially for the language you want to use will have huge effect on the performance.
2 changes: 1 addition & 1 deletion docs/QuickStart.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ PM> Install-Package LLamaSharp

## Model preparation

There are two popular format of model file of LLM now, which are PyTorch format (.pth) and Huggingface format (.bin). LLamaSharp uses `GGUF` format file, which could be converted from these two formats. To get `GGUF` file, there are two options:
There are two popular formats of model file of LLM now, which are PyTorch format (.pth) and Huggingface format (.bin). LLamaSharp uses `GGUF` format file, which could be converted from these two formats. To get `GGUF` file, there are two options:

1. Search model name + 'gguf' in [Huggingface](https://huggingface.co), you will find lots of model files that have already been converted to GGUF format. Please take care of the publishing time of them because some old ones could only work with old version of LLamaSharp.

Expand Down
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ If you are new to LLM, here're some tips for you to help you to get start with `

## Integrations

There are integarions for the following libraries, which help to expand the application of LLamaSharp. Integrations for semantic-kernel and kernel-memory are developed in LLamaSharp repository, while others are developed in their own repositories.
There are integrations for the following libraries, which help to expand the application of LLamaSharp. Integrations for semantic-kernel and kernel-memory are developed in LLamaSharp repository, while others are developed in their own repositories.

- [semantic-kernel](https://github.com/microsoft/semantic-kernel): an SDK that integrates LLM like OpenAI, Azure OpenAI, and Hugging Face.
- [kernel-memory](https://github.com/microsoft/kernel-memory): a multi-modal AI Service specialized in the efficient indexing of datasets through custom continuous data hybrid pipelines, with support for RAG ([Retrieval Augmented Generation](https://en.wikipedia.org/wiki/Prompt_engineering#Retrieval-augmented_generation)), synthetic memory, prompt engineering, and custom semantic memory processing.
Expand All @@ -32,7 +32,7 @@ There are integarions for the following libraries, which help to expand the appl
Community effort is always one of the most important things in open-source projects. Any contribution in any way is welcomed here. For example, the following things mean a lot for LLamaSharp:

1. Open an issue when you find something wrong.
2. Open an PR if you've fixed something. Even if just correcting a typo, it also makes great sense.
2. Open a PR if you've fixed something. Even if just correcting a typo, it also makes great sense.
3. Help to optimize the documentation.
4. Write an example or blog about how to integrate LLamaSharp with your APPs.
5. Ask for a missing feature and discuss with us.
Expand Down
2 changes: 1 addition & 1 deletion docs/xmldocs/llama.abstractions.metadataoverride.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Implements [IEquatable&lt;MetadataOverride&gt;](https://docs.microsoft.com/en-us

### **Key**

Get the key being overriden by this override
Get the key being overridden by this override

```csharp
public string Key { get; }
Expand Down
4 changes: 2 additions & 2 deletions docs/xmldocs/llama.native.nativeapi.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ Number of threads
Binary image in jpeg format

`image_bytes_length` [Int32](https://docs.microsoft.com/en-us/dotnet/api/system.int32)<br>
Bytes lenght of the image
Bytes length of the image

#### Returns

Expand Down Expand Up @@ -671,7 +671,7 @@ public static Span<float> llama_get_embeddings(SafeLLamaContextHandle ctx)

Apply chat template. Inspired by hf apply_chat_template() on python.
Both "model" and "custom_template" are optional, but at least one is required. "custom_template" has higher precedence than "model"
NOTE: This function does not use a jinja parser. It only support a pre-defined list of template. See more: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
NOTE: This function does not use a jinja parser. It only supports a pre-defined list of template. See more: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

```csharp
public static int llama_chat_apply_template(SafeLlamaModelHandle model, Char* tmpl, LLamaChatMessage* chat, IntPtr n_msg, bool add_ass, Char* buf, int length)
Expand Down
2 changes: 1 addition & 1 deletion docs/xmldocs/llama.sessionstate.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ public IHistoryTransform HistoryTransform { get; set; }

### **History**

The the chat history messages for this session.
The chat history messages for this session.

```csharp
public Message[] History { get; set; }
Expand Down
Loading