6
6
7
7
[ Roadmap] ( https://github.com/users/ggerganov/projects/7 ) / [ Project status] ( https://github.com/ggerganov/llama.cpp/discussions/3471 ) / [ Manifesto] ( https://github.com/ggerganov/llama.cpp/discussions/205 ) / [ ggml] ( https://github.com/ggerganov/ggml )
8
8
9
- Inference of [ LLaMA] ( https://arxiv.org/abs/2302.13971 ) model in pure C/C++
9
+ Inference of Meta's [ LLaMA] ( https://arxiv.org/abs/2302.13971 ) model (and others) in pure C/C++
10
10
11
11
### Hot topics
12
12
@@ -58,30 +58,35 @@ Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++
58
58
59
59
## Description
60
60
61
- The main goal of ` llama.cpp ` is to run the LLaMA model using 4-bit integer quantization on a MacBook
61
+ The main goal of ` llama.cpp ` is to enable LLM inference with minimal setup and state-of-the-art performance on a wide
62
+ variety of hardware - locally and in the cloud.
62
63
63
- - Plain C/C++ implementation without dependencies
64
- - Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
64
+ - Plain C/C++ implementation without any dependencies
65
+ - Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
65
66
- AVX, AVX2 and AVX512 support for x86 architectures
66
- - Mixed F16 / F32 precision
67
- - 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support
68
- - CUDA, Metal, OpenCL, SYCL GPU backend support
67
+ - 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
68
+ - Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP)
69
+ - Vulkan, SYCL, and (partial) OpenCL backend support
70
+ - CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity
69
71
70
- The original implementation of ` llama.cpp ` was [ hacked in an evening ] ( https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022 ) .
71
- Since then, the project has improved significantly thanks to many contributions. This project is mainly for educational purposes and serves
72
- as the main playground for developing new features for the [ ggml] ( https://github.com/ggerganov/ggml ) library.
72
+ Since its [ inception ] ( https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022 ) , the project has
73
+ improved significantly thanks to many contributions. It is the main playground for developing new features for the
74
+ [ ggml] ( https://github.com/ggerganov/ggml ) library.
73
75
74
76
** Supported platforms:**
75
77
76
78
- [X] Mac OS
77
79
- [X] Linux
78
80
- [X] Windows (via CMake)
79
81
- [X] Docker
82
+ - [X] FreeBSD
80
83
81
84
** Supported models:**
82
85
83
86
- [X] LLaMA 🦙
84
87
- [x] LLaMA 2 🦙🦙
88
+ - [X] [ Mistral AI v0.1] ( https://huggingface.co/mistralai/Mistral-7B-v0.1 )
89
+ - [x] [ Mixtral MoE] ( https://huggingface.co/models?search=mistral-ai/Mixtral )
85
90
- [X] Falcon
86
91
- [X] [ Alpaca] ( https://github.com/ggerganov/llama.cpp#instruction-mode-with-alpaca )
87
92
- [X] [ GPT4All] ( https://github.com/ggerganov/llama.cpp#using-gpt4all )
@@ -95,7 +100,6 @@ as the main playground for developing new features for the [ggml](https://github
95
100
- [X] [ Baichuan 1 & 2] ( https://huggingface.co/models?search=baichuan-inc/Baichuan ) + [ derivations] ( https://huggingface.co/hiyouga/baichuan-7b-sft )
96
101
- [X] [ Aquila 1 & 2] ( https://huggingface.co/models?search=BAAI/Aquila )
97
102
- [X] [ Starcoder models] ( https://github.com/ggerganov/llama.cpp/pull/3187 )
98
- - [X] [ Mistral AI v0.1] ( https://huggingface.co/mistralai/Mistral-7B-v0.1 )
99
103
- [X] [ Refact] ( https://huggingface.co/smallcloudai/Refact-1_6B-fim )
100
104
- [X] [ Persimmon 8B] ( https://github.com/ggerganov/llama.cpp/pull/3410 )
101
105
- [X] [ MPT] ( https://github.com/ggerganov/llama.cpp/pull/3417 )
@@ -104,15 +108,14 @@ as the main playground for developing new features for the [ggml](https://github
104
108
- [X] [ StableLM-3b-4e1t] ( https://github.com/ggerganov/llama.cpp/pull/3586 )
105
109
- [x] [ Deepseek models] ( https://huggingface.co/models?search=deepseek-ai/deepseek )
106
110
- [x] [ Qwen models] ( https://huggingface.co/models?search=Qwen/Qwen )
107
- - [x] [ Mixtral MoE] ( https://huggingface.co/models?search=mistral-ai/Mixtral )
108
111
- [x] [ PLaMo-13B] ( https://github.com/ggerganov/llama.cpp/pull/3557 )
109
112
- [x] [ GPT-2] ( https://huggingface.co/gpt2 )
110
113
- [x] [ CodeShell] ( https://github.com/WisdomShell/codeshell )
111
114
112
115
** Multimodal models:**
113
116
114
- - [x] [ Llava 1.5 models] ( https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e )
115
- - [x] [ Bakllava ] ( https://huggingface.co/models?search=SkunkworksAI/Bakllava )
117
+ - [x] [ LLaVA 1.5 models] ( https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e )
118
+ - [x] [ BakLLaVA ] ( https://huggingface.co/models?search=SkunkworksAI/Bakllava )
116
119
- [x] [ Obsidian] ( https://huggingface.co/NousResearch/Obsidian-3B-V0.5 )
117
120
- [x] [ ShareGPT4V] ( https://huggingface.co/models?search=Lin-Chen/ShareGPT4V )
118
121
- [x] [ MobileVLM 1.7B/3B models] ( https://huggingface.co/models?search=mobileVLM )
@@ -137,14 +140,22 @@ as the main playground for developing new features for the [ggml](https://github
137
140
138
141
** UI:**
139
142
143
+ Unless otherwise noted these projects are open-source with permissive licensing:
144
+
145
+ - [ iohub/collama] ( https://github.com/iohub/coLLaMA )
146
+ - [ janhq/jan] ( https://github.com/janhq/jan ) (AGPL)
140
147
- [ nat/openplayground] ( https://github.com/nat/openplayground )
141
- - [ oobabooga/text-generation-webui] ( https://github.com/oobabooga/text-generation-webui )
142
- - [ withcatai/catai] ( https://github.com/withcatai/catai )
143
- - [ semperai/amica] ( https://github.com/semperai/amica )
148
+ - [ LMStudio] ( https://lmstudio.ai/ ) (proprietary)
149
+ - [ LostRuins/koboldcpp] ( https://github.com/LostRuins/koboldcpp ) (AGPL)
150
+ - [ Mozilla-Ocho/llamafile] ( https://github.com/Mozilla-Ocho/llamafile )
151
+ - [ nomic-ai/gpt4all] ( https://github.com/nomic-ai/gpt4all )
152
+ - [ ollama/ollama] ( https://github.com/ollama/ollama )
153
+ - [ oobabooga/text-generation-webui] ( https://github.com/oobabooga/text-generation-webui ) (AGPL)
144
154
- [ psugihara/FreeChat] ( https://github.com/psugihara/FreeChat )
145
155
- [ ptsochantaris/emeltal] ( https://github.com/ptsochantaris/emeltal )
146
- - [ iohub/collama] ( https://github.com/iohub/coLLaMA )
147
- - [ pythops/tenere] ( https://github.com/pythops/tenere )
156
+ - [ pythops/tenere] ( https://github.com/pythops/tenere ) (AGPL)
157
+ - [ semperai/amica] ( https://github.com/semperai/amica )
158
+ - [ withcatai/catai] ( https://github.com/withcatai/catai )
148
159
149
160
---
150
161
0 commit comments