Skip to content

Commit 6c089cd

Browse files
committed
Merge remote-tracking branch 'ggerganov/master' into fix_decoding
* ggerganov/master: (40 commits) revert : cmake : set MSVC to use UTF-8 on source files (ggml-org#2346) sync : ggml ggml: fix ggml_graph_cpy undefined behavior (ggml/943) cann : fix doxy (ggml/0) vulkan : fix build (llama/0) cuda : mark BF16 CONT as unsupported ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934) cmake : set MSVC to use UTF-8 on source files (ggml-org#2346) readme : remove invalid flag from Python example (ggml-org#2396) readme : fix link (ggml-org#2394) go : add beamsize/entropythold/maxcontext to context interface (ggml-org#2350) talk-llama : sync llama.cpp whisper : update FA call sync : ggml sync : vulkan (skip) (llama/0) ggml : do not crash when quantizing q4_x_x with an imatrix (llama/9192) metal : separate scale and mask from QKT in FA kernel (llama/9189) ggml : add SSM Metal kernels (llama/8546) metal : gemma2 flash attention support (llama/9159) CPU/CUDA: Gemma 2 FlashAttention support (llama/8542) ...
2 parents b2f5a0a + 5236f02 commit 6c089cd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+4777
-2447
lines changed

Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -971,7 +971,8 @@ $(LIB_WHISPER): \
971971
$(CXX) $(CXXFLAGS) -shared -fPIC -o $@ $^ $(LDFLAGS)
972972

973973
$(LIB_WHISPER_S): \
974-
$(OBJ_WHISPER)
974+
$(OBJ_WHISPER) \
975+
$(OBJ_GGML)
975976
ar rcs $(LIB_WHISPER_S) $^
976977

977978
# common

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
2121
- Support for CPU-only inference
2222
- [Efficient GPU support for NVIDIA](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
2323
- [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
24-
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
24+
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/include/whisper.h)
2525

2626
Supported platforms:
2727

@@ -33,7 +33,7 @@ Supported platforms:
3333
- [x] [WebAssembly](examples/whisper.wasm)
3434
- [x] Windows ([MSVC](https://github.com/ggerganov/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggerganov/whisper.cpp/issues/168)]
3535
- [x] [Raspberry Pi](https://github.com/ggerganov/whisper.cpp/discussions/166)
36-
- [x] [docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
36+
- [x] [Docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
3737

3838
The entire high-level implementation of the model is contained in [whisper.h](include/whisper.h) and [whisper.cpp](src/whisper.cpp).
3939
The rest of the code is part of the [`ggml`](https://github.com/ggerganov/ggml) machine learning library.
@@ -55,8 +55,8 @@ Or you can even run it straight in the browser: [talk.wasm](examples/talk.wasm)
5555

5656
## Implementation details
5757

58-
- The core tensor operations are implemented in C ([ggml.h](ggml.h) / [ggml.c](ggml.c))
59-
- The transformer model and the high-level C-style API are implemented in C++ ([whisper.h](whisper.h) / [whisper.cpp](whisper.cpp))
58+
- The core tensor operations are implemented in C ([ggml.h](ggml/include/ggml.h) / [ggml.c](ggml/src/ggml.c))
59+
- The transformer model and the high-level C-style API are implemented in C++ ([whisper.h](include/whisper.h) / [whisper.cpp](src/whisper.cpp))
6060
- Sample usage is demonstrated in [main.cpp](examples/main)
6161
- Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
6262
- Various other examples are available in the [examples](examples) folder
@@ -751,7 +751,7 @@ took to execute it. The results are summarized in the following Github issue:
751751

752752
[Benchmark results](https://github.com/ggerganov/whisper.cpp/issues/89)
753753

754-
Additionally a script to run whisper.cpp with different models and audio files is provided [bench.py](bench.py).
754+
Additionally a script to run whisper.cpp with different models and audio files is provided [bench.py](scripts/bench.py).
755755

756756
You can run it with the following command, by default it will run against any standard model in the models folder.
757757

@@ -798,6 +798,7 @@ For more details, see the conversion script [models/convert-pt-to-ggml.py](model
798798
- [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
799799
- [AIWintermuteAI/whispercpp](https://github.com/AIWintermuteAI/whispercpp) (Updated fork of aarnphm/whispercpp)
800800
- [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
801+
- [abdeladim-s/pywhispercpp](https://github.com/abdeladim-s/pywhispercpp) (Pybind11)
801802
- [x] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper)
802803
- [x] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity)
803804

bindings/go/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ GGML_METAL_PATH_RESOURCES := $(abspath ../..)
1414
BUILD_DIR := build
1515
MODELS_DIR := models
1616
EXAMPLES_DIR := $(wildcard examples/*)
17-
INCLUDE_PATH := $(abspath ../..)
17+
INCLUDE_PATH := $(abspath ../../include):$(abspath ../../ggml/include)
1818
LIBRARY_PATH := $(abspath ../..)
1919

2020
ifeq ($(UNAME_S),Darwin)

bindings/go/params.go

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,18 @@ func (p *Params) SetAudioCtx(n int) {
115115
p.audio_ctx = C.int(n)
116116
}
117117

118+
func (p *Params) SetMaxContext(n int) {
119+
p.n_max_text_ctx = C.int(n)
120+
}
121+
122+
func (p *Params) SetBeamSize(n int) {
123+
p.beam_search.beam_size = C.int(n)
124+
}
125+
126+
func (p *Params) SetEntropyThold(t float32) {
127+
p.entropy_thold = C.float(t)
128+
}
129+
118130
// Set initial prompt
119131
func (p *Params) SetInitialPrompt(prompt string) {
120132
p.initial_prompt = C.CString(prompt)
@@ -145,6 +157,8 @@ func (p *Params) String() string {
145157
str += fmt.Sprintf(" duration_ms=%d", p.duration_ms)
146158
str += fmt.Sprintf(" audio_ctx=%d", p.audio_ctx)
147159
str += fmt.Sprintf(" initial_prompt=%s", C.GoString(p.initial_prompt))
160+
str += fmt.Sprintf(" entropy_thold=%f", p.entropy_thold)
161+
str += fmt.Sprintf(" beam_size=%d", p.beam_search.beam_size)
148162
if p.translate {
149163
str += " translate"
150164
}

bindings/go/pkg/whisper/context.go

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,21 @@ func (context *context) SetAudioCtx(n uint) {
125125
context.params.SetAudioCtx(int(n))
126126
}
127127

128+
// Set maximum number of text context tokens to store
129+
func (context *context) SetMaxContext(n int) {
130+
context.params.SetMaxContext(n)
131+
}
132+
133+
// Set Beam Size
134+
func (context *context) SetBeamSize(n int) {
135+
context.params.SetBeamSize(n)
136+
}
137+
138+
// Set Entropy threshold
139+
func (context *context) SetEntropyThold(t float32) {
140+
context.params.SetEntropyThold(t)
141+
}
142+
128143
// Set initial prompt
129144
func (context *context) SetInitialPrompt(prompt string) {
130145
context.params.SetInitialPrompt(prompt)

bindings/go/pkg/whisper/interface.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,9 @@ type Context interface {
4848
SetTokenTimestamps(bool) // Set token timestamps flag
4949
SetMaxTokensPerSegment(uint) // Set max tokens per segment (0 = no limit)
5050
SetAudioCtx(uint) // Set audio encoder context
51+
SetMaxContext(n int) // Set maximum number of text context tokens to store
52+
SetBeamSize(n int) // Set Beam Size
53+
SetEntropyThold(t float32) // Set Entropy threshold
5154
SetInitialPrompt(prompt string) // Set initial prompt
5255

5356
// Process mono audio data and return any errors.

bindings/go/whisper.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ import (
99
// CGO
1010

1111
/*
12-
#cgo LDFLAGS: -lwhisper -lm -lstdc++
12+
#cgo LDFLAGS: -lwhisper -lm -lstdc++ -fopenmp
1313
#cgo darwin LDFLAGS: -framework Accelerate -framework Metal -framework Foundation -framework CoreGraphics
1414
#include <whisper.h>
1515
#include <stdlib.h>

examples/python/whisper_processor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ def process_audio(wav_file, model_name="base.en"):
2121
if not os.path.exists(wav_file):
2222
raise FileNotFoundError(f"WAV file not found: {wav_file}")
2323

24-
full_command = f"./main -m {model} -f {wav_file} -np -nt"
24+
full_command = f"./main -m {model} -f {wav_file} -nt"
2525

2626
# Execute the command
2727
process = subprocess.Popen(full_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

examples/talk-llama/llama-impl.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,24 @@ void llama_log_callback_default(ggml_log_level level, const char * text, void *
2424
#define LLAMA_LOG_INFO(...) llama_log_internal(GGML_LOG_LEVEL_INFO , __VA_ARGS__)
2525
#define LLAMA_LOG_WARN(...) llama_log_internal(GGML_LOG_LEVEL_WARN , __VA_ARGS__)
2626
#define LLAMA_LOG_ERROR(...) llama_log_internal(GGML_LOG_LEVEL_ERROR, __VA_ARGS__)
27+
28+
//
29+
// helpers
30+
//
31+
32+
static void replace_all(std::string & s, const std::string & search, const std::string & replace) {
33+
if (search.empty()) {
34+
return;
35+
}
36+
std::string builder;
37+
builder.reserve(s.length());
38+
size_t pos = 0;
39+
size_t last_pos = 0;
40+
while ((pos = s.find(search, last_pos)) != std::string::npos) {
41+
builder.append(s, last_pos, pos - last_pos);
42+
builder.append(replace);
43+
last_pos = pos + search.length();
44+
}
45+
builder.append(s, last_pos, std::string::npos);
46+
s = std::move(builder);
47+
}

examples/talk-llama/llama-sampling.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,14 +85,14 @@ void llama_sample_top_k_impl(struct llama_sampling * smpl, llama_token_data_arra
8585
constexpr float bucket_low = -10.0f;
8686
constexpr float bucket_high = 10.0f;
8787
constexpr float bucket_scale = nbuckets/(bucket_high - bucket_low);
88-
constexpr float bucker_inter = -bucket_low * bucket_scale;
88+
constexpr float bucket_inter = -bucket_low * bucket_scale;
8989

9090
std::vector<int> bucket_idx(candidates->size);
9191
std::vector<int> histo(nbuckets, 0);
9292

9393
for (int i = 0; i < (int)candidates->size; ++i) {
9494
const float val = candidates->data[i].logit;
95-
int ib = int(bucket_scale * val + bucker_inter); //nbuckets * (val - bucket_low) / (bucket_high - bucket_low);
95+
int ib = int(bucket_scale * val + bucket_inter); //nbuckets * (val - bucket_low) / (bucket_high - bucket_low);
9696
ib = std::max(0, std::min(nbuckets-1, ib));
9797
bucket_idx[i] = ib;
9898
++histo[ib];

examples/talk-llama/llama-vocab.cpp

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -16,20 +16,6 @@
1616
// helpers
1717
//
1818

19-
static void replace_all(std::string & s, const std::string & search, const std::string & replace) {
20-
std::string result;
21-
for (size_t pos = 0; ; pos += search.length()) {
22-
auto new_pos = s.find(search, pos);
23-
if (new_pos == std::string::npos) {
24-
result += s.substr(pos, s.size() - pos);
25-
break;
26-
}
27-
result += s.substr(pos, new_pos - pos) + replace;
28-
pos = new_pos;
29-
}
30-
s = std::move(result);
31-
}
32-
3319
LLAMA_ATTRIBUTE_FORMAT(1, 2)
3420
static std::string format(const char * fmt, ...) {
3521
va_list ap;
@@ -335,6 +321,21 @@ struct llm_tokenizer_spm {
335321

336322
// TODO: there are a lot of common parts between spm and bpe tokenizers, should be refactored and reused
337323

324+
template<typename T, typename Container = std::vector<T>, typename Compare = std::less<typename Container::value_type>>
325+
class llama_priority_queue : public std::priority_queue<T, Container, Compare> {
326+
public:
327+
using std::priority_queue<T, Container, Compare>::priority_queue;
328+
329+
T pop_move() {
330+
T item = std::move(this->c.front());
331+
std::pop_heap(this->c.begin(), this->c.end(), this->comp);
332+
this->c.pop_back();
333+
return item;
334+
}
335+
336+
void pop() = delete;
337+
};
338+
338339
struct llm_bigram_bpe {
339340
struct comparator {
340341
bool operator()(const llm_bigram_bpe & l, const llm_bigram_bpe & r) const {
@@ -343,7 +344,7 @@ struct llm_bigram_bpe {
343344
};
344345

345346
using queue_storage = std::vector<llm_bigram_bpe>;
346-
using queue = std::priority_queue<llm_bigram_bpe, queue_storage, comparator>;
347+
using queue = llama_priority_queue<llm_bigram_bpe, queue_storage, comparator>;
347348
llm_symbol::index left;
348349
llm_symbol::index right;
349350
std::string text;
@@ -402,6 +403,7 @@ struct llm_tokenizer_bpe {
402403
case LLAMA_VOCAB_PRE_TYPE_COMMAND_R:
403404
case LLAMA_VOCAB_PRE_TYPE_SMOLLM:
404405
case LLAMA_VOCAB_PRE_TYPE_CODESHELL:
406+
case LLAMA_VOCAB_PRE_TYPE_EXAONE:
405407
regex_exprs = {
406408
"\\p{N}",
407409
"'s|'t|'re|'ve|'m|'ll|'d| ?\\p{L}+| ?\\p{N}+| ?[^\\s\\p{L}\\p{N}]+|\\s+(?!\\S)",
@@ -424,6 +426,8 @@ struct llm_tokenizer_bpe {
424426
};
425427
break;
426428
case LLAMA_VOCAB_PRE_TYPE_PORO:
429+
case LLAMA_VOCAB_PRE_TYPE_BLOOM:
430+
case LLAMA_VOCAB_PRE_TYPE_GPT3_FINNISH:
427431
regex_exprs = {
428432
" ?[^(\\s|.,!?…。,、।۔،)]+",
429433
};
@@ -531,8 +535,7 @@ struct llm_tokenizer_bpe {
531535

532536
// build token(s)
533537
while (!work_queue.empty()) {
534-
auto bigram = work_queue.top();
535-
work_queue.pop();
538+
auto bigram = work_queue.pop_move();
536539

537540
auto & left_symbol = symbols[bigram.left];
538541
auto & right_symbol = symbols[bigram.right];
@@ -1480,11 +1483,11 @@ llama_token llama_token_pad_impl(const struct llama_vocab & vocab) {
14801483
return vocab.special_pad_id;
14811484
}
14821485

1483-
int32_t llama_add_bos_token_impl(const struct llama_vocab & vocab) {
1486+
bool llama_add_bos_token_impl(const struct llama_vocab & vocab) {
14841487
return vocab.tokenizer_add_bos;
14851488
}
14861489

1487-
int32_t llama_add_eos_token_impl(const struct llama_vocab & vocab) {
1490+
bool llama_add_eos_token_impl(const struct llama_vocab & vocab) {
14881491
return vocab.tokenizer_add_eos;
14891492
}
14901493

examples/talk-llama/llama-vocab.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,8 @@ llama_token llama_token_sep_impl(const struct llama_vocab & vocab);
9595
llama_token llama_token_nl_impl (const struct llama_vocab & vocab);
9696
llama_token llama_token_pad_impl(const struct llama_vocab & vocab);
9797

98-
int32_t llama_add_bos_token_impl(const struct llama_vocab & vocab);
99-
int32_t llama_add_eos_token_impl(const struct llama_vocab & vocab);
98+
bool llama_add_bos_token_impl(const struct llama_vocab & vocab);
99+
bool llama_add_eos_token_impl(const struct llama_vocab & vocab);
100100

101101
llama_token llama_token_prefix_impl(const struct llama_vocab & vocab);
102102
llama_token llama_token_middle_impl(const struct llama_vocab & vocab);

0 commit comments

Comments
 (0)