Skip to content

Commit 0c7b359

Browse files
Add cvector-generator example (#7514)
* add control-vector-generator * calc diff * add comments * proof-of-concept stdlib implementation Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish. * param parsing, refactor, comments Added basic command-line parameters for outfile and one each positive/negative prompt. Refactored some messy code in PCA computation and GGUF exporting. Left a bunch of comments regarding further work needed. * example template completions Implements an example template set built from the positive/negative prompts like the control vector Python implementation. * add multi prompts, multi-thread for PCA * fix mem error * add debugs * fix matrix transpose multiplication you have got to be kidding me * preliminary template/multiprompt support model is running out of context and that ought to be fixed (segfaulting) but other than that it looks goodish * fix zero output & param parsing, functional templating fixed a bug where the output file had no tensor data/was all zero fixed a bug where single hyphen flags were not being correctly parsed implements creation of templated prompts from input (still need to adapt based on model) * fix square_diff matmul index range and CRLF->LF line endings fixed a logic error where square_diff would not multiply all rows fixed a formatting error where the provided completions.txt had CRLF line endings * add command-line args for num threads, num completions file lines, always reload model refactored a few things and did what the commit message says on the tin * code aestheticization * fix compiler warnings * in-series multithreading for prompt embedding? added commented-out code to attempt to start implementing mutlithreading for embedding in main * remove unnecessary multithreading * interim fix memory leak * translated everything but PCA (I think) * tentatively translate the rest * fix ggml errors and make new ones at least it compiles and runs * fix cb_eval * temporary commit while I move dev environments it finally outputs a functioning control vector - "functioning" in the sense that it can be loaded and it clearly has the right idea, but makes the model incoherent * update debug statements * pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped * update comments * (wip) refactor * clean up PCA ggml implementation * fix shape of v_diff_original * add n_batch for pca * working version * remember to copy back the last_eigenvector * fix n_completions * bring back n_completions * default n_pca_batch to 20 * fix macos build * add to makefile all targets * use ggml_format_name * add readme * fix .editorconfig * use ggml_backend_tensor_copy * attemp to fix compile problem on mac * fix compile warn * reuse allocr * move param parser to common * better error handling * clean up a bit * add print_usage * shorten help msg * beautify help msg * escape prompt by default * change compile target to llama-cvector-generator * typo * disable GPU for PCA * code style --------- Co-authored-by: Christian Zhou-Zheng <[email protected]>
1 parent 7b2f4a7 commit 0c7b359

File tree

12 files changed

+1522
-0
lines changed

12 files changed

+1522
-0
lines changed

.editorconfig

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,6 @@ indent_size = 2
2626

2727
[examples/llama.swiftui/llama.swiftui.xcodeproj/*]
2828
indent_style = tab
29+
30+
[examples/cvector-generator/*.txt]
31+
insert_final_newline = unset

Makefile

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ BUILD_TARGETS = \
3838
llama-tokenize \
3939
llama-train-text-from-scratch \
4040
llama-vdot \
41+
llama-cvector-generator \
4142
tests/test-c.o
4243

4344
# Binaries only useful for tests
@@ -922,6 +923,10 @@ llama-eval-callback: examples/eval-callback/eval-callback.cpp ggml.o llama.o $(C
922923
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
923924
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
924925

926+
llama-cvector-generator: examples/cvector-generator/cvector-generator.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS)
927+
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
928+
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
929+
925930
llama-train-text-from-scratch: examples/train-text-from-scratch/train-text-from-scratch.cpp ggml.o llama.o $(COMMON_DEPS) train.o $(OBJS)
926931
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
927932
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

common/common.cpp

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1576,6 +1576,7 @@ bool gpt_params_find_arg(int argc, char ** argv, const std::string & arg, gpt_pa
15761576
return true;
15771577
}
15781578
params.out_file = argv[i];
1579+
params.cvector_outfile = argv[i];
15791580
return true;
15801581
}
15811582
if (arg == "-ofreq" || arg == "--output-frequency") {
@@ -1610,6 +1611,55 @@ bool gpt_params_find_arg(int argc, char ** argv, const std::string & arg, gpt_pa
16101611
params.i_chunk = std::stoi(argv[i]);
16111612
return true;
16121613
}
1614+
// cvector params
1615+
if (arg == "--completions-file") {
1616+
if (++i >= argc) {
1617+
invalid_param = true;
1618+
return true;
1619+
}
1620+
params.cvector_completions_file = argv[i];
1621+
return true;
1622+
}
1623+
if (arg == "--positive-file") {
1624+
if (++i >= argc) {
1625+
invalid_param = true;
1626+
return true;
1627+
}
1628+
params.cvector_positive_file = argv[i];
1629+
return true;
1630+
}
1631+
if (arg == "--negative-file") {
1632+
if (++i >= argc) {
1633+
invalid_param = true;
1634+
return true;
1635+
}
1636+
params.cvector_negative_file = argv[i];
1637+
return true;
1638+
}
1639+
if (arg == "--completions") {
1640+
if (++i >= argc) {
1641+
invalid_param = true;
1642+
return true;
1643+
}
1644+
params.n_completions = std::stoi(argv[i]);
1645+
return true;
1646+
}
1647+
if (arg == "--pca-batch") {
1648+
if (++i >= argc) {
1649+
invalid_param = true;
1650+
return true;
1651+
}
1652+
params.n_pca_batch = std::stoi(argv[i]);
1653+
return true;
1654+
}
1655+
if (arg == "--pca-iter") {
1656+
if (++i >= argc) {
1657+
invalid_param = true;
1658+
return true;
1659+
}
1660+
params.n_pca_iterations = std::stoi(argv[i]);
1661+
return true;
1662+
}
16131663
#ifndef LOG_DISABLE_LOGS
16141664
// Parse args for logging parameters
16151665
if (log_param_single_parse(argv[i])) {
@@ -1931,6 +1981,16 @@ void gpt_params_print_usage(int /*argc*/, char ** argv, const gpt_params & param
19311981
options.push_back({ "logging", " --log-append", "Don't truncate the old log file." });
19321982
#endif // LOG_DISABLE_LOGS
19331983

1984+
options.push_back({ "cvector" });
1985+
options.push_back({ "cvector", "-o, --output FNAME", "output file (default: '%s')", params.cvector_outfile.c_str() });
1986+
options.push_back({ "cvector", " --positive-file FNAME", "positive prompts file, one prompt per line (default: '%s')", params.cvector_positive_file.c_str() });
1987+
options.push_back({ "cvector", " --negative-file FNAME", "negative prompts file, one prompt per line (default: '%s')", params.cvector_negative_file.c_str() });
1988+
options.push_back({ "cvector", " --completions-file FNAME",
1989+
"completions file (default: '%s')", params.cvector_completions_file.c_str() });
1990+
options.push_back({ "cvector", " --completions N", "number of lines of completions file to use (default: %d)", params.n_completions });
1991+
options.push_back({ "cvector", " --batch-pca N", "batch size used for PCA. Larger batch runs faster, but uses more memory (default: %d)", params.n_pca_batch });
1992+
options.push_back({ "cvector", " --iter-pca N", "number of iterations used for PCA (default: %d)", params.n_pca_iterations });
1993+
19341994
printf("usage: %s [options]\n", argv[0]);
19351995

19361996
for (const auto & o : options) {

common/common.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,15 @@ struct gpt_params {
232232

233233
bool process_output = false; // collect data for the output tensor
234234
bool compute_ppl = true; // whether to compute perplexity
235+
236+
// cvector-generator params
237+
int n_completions = 64;
238+
int n_pca_batch = 20;
239+
int n_pca_iterations = 1000;
240+
std::string cvector_outfile = "control_vector.gguf";
241+
std::string cvector_completions_file = "examples/cvector-generator/completions.txt";
242+
std::string cvector_positive_file = "examples/cvector-generator/positive.txt";
243+
std::string cvector_negative_file = "examples/cvector-generator/negative.txt";
235244
};
236245

237246
void gpt_params_handle_model_default(gpt_params & params);

examples/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ include_directories(${CMAKE_CURRENT_SOURCE_DIR})
1212

1313
if (EMSCRIPTEN)
1414
else()
15+
add_subdirectory(cvector-generator)
1516
add_subdirectory(baby-llama)
1617
add_subdirectory(batched-bench)
1718
add_subdirectory(batched)
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
set(TARGET llama-cvector-generator)
2+
add_executable(${TARGET} cvector-generator.cpp pca.hpp)
3+
install(TARGETS ${TARGET} RUNTIME)
4+
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
5+
target_compile_features(${TARGET} PRIVATE cxx_std_11)

examples/cvector-generator/README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# cvector-generator
2+
3+
This example demonstrates how to generate a control vector using gguf models.
4+
5+
Related PRs:
6+
- [Add support for control vectors](https://github.com/ggerganov/llama.cpp/pull/5970)
7+
- (Issue) [Generate control vector using llama.cpp](https://github.com/ggerganov/llama.cpp/issues/6880)
8+
- [Add cvector-generator example](https://github.com/ggerganov/llama.cpp/pull/7514)
9+
10+
## Examples
11+
12+
```sh
13+
# CPU only
14+
./cvector-generator -m ./dolphin-2.0-mistral-7b.Q4_K_M.gguf
15+
16+
# With GPU
17+
./cvector-generator -m ./dolphin-2.0-mistral-7b.Q4_K_M.gguf -ngl 99
18+
19+
# With advanced options
20+
./cvector-generator -m ./dolphin-2.0-mistral-7b.Q4_K_M.gguf -ngl 99 --completions 128 --pca-iter 2000 --batch-pca 100
21+
22+
# To see help message
23+
./cvector-generator -h
24+
# Then, have a look at "cvector" section
25+
```
26+
27+
## Tips and tricks
28+
29+
If you have multiple lines per prompt, you can escape the newline character (change it to `\n`). For example:
30+
31+
```
32+
<|im_start|>system\nAct like a person who is extremely happy.<|im_end|>
33+
<|im_start|>system\nYou are in a very good mood today<|im_end|>
34+
```

0 commit comments

Comments
 (0)