ggml : add IQ2 to test-backend-ops + refactoring #4990

ggerganov · 2024-01-16T21:11:03Z

Add imatrix-based quantization tests to test-backend-ops (using a dummy imatrix of 1.0f weights)
Lazy quantization init API
Workaround for Apple linker bug (Fix MacOS Sonoma model quantization #4052)
Fix bug in CUDA mul_mat_vec_q when blocks_per_row % blocks_per_warp != 0 (out-of-bounds access)

ggml-ci

cebtenzzre · 2024-01-16T21:18:54Z

tests/test-backend-ops.cpp

+            // when the imatrix is optional, we want to test both quantization with and without imatrix
+            std::random_device rd;
+            std::default_random_engine generator(rd());
+            std::uniform_int_distribution<int> distribution(0, 1);
+            if (distribution(generator)) {
+                im = nullptr;
+            }


I know this code isn't critical, but creating a new random device and then a new random generator every time you want a random boolean is definitely an anti-pattern. You should at least make generator static. (In fact, creating random generators too often can exhaust the system's entropy source, e.g. /dev/random.)

And do you think we should use a fixed seed by default, or is this designed to be a non-deterministic test?

btw init_thread is doing the same thing.

For init_thread I'm not sure how to improve - would static thread_local achieve anything? Simply static is a data race AFAIK

Either way - it's OK as it is for this part of the code

If it were thread-local you would have to initialize it separately, which would be ugly. If you wanted to improve it you could initialize the random generators at the top of init_tensor_uniform in advance like this (static so it happens only once):

static std::vector<std::default_random_engine> generators = [n_threads]() { std::random_device rd; std::vector<std::default_random_engine> vec; vec.reserve(n_threads); for (size_t i = 0; i < n_threads; i++) { vec.emplace_back(rd()); } return vec; }();

And then pass them to init_thread by reference.

ggml-ci

* ggml : add IQ2 to test-backend-ops + refactoring ggml-ci * cuda : update supports_op for IQ2 ggml-ci * ci : enable LLAMA_CUBLAS=1 for CUDA nodes ggml-ci * cuda : fix out-of-bounds-access in `mul_mat_vec_q` ggml-ci * tests : avoid creating RNGs for each Q tensor ggml-ci * tests : avoid creating RNGs for each tensor ggml-ci

ggerganov added 4 commits January 16, 2024 21:52

ggml : add IQ2 to test-backend-ops + refactoring

bc0bb30

ggml-ci

cuda : update supports_op for IQ2

e9a5d54

ggml-ci

ci : enable LLAMA_CUBLAS=1 for CUDA nodes

36feaeb

ggml-ci

cuda : fix out-of-bounds-access in mul_mat_vec_q

b7ddc8b

ggml-ci

cebtenzzre reviewed Jan 16, 2024

View reviewed changes

ggerganov added 2 commits January 16, 2024 23:24

tests : avoid creating RNGs for each Q tensor

8eb8fd9

ggml-ci

tests : avoid creating RNGs for each tensor

49bafe0

ggml-ci

ggerganov added the sync Requires sync with the ggml repo after merging label Jan 17, 2024

ggerganov merged commit 3856668 into master Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : add IQ2 to test-backend-ops + refactoring #4990

ggml : add IQ2 to test-backend-ops + refactoring #4990

Uh oh!

ggerganov commented Jan 16, 2024

Uh oh!

cebtenzzre Jan 16, 2024 •

edited

Loading

Uh oh!

cebtenzzre Jan 16, 2024

Uh oh!

ggerganov Jan 16, 2024

Uh oh!

cebtenzzre Jan 16, 2024 •

edited

Loading

Uh oh!

Uh oh!

ggml : add IQ2 to test-backend-ops + refactoring #4990

ggml : add IQ2 to test-backend-ops + refactoring #4990

Uh oh!

Conversation

ggerganov commented Jan 16, 2024

Uh oh!

cebtenzzre Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cebtenzzre Jan 16, 2024

Choose a reason for hiding this comment

Uh oh!

ggerganov Jan 16, 2024

Choose a reason for hiding this comment

Uh oh!

cebtenzzre Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cebtenzzre Jan 16, 2024 •

edited

Loading

cebtenzzre Jan 16, 2024 •

edited

Loading