lock instead of spinlock #3

bogdad · 2023-03-31T19:31:23Z

a try to replace busy wait spinlock with mutex - cond var described here ggml-org#633

vladimir@FT751F6N7D ~/w/llama.cpp (locking_for_threads) [SIGINT]> ./build/bin/main -m ./models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -n 64 -t 6
main: seed = 1680291012
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from './models/7B/ggml-model-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 64, n_keep = 0


 Below is an instruction that describes a task. Write a response that appropriately completes the request.
Alice is planning on going to dinner with her boyfriend, and she wants to look particularly pretty for him. She asks you what she should wear so that he will think she looks great.
“I’ll probably be wearing black jeans, blue top, pink necklace and silver shoes
llama_print_timings:        load time =   726.89 ms
llama_print_timings:      sample time =    46.19 ms /    64 runs   (    0.72 ms per run)
llama_print_timings: prompt eval time =  1302.22 ms /    21 tokens (   62.01 ms per token)
llama_print_timings:        eval time =  8034.21 ms /    63 runs   (  127.53 ms per run)
llama_print_timings:       total time =  9589.29 ms
v

vs. master:

vladimir@FT751F6N7D ~/w/llama.cpp (master)> ./build/bin/main -m ./models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -n 64 -t 6        (base)
main: seed = 1680291260
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from './models/7B/ggml-model-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 64, n_keep = 0


 Below is an instruction that describes a task. Write a response that appropriately completes the request.
How to write a response: (1) Identify which type of writing you are expected to complete; and (2) describe your work and any additional steps necessary in order to complete the task at hand. After your initial draft, revise and edit as needed to ensure your work is clear and succinct.
llama_print_timings:        load time =   690.21 ms
llama_print_timings:      sample time =    46.70 ms /    64 runs   (    0.73 ms per run)
llama_print_timings: prompt eval time =  1121.04 ms /    21 tokens (   53.38 ms per token)
llama_print_timings:        eval time =  3738.95 ms /    63 runs   (   59.35 ms per run)
llama_print_timings:       total time =  5131.97 ms
vladimir@FT751F6N7D ~/w/llama.cpp (master)> ./build/bin/main -m ./models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -n 64 -t 6        (base)
main: seed = 1680291274
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from './models/7B/ggml-model-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 64, n_keep = 0


 Below is an instruction that describes a task. Write a response that appropriately completes the request.
Sir, I have done my assignment, but i got some points incorrect and they are all 5 points. Can you please help me to identify where it went wrong so that i can rectify them?
In your essay on "Masculinity" why do you consider Masculinities a
llama_print_timings:        load time =   658.70 ms
llama_print_timings:      sample time =    46.60 ms /    64 runs   (    0.73 ms per run)
llama_print_timings: prompt eval time =  1114.95 ms /    21 tokens (   53.09 ms per token)
llama_print_timings:        eval time =  3744.40 ms /    63 runs   (   59.43 ms per run)
llama_print_timings:       total time =  5102.00 ms
vladimir@FT751F6N7D ~/w/llama.cpp (master)>                                                                                                       (base)

bogdad · 2023-03-31T19:52:27Z

ggml.c

+typedef pthread_mutex_t ggml_mutex_t;
+typedef pthread_cond_t ggml_cond_t;
+
+#define ggml_mutex_init       pthread_mutex_init


does not work on windows - fixed, now should work

bogdad force-pushed the locking_for_threads branch 5 times, most recently from 32316c5 to 316d873 Compare March 31, 2023 19:51

bogdad commented Mar 31, 2023

View reviewed changes

bogdad force-pushed the locking_for_threads branch 2 times, most recently from 67d52da to 1eb0553 Compare April 1, 2023 15:35

bogdad added 2 commits April 2, 2023 12:05

lock instead of spinlock

8824130

windows fixed

c55222e

bogdad force-pushed the locking_for_threads branch from 1eb0553 to c55222e Compare April 2, 2023 10:06

bogdad force-pushed the master branch from 21e88c8 to 6e7801d Compare April 2, 2023 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

lock instead of spinlock #3

lock instead of spinlock #3

Uh oh!

bogdad commented Mar 31, 2023 •

edited

Loading

Uh oh!

bogdad Mar 31, 2023 •

edited

Loading

Uh oh!

Uh oh!

lock instead of spinlock #3

Are you sure you want to change the base?

lock instead of spinlock #3

Uh oh!

Conversation

bogdad commented Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bogdad Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bogdad commented Mar 31, 2023 •

edited

Loading

bogdad Mar 31, 2023 •

edited

Loading