Skip to content

lock instead of spinlock #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

lock instead of spinlock #3

wants to merge 2 commits into from

Conversation

bogdad
Copy link
Owner

@bogdad bogdad commented Mar 31, 2023

a try to replace busy wait spinlock with mutex - cond var described here ggml-org#633

vladimir@FT751F6N7D ~/w/llama.cpp (locking_for_threads) [SIGINT]> ./build/bin/main -m ./models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -n 64 -t 6
main: seed = 1680291012
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from './models/7B/ggml-model-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 64, n_keep = 0


 Below is an instruction that describes a task. Write a response that appropriately completes the request.
Alice is planning on going to dinner with her boyfriend, and she wants to look particularly pretty for him. She asks you what she should wear so that he will think she looks great.
“I’ll probably be wearing black jeans, blue top, pink necklace and silver shoes
llama_print_timings:        load time =   726.89 ms
llama_print_timings:      sample time =    46.19 ms /    64 runs   (    0.72 ms per run)
llama_print_timings: prompt eval time =  1302.22 ms /    21 tokens (   62.01 ms per token)
llama_print_timings:        eval time =  8034.21 ms /    63 runs   (  127.53 ms per run)
llama_print_timings:       total time =  9589.29 ms
v

vs. master:

vladimir@FT751F6N7D ~/w/llama.cpp (master)> ./build/bin/main -m ./models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -n 64 -t 6        (base)
main: seed = 1680291260
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from './models/7B/ggml-model-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 64, n_keep = 0


 Below is an instruction that describes a task. Write a response that appropriately completes the request.
How to write a response: (1) Identify which type of writing you are expected to complete; and (2) describe your work and any additional steps necessary in order to complete the task at hand. After your initial draft, revise and edit as needed to ensure your work is clear and succinct.
llama_print_timings:        load time =   690.21 ms
llama_print_timings:      sample time =    46.70 ms /    64 runs   (    0.73 ms per run)
llama_print_timings: prompt eval time =  1121.04 ms /    21 tokens (   53.38 ms per token)
llama_print_timings:        eval time =  3738.95 ms /    63 runs   (   59.35 ms per run)
llama_print_timings:       total time =  5131.97 ms
vladimir@FT751F6N7D ~/w/llama.cpp (master)> ./build/bin/main -m ./models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -n 64 -t 6        (base)
main: seed = 1680291274
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from './models/7B/ggml-model-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 64, n_keep = 0


 Below is an instruction that describes a task. Write a response that appropriately completes the request.
Sir, I have done my assignment, but i got some points incorrect and they are all 5 points. Can you please help me to identify where it went wrong so that i can rectify them?
In your essay on "Masculinity" why do you consider Masculinities a
llama_print_timings:        load time =   658.70 ms
llama_print_timings:      sample time =    46.60 ms /    64 runs   (    0.73 ms per run)
llama_print_timings: prompt eval time =  1114.95 ms /    21 tokens (   53.09 ms per token)
llama_print_timings:        eval time =  3744.40 ms /    63 runs   (   59.43 ms per run)
llama_print_timings:       total time =  5102.00 ms
vladimir@FT751F6N7D ~/w/llama.cpp (master)>                                                                                                       (base)

@bogdad bogdad force-pushed the locking_for_threads branch 5 times, most recently from 32316c5 to 316d873 Compare March 31, 2023 19:51
typedef pthread_mutex_t ggml_mutex_t;
typedef pthread_cond_t ggml_cond_t;

#define ggml_mutex_init pthread_mutex_init
Copy link
Owner Author

@bogdad bogdad Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not work on windows - fixed, now should work

@bogdad bogdad force-pushed the locking_for_threads branch 2 times, most recently from 67d52da to 1eb0553 Compare April 1, 2023 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant