Roadmap (short-term) #457
Closed
ggerganov
announced in
Announcements
Replies: 3 comments 2 replies
-
|
Are these sorted top-priority first? |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
Can we update the README with usage of the new perplexity tool since the 'main --perplexity' way stopped working? |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
What's append, is the projet become re activate later ? u_u |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
These will be the priorities for the next few days:
Reduce inference memory usage via ggml scratch buffers, no hardcoded memory buffer sizes and support infinite interactive mode
I know how to fix this and this is important since the GH issues are being flooded with complaints about seg faults and crashes
Finalize SIMD accelerated quantization and merge
ggmlback in the parent repo:quantize_row_q4_0()quantize_row_q4_1()dequantize_row_q4_0()dequantize_row_q4_1()quantize_row_q4_0()quantize_row_q4_1()dequantize_row_q4_0()dequantize_row_q4_1()I suspect this could improve performance for prompt batch processing
Deprecate
ggml_vec_mad_xxx()routines and simplifyggml_forward_mul_mat_xxx()This should lead to some significant code reduction in
ggml.cSeparate the perplexity computation from
main.cppinto standalone example program calledperplexityMove
main.cppinto a standalone example program and moveutils.h/utils.cppinto./examplesto be shared by all examplesAdd
llama_stateto allow parallel text generation sessions with a single modelI will do this in a similar way it is done in
whisper.cppExtend
llama_stateto support loading individual model tensors. Needed for LoRA personalities supportAdd 2-bit integer quantization
When the above things are ready we will have a good foundation to start porting more models and create more example applications to demonstrate the usage of
ggml.New roadmap: #784
Beta Was this translation helpful? Give feedback.
All reactions