-
Notifications
You must be signed in to change notification settings - Fork 12k
Add support for GritLM #5959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for GritLM #5959
Conversation
llama.h
Outdated
// Set whether to use causal attention or not | ||
// If set to true, the model will only attend to the past tokens | ||
LLAMA_API void llama_set_embeddings(struct llama_context * ctx, bool embeddings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be improved - the comment is about causal attention, but the function is called llama_set_embeddings
. It's not clear enough
I think we need to introduce bool causal_attn
in llama_cparams
. It will be initialized by default with llama_hparams.causal_attn
, but it would be possible to override it via llama_set_causal_attn(struct llama_context * ctx, bool causal_attn);
The if
in llama_set_inputs()
should be checking cparams.causal_attn
. It would make more sense, because the if
is about whether we build a causal mask or not. But the proposed change if (!cparams.embeddings) {
is confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, that is confusing. Pushing proposed changes now.
* add gritlm example * gritlm results match * tabs to spaces * comment out debug printing * rebase to new embed * gritlm embeddings are back babeee * add to gitignore * allow to toggle embedding mode * Clean-up GritLM sample code. * Fix types. * Flush stdout and output ending newline if streaming. * mostly style fixes; correct KQ_mask comment * add causal_attn flag to llama_cparams * gritml : minor * llama : minor --------- Co-authored-by: Douglas Hanley <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
* add gritlm example * gritlm results match * tabs to spaces * comment out debug printing * rebase to new embed * gritlm embeddings are back babeee * add to gitignore * allow to toggle embedding mode * Clean-up GritLM sample code. * Fix types. * Flush stdout and output ending newline if streaming. * mostly style fixes; correct KQ_mask comment * add causal_attn flag to llama_cparams * gritml : minor * llama : minor --------- Co-authored-by: Douglas Hanley <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
Closes #5783
This PR adds support for GritLM (thanks to @iamlemec). I also added sample code to reproduce their example from here. The sample demonstrates how to use both embeddings and text generation using the same model.
I uploaded some model files here.