Skip to content

Add support for GritLM #5959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Mar 10, 2024
Merged

Add support for GritLM #5959

merged 16 commits into from
Mar 10, 2024

Conversation

dranger003
Copy link
Contributor

Closes #5783

This PR adds support for GritLM (thanks to @iamlemec). I also added sample code to reproduce their example from here. The sample demonstrates how to use both embeddings and text generation using the same model.

I uploaded some model files here.

llama.h Outdated
Comment on lines 644 to 646
// Set whether to use causal attention or not
// If set to true, the model will only attend to the past tokens
LLAMA_API void llama_set_embeddings(struct llama_context * ctx, bool embeddings);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be improved - the comment is about causal attention, but the function is called llama_set_embeddings. It's not clear enough

I think we need to introduce bool causal_attn in llama_cparams. It will be initialized by default with llama_hparams.causal_attn, but it would be possible to override it via llama_set_causal_attn(struct llama_context * ctx, bool causal_attn);

The if in llama_set_inputs() should be checking cparams.causal_attn. It would make more sense, because the if is about whether we build a causal mask or not. But the proposed change if (!cparams.embeddings) { is confusing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that is confusing. Pushing proposed changes now.

@ggerganov ggerganov merged commit bcebd7d into ggml-org:master Mar 10, 2024
NeoZhangJianyu pushed a commit to NeoZhangJianyu/llama.cpp that referenced this pull request Mar 12, 2024
* add gritlm example

* gritlm results match

* tabs to spaces

* comment out debug printing

* rebase to new embed

* gritlm embeddings are back babeee

* add to gitignore

* allow to toggle embedding mode

* Clean-up GritLM sample code.

* Fix types.

* Flush stdout and output ending newline if streaming.

* mostly style fixes; correct KQ_mask comment

* add causal_attn flag to llama_cparams

* gritml : minor

* llama : minor

---------

Co-authored-by: Douglas Hanley <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
@dranger003 dranger003 deleted the gritlm-pr branch March 21, 2024 15:13
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* add gritlm example

* gritlm results match

* tabs to spaces

* comment out debug printing

* rebase to new embed

* gritlm embeddings are back babeee

* add to gitignore

* allow to toggle embedding mode

* Clean-up GritLM sample code.

* Fix types.

* Flush stdout and output ending newline if streaming.

* mostly style fixes; correct KQ_mask comment

* add causal_attn flag to llama_cparams

* gritml : minor

* llama : minor

---------

Co-authored-by: Douglas Hanley <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected embeddings using GritLM
3 participants