Skip to content

llama : fix Gemma3 SWA KV cache shift #12373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 13, 2025
Merged

llama : fix Gemma3 SWA KV cache shift #12373

merged 2 commits into from
Mar 13, 2025

Conversation

ggerganov
Copy link
Member

fix #12357

This should fix the KV cache shift for Gemma3 models. Testing:

make -j && ./bin/llama-cli -m ../models/gemma-3-4b/ggml-model-f16.gguf --top-k 1 -s 1 -p "I believe the meaning of life is" -c 256

Comment on lines +540 to +550
float freq_base_l = cparams.rope_freq_base;
float freq_scale_l = cparams.rope_freq_scale;

// TODO: improve
if (model.arch == LLM_ARCH_GEMMA3) {
const bool is_sliding = hparams.is_sliding(il);

freq_base_l = is_sliding ? 10000.0f : cparams.rope_freq_base;
freq_scale_l = is_sliding ? 1.0f : cparams.rope_freq_scale;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to avoid this special-casing here. It does not look great.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can extend the llama_layer to hold this info in the near future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I've pushed the following version, which should be a bit cleaner: #12374

Will see if there is a better way to do it with the upcoming model implementation refactoring.

@ggerganov ggerganov merged commit 84d5475 into master Mar 13, 2025
1 check passed
jpohhhh pushed a commit to Telosnex/llama.cpp that referenced this pull request Mar 14, 2025
* llama : fix Gemma3 SWA KV cache shift

ggml-ci

* hparams : add comment [no ci]
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
* llama : fix Gemma3 SWA KV cache shift

ggml-ci

* hparams : add comment [no ci]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eval bug: Gemma 3 Outputs Gibberish After Context Shift
2 participants