Skip to content

convert : bailingmoe : set yarn metadata if present #13312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 5, 2025

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented May 5, 2025

Set YaRN metadata if present since support was finally added and it looks to be bog standard. Tested with Ling-Coder-lite (manually configured).

NOTE: For some reason they have not updated the config on any of their models, so you have to add/enable this yourself, can be done in one of the following ways:

  • On the commandline (no changes to old GGUFs necessary):
    ./llama-cli -m Ling-Coder-lite.gguf -c 16384 --rope-scaling yarn --rope-scale 4
  • Change max_position_embeddings (multiply by 4) and rope_scaling in config.json and reconvert:
    {
        "factor": 4.0,
        "original_max_position_embeddings": 4096,
        "type": "yarn"
    }
  • Add/change metadata to old GGUF with gguf_editor_gui.py
  • Download dynamically modified GGUF using gguf-editor

@github-actions github-actions bot added the python python script changes label May 5, 2025
@CISC CISC requested a review from ngxson May 5, 2025 10:30
@CISC CISC merged commit ae803bf into master May 5, 2025
7 checks passed
@CISC CISC deleted the cisc/convert-bailingmoe-yarn branch May 5, 2025 10:34
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 6, 2025
* origin/master: (27 commits)
llama : fix build_ffn without gate (ggml-org#13336)
CUDA: fix bad asserts for partial offload (ggml-org#13337)
convert : qwen2/3moe : set yarn metadata if present (ggml-org#13331)
CUDA: fix --split-mode row for MMQ (ggml-org#13323)
gguf-py : avoid requiring pyside6 for other scripts (ggml-org#13036)
CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320)
sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (ggml-org#13264)
server : Webui - change setText command from parent window to also send the message. (ggml-org#13309)
mtmd : rename llava directory to mtmd (ggml-org#13311)
clip : fix confused naming ffn_up and ffn_down (ggml-org#13290)
convert : bailingmoe : set yarn metadata if present (ggml-org#13312)
SYCL: Disable mul_mat kernels for noncontiguous tensor b (ggml-org#13308)
mtmd : add C public API (ggml-org#13184)
rpc : use backend registry, support dl backends (ggml-org#13304)
ggml : activate s390x simd for Q3_K (ggml-org#13301)
llava/mtmd : fixes to fully support dl backends (ggml-org#13303)
llama : build windows releases with dl backends (ggml-org#13220)
CUDA: fix race condition in MMQ stream-k fixup (ggml-org#13299)
CUDA: fix race condition in MMQ ids_dst (ggml-org#13294)
vulkan: Additional type support for unary, binary, and copy (ggml-org#13266)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants