-
Notifications
You must be signed in to change notification settings - Fork 12.6k
ggml : update mul_mat_id to use the same tensor for all the experts #6387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 10 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
0c7e21d
ggml : update mul_mat_id to use the same tensor for all the experts
slaren 9c9fe60
update cuda
slaren 2479900
minor
slaren 93db37e
update metal
slaren 325e5ef
update test-backend-ops
slaren 26c09ad
fix cuda
slaren 2abb6c7
Update ggml-metal.m
slaren 6203d72
update convert.py
slaren 4a5d50e
update convert-hf-to-gguf.py
slaren 3b3298a
update convert.py for mixtral hf models
slaren 8c2f7b8
Update convert-hf-to-gguf.py
slaren 4531b02
cuda : support non-pow-2 number of experts
slaren 6886fdb
allow quantize to work for split and merged experts models in the sam…
slaren deea200
cleanup + disable mmap automatically with split tensors models
slaren b4a6206
update imatrix
slaren 8f84ca3
test-backend-ops : test qwen argsort
slaren 5de4a5d
update grok model loading
slaren 6875369
llama : add merged experts tensors to the grok tensor map
slaren 6f33852
minor
slaren 68d21de
gguf : bump version
slaren f27cbf3
fix quantizing of merged experts
slaren d08a1f4
convert-hf-to-gguf.py : update grok (untested)
slaren 9530398
make linter happy
slaren f421b32
cuda/argsort : use shared memory instead of pool memory
slaren c704c77
convert : fix grok tensor names
ggerganov fe62909
metal : add support for non-pow-2 argsort
slaren 31adc93
llama : more loader cleanup, better error checking
slaren 86f3666
cuda : fix warning
slaren a1343ae
llama : still use mmap for loading old models, but copy the data to a…
slaren 19dafaf
add review note
slaren 3779b98
llama : remove ffn tensor counting + add sanity check
ggerganov e810899
convert : fix handling of n_experts == None
ggerganov fc719b6
imatrix : fix ncall counters
ggerganov 822caa4
llama : produce error if imatrix size does not match
ggerganov a054283
quantize : terminate on errors + trace logs
ggerganov 716e960
metal : pad shared memory to 16 bytes
ggerganov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.