-
Notifications
You must be signed in to change notification settings - Fork 12k
llama : Metal inference #1642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
llama : Metal inference #1642
Changes from all commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
f85020b
mtl : export the LLaMA computation graph
ggerganov 98c267f
ci : disable temporary
ggerganov b23fe8c
mtl : adapt the MNIST example as starter
ggerganov a792cbd
mtl : no need for mtl-export tool, add cli arg for main instead
ggerganov 897d6d8
mtl : export just a small part of the graph for now to make it easier
ggerganov 248a8c3
mtl : move MSL code into separate file for easy editing
ggerganov a8fd9dc
mtl : initial get_rows_q4_0 kernel
ggerganov 794704e
mtl : confirmed get_rows_q4_0 is working correctly
ggerganov 72256eb
mtl : add rms_norm kernel + confirm working
ggerganov 64afc0b
mtl : add mul kernel + confirm working
ggerganov 2a24994
mtl : initial mul_mat Q4 kernel (wrong results)
ggerganov 96d0052
mtl : mul_mat fixes (still wrong)
ggerganov 29bec00
mtl : another mul_mat Q4 (still does not work)
ggerganov b2fd06c
mtl : working mul_mat q4
ggerganov 6af6a05
ggml : fix handling of "view" ops in ggml_graph_import()
ggerganov 1213af7
mtl : add rope kernel
ggerganov 7ca81e9
mtl : add reshape and transpose handling
ggerganov 94ea9e7
ggml : store offset as opt arg for ggml_view_xd() operators
ggerganov 948fcfd
mtl : add cpy kernel + handle view ops
ggerganov 51efb59
mtl : confirm f16 x f32 attention mul mat
ggerganov 0f1c580
mtl : add scale kernel
ggerganov 17a7036
mtl : add diag_mask_inf kernel
ggerganov 17930fb
mtl : fix soft_max kernel
ggerganov f67c2d8
ggml : update ggml_nbytes() to handle non-contiguous tensors
ggerganov a266c26
mtl : verify V tensor contents
ggerganov a0cc3de
mtl : add f32 -> f32 cpy kernel
ggerganov 42dca40
mtl : add silu kernel
ggerganov fbd3f62
mtl : add non-broadcast mul kernel
ggerganov 9665429
mtl : full GPU inference of the computation graph
ggerganov f0196a7
mtl : optimize rms_norm and soft_max kernels
ggerganov e55f7b0
mtl : add f16 mat x f32 vec multiplication kernel
ggerganov 3367146
mtl : fix bug in f16 x f32 mul mat + speed-up computation
ggerganov 847bbfe
mtl : faster mul_mat_q4_0_f32 kernel
ggerganov 70c3387
mtl : fix kernel signature + roll inner loop
ggerganov b088e14
mtl : more threads for rms_norm + better timing
ggerganov 6276057
mtl : remove printfs from inner loop
ggerganov 03c2d72
mtl : simplify implementation
ggerganov 640a889
mtl : add save/load vocab to ggml file
ggerganov 2f4e9d1
mtl : plug Metal inference into llama.cpp (very quick-n-dirty)
ggerganov 4df2ef3
mtl : make it work with main example
ggerganov 18e482a
mtl : preparing for merge
ggerganov e4b5222
mtl : clean-up ggml mtl interface + suport scratch / inplace
ggerganov e26cd6b
mtl : remove temp / debug code
ggerganov a7fb899
metal : final refactoring and simplification
ggerganov d8a7486
Revert "ci : disable temporary"
ggerganov b252acb
metal : add comments
ggerganov db3db9e
metal : clean-up stuff, fix typos
ggerganov e33002d
readme : add Metal instructions
ggerganov 324e823
readme : add example for main
ggerganov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
set(TEST_TARGET metal) | ||
add_executable(${TEST_TARGET} metal.cpp) | ||
target_link_libraries(${TEST_TARGET} PRIVATE ggml) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.