-
Notifications
You must be signed in to change notification settings - Fork 12.8k
llama_model_loader: support multiple split/shard GGUFs #6187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+411
−223
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
7c64fef
split: support in llama_model_loader
phymbert b8feff4
Avoir copying the entire vector
phymbert 18ff6ca
split: move llama_tensor_offset to llama_model_loader
phymbert 60a87ae
Merge branch 'master' into hp/split/load-model
phymbert 1892ae7
llama_model_loader: PR feedbacks:
phymbert 00381b0
avoid copying the entire vector
phymbert c34a5de
Simplify this by making these optional, switch some layer creation te…
phymbert 1c931f3
Handle optional tensors
phymbert d8b567d
llama_model_loader: fail if backend cannot allocate buffer
phymbert 02020b0
fix mmap buffer management
slaren 078a1ac
llama_model_loader: map file to backend buffer if the allocation succ…
phymbert 69bdee9
llama_model_loader: only map tensors included in the context
phymbert 6df9757
llama_model_loader: minor, use same variable name for consistency, fi…
phymbert f9a2973
llama_model_loader: fail if any of backend buffer cannot be allocated
phymbert 0fd652e
spacing
phymbert 1a179bf
fix loop over pointer
phymbert 7cbe1ea
llama_model_loader: if n_tensors declared not equals to loaded tensor…
phymbert 9940df4
llama_model_loader: ensure mappings vector has the expected size
phymbert ec372c6
llama_model_loader: use at instead of operator[] if this should neve…
phymbert a9e88c6
llama_model_loader: immediately add the backend buffer to the model b…
phymbert b19af36
llama_model_loader: be sure the model mappings has enough capacity be…
phymbert 4c04400
llama_model_loader: fix map -> unordered map
phymbert e474e45
llama_split_prefix: use a clearer version, not pass split path len bu…
phymbert 8326607
llama : minor
ggerganov dbc35ac
llama : introduce some typedef helpers
ggerganov f616b38
docs: add model shard in hot topic
phymbert 1f38759
llama_model_loader: put mapping in a unique_ptr from the moment it is…
phymbert 764c7af
fix llama_split_prefix
ngxson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.