You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -390,7 +390,7 @@ class llama_model_kv_override(Structure):
390
390
# // LLAMA_SPLIT_LAYER: ignored
391
391
# int32_t main_gpu;
392
392
393
-
# // proportion of the model (layers or rows) to offload to each GPU, size: LLAMA_MAX_DEVICES
393
+
# // proportion of the model (layers or rows) to offload to each GPU, size: llama_max_devices()
394
394
# const float * tensor_split;
395
395
396
396
# // Called with a progress value between 0.0 and 1.0. Pass NULL to disable.
@@ -417,7 +417,7 @@ class llama_model_params(Structure):
417
417
n_gpu_layers (int): number of layers to store in VRAM
418
418
split_mode (int): how to split the model across multiple GPUs
419
419
main_gpu (int): the GPU that is used for the entire model. main_gpu interpretation depends on split_mode: LLAMA_SPLIT_NONE: the GPU that is used for the entire model LLAMA_SPLIT_ROW: the GPU that is used for small tensors and intermediate results LLAMA_SPLIT_LAYER: ignored
420
-
tensor_split (ctypes.Array[ctypes.c_float]): proportion of the model (layers or rows) to offload to each GPU, size: LLAMA_MAX_DEVICES
420
+
tensor_split (ctypes.Array[ctypes.c_float]): proportion of the model (layers or rows) to offload to each GPU, size: llama_max_devices()
421
421
progress_callback (llama_progress_callback): called with a progress value between 0.0 and 1.0. Pass NULL to disable. If the provided progress_callback returns true, model loading continues. If it returns false, model loading is immediately aborted.
422
422
progress_callback_user_data (ctypes.c_void_p): context pointer passed to the progress callback
423
423
kv_overrides (ctypes.Array[llama_model_kv_override]): override key-value pairs of the model meta data
0 commit comments