-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
cc @junrushao
🚀 Feature
MLC-LLM has the capability to build two types of CUDA binaries:
- slim CUDA binaries, which only work for the CUDA GPU you built on
currently invoked by runningmlc-llm.build <your_build_options> -target cuda
- full CUDA binaries, which work for most CUDA GPU architectures
currently invoked by runningmlc-llm.build <your_build_options> -target cuda-multiarch
cuda-multiarch
is not publicly listed in the MLC-LLM docs, and I only found out about it by talking to Junru directly.
I propose that we switch the -target cuda
to build the full CUDA binaries, have -target auto
build full CUDA binaries if it detects a CUDA GPU, and have the user explicitly target something like -target cuda-slim
if they want to build a binary for their current CUDA GPU only.
Motivation
Many users are interested in building the full CUDA binaries, and might be surprised to learn that the -target cuda
only works for the current CUDA device. It does not take much additional time or disk space to build for -target cuda-multiarch
as compared to -target cuda
.
Alternatives
Keep things the way they are today - with the cuda
target generating a slim CUDA binary and the cuda-multiarch
target generating a full CUDA binary, but document the difference clearly so that users know that the multiarch option exists.