Skip to content

[Feature Request] Make cuda-multiarch the default cuda target for mlc-llm.build #1020

@denise-k

Description

@denise-k

cc @junrushao

🚀 Feature

MLC-LLM has the capability to build two types of CUDA binaries:

  • slim CUDA binaries, which only work for the CUDA GPU you built on
    currently invoked by running mlc-llm.build <your_build_options> -target cuda
  • full CUDA binaries, which work for most CUDA GPU architectures
    currently invoked by running mlc-llm.build <your_build_options> -target cuda-multiarch

cuda-multiarch is not publicly listed in the MLC-LLM docs, and I only found out about it by talking to Junru directly.

I propose that we switch the -target cuda to build the full CUDA binaries, have -target auto build full CUDA binaries if it detects a CUDA GPU, and have the user explicitly target something like -target cuda-slim if they want to build a binary for their current CUDA GPU only.

Motivation

Many users are interested in building the full CUDA binaries, and might be surprised to learn that the -target cuda only works for the current CUDA device. It does not take much additional time or disk space to build for -target cuda-multiarch as compared to -target cuda.

Alternatives

Keep things the way they are today - with the cuda target generating a slim CUDA binary and the cuda-multiarch target generating a full CUDA binary, but document the difference clearly so that users know that the multiarch option exists.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions