Skip to content

Commit ecb6cce

Browse files
authored
Add FA3 (#4166)
* add fa3 * separate hopper * use pre-built FA3 * FA3 for cu128 only * simplify * remove hopper image, add dlslime * FA3 for cu128, cu130 * skip Dive for cu13
1 parent 4abccaf commit ecb6cce

File tree

3 files changed

+15
-7
lines changed

3 files changed

+15
-7
lines changed

.github/workflows/test_docker.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ jobs:
5656
docker images
5757
docker run --rm lmdeploy:latest lmdeploy check_env
5858
- name: Dive
59+
if: ${{ matrix.cuda_version == 'cu12' }}
5960
uses: MaxymVlasov/[email protected]
6061
with:
6162
image: lmdeploy:latest

docker/install.sh

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,11 @@ popd >/dev/null
2525
if [[ "${CUDA_VERSION_SHORT}" = "cu118" ]]; then
2626
apt-get install -y --no-install-recommends cuda-minimal-build-11-8
2727
elif [[ "${CUDA_VERSION_SHORT}" = "cu124" ]]; then
28-
apt-get install -y --no-install-recommends cuda-minimal-build-12-4 dkms
28+
apt-get install -y --no-install-recommends cuda-minimal-build-12-4 numactl dkms
2929
elif [[ "${CUDA_VERSION_SHORT}" = "cu128" ]]; then
30-
apt-get install -y --no-install-recommends cuda-minimal-build-12-8 dkms
30+
apt-get install -y --no-install-recommends cuda-minimal-build-12-8 numactl dkms
3131
elif [[ "${CUDA_VERSION_SHORT}" = "cu130" ]]; then
32-
apt-get install -y --no-install-recommends cuda-minimal-build-13-0 dkms
32+
apt-get install -y --no-install-recommends cuda-minimal-build-13-0 numactl dkms
3333
fi
3434

3535
apt-get clean -y
@@ -66,12 +66,20 @@ fi
6666
pip install torch${TORCH_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION_SHORT}
6767
pip install /wheels/*.whl
6868

69-
7069
if [[ "${CUDA_VERSION_SHORT}" != "cu118" ]] && [[ "${PYTHON_VERSION}" != "3.9" ]]; then
71-
pip install cuda-python dlblas==0.0.6
70+
pip install cuda-python dlblas==0.0.6 dlslime==0.0.1.post10
71+
fi
72+
73+
# install pre-built flash attention 3 wheel
74+
if [[ "${CUDA_VERSION_SHORT}" = "cu128" ]]; then
75+
FA3_WHEELS_URL="https://windreamer.github.io/flash-attention3-wheels/cu128_torch280"
76+
pip install flash_attn_3 --find-links ${FA3_WHEELS_URL} --extra-index-url https://download.pytorch.org/whl/cu128
77+
elif [[ "${CUDA_VERSION_SHORT}" = "cu130" ]]; then
78+
FA3_WHEELS_URL="https://windreamer.github.io/flash-attention3-wheels/cu130_torch290"
79+
pip install flash_attn_3 --find-links ${FA3_WHEELS_URL} --extra-index-url https://download.pytorch.org/whl/cu130
7280
fi
7381

74-
# install pre-compiled flash attention wheel
82+
# install pre-built flash attention wheel
7583
PLATFORM="linux_x86_64"
7684
PY_VERSION=$(python3 - <<'PY'
7785
import torch, sys

docker/prepare_wheel.sh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ if [[ ${PYTHON_VERSION} = "3.13" ]]; then
2323
fi
2424

2525
if [[ "${CUDA_VERSION_SHORT}" != "cu118" ]]; then
26-
2726
GDRCOPY_VERSION=2.5.1
2827
DEEP_EP_VERSION=9af0e0d # v1.2.1
2928
DEEP_GEMM_VERSION=c9f8b34 # v2.1.1.post3

0 commit comments

Comments
 (0)