-
Notifications
You must be signed in to change notification settings - Fork 633
[Core] Support ARM architecture #4835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Dockerfile
Outdated
# Install kubectl based on architecture | ||
ARCH=$(uname -m) && \ | ||
if [ "$ARCH" = "x86_64" ]; then \ | ||
curl -LO "https://dl.k8s.io/release/v1.31.6/bin/linux/amd64/kubectl"; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this can be simplified as curl -LO "https://dl.k8s.io/release/v1.31.6/bin/linux/${TARGETARCH}/kubectl";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Simplified the impl and make it more general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome @Michaelvll!
/smoke-test --kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Michaelvll! Super excited to make GH200s go brrrr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @Michaelvll !
Mitigates #4793
Fixes #4601
This enables the SkyPilot clusters to run on arm architecture with the default image.
ARM support is becoming more important as the NV's GH200, GB200 (offered by Lambda clouds, GCP) come with ARM CPUs by default, but our docker image does not support ARM CPU well.
Original k8s gpu image does not support ARM architecture
Error:
TODO (future PRs):
sky local up --ips
works for remote ARM machinessky launch -t c6g.large
works if we specify--image-id
with a ARM based deep learning image, but does not work with our default imageTODO before merging:
latest
tag.Tested (run the relevant ones):
bash format.sh
cd sky/clouds/service_catalogs/images/; ./skypilot-k8s-image.sh -p -g
cd sky/clouds/service_catalogs/images/; ./skypilot-k8s-image.sh -p
sky launch --cloud kubernetes --cpus 1 --image-id docker:us-docker.pkg.dev/sky-dev-465/skypilotk8s/skypilot-gpu:20250227 echo hi
sky autostop sky-b694-ubuntu --down -i 0
sky launch --cloud kubernetes --cpus 1 --image-id docker:us-docker.pkg.dev/sky-dev-465/skypilotk8s/skypilot:20250227 echo hi
sky autostop sky-b694-ubuntu --down -i 0
docker build .
on Mac and AMD64 linuxsky launch --cloud lambda --gpus Gh200 --image-id docker:us-docker.pkg.dev/sky-dev-465/skypilotk8s/skypilot-gpu:20250227 nvidia-smi -c test-lambda-arm
, ssh into the machine anduname -m
shows aarch64, i.e. arm architecturesky launch --gpus Gh200 --cloud lambda examples/using_file_mounts.yaml -c test-fm-arm --down
local up
on the machine:sky local up --ips
sky launch --gpus gh200 --cloud kubernetes nvidia-smi
pytest tests/test_smoke.py
pytest tests/test_smoke.py --kubernetes
with the new imagespytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh