-
Notifications
You must be signed in to change notification settings - Fork 125
RHAIENG-1512: fix(ROCm/TensorFlow) repository URL to install ROCm 6.3.4 as in AIPCC base image #2595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughPins ROCm tooling and TensorFlow-ROCm to v6.3 / 2.17.0: base image tags and tensorflow-rocm wheel references changed, tensorboard and numpy constraints downgraded, Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
….4 as in AIPCC base image (opendatahub-io#2595)
…otherwise we can't align everything together (opendatahub-io#2595) ``` /Users/jdanek/IdeaProjects/notebooks/jupyter/rocm/tensorflow/ubi9-python-3.12 × No solution found when resolving dependencies: ╰─▶ Because only feast<=0.55.0 is available and feast==0.55.0 depends on numpy>=2.0.0, we can conclude that feast>=0.55.0 depends on numpy>=2.0.0. And because rocm-tensorflow-notebook depends on numpy>=1.26.4,<1.27.dev0 and feast>=0.55.0, we can conclude that your requirements are unsatisfiable. hint: `feast` was requested with a pre-release marker (e.g., feast>0.55.0,<0.56.dev0), but pre-releases weren't enabled (try: `--prerelease=allow`) ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
manifests/base/jupyter-rocm-tensorflow-notebook-imagestream.yaml (1)
28-46: Critical inconsistency: Feast removal incomplete and creates dependency mismatch.Feast has been removed from the
jupyter-rocm-tensorflow-notebook-imagestream.yamlmanifest, but this removal is inconsistent with the rest of the codebase:
- Still present in other imagestream manifests: Feast (v0.55) remains declared in
jupyter-rocm-pytorch-notebook-imagestream.yaml,jupyter-tensorflow-notebook-imagestream.yaml,jupyter-pytorch-llmcompressor-imagestream.yaml,jupyter-pytorch-notebook-imagestream.yaml, andjupyter-datascience-notebook-imagestream.yaml- Still in underlying dependencies: All
pyproject.tomlandpylock.tomlfiles across jupyter, runtimes, and codeserver still specifyfeast~=0.55.0, including rocm-tensorflow- Partial coverage: This creates a mismatch where the rocm-tensorflow imagestream manifest declares Feast removed, but the actual build dependencies still include it
Either remove Feast from all similar imagestreams and their dependency files, or retain it consistently across all. The current partial removal risks breaking workflows and creates confusion across similar notebook types.
🧹 Nitpick comments (2)
runtimes/rocm-tensorflow/ubi9-python-3.12/pyproject.toml (1)
11-12: Consider adding SHA256 checksum for security.Adding the SHA256 checksum to the wheel URL would improve supply chain security by ensuring the downloaded package hasn't been tampered with. The checksum is already identified in the TODO comment.
Apply this diff:
- # TODO(jdanek): consider adding #sha256=814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598 - "tensorflow-rocm @ https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3/tensorflow_rocm-2.17.0-cp312-cp312-manylinux_2_28_x86_64.whl", + "tensorflow-rocm @ https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3/tensorflow_rocm-2.17.0-cp312-cp312-manylinux_2_28_x86_64.whl#sha256=814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598",jupyter/rocm/tensorflow/ubi9-python-3.12/pyproject.toml (1)
11-12: Consider adding SHA256 checksum for security.Adding the SHA256 checksum to the wheel URL would improve supply chain security by ensuring the downloaded package hasn't been tampered with.
Apply this diff:
- # TODO(jdanek): consider adding #sha256=814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598 - "tensorflow-rocm @ https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3/tensorflow_rocm-2.17.0-cp312-cp312-manylinux_2_28_x86_64.whl", + "tensorflow-rocm @ https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3/tensorflow_rocm-2.17.0-cp312-cp312-manylinux_2_28_x86_64.whl#sha256=814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598",
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
jupyter/rocm/tensorflow/ubi9-python-3.12/build-args/rocm.conf(1 hunks)jupyter/rocm/tensorflow/ubi9-python-3.12/pyproject.toml(1 hunks)manifests/base/jupyter-rocm-tensorflow-notebook-imagestream.yaml(1 hunks)runtimes/rocm-tensorflow/ubi9-python-3.12/build-args/rocm.conf(1 hunks)runtimes/rocm-tensorflow/ubi9-python-3.12/pyproject.toml(1 hunks)tests/test_main.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: validation-of-sw-versions-in-imagestreams
🔇 Additional comments (10)
runtimes/rocm-tensorflow/ubi9-python-3.12/build-args/rocm.conf (1)
1-1: LGTM: Base image updated to ROCm 6.3.The base image version change from v6.2 to v6.3 aligns with the PR objective to use ROCm 6.3.4.
jupyter/rocm/tensorflow/ubi9-python-3.12/build-args/rocm.conf (1)
1-1: LGTM: Base image updated to ROCm 6.3.The base image version change from v6.4 to v6.3 aligns with the PR objective.
manifests/base/jupyter-rocm-tensorflow-notebook-imagestream.yaml (1)
25-25: LGTM: TensorFlow, TensorBoard, and NumPy versions aligned with ROCm 6.3.The dependency versions have been correctly downgraded to match ROCm 6.3 compatibility:
- TensorFlow-ROCm: 2.17
- TensorBoard: 2.17
- NumPy: 1.26
Also applies to: 31-32, 35-35
runtimes/rocm-tensorflow/ubi9-python-3.12/pyproject.toml (2)
12-14: LGTM: TensorFlow, TensorBoard, and NumPy versions aligned with ROCm 6.3.The dependency downgrades are consistent with the ROCm 6.3 base image change.
Also applies to: 20-20
24-28: Verify necessity of new dependencies.Several new packages have been added (scipy, skl2onnx, onnxconverter-common, codeflare-sdk) that weren't in the previous version. Ensure these are required for ROCm 6.3 compatibility or documented use cases.
tests/test_main.py (3)
233-233: LGTM: Test expectations updated for TensorBoard 2.17.The addition of "2.17" to the accepted TensorBoard versions correctly reflects the dependency downgrade in the ROCm 6.3 images.
279-279: LGTM: Test expectations updated for TensorBoard specifier.The addition of "~=2.17.0" to the accepted TensorBoard specifiers aligns with the pyproject.toml changes.
286-293: LGTM: NumPy expectations updated.The removal of "~=2.0.2" from the NumPy exceptions list correctly reflects that all ROCm TensorFlow images now use NumPy ~=1.26.4.
jupyter/rocm/tensorflow/ubi9-python-3.12/pyproject.toml (2)
12-14: LGTM: TensorFlow, TensorBoard, and NumPy versions aligned with ROCm 6.3.The dependency versions are correctly aligned with the ROCm 6.3 base image.
Also applies to: 21-21
25-28: Verify necessity of new dependencies.Several new packages have been added (scipy, skl2onnx, onnxconverter-common, codeflare-sdk). Ensure these additions are required for the notebook image functionality or documented in the PR description.
manifests/base/jupyter-rocm-tensorflow-notebook-imagestream.yaml
Outdated
Show resolved
Hide resolved
….4 as in AIPCC base image (opendatahub-io#2595)
…otherwise we can't align everything together (opendatahub-io#2595) ``` /Users/jdanek/IdeaProjects/notebooks/jupyter/rocm/tensorflow/ubi9-python-3.12 × No solution found when resolving dependencies: ╰─▶ Because only feast<=0.55.0 is available and feast==0.55.0 depends on numpy>=2.0.0, we can conclude that feast>=0.55.0 depends on numpy>=2.0.0. And because rocm-tensorflow-notebook depends on numpy>=1.26.4,<1.27.dev0 and feast>=0.55.0, we can conclude that your requirements are unsatisfiable. hint: `feast` was requested with a pre-release marker (e.g., feast>0.55.0,<0.56.dev0), but pre-releases weren't enabled (try: `--prerelease=allow`) ```
atheo89
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
…uteError: module 'ml_dtypes' has no attribute 'float4_e2m1fn' ``` File /opt/app-root/lib64/python3.12/site-packages/onnx/_mapping.py:104 AttributeError: module 'ml_dtypes' has no attribute 'float4_e2m1fn' ``` * jax-ml/ml_dtypes#181 It seems that ml-types only added this in 0.5.0 but in our tf rocm images it resolves to 0.4.1. ``` /Users/jdanek/IdeaProjects/notebooks/runtimes/rocm-tensorflow/ubi9-python-3.12 × No solution found when resolving dependencies: ╰─▶ Because tensorflow-rocm==2.17.0 depends on ml-dtypes>=0.3.1,<0.5.0 and ml-dtypes>=0.5.0, we can conclude that tensorflow-rocm==2.17.0 cannot be used. And because only tensorflow-rocm==2.17.0 is available and rocm-tensorflow-ubi9-python-3-12 depends on tensorflow-rocm, we can conclude that your requirements are unsatisfiable. ``` That means we cannot upgrade ml-dtypes, therefore we have to downgrade onnx * onnx/onnx#7089
|
@jiridanek: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
daniellutz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved!
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: atheo89, daniellutz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
Based on
Push builds for the bases (currently running)
The ONNX issue fixed in the last commit of the series is https://github.com/opendatahub-io/notebooks/actions/runs/18598682230/job/53031524655?pr=2595#step:36:630
How Has This Been Tested?
Self checklist (all need to be checked):
make test(gmakeon macOS) before asking for reviewDockerfile.konfluxfiles should be done inodh/notebooksand automatically synced torhds/notebooks. For Konflux-specific changes, modifyDockerfile.konfluxfiles directly inrhds/notebooksas these require special attention in the downstream repository and flow to the upcoming RHOAI release.Merge criteria:
Summary by CodeRabbit
Chores
Tests