Skip to content

Conversation

@jiridanek
Copy link
Member

@jiridanek jiridanek commented Oct 17, 2025

Description

Based on

Push builds for the bases (currently running)

The ONNX issue fixed in the last commit of the series is https://github.com/opendatahub-io/notebooks/actions/runs/18598682230/job/53031524655?pr=2595#step:36:630

How Has This Been Tested?

Self checklist (all need to be checked):

  • Ensure that you have run make test (gmake on macOS) before asking for review
  • Changes to everything except Dockerfile.konflux files should be done in odh/notebooks and automatically synced to rhds/notebooks. For Konflux-specific changes, modify Dockerfile.konflux files directly in rhds/notebooks as these require special attention in the downstream repository and flow to the upcoming RHOAI release.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Chores

    • Updated ROCm base images to v6.3 across runtimes
    • Downgraded TensorFlow-ROCm to 2.17.0 and TensorBoard to 2.17.0
    • Relaxed NumPy constraint to 1.26.4
    • Removed Feast dependency
    • Added ONNX constraint (<1.19.0) and a placeholder note for the TF-ROCm wheel checksum
  • Tests

    • Aligned test expectations with the updated dependency versions

@openshift-ci openshift-ci bot requested review from atheo89 and daniellutz October 17, 2025 15:27
@github-actions github-actions bot added the review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel label Oct 17, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 17, 2025

Walkthrough

Pins ROCm tooling and TensorFlow-ROCm to v6.3 / 2.17.0: base image tags and tensorflow-rocm wheel references changed, tensorboard and numpy constraints downgraded, feast removed, an onnx<1.19.0 constraint added, and manifests/tests updated to match the new versions.

Changes

Cohort / File(s) Summary
ROCm Base Image Configuration
jupyter/rocm/tensorflow/ubi9-python-3.12/build-args/rocm.conf, runtimes/rocm-tensorflow/ubi9-python-3.12/build-args/rocm.conf
Updated BASE_IMAGE tags: v6.4v6.3 (jupyter); v6.2v6.3 (runtimes).
Python Project Dependencies (pyproject.toml)
jupyter/rocm/tensorflow/ubi9-python-3.12/pyproject.toml, runtimes/rocm-tensorflow/ubi9-python-3.12/pyproject.toml
tensorflow-rocm wheel reference changed from rocm-rel-6.4/2.18.1rocm-rel-6.3/2.17.0; tensorboard ~2.18.0~2.17.0; numpy constraint changed to ~=1.26.4; removed feast~=0.55.0; added constraint-dependencies blocks with onnx < 1.19.0; added TODO/SHA256 placeholder comments.
Kubernetes Manifests / ImageStream
manifests/base/jupyter-rocm-tensorflow-notebook-imagestream.yaml
Updated notebook image stream entries: ROCm v6.4v6.3; TensorFlow-ROCm 2.182.17; TensorBoard 2.182.17; Numpy 2.01.26; removed feast from python dependencies.
Tests / Expectations
tests/test_main.py
Adjusted test vectors and expectations to reflect tensorboard ~2.17.0 and removal of explicit numpy-rocm 2.0.2 from accepted-version lists and related allowed-version/ignored-exception entries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is incomplete against the provided template. While the author has provided issue references, upstream PR links, and Konflux pipeline URLs, critical required sections are missing or insufficient. The "How Has This Been Tested?" section is largely empty with only the template text remaining, and no explicit testing instructions are provided for the non-obvious changes (version downgrade from ROCm 6.4 to 6.3, dependency updates, and ONNX constraint additions). The "Description" section lacks detailed explanation of what was changed and why. Additionally, the merge criteria checklist indicates that testing instructions have not been added and manual verification is not marked as complete, both of which are requirements in the template. The author should update the PR description to include: a detailed description of the changes and their rationale, explicit testing instructions in the "How Has This Been Tested?" section explaining how the ROCm 6.3.4 integration and dependency changes were validated, and complete the merge criteria checklist by either adding testing instructions and confirming manual verification or explaining why these items cannot be checked. The raw summary shows significant changes to configuration files and dependencies that warrant clear testing documentation in the PR body.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "RHAIENG-1512: fix(ROCm/TensorFlow) repository URL to install ROCm 6.3.4 as in AIPCC base image" directly and specifically describes the primary objective of the changeset. The title clearly indicates the version change (ROCm 6.3.4) and the rationale (aligning with AIPCC base image), which aligns with the actual changes across configuration files and dependency declarations. The title is concise, descriptive, and avoids vague language. It provides sufficient context for a developer scanning the commit history to understand the main change.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added size/xxl and removed size/xxl labels Oct 17, 2025
jiridanek pushed a commit to jiridanek/notebooks that referenced this pull request Oct 17, 2025
jiridanek added a commit to jiridanek/notebooks that referenced this pull request Oct 17, 2025
…otherwise we can't align everything together (opendatahub-io#2595)

```
/Users/jdanek/IdeaProjects/notebooks/jupyter/rocm/tensorflow/ubi9-python-3.12
  × No solution found when resolving dependencies:
  ╰─▶ Because only feast<=0.55.0 is available and feast==0.55.0 depends on numpy>=2.0.0, we can conclude that feast>=0.55.0 depends on numpy>=2.0.0.
      And because rocm-tensorflow-notebook depends on numpy>=1.26.4,<1.27.dev0 and feast>=0.55.0, we can conclude that your requirements are unsatisfiable.

      hint: `feast` was requested with a pre-release marker (e.g., feast>0.55.0,<0.56.dev0), but pre-releases weren't enabled (try: `--prerelease=allow`)
```
@openshift-ci openshift-ci bot added size/xxl and removed size/xxl labels Oct 17, 2025
@jiridanek jiridanek added the tide/merge-method-rebase Denotes a PR that should be rebased by tide when it merges. label Oct 17, 2025
@openshift-ci openshift-ci bot added size/xxl and removed size/xxl labels Oct 17, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
manifests/base/jupyter-rocm-tensorflow-notebook-imagestream.yaml (1)

28-46: Critical inconsistency: Feast removal incomplete and creates dependency mismatch.

Feast has been removed from the jupyter-rocm-tensorflow-notebook-imagestream.yaml manifest, but this removal is inconsistent with the rest of the codebase:

  • Still present in other imagestream manifests: Feast (v0.55) remains declared in jupyter-rocm-pytorch-notebook-imagestream.yaml, jupyter-tensorflow-notebook-imagestream.yaml, jupyter-pytorch-llmcompressor-imagestream.yaml, jupyter-pytorch-notebook-imagestream.yaml, and jupyter-datascience-notebook-imagestream.yaml
  • Still in underlying dependencies: All pyproject.toml and pylock.toml files across jupyter, runtimes, and codeserver still specify feast~=0.55.0, including rocm-tensorflow
  • Partial coverage: This creates a mismatch where the rocm-tensorflow imagestream manifest declares Feast removed, but the actual build dependencies still include it

Either remove Feast from all similar imagestreams and their dependency files, or retain it consistently across all. The current partial removal risks breaking workflows and creates confusion across similar notebook types.

🧹 Nitpick comments (2)
runtimes/rocm-tensorflow/ubi9-python-3.12/pyproject.toml (1)

11-12: Consider adding SHA256 checksum for security.

Adding the SHA256 checksum to the wheel URL would improve supply chain security by ensuring the downloaded package hasn't been tampered with. The checksum is already identified in the TODO comment.

Apply this diff:

-    # TODO(jdanek): consider adding #sha256=814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598
-    "tensorflow-rocm @ https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3/tensorflow_rocm-2.17.0-cp312-cp312-manylinux_2_28_x86_64.whl",
+    "tensorflow-rocm @ https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3/tensorflow_rocm-2.17.0-cp312-cp312-manylinux_2_28_x86_64.whl#sha256=814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598",
jupyter/rocm/tensorflow/ubi9-python-3.12/pyproject.toml (1)

11-12: Consider adding SHA256 checksum for security.

Adding the SHA256 checksum to the wheel URL would improve supply chain security by ensuring the downloaded package hasn't been tampered with.

Apply this diff:

-    # TODO(jdanek): consider adding #sha256=814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598
-    "tensorflow-rocm @ https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3/tensorflow_rocm-2.17.0-cp312-cp312-manylinux_2_28_x86_64.whl",
+    "tensorflow-rocm @ https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3/tensorflow_rocm-2.17.0-cp312-cp312-manylinux_2_28_x86_64.whl#sha256=814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598",
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 475983e and 03a2f7e.

📒 Files selected for processing (6)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/build-args/rocm.conf (1 hunks)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/pyproject.toml (1 hunks)
  • manifests/base/jupyter-rocm-tensorflow-notebook-imagestream.yaml (1 hunks)
  • runtimes/rocm-tensorflow/ubi9-python-3.12/build-args/rocm.conf (1 hunks)
  • runtimes/rocm-tensorflow/ubi9-python-3.12/pyproject.toml (1 hunks)
  • tests/test_main.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: validation-of-sw-versions-in-imagestreams
🔇 Additional comments (10)
runtimes/rocm-tensorflow/ubi9-python-3.12/build-args/rocm.conf (1)

1-1: LGTM: Base image updated to ROCm 6.3.

The base image version change from v6.2 to v6.3 aligns with the PR objective to use ROCm 6.3.4.

jupyter/rocm/tensorflow/ubi9-python-3.12/build-args/rocm.conf (1)

1-1: LGTM: Base image updated to ROCm 6.3.

The base image version change from v6.4 to v6.3 aligns with the PR objective.

manifests/base/jupyter-rocm-tensorflow-notebook-imagestream.yaml (1)

25-25: LGTM: TensorFlow, TensorBoard, and NumPy versions aligned with ROCm 6.3.

The dependency versions have been correctly downgraded to match ROCm 6.3 compatibility:

  • TensorFlow-ROCm: 2.17
  • TensorBoard: 2.17
  • NumPy: 1.26

Also applies to: 31-32, 35-35

runtimes/rocm-tensorflow/ubi9-python-3.12/pyproject.toml (2)

12-14: LGTM: TensorFlow, TensorBoard, and NumPy versions aligned with ROCm 6.3.

The dependency downgrades are consistent with the ROCm 6.3 base image change.

Also applies to: 20-20


24-28: Verify necessity of new dependencies.

Several new packages have been added (scipy, skl2onnx, onnxconverter-common, codeflare-sdk) that weren't in the previous version. Ensure these are required for ROCm 6.3 compatibility or documented use cases.

tests/test_main.py (3)

233-233: LGTM: Test expectations updated for TensorBoard 2.17.

The addition of "2.17" to the accepted TensorBoard versions correctly reflects the dependency downgrade in the ROCm 6.3 images.


279-279: LGTM: Test expectations updated for TensorBoard specifier.

The addition of "~=2.17.0" to the accepted TensorBoard specifiers aligns with the pyproject.toml changes.


286-293: LGTM: NumPy expectations updated.

The removal of "~=2.0.2" from the NumPy exceptions list correctly reflects that all ROCm TensorFlow images now use NumPy ~=1.26.4.

jupyter/rocm/tensorflow/ubi9-python-3.12/pyproject.toml (2)

12-14: LGTM: TensorFlow, TensorBoard, and NumPy versions aligned with ROCm 6.3.

The dependency versions are correctly aligned with the ROCm 6.3 base image.

Also applies to: 21-21


25-28: Verify necessity of new dependencies.

Several new packages have been added (scipy, skl2onnx, onnxconverter-common, codeflare-sdk). Ensure these additions are required for the notebook image functionality or documented in the PR description.

@openshift-ci openshift-ci bot added size/xxl and removed size/xxl labels Oct 17, 2025
daniellutz and others added 2 commits October 17, 2025 17:57
…otherwise we can't align everything together (opendatahub-io#2595)

```
/Users/jdanek/IdeaProjects/notebooks/jupyter/rocm/tensorflow/ubi9-python-3.12
  × No solution found when resolving dependencies:
  ╰─▶ Because only feast<=0.55.0 is available and feast==0.55.0 depends on numpy>=2.0.0, we can conclude that feast>=0.55.0 depends on numpy>=2.0.0.
      And because rocm-tensorflow-notebook depends on numpy>=1.26.4,<1.27.dev0 and feast>=0.55.0, we can conclude that your requirements are unsatisfiable.

      hint: `feast` was requested with a pre-release marker (e.g., feast>0.55.0,<0.56.dev0), but pre-releases weren't enabled (try: `--prerelease=allow`)
```
@openshift-ci openshift-ci bot added size/xxl and removed size/xxl labels Oct 17, 2025
Copy link
Member

@atheo89 atheo89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 475983e and 2 for PR HEAD 03a2f7e in total

…uteError: module 'ml_dtypes' has no attribute 'float4_e2m1fn'

```
File /opt/app-root/lib64/python3.12/site-packages/onnx/_mapping.py:104

AttributeError: module 'ml_dtypes' has no attribute 'float4_e2m1fn'
```

* jax-ml/ml_dtypes#181

It seems that ml-types only added this in 0.5.0 but in our tf rocm images it resolves to 0.4.1.

```
/Users/jdanek/IdeaProjects/notebooks/runtimes/rocm-tensorflow/ubi9-python-3.12
  × No solution found when resolving dependencies:
  ╰─▶ Because tensorflow-rocm==2.17.0 depends on ml-dtypes>=0.3.1,<0.5.0 and ml-dtypes>=0.5.0, we can conclude that tensorflow-rocm==2.17.0 cannot be used.
      And because only tensorflow-rocm==2.17.0 is available and rocm-tensorflow-ubi9-python-3-12 depends on tensorflow-rocm, we can conclude that your requirements are unsatisfiable.
```

That means we cannot upgrade ml-dtypes, therefore we have to downgrade onnx

* onnx/onnx#7089
@openshift-ci openshift-ci bot added size/xxl and removed size/xxl labels Oct 17, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 17, 2025

@jiridanek: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images 9958ffb link true /test images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@daniellutz daniellutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: atheo89, daniellutz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@atheo89 atheo89 merged commit 9bfef1f into opendatahub-io:main Oct 17, 2025
14 of 18 checks passed
@jiridanek jiridanek deleted the jd_rocm_63 branch October 18, 2025 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel size/xxl tide/merge-method-rebase Denotes a PR that should be rebased by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants