Skip to content

Updating training script in basic_interactive notebook #545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Fiona-Waters
Copy link
Contributor

@Fiona-Waters Fiona-Waters commented May 23, 2024

Issue link

RHOAIENG-6475

What changes have been made

When running GPU utilising workloads using the basic_interactive demo notebook I encountered a lot of errors while running the training script provided in the notebook. As it was added more than a year ago and may be out of date I have updated the notebook to use ray train using these examples: Text classification, Hugging Face Transformers + Ray Train

I have also updated the hf_interactive notebook which was using the same training script.

Verification steps

  • Install RHOAI with the CodeFlare components set to Managed
  • Ensure that your cluster has GPU nodes as required.
  • Within a data science project workbench clone this branch and install codeflare sdk
git clone https://github.com/Fiona-Waters/codeflare-sdk.git -b udpate-notebook-script
cd codeflare-sdk
pip install poetry
poetry install
  • Run the basic_interactive notebook and ensure that it works as expected.
  • Note that aws credentials are required and you will need to create and use an s3 bucket while running the training script. See this PR for instructions.

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

@Fiona-Waters Fiona-Waters force-pushed the udpate-notebook-script branch from 0f6b8eb to 9f339d6 Compare May 23, 2024 16:22
@Fiona-Waters Fiona-Waters requested a review from Bobbins228 May 23, 2024 16:27
@Fiona-Waters Fiona-Waters force-pushed the udpate-notebook-script branch 2 times, most recently from d47295b to 4e7a49e Compare May 29, 2024 10:02
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 31, 2024
@Fiona-Waters Fiona-Waters force-pushed the udpate-notebook-script branch 2 times, most recently from 6126cc3 to 431d308 Compare June 17, 2024 15:51
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 17, 2024
@Fiona-Waters Fiona-Waters force-pushed the udpate-notebook-script branch 5 times, most recently from 95addca to a074625 Compare June 18, 2024 15:29
Copy link
Contributor

@Bobbins228 Bobbins228 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to test with the latest-py39-cu118 image. It worked like a charm.
Happy to lgtm once the images are set back.
Awesome work Fiona 🚀

Copy link
Contributor

@Bobbins228 Bobbins228 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 19, 2024
@astefanutti
Copy link
Contributor

/retest

@Bobbins228
Copy link
Contributor

This PR just needs a rebase and it should pass the e2e test

@Fiona-Waters Fiona-Waters force-pushed the udpate-notebook-script branch from cf2e140 to b07f9aa Compare June 19, 2024 14:02
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 19, 2024
@Fiona-Waters Fiona-Waters force-pushed the udpate-notebook-script branch from b07f9aa to 5c80181 Compare June 19, 2024 14:56
@astefanutti
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 19, 2024
@astefanutti
Copy link
Contributor

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 19, 2024
Copy link
Contributor

openshift-ci bot commented Jun 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astefanutti, Bobbins228

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Bobbins228,astefanutti]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 1e80e62 into project-codeflare:main Jun 19, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants