Skip to content

feat: add code samples for tuning with intermediate checkpoints #13366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 14, 2025

Conversation

yishan-pu
Copy link
Contributor

@yishan-pu yishan-pu commented May 13, 2025

Description

Fixes #

Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.

Checklist

@yishan-pu yishan-pu requested review from a team as code owners May 13, 2025 23:45
Copy link

snippet-bot bot commented May 13, 2025

Here is the summary of changes.

You are about to add 5 region tags.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label May 13, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @yishan-pu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces code samples for tuning models with intermediate checkpoints using the google-cloud-ai-generative library. It adds new files demonstrating how to create a tuning job that exports intermediate checkpoints, retrieve a tuned model with checkpoints, list checkpoints, set a default checkpoint, and test a model against different checkpoints. Additionally, it modifies an existing file to print checkpoint information.

Highlights

  • New Samples: Adds new code samples for creating tuning jobs with checkpoints, retrieving tuned models, listing checkpoints, setting default checkpoints, and testing models against checkpoints.
  • Checkpoint Handling: Demonstrates how to access and utilize intermediate checkpoints during model tuning.
  • API Usage: Showcases the usage of google-cloud-ai-generative library for tuning jobs and model management.

Changelog

Click here to see the changelog
  • genai/tuning/tuning_job_create.py
    • Added code to print checkpoint information if checkpoints are available in the tuning job.
  • genai/tuning/tuning_with_checkpoints_create.py
    • Created a new sample demonstrating how to create a tuning job that exports intermediate checkpoints.
    • Includes code to monitor the tuning job's state and print model, endpoint, and experiment information.
    • Prints checkpoint details if available.
  • genai/tuning/tuning_with_checkpoints_get_model.py
    • Created a new sample demonstrating how to retrieve a tuned model with checkpoint information.
    • Prints the default checkpoint ID and details of all available checkpoints.
  • genai/tuning/tuning_with_checkpoints_list_checkpoints.py
    • Created a new sample demonstrating how to list the checkpoints associated with a tuning job.
    • Prints details of each checkpoint.
  • genai/tuning/tuning_with_checkpoints_set_default_checkpoint.py
    • Created a new sample demonstrating how to set the default checkpoint for a tuned model.
    • Updates the model configuration with the new default checkpoint ID.
  • genai/tuning/tuning_with_checkpoints_textgen_with_txt.py
    • Created a new sample demonstrating how to test a tuned model against different checkpoints.
    • Generates content using the default checkpoint and specific intermediate checkpoints.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Checkpoints mark the way,
Through tuning's winding maze,
Models learn and grow.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces code samples for tuning with intermediate checkpoints, which is a valuable addition to the documentation. The new files demonstrate how to create tuning jobs with checkpoints, retrieve tuned models, list checkpoints, set default checkpoints, and test checkpoints. Overall, the code is well-structured and easy to understand. However, there are a few areas that could be improved for clarity and efficiency.

Summary of Findings

  • Code Duplication: The code blocks for printing checkpoint information are duplicated in tuning_job_create.py and tuning_with_checkpoints_create.py. Consider refactoring this into a shared function or utility.
  • Error Handling: The input() calls in the if __name__ == "__main__" blocks lack error handling. Add error handling to gracefully handle invalid user input.
  • Hardcoded Indices: The test_checkpoint function uses hardcoded indices to access checkpoints. Consider iterating through the checkpoints dynamically to make the code more flexible.

Merge Readiness

The pull request introduces valuable code samples for tuning with intermediate checkpoints. However, there are a few areas that could be improved for clarity, efficiency, and robustness. I recommend addressing the code duplication, error handling, and hardcoded indices issues before merging. I am unable to approve this pull request, and recommend that others review and approve this code before merging.

while tuning_job.state in running_states:
print(tuning_job.state)
tuning_job = client.tunings.get(name=tuning_job.name)
time.sleep(60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The time.sleep(60) call could be interrupted by a signal. Consider using a more robust approach to waiting for the tuning job to complete, such as polling with a timeout or using a dedicated event loop.

Comment on lines +55 to +56
tuning_job_name = input("Tuning job name: ")
test_checkpoint(tuning_job_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding error handling to the input() calls. If the user enters invalid input, the program will crash. It would be more robust to handle potential exceptions and provide informative error messages.

@glasnt glasnt merged commit 286361a into GoogleCloudPlatform:main May 14, 2025
11 checks passed
training_dataset="gs://cloud-samples-data/ai-platform/generative_ai/gemini-2_0/text/sft_train_data.jsonl",
config=CreateTuningJobConfig(
tuned_model_display_name="Example tuning job",
# Set to True to disable tuning intermediate checkpoints. Default is False.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues

  1. Little confusing words. Try something like
# Set `export_last_checkpoint_only` to False, to create intermediate checkpoints.
  1. Instead of export_last_checkpoint_only, add_intermediate_checkpoints could be a better word choice.

  2. The default value is None. https://github.com/googleapis/python-genai/blob/a3fc532594eff8f01749f6275c506f7516e8ab73/google/genai/types.py#L6890

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sampath, export_last_checkpoint_only is the variable name defined by the Gen AI SDK, and aligns with the API and the UI.


tuning_job = client.tunings.tune(
base_model="gemini-2.0-flash-lite-001",
training_dataset="gs://cloud-samples-data/ai-platform/generative_ai/gemini-2_0/text/sft_train_data.jsonl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use gemini-2_0 (model family version or model versions) in the file name.

You can use gemini_sft or gemini_flash_sft

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I'll update the filename to genai_flash_sft to better reflect the content.


# Get the tuning job and the tuned model.
# Eg. name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job = client.tunings.get(name=name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly do not use Generic variable names like name. This is too difficult to understand.

Use something like job_name or tuning_job_id gives an idea of what is name

# limitations under the License.


def set_default_checkpoint(name: str, checkpoint_id: str) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly do not use Generic variable names like name. This is too difficult to understand.

Use something like job_name or tuning_job_id gives an idea of what is name

# limitations under the License.


def test_checkpoint(name: str) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Kindly do not use Generic variable names like name. This is too difficult to understand. Use something like job_name or tuning_job_id gives an idea of what is name

  2. test is reserved word! Do not use test as prefix or suffix

model=tuning_job.tuned_model.endpoint,
contents=contents,
)
print(response.text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing example response


contents = "Why is the sky blue?"

# Tests the default checkpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this a test?

)
print(response.text)

# Tests Checkpoint 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this a test?

)
print(checkpoint1_response.text)

# Tests Checkpoint 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this a test?

model=tuning_job.tuned_model.checkpoints[0].endpoint,
contents=contents,
)
print(checkpoint1_response.text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing example response

model=tuning_job.tuned_model.checkpoints[1].endpoint,
contents=contents,
)
print(checkpoint2_response.text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing example response

@glasnt glasnt assigned msampathkumar and unassigned glasnt May 14, 2025
@glasnt
Copy link
Contributor

glasnt commented May 14, 2025

Hi @msampathkumar, it looks like your review was on a merged PR. You may need to open a new PR with these suggested changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants