Skip to content

Populate the benchmark metadata #5918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Nov 15, 2024

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Nov 14, 2024

To ease the process of gathering the benchmark metadata before uploading the the database, I'm adding a script .github/scripts/benchmarks/gather_metadata.py to gather this information and pass it to the upload script. From #5839, the benchmark metadata includes the following required fields:

-- Metadata
`timestamp` UInt64,
`schema_version` String DEFAULT 'v3',
`name` String,
-- About the change
`repo` String DEFAULT 'pytorch/pytorch',
`head_branch` String,
`head_sha` String,
`workflow_id` UInt64,
`run_attempt` UInt32,
`job_id` UInt64,
-- The raw records on S3
`s3_path` String,

I'm going to test this out with PT2 compiler instruction count benchmark at pytorch/pytorch#140493

Testing

https://github.com/pytorch/test-infra/actions/runs/11831746632/job/32967412160?pr=5918#step:5:105 gathers the metadata and upload the benchmark results correctly

Also, an actual upload at https://github.com/pytorch/pytorch/actions/runs/11831781500/job/33006545698#step:24:138

@huydhn huydhn requested review from kit1980, clee2000 and a team November 14, 2024 05:30
Copy link

vercel bot commented Nov 14, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Nov 15, 2024 7:14pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 14, 2024
for result in benchmark_results:
# This is a required field
if "metric" not in result:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I could print a warning and dump the record. Although we have one metric per record in the database, there is nothing wrong with having a list of them in the same JSON file. So, I'm thinking the code just skip invalid records in the list

info(
"The result is without any information about the repo, workflow, or job id"
)
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if you're going to return optional[str] might as well as make this none

is there a chance of nothing being in the benchmark results? if yes maybe declare repo, workflow_id, job_id etc outside of the loop

@@ -9,6 +9,8 @@ inputs:
# TODO (huydhn): Use this to gate the migration to oss_ci_benchmark_v3 on S3
schema-version:
default: 'v2'
github-token:
default: ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed for v3 right, maybe we can have a check that this is given if v3 is set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound good. I'm wondering if I could leave the job id optional even for v3, but then it would complicate thing like writing query joining with workflow_job. It seems easier to make this mandatory for v3

@huydhn huydhn merged commit 5397347 into main Nov 15, 2024
8 checks passed
@huydhn huydhn deleted the upload-benchmark-results-with-additional-information branch November 15, 2024 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants