-
Notifications
You must be signed in to change notification settings - Fork 98
Populate the benchmark metadata #5918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Populate the benchmark metadata #5918
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
for result in benchmark_results: | ||
# This is a required field | ||
if "metric" not in result: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I could print a warning and dump the record. Although we have one metric per record in the database, there is nothing wrong with having a list of them in the same JSON file. So, I'm thinking the code just skip invalid records in the list
info( | ||
"The result is without any information about the repo, workflow, or job id" | ||
) | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if you're going to return optional[str] might as well as make this none
is there a chance of nothing being in the benchmark results? if yes maybe declare repo, workflow_id, job_id etc outside of the loop
@@ -9,6 +9,8 @@ inputs: | |||
# TODO (huydhn): Use this to gate the migration to oss_ci_benchmark_v3 on S3 | |||
schema-version: | |||
default: 'v2' | |||
github-token: | |||
default: '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is needed for v3 right, maybe we can have a check that this is given if v3 is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sound good. I'm wondering if I could leave the job id optional even for v3, but then it would complicate thing like writing query joining with workflow_job. It seems easier to make this mandatory for v3
To ease the process of gathering the benchmark metadata before uploading the the database, I'm adding a script
.github/scripts/benchmarks/gather_metadata.py
to gather this information and pass it to the upload script. From #5839, the benchmark metadata includes the following required fields:I'm going to test this out with PT2 compiler instruction count benchmark at pytorch/pytorch#140493
Testing
https://github.com/pytorch/test-infra/actions/runs/11831746632/job/32967412160?pr=5918#step:5:105 gathers the metadata and upload the benchmark results correctly
Also, an actual upload at https://github.com/pytorch/pytorch/actions/runs/11831781500/job/33006545698#step:24:138