Populate the benchmark metadata #5918

huydhn · 2024-11-14T05:30:58Z

To ease the process of gathering the benchmark metadata before uploading the the database, I'm adding a script .github/scripts/benchmarks/gather_metadata.py to gather this information and pass it to the upload script. From #5839, the benchmark metadata includes the following required fields:

-- Metadata
`timestamp` UInt64,
`schema_version` String DEFAULT 'v3',
`name` String,
-- About the change
`repo` String DEFAULT 'pytorch/pytorch',
`head_branch` String,
`head_sha` String,
`workflow_id` UInt64,
`run_attempt` UInt32,
`job_id` UInt64,
-- The raw records on S3
`s3_path` String,

I'm going to test this out with PT2 compiler instruction count benchmark at pytorch/pytorch#140493

Testing

https://github.com/pytorch/test-infra/actions/runs/11831746632/job/32967412160?pr=5918#step:5:105 gathers the metadata and upload the benchmark results correctly

Also, an actual upload at https://github.com/pytorch/pytorch/actions/runs/11831781500/job/33006545698#step:24:138

vercel · 2024-11-14T05:31:02Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
torchci	⬜️ Ignored (Inspect)	Visit Preview		Nov 15, 2024 7:14pm

clee2000 · 2024-11-15T17:27:52Z

.github/scripts/upload_benchmark_results.py

+    for result in benchmark_results:
+        # This is a required field
+        if "metric" not in result:
+            continue


Would it be better to error here?

I think I could print a warning and dump the record. Although we have one metric per record in the database, there is nothing wrong with having a list of them in the same JSON file. So, I'm thinking the code just skip invalid records in the list

clee2000 · 2024-11-15T17:29:03Z

.github/scripts/upload_benchmark_results.py

+        info(
+            "The result is without any information about the repo, workflow, or job id"
+        )
+        return ""


nit: if you're going to return optional[str] might as well as make this none

is there a chance of nothing being in the benchmark results? if yes maybe declare repo, workflow_id, job_id etc outside of the loop

clee2000 · 2024-11-15T17:30:47Z

.github/actions/upload-benchmark-results/action.yml

@@ -9,6 +9,8 @@ inputs:
  # TODO (huydhn): Use this to gate the migration to oss_ci_benchmark_v3 on S3
  schema-version:
    default: 'v2'
+  github-token:
+    default: ''


this is needed for v3 right, maybe we can have a check that this is given if v3 is set?

Sound good. I'm wondering if I could leave the job id optional even for v3, but then it would complicate thing like writing query joining with workflow_job. It seems easier to make this mandatory for v3

Populate the benchmark metadata

acfb726

huydhn requested review from kit1980, clee2000 and a team November 14, 2024 05:30

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 14, 2024

huydhn added 8 commits November 13, 2024 21:42

Make GITHUB_TOKEN as optional

d2f9274

Typo

4233e44

Debug

c9da32b

Debug

0f88146

Debug

736adb0

is this correct now?

f034584

Schema version has been moved to metadata

8b61e1b

It's working

cc3fc7f

huydhn mentioned this pull request Nov 14, 2024

Record PR time benchmark results in JSON format pytorch/pytorch#140493

Closed

huydhn added 4 commits November 13, 2024 23:59

Some more bugs

43b5bb9

Use JSONEachRow format

6397b42

Remove debug msg

e152d08

Minor tweak

698bd2b

clee2000 approved these changes Nov 15, 2024

View reviewed changes

clee2000 reviewed Nov 15, 2024

View reviewed changes

huydhn mentioned this pull request Nov 15, 2024

Upload sccache stats into benchmark database with build step time pytorch/pytorch#140839

Closed

huydhn added 2 commits November 15, 2024 11:08

Address review comments

ffc88f5

Typo

f382470

huydhn merged commit 5397347 into main Nov 15, 2024
8 checks passed

huydhn deleted the upload-benchmark-results-with-additional-information branch November 15, 2024 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Populate the benchmark metadata #5918

Populate the benchmark metadata #5918

Uh oh!

huydhn commented Nov 14, 2024 •

edited

Loading

Uh oh!

vercel bot commented Nov 14, 2024 •

edited

Loading

Uh oh!

clee2000 Nov 15, 2024

Uh oh!

huydhn Nov 15, 2024

Uh oh!

clee2000 Nov 15, 2024

Uh oh!

clee2000 Nov 15, 2024

Uh oh!

huydhn Nov 15, 2024

Uh oh!

Uh oh!

Uh oh!

Populate the benchmark metadata #5918

Populate the benchmark metadata #5918

Uh oh!

Conversation

huydhn commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

vercel bot commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clee2000 Nov 15, 2024

Choose a reason for hiding this comment

Uh oh!

huydhn Nov 15, 2024

Choose a reason for hiding this comment

Uh oh!

clee2000 Nov 15, 2024

Choose a reason for hiding this comment

Uh oh!

clee2000 Nov 15, 2024

Choose a reason for hiding this comment

Uh oh!

huydhn Nov 15, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

huydhn commented Nov 14, 2024 •

edited

Loading

vercel bot commented Nov 14, 2024 •

edited

Loading