Skip to content

Introducing Seed Checker for #85 #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 30, 2021

Conversation

shangw-nvidia
Copy link
Contributor

@shangw-nvidia shangw-nvidia commented Mar 9, 2021

Hi @xyhuang ,

Hope things are going well! Please review this PR for implementing seed checker for #85.

In addition to the rules layout in #85, the following additional requirements are added to root out corner cases:

  1. All seeds must be logged through mllog (if choose to log seeds). Any seed logged via any other method will be disgarded.
  2. All seeds, if choose to be logged, must be valid integer (convertible via int()).
  3. If any run log at lesat one seed, we expect all runs to log at least one seed.
    4. The set of seed(s) that one run logs must be completely different from the set of seed(s) any other run logs.
  4. If one run logs one seed on a certain line in a certain source file, no other run can log the same seed on the same line in the same file.

@github-actions
Copy link

github-actions bot commented Mar 9, 2021

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@shangw-nvidia
Copy link
Contributor Author

Error: committers of Pull Request number 87 have to sign the CLA

I don't know what this means.

if len(seeds) > 1:
warnings.warn("Result file {} logs more than one "
"seeds {}!".format(result_file, seeds))
for seed in seeds:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, if result files F1 and F2 have seeds S1 and S2, and F1.S1 == F2.S2 this will be marked as a violation even though it is not, right? This case is rare, but do we handle it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By S1 and S2, you meant seeds logged at different lines between different result files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so each file has S1 and S2, and S1 and S2 are the same for different files. Not sure if it is a valid scenario

Copy link
Contributor Author

@shangw-nvidia shangw-nvidia Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated a few new changes to support this.

I think what you pointed out is a good idea in general, however, currently I don't think we have a way to make sure that what being logged matches the source code, thus, seed assertion on a file/lineno granularity might be not rigorous enough. For example, say I launch one run, then I modified something minor in the code (in the sense that I subjectively think a re-run is not really needed) and then launch a second run; even if I use the same seed, the file/lineno is going to be different.

TL;DR: It's probably good for now, but some follow-up work is probably needed to make it more rigorous.

@xyhuang xyhuang merged commit a75cd37 into mlcommons:master Mar 30, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Mar 30, 2021
@shangw-nvidia shangw-nvidia deleted the shangw-nvidia/seed_checker branch April 26, 2021 21:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants