Skip to content

Conversation

@jamesljlster
Copy link
Contributor

@jamesljlster jamesljlster commented Jul 28, 2025

Dear MLCommons team,

I appreciate your work, which has helped me verify the training performance after applying hardware resource virtualization to our bare-metal server.

I want to validate the model training performance in a reasonable time on both virtualized and non-virtualized environments. However, this is very challenging with the default Open Images dataset and our relatively small GPUs in a single node. Thus, I made some changes to the performance evaluation script, which may help others with similar use cases.

This pull request modifies the run_and_time.sh of SSD training, and introduces the following changes:

@jamesljlster jamesljlster requested a review from a team as a code owner July 28, 2025 08:32
@github-actions
Copy link

github-actions bot commented Jul 28, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@jamesljlster
Copy link
Contributor Author

recheck

1 similar comment
@jamesljlster
Copy link
Contributor Author

recheck

EVALBATCHSIZE=${EVALBATCHSIZE:-${BATCHSIZE}}
NUMEPOCHS=${NUMEPOCHS:-30}
LOG_INTERVAL=${LOG_INTERVAL:-20}
DATASET=${DATASET:-"openimages-mlperf"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLPerf requires us to use the same dataset as in the reference to ensure results are comparable across submissions. So we should not make it a changeable parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I've reverted the dataset customization commit.

@jamesljlster jamesljlster changed the title [single_stage_detector] Updated run_and_time.sh for customizing datasets and number of GPUs [single_stage_detector] Updated run_and_time.sh for customizing number of GPUs Aug 9, 2025
@jamesljlster jamesljlster changed the title [single_stage_detector] Updated run_and_time.sh for customizing number of GPUs [single_stage_detector] Updated run_and_time.sh for customizing number of GPUs on single node Aug 9, 2025
@ShriyaRishab ShriyaRishab merged commit c7b2283 into mlcommons:master Aug 11, 2025
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants