Skip to content

Commit c7b2283

Browse files
authored
[single_stage_detector] Updated run_and_time.sh for customizing number of GPUs on single node (#808)
* Added support for dataset customization * Added support for setting GPUs on single node training * Revert "Added support for dataset customization" This reverts commit c37ae36.
1 parent 5a7e03a commit c7b2283

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

single_stage_detector/ssd/run_and_time.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ NUMEPOCHS=${NUMEPOCHS:-30}
3838
LOG_INTERVAL=${LOG_INTERVAL:-20}
3939
DATASET_DIR=${DATASET_DIR:-"/datasets/open-images-v6-mlperf"}
4040
TORCH_HOME=${TORCH_HOME:-"$(pwd)/torch-model-cache"}
41+
DGXNGPU=${DGXNGPU:-1}
4142

4243
# Handle MLCube parameters
4344
while [ $# -gt 0 ]; do
@@ -76,7 +77,7 @@ if [ -n "${SLURM_LOCALID-}" ]; then
7677
fi
7778
else
7879
# Mode 2: Single-node Docker; need to launch tasks with torchrun
79-
CMD=( "torchrun" "--standalone" "--nnodes=1" "--nproc_per_node=1" )
80+
CMD=( "torchrun" "--standalone" "--nnodes=1" "--nproc_per_node=${DGXNGPU}" )
8081
[ "$MEMBIND" = false ] && CMD+=( "--no_membind" )
8182
fi
8283

0 commit comments

Comments
 (0)