Open
Description
It seems there is a bug in IMB-IO regarding the exploration of the CPU. Except for rank 0, all remaining ranks have target_reps
equal to zero. I have added:
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("rank %d -> Nrep: %d, target rep: %d\n",rank, Nrep,target_reps);
to the end of IMB_cpu_exploit, and executed:
LD_PRELOAD=./some_lib.so mpirun -np 2 ./IMB-IO P_IWrite_Indv -iter 5 -npmin 2 -msglog 20:20 -iter_policy off -time 500
here is the result:
#----------------------------------------------------------------
# Intel(R) MPI Benchmarks 2021.3, MPI-IO partn#----------------------------------------------------------------
# Date : Tue Sep 6 15:03:02 2022
# Machine : x86_64
# System : Linux
# Release : 5.15.0-47-generic
# Version : #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022
# MPI Version : 3.1
# MPI Thread Environment:
# Calling sequence was:
# ./IMB-IO P_IWrite_Indv -iter 5 -npmin 2 -msglog 20:20 -iter_policy off -time 500
# Minimum io portion in bytes: 0
# Maximum io portion in bytes: 1048576
#
#
#
# List of Benchmarks to run:
# P_IWrite_Indv
rank 0 -> Nrep: 1432890, target rep: 14328
# For nonblocking benchmarks:
# Function CPU_Exploit obtains an undisturbed
# performance of 286.58 MFlops
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
#-----------------------------------------------------------------------------
# Benchmarking P_IWrite_Indv
# #processes = 2
#-----------------------------------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t_ovrl[usec] t_pure[usec] t_CPU[usec] overlap[%]
0 5 424323.39 74.06 845648.61 100.00
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
1048576 5 429989.80 13614.71 845648.61 100.00
# All processes entering MPI_Finalize
This bug can be fixed by adding to original_benchmark.h after line 197 (#ifdef MPIIO):
if(c_info.w_rank != 0 && do_nonblocking_)
IMB_cpu_exploit_reworked(TARGET_CPU_SECS, 1);
As it is nice to know the progress of the Nonblocking operation, I have added MPI_Testall
to IMB_cpu_exploit.c
If you want, I can create a pull request.
Metadata
Metadata
Assignees
Labels
No labels