Skip to content

Conversation

lgarithm
Copy link
Owner

No description provided.

@lgarithm
Copy link
Owner Author

test

@lgarithm
Copy link
Owner Author

@lgarithm
Copy link
Owner Author

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0136s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0136s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0138s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0143s, total workload: 384000B, rate: 0.025GiB/s
bench_allreduce(np=4) took 0.0142s, total workload: 384000B, rate: 0.025GiB/s
bench_allreduce(np=4) took 0.0143s, total workload: 384000B, rate: 0.025GiB/s
bench_allreduce(np=4) took 0.0136s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0130s, total workload: 384000B, rate: 0.027GiB/s
bench_allreduce(np=4) took 0.0136s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0154s, total workload: 384000B, rate: 0.023GiB/s
bench_allreduce(np=4) took 0.3252s, total workload: 1.144GiB, rate: 3.517GiB/s
bench_allreduce(np=4) took 0.3012s, total workload: 1.144GiB, rate: 3.797GiB/s
bench_allreduce(np=4) took 0.3010s, total workload: 1.144GiB, rate: 3.799GiB/s
bench_allreduce(np=4) took 0.2990s, total workload: 1.144GiB, rate: 3.824GiB/s
bench_allreduce(np=4) took 0.3048s, total workload: 1.144GiB, rate: 3.752GiB/s
bench_allreduce(np=4) took 0.3038s, total workload: 1.144GiB, rate: 3.765GiB/s
bench_allreduce(np=4) took 0.3036s, total workload: 1.144GiB, rate: 3.767GiB/s
bench_allreduce(np=4) took 0.3024s, total workload: 1.144GiB, rate: 3.782GiB/s
bench_allreduce(np=4) took 0.3026s, total workload: 1.144GiB, rate: 3.779GiB/s
bench_allreduce(np=4) took 0.3025s, total workload: 1.144GiB, rate: 3.781GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.4133s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3736s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3564s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3705s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3595s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3422s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3583s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3532s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3533s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3522s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.1201s, total workload: 1.144GiB, rate: 1.021GiB/s
bench_allreduce(np=4) took 1.1197s, total workload: 1.144GiB, rate: 1.021GiB/s
bench_allreduce(np=4) took 1.0404s, total workload: 1.144GiB, rate: 1.099GiB/s
bench_allreduce(np=4) took 1.0849s, total workload: 1.144GiB, rate: 1.054GiB/s
bench_allreduce(np=4) took 1.0356s, total workload: 1.144GiB, rate: 1.104GiB/s
bench_allreduce(np=4) took 1.0564s, total workload: 1.144GiB, rate: 1.083GiB/s
bench_allreduce(np=4) took 1.0920s, total workload: 1.144GiB, rate: 1.047GiB/s
bench_allreduce(np=4) took 1.0471s, total workload: 1.144GiB, rate: 1.092GiB/s
bench_allreduce(np=4) took 1.0921s, total workload: 1.144GiB, rate: 1.047GiB/s
bench_allreduce(np=4) took 1.0669s, total workload: 1.144GiB, rate: 1.072GiB/s
END ======================================== bench_allreduce remote ========================================

@lgarithm
Copy link
Owner Author

7483943.log

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0144s, total workload: 384000B, rate: 0.025GiB/s
bench_allreduce(np=4) took 0.0138s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0139s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0139s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0148s, total workload: 384000B, rate: 0.024GiB/s
bench_allreduce(np=4) took 0.0152s, total workload: 384000B, rate: 0.023GiB/s
bench_allreduce(np=4) took 0.0142s, total workload: 384000B, rate: 0.025GiB/s
bench_allreduce(np=4) took 0.0141s, total workload: 384000B, rate: 0.025GiB/s
bench_allreduce(np=4) took 0.0143s, total workload: 384000B, rate: 0.025GiB/s
bench_allreduce(np=4) took 0.0142s, total workload: 384000B, rate: 0.025GiB/s
bench_allreduce(np=4) took 0.3064s, total workload: 1.144GiB, rate: 3.733GiB/s
bench_allreduce(np=4) took 0.2885s, total workload: 1.144GiB, rate: 3.965GiB/s
bench_allreduce(np=4) took 0.2835s, total workload: 1.144GiB, rate: 4.034GiB/s
bench_allreduce(np=4) took 0.2832s, total workload: 1.144GiB, rate: 4.038GiB/s
bench_allreduce(np=4) took 0.2804s, total workload: 1.144GiB, rate: 4.079GiB/s
bench_allreduce(np=4) took 0.2787s, total workload: 1.144GiB, rate: 4.103GiB/s
bench_allreduce(np=4) took 0.2824s, total workload: 1.144GiB, rate: 4.049GiB/s
bench_allreduce(np=4) took 0.2817s, total workload: 1.144GiB, rate: 4.060GiB/s
bench_allreduce(np=4) took 0.2839s, total workload: 1.144GiB, rate: 4.029GiB/s
bench_allreduce(np=4) took 0.2810s, total workload: 1.144GiB, rate: 4.070GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.4241s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3890s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3636s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3709s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3625s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3358s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3466s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3395s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3374s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3349s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.0444s, total workload: 1.144GiB, rate: 1.095GiB/s
bench_allreduce(np=4) took 1.0518s, total workload: 1.144GiB, rate: 1.087GiB/s
bench_allreduce(np=4) took 0.9910s, total workload: 1.144GiB, rate: 1.154GiB/s
bench_allreduce(np=4) took 1.0018s, total workload: 1.144GiB, rate: 1.142GiB/s
bench_allreduce(np=4) took 1.0422s, total workload: 1.144GiB, rate: 1.097GiB/s
bench_allreduce(np=4) took 1.0015s, total workload: 1.144GiB, rate: 1.142GiB/s
bench_allreduce(np=4) took 1.0119s, total workload: 1.144GiB, rate: 1.130GiB/s
bench_allreduce(np=4) took 1.0095s, total workload: 1.144GiB, rate: 1.133GiB/s
bench_allreduce(np=4) took 1.0127s, total workload: 1.144GiB, rate: 1.129GiB/s
bench_allreduce(np=4) took 1.0237s, total workload: 1.144GiB, rate: 1.117GiB/s
END ======================================== bench_allreduce remote ========================================

@lgarithm
Copy link
Owner Author

8d10fa2.log

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0132s, total workload: 384000B, rate: 0.027GiB/s
bench_allreduce(np=4) took 0.0124s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0127s, total workload: 384000B, rate: 0.028GiB/s
bench_allreduce(np=4) took 0.0129s, total workload: 384000B, rate: 0.028GiB/s
bench_allreduce(np=4) took 0.0136s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0158s, total workload: 384000B, rate: 0.023GiB/s
bench_allreduce(np=4) took 0.0158s, total workload: 384000B, rate: 0.023GiB/s
bench_allreduce(np=4) took 0.0163s, total workload: 384000B, rate: 0.022GiB/s
bench_allreduce(np=4) took 0.0162s, total workload: 384000B, rate: 0.022GiB/s
bench_allreduce(np=4) took 0.0160s, total workload: 384000B, rate: 0.022GiB/s
bench_allreduce(np=4) took 0.2806s, total workload: 1.144GiB, rate: 4.076GiB/s
bench_allreduce(np=4) took 0.2663s, total workload: 1.144GiB, rate: 4.294GiB/s
bench_allreduce(np=4) took 0.2694s, total workload: 1.144GiB, rate: 4.245GiB/s
bench_allreduce(np=4) took 0.2684s, total workload: 1.144GiB, rate: 4.260GiB/s
bench_allreduce(np=4) took 0.2707s, total workload: 1.144GiB, rate: 4.225GiB/s
bench_allreduce(np=4) took 0.2719s, total workload: 1.144GiB, rate: 4.207GiB/s
bench_allreduce(np=4) took 0.2709s, total workload: 1.144GiB, rate: 4.221GiB/s
bench_allreduce(np=4) took 0.2796s, total workload: 1.144GiB, rate: 4.090GiB/s
bench_allreduce(np=4) took 0.2706s, total workload: 1.144GiB, rate: 4.227GiB/s
bench_allreduce(np=4) took 0.2748s, total workload: 1.144GiB, rate: 4.162GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3628s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3409s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3419s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3453s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3327s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3349s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3218s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3159s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3158s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3102s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.4023s, total workload: 1.144GiB, rate: 0.816GiB/s
bench_allreduce(np=4) took 1.1479s, total workload: 1.144GiB, rate: 0.996GiB/s
bench_allreduce(np=4) took 1.0784s, total workload: 1.144GiB, rate: 1.061GiB/s
bench_allreduce(np=4) took 1.1027s, total workload: 1.144GiB, rate: 1.037GiB/s
bench_allreduce(np=4) took 1.0772s, total workload: 1.144GiB, rate: 1.062GiB/s
bench_allreduce(np=4) took 1.0918s, total workload: 1.144GiB, rate: 1.047GiB/s
bench_allreduce(np=4) took 1.0852s, total workload: 1.144GiB, rate: 1.054GiB/s
bench_allreduce(np=4) took 1.0531s, total workload: 1.144GiB, rate: 1.086GiB/s
bench_allreduce(np=4) took 0.9709s, total workload: 1.144GiB, rate: 1.178GiB/s
bench_allreduce(np=4) took 1.0547s, total workload: 1.144GiB, rate: 1.084GiB/s
END ======================================== bench_allreduce remote ========================================

@lgarithm
Copy link
Owner Author

lgarithm commented Mar 1, 2024

7483943

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0151s, total workload: 384000B, rate: 0.024GiB/s
bench_allreduce(np=4) took 0.0138s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0130s, total workload: 384000B, rate: 0.027GiB/s
bench_allreduce(np=4) took 0.0129s, total workload: 384000B, rate: 0.028GiB/s
bench_allreduce(np=4) took 0.0129s, total workload: 384000B, rate: 0.028GiB/s
bench_allreduce(np=4) took 0.0125s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0125s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0125s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0126s, total workload: 384000B, rate: 0.028GiB/s
bench_allreduce(np=4) took 0.0125s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.2667s, total workload: 1.144GiB, rate: 4.289GiB/s
bench_allreduce(np=4) took 0.2396s, total workload: 1.144GiB, rate: 4.773GiB/s
bench_allreduce(np=4) took 0.2357s, total workload: 1.144GiB, rate: 4.853GiB/s
bench_allreduce(np=4) took 0.2451s, total workload: 1.144GiB, rate: 4.667GiB/s
bench_allreduce(np=4) took 0.2535s, total workload: 1.144GiB, rate: 4.512GiB/s
bench_allreduce(np=4) took 0.2547s, total workload: 1.144GiB, rate: 4.490GiB/s
bench_allreduce(np=4) took 0.2548s, total workload: 1.144GiB, rate: 4.488GiB/s
bench_allreduce(np=4) took 0.2567s, total workload: 1.144GiB, rate: 4.455GiB/s
bench_allreduce(np=4) took 0.2589s, total workload: 1.144GiB, rate: 4.417GiB/s
bench_allreduce(np=4) took 0.2648s, total workload: 1.144GiB, rate: 4.319GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3935s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3199s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3076s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3013s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3354s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3307s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3446s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3810s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3932s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3753s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.4619s, total workload: 1.144GiB, rate: 0.782GiB/s
bench_allreduce(np=4) took 1.1683s, total workload: 1.144GiB, rate: 0.979GiB/s
bench_allreduce(np=4) took 0.9212s, total workload: 1.144GiB, rate: 1.242GiB/s
bench_allreduce(np=4) took 0.8812s, total workload: 1.144GiB, rate: 1.298GiB/s
bench_allreduce(np=4) took 0.8741s, total workload: 1.144GiB, rate: 1.308GiB/s
bench_allreduce(np=4) took 0.8586s, total workload: 1.144GiB, rate: 1.332GiB/s
bench_allreduce(np=4) took 0.8343s, total workload: 1.144GiB, rate: 1.371GiB/s
bench_allreduce(np=4) took 0.8592s, total workload: 1.144GiB, rate: 1.331GiB/s
bench_allreduce(np=4) took 0.8379s, total workload: 1.144GiB, rate: 1.365GiB/s
bench_allreduce(np=4) took 0.8672s, total workload: 1.144GiB, rate: 1.319GiB/s
END ======================================== bench_allreduce remote ========================================

@lgarithm
Copy link
Owner Author

lgarithm commented Mar 1, 2024

8d10fa2

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0143s, total workload: 384000B, rate: 0.025GiB/s
bench_allreduce(np=4) took 0.0137s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0136s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0137s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0136s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0131s, total workload: 384000B, rate: 0.027GiB/s
bench_allreduce(np=4) took 0.0122s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0122s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0123s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0123s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.2649s, total workload: 1.144GiB, rate: 4.317GiB/s
bench_allreduce(np=4) took 0.2585s, total workload: 1.144GiB, rate: 4.425GiB/s
bench_allreduce(np=4) took 0.2618s, total workload: 1.144GiB, rate: 4.369GiB/s
bench_allreduce(np=4) took 0.2572s, total workload: 1.144GiB, rate: 4.447GiB/s
bench_allreduce(np=4) took 0.2375s, total workload: 1.144GiB, rate: 4.816GiB/s
bench_allreduce(np=4) took 0.2565s, total workload: 1.144GiB, rate: 4.459GiB/s
bench_allreduce(np=4) took 0.2618s, total workload: 1.144GiB, rate: 4.368GiB/s
bench_allreduce(np=4) took 0.2624s, total workload: 1.144GiB, rate: 4.358GiB/s
bench_allreduce(np=4) took 0.2637s, total workload: 1.144GiB, rate: 4.337GiB/s
bench_allreduce(np=4) took 0.2654s, total workload: 1.144GiB, rate: 4.310GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3693s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3492s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3510s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3492s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3472s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3508s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3212s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3187s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3108s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3327s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.4654s, total workload: 1.144GiB, rate: 0.780GiB/s
bench_allreduce(np=4) took 1.3593s, total workload: 1.144GiB, rate: 0.841GiB/s
bench_allreduce(np=4) took 1.0765s, total workload: 1.144GiB, rate: 1.062GiB/s
bench_allreduce(np=4) took 0.9870s, total workload: 1.144GiB, rate: 1.159GiB/s
bench_allreduce(np=4) took 0.9946s, total workload: 1.144GiB, rate: 1.150GiB/s
bench_allreduce(np=4) took 0.9512s, total workload: 1.144GiB, rate: 1.202GiB/s
bench_allreduce(np=4) took 0.9667s, total workload: 1.144GiB, rate: 1.183GiB/s
bench_allreduce(np=4) took 0.9778s, total workload: 1.144GiB, rate: 1.170GiB/s
bench_allreduce(np=4) took 0.9656s, total workload: 1.144GiB, rate: 1.184GiB/s
bench_allreduce(np=4) took 0.9692s, total workload: 1.144GiB, rate: 1.180GiB/s
END ======================================== bench_allreduce remote ========================================

@lgarithm
Copy link
Owner Author

lgarithm commented Mar 1, 2024

578c079

@lgarithm
Copy link
Owner Author

lgarithm commented Mar 1, 2024

5c7afeb

@lgarithm
Copy link
Owner Author

lgarithm commented Mar 1, 2024

3f0f4d7

@lgarithm
Copy link
Owner Author

lgarithm commented Mar 1, 2024

7483943

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0151s, total workload: 384000B, rate: 0.024GiB/s
bench_allreduce(np=4) took 0.0138s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0130s, total workload: 384000B, rate: 0.027GiB/s
bench_allreduce(np=4) took 0.0129s, total workload: 384000B, rate: 0.028GiB/s
bench_allreduce(np=4) took 0.0129s, total workload: 384000B, rate: 0.028GiB/s
bench_allreduce(np=4) took 0.0125s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0125s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0125s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0126s, total workload: 384000B, rate: 0.028GiB/s
bench_allreduce(np=4) took 0.0125s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.2667s, total workload: 1.144GiB, rate: 4.289GiB/s
bench_allreduce(np=4) took 0.2396s, total workload: 1.144GiB, rate: 4.773GiB/s
bench_allreduce(np=4) took 0.2357s, total workload: 1.144GiB, rate: 4.853GiB/s
bench_allreduce(np=4) took 0.2451s, total workload: 1.144GiB, rate: 4.667GiB/s
bench_allreduce(np=4) took 0.2535s, total workload: 1.144GiB, rate: 4.512GiB/s
bench_allreduce(np=4) took 0.2547s, total workload: 1.144GiB, rate: 4.490GiB/s
bench_allreduce(np=4) took 0.2548s, total workload: 1.144GiB, rate: 4.488GiB/s
bench_allreduce(np=4) took 0.2567s, total workload: 1.144GiB, rate: 4.455GiB/s
bench_allreduce(np=4) took 0.2589s, total workload: 1.144GiB, rate: 4.417GiB/s
bench_allreduce(np=4) took 0.2648s, total workload: 1.144GiB, rate: 4.319GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3935s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3199s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3076s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3013s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3354s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3307s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3446s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3810s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3932s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3753s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.4619s, total workload: 1.144GiB, rate: 0.782GiB/s
bench_allreduce(np=4) took 1.1683s, total workload: 1.144GiB, rate: 0.979GiB/s
bench_allreduce(np=4) took 0.9212s, total workload: 1.144GiB, rate: 1.242GiB/s
bench_allreduce(np=4) took 0.8812s, total workload: 1.144GiB, rate: 1.298GiB/s
bench_allreduce(np=4) took 0.8741s, total workload: 1.144GiB, rate: 1.308GiB/s
bench_allreduce(np=4) took 0.8586s, total workload: 1.144GiB, rate: 1.332GiB/s
bench_allreduce(np=4) took 0.8343s, total workload: 1.144GiB, rate: 1.371GiB/s
bench_allreduce(np=4) took 0.8592s, total workload: 1.144GiB, rate: 1.331GiB/s
bench_allreduce(np=4) took 0.8379s, total workload: 1.144GiB, rate: 1.365GiB/s
bench_allreduce(np=4) took 0.8672s, total workload: 1.144GiB, rate: 1.319GiB/s
END ======================================== bench_allreduce remote ========================================

@lgarithm
Copy link
Owner Author

lgarithm commented Mar 1, 2024

8d10fa2

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0143s, total workload: 384000B, rate: 0.025GiB/s
bench_allreduce(np=4) took 0.0137s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0136s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0137s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0136s, total workload: 384000B, rate: 0.026GiB/s
bench_allreduce(np=4) took 0.0131s, total workload: 384000B, rate: 0.027GiB/s
bench_allreduce(np=4) took 0.0122s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0122s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0123s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0123s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.2649s, total workload: 1.144GiB, rate: 4.317GiB/s
bench_allreduce(np=4) took 0.2585s, total workload: 1.144GiB, rate: 4.425GiB/s
bench_allreduce(np=4) took 0.2618s, total workload: 1.144GiB, rate: 4.369GiB/s
bench_allreduce(np=4) took 0.2572s, total workload: 1.144GiB, rate: 4.447GiB/s
bench_allreduce(np=4) took 0.2375s, total workload: 1.144GiB, rate: 4.816GiB/s
bench_allreduce(np=4) took 0.2565s, total workload: 1.144GiB, rate: 4.459GiB/s
bench_allreduce(np=4) took 0.2618s, total workload: 1.144GiB, rate: 4.368GiB/s
bench_allreduce(np=4) took 0.2624s, total workload: 1.144GiB, rate: 4.358GiB/s
bench_allreduce(np=4) took 0.2637s, total workload: 1.144GiB, rate: 4.337GiB/s
bench_allreduce(np=4) took 0.2654s, total workload: 1.144GiB, rate: 4.310GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3693s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3492s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3510s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3492s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3472s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3508s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3212s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3187s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3108s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3327s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.4654s, total workload: 1.144GiB, rate: 0.780GiB/s
bench_allreduce(np=4) took 1.3593s, total workload: 1.144GiB, rate: 0.841GiB/s
bench_allreduce(np=4) took 1.0765s, total workload: 1.144GiB, rate: 1.062GiB/s
bench_allreduce(np=4) took 0.9870s, total workload: 1.144GiB, rate: 1.159GiB/s
bench_allreduce(np=4) took 0.9946s, total workload: 1.144GiB, rate: 1.150GiB/s
bench_allreduce(np=4) took 0.9512s, total workload: 1.144GiB, rate: 1.202GiB/s
bench_allreduce(np=4) took 0.9667s, total workload: 1.144GiB, rate: 1.183GiB/s
bench_allreduce(np=4) took 0.9778s, total workload: 1.144GiB, rate: 1.170GiB/s
bench_allreduce(np=4) took 0.9656s, total workload: 1.144GiB, rate: 1.184GiB/s
bench_allreduce(np=4) took 0.9692s, total workload: 1.144GiB, rate: 1.180GiB/s
END ======================================== bench_allreduce remote ========================================

@lgarithm
Copy link
Owner Author

lgarithm commented Mar 1, 2024

578c079

@lgarithm
Copy link
Owner Author

lgarithm commented Mar 1, 2024

5c7afeb

@lgarithm
Copy link
Owner Author

lgarithm commented Mar 1, 2024

3f0f4d7

@lgarithm
Copy link
Owner Author

5b7667e

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0130s, total workload: 384000B, rate: 0.028GiB/s
bench_allreduce(np=4) took 0.0124s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0123s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0123s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0122s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0124s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0120s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0116s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0116s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0116s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.2637s, total workload: 1.144GiB, rate: 4.337GiB/s
bench_allreduce(np=4) took 0.2543s, total workload: 1.144GiB, rate: 4.497GiB/s
bench_allreduce(np=4) took 0.2547s, total workload: 1.144GiB, rate: 4.491GiB/s
bench_allreduce(np=4) took 0.2571s, total workload: 1.144GiB, rate: 4.448GiB/s
bench_allreduce(np=4) took 0.2604s, total workload: 1.144GiB, rate: 4.391GiB/s
bench_allreduce(np=4) took 0.2504s, total workload: 1.144GiB, rate: 4.567GiB/s
bench_allreduce(np=4) took 0.2560s, total workload: 1.144GiB, rate: 4.468GiB/s
bench_allreduce(np=4) took 0.2545s, total workload: 1.144GiB, rate: 4.494GiB/s
bench_allreduce(np=4) took 0.2544s, total workload: 1.144GiB, rate: 4.495GiB/s
bench_allreduce(np=4) took 0.2533s, total workload: 1.144GiB, rate: 4.515GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3831s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3537s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3587s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3512s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3410s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3503s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3591s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3488s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3470s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3470s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.0902s, total workload: 1.144GiB, rate: 1.049GiB/s
bench_allreduce(np=4) took 1.0794s, total workload: 1.144GiB, rate: 1.060GiB/s
bench_allreduce(np=4) took 0.9957s, total workload: 1.144GiB, rate: 1.149GiB/s
bench_allreduce(np=4) took 0.8494s, total workload: 1.144GiB, rate: 1.346GiB/s
bench_allreduce(np=4) took 0.7802s, total workload: 1.144GiB, rate: 1.466GiB/s
bench_allreduce(np=4) took 0.7733s, total workload: 1.144GiB, rate: 1.479GiB/s
bench_allreduce(np=4) took 0.7723s, total workload: 1.144GiB, rate: 1.481GiB/s
bench_allreduce(np=4) took 0.7585s, total workload: 1.144GiB, rate: 1.508GiB/s
bench_allreduce(np=4) took 0.7731s, total workload: 1.144GiB, rate: 1.479GiB/s
bench_allreduce(np=4) took 0.7653s, total workload: 1.144GiB, rate: 1.494GiB/s
END ======================================== bench_allreduce remote ========================================

@lgarithm
Copy link
Owner Author

ba0d691

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0124s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0120s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0120s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0119s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0119s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0115s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0114s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0114s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0114s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0115s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.2637s, total workload: 1.144GiB, rate: 4.336GiB/s
bench_allreduce(np=4) took 0.2531s, total workload: 1.144GiB, rate: 4.519GiB/s
bench_allreduce(np=4) took 0.2512s, total workload: 1.144GiB, rate: 4.553GiB/s
bench_allreduce(np=4) took 0.2541s, total workload: 1.144GiB, rate: 4.500GiB/s
bench_allreduce(np=4) took 0.2541s, total workload: 1.144GiB, rate: 4.501GiB/s
bench_allreduce(np=4) took 0.2536s, total workload: 1.144GiB, rate: 4.510GiB/s
bench_allreduce(np=4) took 0.2544s, total workload: 1.144GiB, rate: 4.495GiB/s
bench_allreduce(np=4) took 0.2535s, total workload: 1.144GiB, rate: 4.512GiB/s
bench_allreduce(np=4) took 0.2543s, total workload: 1.144GiB, rate: 4.497GiB/s
bench_allreduce(np=4) took 0.2528s, total workload: 1.144GiB, rate: 4.524GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3830s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3861s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3815s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3696s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3268s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3261s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3305s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3322s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3092s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3340s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.1843s, total workload: 1.144GiB, rate: 0.966GiB/s
bench_allreduce(np=4) took 1.0431s, total workload: 1.144GiB, rate: 1.096GiB/s
bench_allreduce(np=4) took 1.0408s, total workload: 1.144GiB, rate: 1.099GiB/s
bench_allreduce(np=4) took 0.9793s, total workload: 1.144GiB, rate: 1.168GiB/s
bench_allreduce(np=4) took 0.9878s, total workload: 1.144GiB, rate: 1.158GiB/s
bench_allreduce(np=4) took 1.0487s, total workload: 1.144GiB, rate: 1.091GiB/s
bench_allreduce(np=4) took 0.9917s, total workload: 1.144GiB, rate: 1.153GiB/s
bench_allreduce(np=4) took 1.0071s, total workload: 1.144GiB, rate: 1.136GiB/s
bench_allreduce(np=4) took 0.9798s, total workload: 1.144GiB, rate: 1.167GiB/s
bench_allreduce(np=4) took 1.0379s, total workload: 1.144GiB, rate: 1.102GiB/s
END ======================================== bench_allreduce remote ========================================

@lgarithm
Copy link
Owner Author

71d3f79

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0038s, total workload: 384000B, rate: 0.093GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.098GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0035s, total workload: 384000B, rate: 0.103GiB/s
bench_allreduce(np=4) took 0.0033s, total workload: 384000B, rate: 0.108GiB/s
bench_allreduce(np=4) took 0.0032s, total workload: 384000B, rate: 0.110GiB/s
bench_allreduce(np=4) took 0.2252s, total workload: 1.144GiB, rate: 5.079GiB/s
bench_allreduce(np=4) took 0.2040s, total workload: 1.144GiB, rate: 5.606GiB/s
bench_allreduce(np=4) took 0.2034s, total workload: 1.144GiB, rate: 5.622GiB/s
bench_allreduce(np=4) took 0.2023s, total workload: 1.144GiB, rate: 5.652GiB/s
bench_allreduce(np=4) took 0.2098s, total workload: 1.144GiB, rate: 5.450GiB/s
bench_allreduce(np=4) took 0.2046s, total workload: 1.144GiB, rate: 5.590GiB/s
bench_allreduce(np=4) took 0.2052s, total workload: 1.144GiB, rate: 5.573GiB/s
bench_allreduce(np=4) took 0.2044s, total workload: 1.144GiB, rate: 5.596GiB/s
bench_allreduce(np=4) took 0.2060s, total workload: 1.144GiB, rate: 5.551GiB/s
bench_allreduce(np=4) took 0.2059s, total workload: 1.144GiB, rate: 5.553GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3308s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3140s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3002s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3112s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3261s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3326s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3107s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3072s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.2969s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.2862s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.0909s, total workload: 1.144GiB, rate: 1.048GiB/s
bench_allreduce(np=4) took 0.9866s, total workload: 1.144GiB, rate: 1.159GiB/s
bench_allreduce(np=4) took 0.9789s, total workload: 1.144GiB, rate: 1.168GiB/s
bench_allreduce(np=4) took 0.9276s, total workload: 1.144GiB, rate: 1.233GiB/s
bench_allreduce(np=4) took 0.9301s, total workload: 1.144GiB, rate: 1.230GiB/s
bench_allreduce(np=4) took 0.9553s, total workload: 1.144GiB, rate: 1.197GiB/s
bench_allreduce(np=4) took 0.9490s, total workload: 1.144GiB, rate: 1.205GiB/s
bench_allreduce(np=4) took 0.9726s, total workload: 1.144GiB, rate: 1.176GiB/s
bench_allreduce(np=4) took 0.8959s, total workload: 1.144GiB, rate: 1.277GiB/s
bench_allreduce(np=4) took 0.9749s, total workload: 1.144GiB, rate: 1.173GiB/s
END ======================================== bench_allreduce remote ========================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant