Skip to content

Conversation

csegarragonz
Copy link
Collaborator

@csegarragonz csegarragonz commented Mar 26, 2024

eventually we could consider using more advanced features with hwloc

closes #385 (superseeds)

Copy link

codecov bot commented Mar 26, 2024

Codecov Report

Attention: Patch coverage is 79.50311% with 33 lines in your changes are missing coverage. Please review.

Project coverage is 81.89%. Comparing base (d94f61a) to head (8a5cd8b).

Files Patch % Lines
src/transport/tcp/RecvSocket.cpp 48.14% 14 Missing ⚠️
src/executor/Executor.cpp 14.28% 6 Missing ⚠️
src/mpi/MpiWorld.cpp 89.28% 6 Missing ⚠️
src/util/hwloc.cpp 89.18% 4 Missing ⚠️
src/transport/tcp/SocketOptions.cpp 90.32% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #408      +/-   ##
==========================================
- Coverage   82.08%   81.89%   -0.20%     
==========================================
  Files         114      116       +2     
  Lines        7525     7560      +35     
==========================================
+ Hits         6177     6191      +14     
- Misses       1348     1369      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@csegarragonz csegarragonz force-pushed the mpi-affinity branch 2 times, most recently from 678dcae to 26400a4 Compare April 5, 2024 17:05
@csegarragonz csegarragonz changed the title mpi(hwloc): bind receiver and mpi thread to the same cpu core mpi(hwloc): spin-lock receiver threads and pin to a core Apr 5, 2024
add_definitions(-DTRACE_ALL=1)
endif()

# We want to disable the usage of spinlocks (and CPU pinning) in GHA runners
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoyingly we cannot use spinlocks inside GHA as that would make tests (and particularly distributed tests) very flaky.

@lgarithm
Copy link
Contributor

lgarithm commented Apr 5, 2024


-----------------------------------------------------------------------------------------------------------------
local/S       local/L       remote/S      remote/L      commit                                                 
-----------------------------------------------------------------------------------------------------------------
0.294         8.855         0.020         8.665         mpi                                                    
0.024 (0.08)  4.411 (0.50)  0.001 (0.05)  1.107 (0.13)  d9810b0d5dec2ff36ee9fdbda015137bb145364d # main        
0.031 (0.11)  4.528 (0.51)  0.001 (0.05)  1.329 (0.15)  26568f78c368256eae5aa3cb74c94bc737bbd85a # mpi-struct  
0.027 (0.09)  4.530 (0.51)  0.001 (0.05)  1.046 (0.12)  8eb27e0985b9d87834d822676a104b74faafb9a1 # ptp-struct  
0.078 (0.27)  5.220 (0.59)  0.001 (0.05)  1.184 (0.14)  0b0f1b117ea03c2e8e0641ab56a9c8e4e97954a7 # spinlock    
0.028 (0.10)  4.470 (0.50)  0.001 (0.05)  1.124 (0.13)  7b17b30706166ee777d8c05379ef37ca74029eb3 # ptp-no-order
-----------------------------------------------------------------------------------------------------------------
0.028 (0.10)  4.498 (0.51)  0.002 (0.10)  1.664 (0.19)  c3dbe3ba090e58d9488f9b3da0562265725a2e0d # new-main    
0.038 (0.13)  4.839 (0.55)  0.002 (0.10)  1.771 (0.20)  63beeaffe0fbb7728628f616529d465f16614ccc # mpi-affinity
--------------------------------------------------------------------------------------------------------- (Mar 27)
0.162 (0.55)  5.443 (0.61)  0.000 (0.00)  0.183 (0.02)  e42395ac5c0fbad3c68cefaf93d8356163e9d4a9 # good 
--------------------------------------------------------------------------------------------------------- (April 4)
0.030 (0.10)  4.513 (0.51)  0.005 (0.25)  0.747 (0.09)  a7d01b60d0df85a0e87a826f30f94068c8ab247d # mpi-st
--------------------------------------------------------------------------------------------------------- (April 5)
0.026 (0.09)  4.463 (0.50)  0.004 (0.20)  0.757 (0.09)  3e34a053140050907d8a8b77b6c22b691933b7ee # main-before-sock-opt
0.028 (0.10)  4.478 (0.51)  0.005 (0.25)  3.655 (0.42)  75b58f98fe6788f159ff6cd008251b6b4cdcdc65 # sock-opt 
0.029 (0.10)  4.357 (0.49)  0.005 (0.25)  0.752 (0.09)  1d6731ba791073b8886d65dd93d3845e6806305a # mpi-affinity   

@csegarragonz
Copy link
Collaborator Author

hey @lgarithm this does not look right to me. could you make sure that spin locks are enabled at build time?

you should be able to see it in the cmake logs. it may require re-running inv dev.cmake with the --clean flag.

@csegarragonz csegarragonz force-pushed the mpi-affinity branch 3 times, most recently from fadb386 to daef454 Compare April 8, 2024 10:51
@lgarithm
Copy link
Contributor

lgarithm commented Apr 8, 2024

does this need a special build flag? I re-run with everything cleaned, but it's still the same:

-----------------------------------------------------------------------------------------------------------------
local/S       local/L       remote/S      remote/L      commit                                                 
-----------------------------------------------------------------------------------------------------------------
0.294         8.855         0.020         8.665         mpi                                                    
0.028 (0.10)  4.320 (0.49)  0.004 (0.20)  0.758 (0.09)  1da6eb8a8c7342b39423c658764407c6172cc04d # mpi-affinity

@csegarragonz csegarragonz force-pushed the mpi-affinity branch 2 times, most recently from ab8ab50 to 602eab2 Compare April 8, 2024 12:12
@csegarragonz
Copy link
Collaborator Author

@lgarithm can you paste the build log here? particularly at the very beginning of the CMake command you should see:

-- The C compiler identification is Clang 17.0.6
-- The CXX compiler identification is Clang 17.0.6
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/clang-17 - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/clang++-17 - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Faabric not optimised for specific CPU
-- Faabric: Activated spin-locks // <============== THIS LINE
-- Found Git: /usr/bin/git (found version "2.34.1") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  

@csegarragonz csegarragonz force-pushed the mpi-affinity branch 3 times, most recently from 7561533 to e76c5c8 Compare April 8, 2024 12:46
@csegarragonz csegarragonz force-pushed the mpi-affinity branch 4 times, most recently from ff12d46 to 63d533b Compare April 8, 2024 16:41
@csegarragonz csegarragonz force-pushed the mpi-affinity branch 3 times, most recently from f1feea1 to 3e96fb8 Compare April 9, 2024 16:25
@csegarragonz csegarragonz merged commit b8e3934 into main Apr 10, 2024
@csegarragonz csegarragonz deleted the mpi-affinity branch April 10, 2024 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants