-
Notifications
You must be signed in to change notification settings - Fork 900
opal: enable load-linked, store-conditional atomics for AArch64 #8412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
hjelmn
merged 1 commit into
open-mpi:master
from
hjelmn:enable_the_load_linked_store_conditional_lock_free_structures_when_using_c11_or_other_builtins
Feb 1, 2021
Merged
opal: enable load-linked, store-conditional atomics for AArch64 #8412
hjelmn
merged 1 commit into
open-mpi:master
from
hjelmn:enable_the_load_linked_store_conditional_lock_free_structures_when_using_c11_or_other_builtins
Feb 1, 2021
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dd9c1da
to
136c693
Compare
bosilca
approved these changes
Jan 25, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code reorganization looks good, but in some cases I don't see the point of creating now files (that will only exists in a single arch) nor moving code from a file into another. But all these are minor.
bwbarrett
approved these changes
Jan 25, 2021
This PR updates the opal atomic code to allow the use of the AArch64 LL/SC instructions even when C11 atomics are enabled. This should provide for better atomic lifo/fifo performance on these systems. Performance on Apple Silicon (Late 2021 Mac Mini M1, 16GB): LL/SC: ``` Mac-mini:class hjelmn$ ./opal_lifo -t 1 Single thread test. Time: 0 s 13621 us 13 nsec/poppush Atomics thread finished. Time: 0 s 14375 us 14 nsec/poppush Atomics thread finished. Time: 0 s 154525 us 154 nsec/poppush Atomics thread finished. Time: 0 s 154661 us 154 nsec/poppush Atomics thread finished. Time: 0 s 156505 us 156 nsec/poppush Atomics thread finished. Time: 0 s 157013 us 157 nsec/poppush Atomics thread finished. Time: 0 s 157493 us 157 nsec/poppush Atomics thread finished. Time: 0 s 158275 us 158 nsec/poppush Atomics thread finished. Time: 0 s 158647 us 158 nsec/poppush Atomics thread finished. Time: 0 s 158973 us 158 nsec/poppush All threads finished. Thread count: 8 Time: 0 s 159023 us 19 nsec/poppush SUPPORT: OMPI Test Passed: opal_lifo_t: (7 tests) ``` ``` Mac-mini:class hjelmn$ ./opal_fifo Single thread test. Time: 0 s 7620 us 7 nsec/poppush Atomics thread finished. Time: 0 s 7918 us 7 nsec/poppush Atomics thread finished. Time: 0 s 76081 us 76 nsec/poppush Atomics thread finished. Time: 0 s 79458 us 79 nsec/poppush Atomics thread finished. Time: 0 s 84994 us 84 nsec/poppush Atomics thread finished. Time: 0 s 90103 us 90 nsec/poppush Atomics thread finished. Time: 0 s 90403 us 90 nsec/poppush Atomics thread finished. Time: 0 s 91280 us 91 nsec/poppush Atomics thread finished. Time: 0 s 92466 us 92 nsec/poppush Atomics thread finished. Time: 0 s 93835 us 93 nsec/poppush All threads finished. Thread count: 8 Time: 0 s 93916 us 11 nsec/poppush Exhaustive atomics thread finished. Popped 821530 items. Time: 0 s 107912 us 131 nsec/poppush Exhaustive atomics thread finished. Popped 810445 items. Time: 0 s 114695 us 141 nsec/poppush Exhaustive atomics thread finished. Popped 806449 items. Time: 0 s 116241 us 144 nsec/poppush Exhaustive atomics thread finished. Popped 813960 items. Time: 0 s 117182 us 143 nsec/poppush Exhaustive atomics thread finished. Popped 825230 items. Time: 0 s 118810 us 143 nsec/poppush Exhaustive atomics thread finished. Popped 826685 items. Time: 0 s 119486 us 144 nsec/poppush Exhaustive atomics thread finished. Popped 828373 items. Time: 0 s 120327 us 145 nsec/poppush Exhaustive atomics thread finished. Popped 830266 items. Time: 0 s 121114 us 145 nsec/poppush All threads finished. Thread count: 8 Time: 0 s 121186 us 15 nsec/poppush SUPPORT: OMPI Test Passed: opal_fifo_t: (8 tests) ``` CAS128: ``` Mac-mini:class hjelmn$ ./opal_lifo -t 1 Single thread test. Time: 0 s 25688 us 25 nsec/poppush Atomics thread finished. Time: 0 s 29322 us 29 nsec/poppush Atomics thread finished. Time: 4 s 57595 us 4057 nsec/poppush Atomics thread finished. Time: 4 s 151568 us 4151 nsec/poppush Atomics thread finished. Time: 4 s 162332 us 4162 nsec/poppush Atomics thread finished. Time: 4 s 173651 us 4173 nsec/poppush Atomics thread finished. Time: 4 s 176088 us 4176 nsec/poppush Atomics thread finished. Time: 4 s 178025 us 4178 nsec/poppush Atomics thread finished. Time: 4 s 178713 us 4178 nsec/poppush Atomics thread finished. Time: 4 s 178760 us 4178 nsec/poppush All threads finished. Thread count: 8 Time: 4 s 178830 us 522 nsec/poppush SUPPORT: OMPI Test Passed: opal_lifo_t: (7 tests) ``` ``` Mac-mini:class hjelmn$ ./opal_fifo Single thread test. Time: 0 s 7611 us 7 nsec/poppush Atomics thread finished. Time: 0 s 19256 us 19 nsec/poppush Atomics thread finished. Time: 2 s 555095 us 2555 nsec/poppush Atomics thread finished. Time: 2 s 562521 us 2562 nsec/poppush Atomics thread finished. Time: 2 s 570284 us 2570 nsec/poppush Atomics thread finished. Time: 2 s 570760 us 2570 nsec/poppush Atomics thread finished. Time: 2 s 571438 us 2571 nsec/poppush Atomics thread finished. Time: 2 s 573642 us 2573 nsec/poppush Atomics thread finished. Time: 2 s 575019 us 2575 nsec/poppush Atomics thread finished. Time: 2 s 575161 us 2575 nsec/poppush All threads finished. Thread count: 8 Time: 2 s 575231 us 321 nsec/poppush Exhaustive atomics thread finished. Popped 639525 items. Time: 1 s 828167 us 2858 nsec/poppush Exhaustive atomics thread finished. Popped 642578 items. Time: 1 s 840312 us 2863 nsec/poppush Exhaustive atomics thread finished. Popped 641617 items. Time: 1 s 846852 us 2878 nsec/poppush Exhaustive atomics thread finished. Popped 639283 items. Time: 1 s 849705 us 2893 nsec/poppush Exhaustive atomics thread finished. Popped 646423 items. Time: 1 s 851183 us 2863 nsec/poppush Exhaustive atomics thread finished. Popped 645146 items. Time: 1 s 851750 us 2870 nsec/poppush Exhaustive atomics thread finished. Popped 645428 items. Time: 1 s 852076 us 2869 nsec/poppush Exhaustive atomics thread finished. Popped 648267 items. Time: 1 s 852240 us 2857 nsec/poppush All threads finished. Thread count: 8 Time: 1 s 852359 us 231 nsec/poppush SUPPORT: OMPI Test Passed: opal_fifo_t: (8 tests) ``` About a 40x performance with the multi-threaded lifo/fifo tests. These are artificial benchmarks but give a reasonable idea of how these structures perform under heavy contention. Signed-off-by: Nathan Hjelm <[email protected]>
136c693
to
5e13f02
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR updates the opal atomic code to allow the use of the AArch64
LL/SC instructions even when C11 atomics are enabled. This should
provide for better atomic lifo/fifo performance on these systems.
Performance on Apple Silicon (Late 2021 Mac Mini M1, 16GB):
LL/SC:
CAS128:
About a 40x performance with the multi-threaded lifo/fifo tests.
These are artificial benchmarks but give a reasonable idea of
how these structures perform under heavy contention.
Signed-off-by: Nathan Hjelm [email protected]