-
Notifications
You must be signed in to change notification settings - Fork 937
coll: reduce_scatter_block: add butterfly algorithm #5197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coll: reduce_scatter_block: add butterfly algorithm #5197
Conversation
|
Can one of the admins verify this patch? |
|
OK to test. |
| } | ||
|
|
||
| /* | ||
| * ompi_mirror_perm: Retruns mirror permutation of nbits low-order bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo.
| static int ompi_mirror_perm(unsigned int value, int nbits) | ||
| { | ||
| int perm = 0; | ||
| for (int i = 0; i < nbits; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I really think that it is performance critical, but you can implement this much simpler by extracting the last bit from value and pushing it left on perm.
for( int i = 0; i < nbits; i++ ) {
perm |= (value & 0x1);
value >>= 1;
perm <<= 1;
}
|
bot:ompi:retest |
01d6d36 to
d2ad0ac
Compare
@bosilca This algorithm forms incorrect result. For example: I changed the algorithm and corrected a few typos. The new version of |
|
@mkurnosov this is the bit reversal problem, and is heavily use in signal processing. There are many fast solutions, some of them described here. I really like the one in constant operations: |
Implements butterfly algorithm for MPI_Reduce_scatter_block. The algorithm can be used both by commutative and non-commutative operations, for power-of-two and non-power-of-two number of processes. Signed-off-by: Mikhail Kurnosov <[email protected]>
d2ad0ac to
28d5837
Compare
|
@bosilca Thanks for your comment. I am familiar with this problem. New version is based on log(n) algorithm (from "Bit Twiddling Hacks" and "Hacker's Delight"). |
Implements butterfly algorithm for
MPI_Reduce_scatter_block.The algorithm can be used both by commutative and non-commutative operations, for power-of-two and non-power-of-two number of processes.
Signed-off-by: Mikhail Kurnosov [email protected]