Skip to content

Conversation

@mkurnosov
Copy link
Contributor

Implements butterfly algorithm for MPI_Reduce_scatter_block.
The algorithm can be used both by commutative and non-commutative operations, for power-of-two and non-power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov [email protected]

@ompiteam-bot
Copy link

Can one of the admins verify this patch?

@bosilca
Copy link
Member

bosilca commented May 25, 2018

OK to test.

}

/*
* ompi_mirror_perm: Retruns mirror permutation of nbits low-order bits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo.

static int ompi_mirror_perm(unsigned int value, int nbits)
{
int perm = 0;
for (int i = 0; i < nbits; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I really think that it is performance critical, but you can implement this much simpler by extracting the last bit from value and pushing it left on perm.

for( int i = 0; i < nbits; i++ ) {
    perm |= (value & 0x1);
    value >>= 1;
    perm <<= 1;
 }

@jsquyres
Copy link
Member

bot:ompi:retest

@mkurnosov mkurnosov force-pushed the reduce-scatter-block-butterfly branch from 01d6d36 to d2ad0ac Compare May 26, 2018 01:14
@mkurnosov
Copy link
Contributor Author

for( int i = 0; i < nbits; i++ ) {
    perm |= (value & 0x1);
    value >>= 1;
    perm <<= 1;
 }

@bosilca This algorithm forms incorrect result. For example: ompi_mirror_permutation(1,2)=4, but correct result is 2.

I changed the algorithm and corrected a few typos. The new version of ompi_mirror_permutation:

int perm = 0;
for (int i = nbits - 1; i >= 0; i--) {
    perm |= (value & 0x1) << i;
    value >>= 1;
}

@bosilca
Copy link
Member

bosilca commented May 26, 2018

@mkurnosov this is the bit reversal problem, and is heavily use in signal processing. There are many fast solutions, some of them described here.

I really like the one in constant operations:

// swap odd and even bits
value = ((value >> 1) & 0x55555555) | ((value & 0x55555555) << 1);
// swap consecutive pairs
value = ((value >> 2) & 0x33333333) | ((value & 0x33333333) << 2);
// swap nibbles ... 
value = ((value >> 4) & 0x0F0F0F0F) | ((value & 0x0F0F0F0F) << 4);
// swap bytes
value = ((value >> 8) & 0x00FF00FF) | ((value & 0x00FF00FF) << 8);
// swap 2-byte long pairs
value = ( value >> 16                                ) | ( value               << 16);

Implements butterfly algorithm for MPI_Reduce_scatter_block.
The algorithm can be used both by commutative and non-commutative
operations, for power-of-two and non-power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov <[email protected]>
@mkurnosov mkurnosov force-pushed the reduce-scatter-block-butterfly branch from d2ad0ac to 28d5837 Compare May 27, 2018 07:18
@mkurnosov
Copy link
Contributor Author

@bosilca Thanks for your comment. I am familiar with this problem. New version is based on log(n) algorithm (from "Bit Twiddling Hacks" and "Hacker's Delight").

@bosilca bosilca merged commit 9adc38c into open-mpi:master May 27, 2018
@mkurnosov mkurnosov deleted the reduce-scatter-block-butterfly branch May 28, 2018 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants