Skip to content

Bring fuzzy matching support into master #5508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 6, 2018
Merged

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Aug 2, 2018

This PR brings the fuzzy matching support developed by @mdosanjh at Sandia into pml/ob1 on master.

The fuzzy matching code is disabled by default and can be enabled on the appropriate platforms by specifying the --with-pml-ob1-matching configure option.

Signed-off-by: Nathan Hjelm <[email protected]>
This commit updates the new custom matching code in pml/ob1 so it can
not be enabled with a configure option. This commit also renames the
fuzzy-matching headers to avoid potential name conflicts and removes
the use of C reserved identifiers.

Signed-off-by: Nathan Hjelm <[email protected]>

typedef struct custom_match_prq_node
{
int32_t tags[PRQ_SIZE];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields seems to always be accessed together, but laying out the structure this way guarantee 5 cache misses per access which is rather expensive for such a time critical operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that restructuring would be pertinent to allow for performant PRQ_SIZE and UMQ_SIZE resizing. The current implementation of 'pml_ob1_custom_match_arrays.h' matching engine structure is explicitly sized to fit each prq/umq node into a single cache line.


typedef struct custom_match_umq_node
{
int32_t tags[UMQ_SIZE];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as for the prq struct.

result = _mm512_cmpeq_epi8_mask(_mm512_and_epi32(elem->keys, elem->mask), _mm512_and_epi32(search, elem->mask));
if(result)
{
for(i = elem->start; i <= elem->end; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that looping around the set bits in result will be faster as it saves few branches.

{
for(i = elem->start; i <= elem->end; i++)
{
if((0x1 << i & result) && elem->value[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 0x1 really promoted to __mmask64 ?

Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn hjelmn merged commit c294bbc into open-mpi:master Aug 6, 2018
@thananon
Copy link
Member

thananon commented Oct 5, 2018

I have error trying to compile with new matching on Intel Xeon processor with AVX and AVX2 instruction set. GCC 7.1.

Is there something wrong on the configure or am I missing something?

Same error with --with-pml-ob1-matching=vector or fuzzy-*.

/sw/gcc/7.1.0/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include/avx512fintrin.h:3573:1: error: inlining failed in call to always_inline '_mm512_set1_epi32': target specific option mismatch
 _mm512_set1_epi32 (int __A)
 ^~~~~~~~~~~~~~~~~
In file included from custommatch/pml_ob1_custom_match.h:49:0,
                 from pml_ob1_comm.h:38,
                 from pml_ob1.c:51:
custommatch/pml_ob1_custom_match_vectors.h:501:22: note: called from here
         elem->srcs = _mm512_set1_epi32(~0);
                      ^~~~~~~~~~~~~~~~~~~~~
In file included from /sw/gcc/7.1.0/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include/immintrin.h:45:0,
                 from custommatch/pml_ob1_custom_match_vectors.h:17,
                 from custommatch/pml_ob1_custom_match.h:49,
                 from pml_ob1_comm.h:38,
                 from pml_ob1.c:51:
/sw/gcc/7.1.0/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include/avx512fintrin.h:3573:1: error: inlining failed in call to always_inline '_mm512_set1_epi32': target specific option mismatch
 _mm512_set1_epi32 (int __A)

@mdosanjh
Copy link
Contributor

mdosanjh commented Oct 5, 2018

@thananon The vector code here is writen for AVX-512 and is not currently compatible with AVX or AVX2 (or the limited AVX-512 implementation on Knights Corner).

@thananon
Copy link
Member

thananon commented Oct 5, 2018

Ah, I see. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants