-
Notifications
You must be signed in to change notification settings - Fork 901
Bring fuzzy matching support into master #5508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Nathan Hjelm <[email protected]>
This commit updates the new custom matching code in pml/ob1 so it can not be enabled with a configure option. This commit also renames the fuzzy-matching headers to avoid potential name conflicts and removes the use of C reserved identifiers. Signed-off-by: Nathan Hjelm <[email protected]>
|
||
typedef struct custom_match_prq_node | ||
{ | ||
int32_t tags[PRQ_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fields seems to always be accessed together, but laying out the structure this way guarantee 5 cache misses per access which is rather expensive for such a time critical operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that restructuring would be pertinent to allow for performant PRQ_SIZE and UMQ_SIZE resizing. The current implementation of 'pml_ob1_custom_match_arrays.h' matching engine structure is explicitly sized to fit each prq/umq node into a single cache line.
|
||
typedef struct custom_match_umq_node | ||
{ | ||
int32_t tags[UMQ_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as for the prq struct.
result = _mm512_cmpeq_epi8_mask(_mm512_and_epi32(elem->keys, elem->mask), _mm512_and_epi32(search, elem->mask)); | ||
if(result) | ||
{ | ||
for(i = elem->start; i <= elem->end; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that looping around the set bits in result will be faster as it saves few branches.
{ | ||
for(i = elem->start; i <= elem->end; i++) | ||
{ | ||
if((0x1 << i & result) && elem->value[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 0x1 really promoted to __mmask64 ?
Signed-off-by: Nathan Hjelm <[email protected]>
I have error trying to compile with new matching on Intel Xeon processor with AVX and AVX2 instruction set. GCC 7.1. Is there something wrong on the configure or am I missing something? Same error with --with-pml-ob1-matching=vector or fuzzy-*.
|
@thananon The vector code here is writen for AVX-512 and is not currently compatible with AVX or AVX2 (or the limited AVX-512 implementation on Knights Corner). |
Ah, I see. Thank you. |
This PR brings the fuzzy matching support developed by @mdosanjh at Sandia into pml/ob1 on master.
The fuzzy matching code is disabled by default and can be enabled on the appropriate platforms by specifying the --with-pml-ob1-matching configure option.