-
Notifications
You must be signed in to change notification settings - Fork 14.6k
Description
This comes up regularly in functions that process or detect ascii inputs. My current example is in rust, int a const
function, which prevents me from using explicit simd intrinsics, but the code generated by clang is nearly the same.
pub unsafe fn ascii_prefix(input: &[u8]) -> usize {
let mut mask = 0_u16;
for i in 0..16 {
mask |= ((*input.get_unchecked(i) < 128) as u16) << i;
}
mask.trailing_ones() as usize
}
Ideally this would map to 4 instructions: compare
, movmsk
, not
and tzcnt
. But currently it gets compiled to about 50 instructions with a mix of scalar and vector instructions: https://godbolt.org/z/jaKb5TMrn
A sightly simpler example that only tries to detect fully ascii chunks leads to similar complex output.
pub unsafe fn is_ascii_mask(input: &[u8]) -> bool {
let mut is_ascii = [false; 16];
for i in 0..16 {
is_ascii[i] = *input.get_unchecked(i) <= 127;
}
let mut mask = 0_u16;
for i in 0..16 {
mask |= (is_ascii[i] as u16) << i;
}
mask == 0xFFFF
}
However, for detection only I found one pattern that generates the wanted instruction sequence, so LLVM has some logic to detect this pattern:
pub unsafe fn is_ascii_sum(input: &[u8]) -> bool {
let mut is_ascii = [false; 16];
for i in 0..16 {
is_ascii[i] = *input.get_unchecked(i) <= 127;
}
let mut count = 0_u8;
for i in 0..16 {
count += is_ascii[i] as u8;
}
count == 16
}
example::is_ascii_sum::h00bc3be7e76426eb:
vmovdqu xmm0, xmmword ptr [rdi]
vpmovmskb eax, xmm0
test eax, eax
sete al
ret
Unfortunately this pattern using sum
cannot be used for the initial example which needs a mask as output.
#45398 seems related, but focuses a bit more on the bit counting aspect, where movmsk
might only be beneficial if the popcnt
instruction is also available.