Skip to content

Conversation

@Dr-Emann
Copy link
Member

Fixes #207

As the comments say, unfortunately, this leads to pretty suboptimal code unless the std library is compiled with the required target feature, and the default x86_64 target (except for macos) is not compiled with the required target feature.

See godbolt showing assembly for: default x86_64 (bad assembly), aarch64 (good assembly), and x86_64 macos (good assembly, which should also be the assembly for -Zbuild-std)

We could use a little unsafe and do something like:

#[cfg(target_feature = "ssse3")]
{
    use core::arch::x86_64::{__m128i, _mm_shuffle_epi8};
    let val_m128 = __m128i::from(val_convert);
    let swizzle_m128 = __m128i::from(swizzle_idxs);
    let swizzled_m128 = unsafe { _mm_shuffle_epi8(val_m128, swizzle_m128) };
    return u16x8::from(swizzled_m128);
}

to allow optimal code generation with just a new enough target in the caller without requiring recompiling std.

@Dr-Emann Dr-Emann changed the title use a dynamic swizzle for swizzle_to_front use a dynamic bytes swizzle for swizzle_to_front Jul 10, 2025
@Kerollmops
Copy link
Member

Hey @Dr-Emann 👋

Thank you for your work. Very good explanation of the problem, as always.

I am wondering if the unsafe code you are showing is not the way to go? If the code is well documented we should go this way even if it's unsafe. What do you think? 🤔

@Dr-Emann Dr-Emann force-pushed the push-rzxqkrtntnpz branch 3 times, most recently from 31de4ec to 835bb11 Compare July 10, 2025 15:02
@Dr-Emann Dr-Emann force-pushed the push-rzxqkrtntnpz branch from 835bb11 to 186f3fd Compare July 10, 2025 15:03
@Dr-Emann
Copy link
Member Author

Sure. I added a version which also uses runtime detection if the std feature is enabled as well, so even if the user doesn't target a cpu with ssse3 directly, they still won't have to go through the non-simd version unless they run on an ancient processor.

Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! Thank you for this 👍

@Kerollmops Kerollmops added this pull request to the merge queue Jul 16, 2025
Merged via the queue into RoaringBitmap:main with commit 2688ab9 Jul 16, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove static jump table from swizzle_to_front

2 participants