-
Notifications
You must be signed in to change notification settings - Fork 291
Deny all warnings and fix errors #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
*self | ||
} | ||
#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), | ||
target_feature = "sse"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm so mostly just a random comment about this tests, but shouldn't this be using cfg_feature_enabled!
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is called in a loop, so here no run-time checks are intended.
One could try to write it as:
if cfg_feature_enabled!("sse") {
use self::stdsimd::vendor;
use simd::f32x4;
let t = self.as_f32x2();
let u = unsafe {
vendor::_mm_rsqrt_ps(
f32x4::new(t.extract(0), t.extract(1), 0., 0.)).as_f64x4()
};
f64x2::new(u.extract(0), u.extract(1))
} else if cfg_feature_enabled!("neon") {
use self::stdsimd::vendor;
unsafe { vendor::vrsqrte_f32(self.as_f32x2()).as_f64x2() }
} else {
self.replace(0, 1. / self.extract(0).sqrt());
self.replace(1, 1. / self.extract(1).sqrt());
*self
}
but one of the first two branches will always fail to compile because the content of vendor
differs (those are different architectures).
To make code portable across architectures one needs to use conditional compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right yeah it won't work for ARM, but presumably it'd work for x86?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that frsqrt is used in the tightest loop, so while we could do that, that would be equivalent to:
for n in 0..no_iters() { for i in 0..no_bodies() { for j in i+1..no_bodies() {
if cfg_feature_enabled!("sse") { // atomic load-store
// intrinsic
} else {
// fallback
}
}}}
I don't think that's how cfg_feature_enabled!
should be used. If every "portable" intrinsic would be implemented like that, the inner-most loop would be filled with those if...
conditions introducing atomic load/stores each.
IMO cfg_feature_enabled!
should be used to switch algorithms at the out-most point of use and not on every intrinsic. This is currently hard in Rust.
Depending on the result of cfg_feature_enabled!
one typically wants to use a different vector type. Once we get stdsimd
updated on crates.io I'd like to experiment building a higher-level library, but I think that until we get for<T> Fn(T) -> T
-kind of where bounds, this won't be really possible.
Ideally, I'd like to write:
slice.par_iter_mut().zip(other_slice.par_iter()) // rayon outer loop
.for_each_simd(|(a, b)| { // simd inner loop
// algorithm on abstract vector types
});
(and/or a .simd::<x4>().
iterator that maps from T to x4
vector types)
And have the for_each_simd
function monomorphize the lambda for different vector types (e.g. x2, x4, x8), chunking the iterator appropriately, handling remainder, and selecting the best at run-time (but note that here cfg_feature_enabled!
would be used only once and out of the inner loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I'm getting at is that this example isn't what I figured we wanted idiomatic SIMD-in-Rust to look like as on x86 at least it'd use cfg_feature_enabled!
. I don't really mind how it looks but I figured we'd want it to be used somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree 100%, this is why I want to start building an idiomatic simd wrapper over stdsimd
already. So that we can see what will be possible and what not, and correct stdsimd
appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton I mean, this code isn't that bad:
pub trait Frsqrt {
fn frsqrt(&self) -> Self;
}
impl Frsqrt for f64x2 {
fn frsqrt(&self) -> Self {
#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"),
target_feature = "sse"))]
{
use self::stdsimd::vendor;
use simd::f32x4;
let t = self.as_f32x2();
let u = unsafe {
vendor::_mm_rsqrt_ps(
f32x4::new(t.extract(0), t.extract(1), 0., 0.)).as_f64x4()
};
f64x2::new(u.extract(0), u.extract(1))
}
#[cfg(all(any(target_arch = "arm", target_arch = "aarch64"),
target_feature = "neon"))]
{
use self::stdsimd::vendor;
unsafe { vendor::vrsqrte_f32(self.as_f32x2()).as_f64x2() }
}
#[cfg(not(any(all(any(target_arch = "x86", target_arch = "x86_64"),
target_feature = "sse"),
all(any(target_arch = "arm", target_arch = "aarch64"),
target_feature = "neon")
)))]
{
self.replace(0, 1. / self.extract(0).sqrt());
self.replace(1, 1. / self.extract(1).sqrt());
*self
}
}
}
when a user does f64x2::frsqrt(...)
they get the fastest feature enabled at compile-time.
The piece that is missing is for run-time feature detection to propagate features top-down, so that:
if cfg_feature_enabled_2000X!("avx") {
f64x2::frsqrt(...) // AVX version called
} else {
f64x2::frsqrt(...) // SSE version called
}
That way one could, in the most absurd case, do:
fn main() {
if cfg_feature_enabled_2000X!("avx") {
run()
} else {
run()
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But IMO it is important to be able to write "generic" code that is a zero-cost abstraction, and the current function, as provided, is a zero-cost abstraction.
It just isn't able to choose the best implementation at run-time in a zero-cost way. Neither for users that want this (because if every intrinsic does the run-time check, then feature detection is not zero cost) nor for user that do not want this (because they don't want the run-time check at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So IMO if we want to be "best class" at this, we need to figure out a way on binary initialization to do run-time feature detection once, and basically choose a binary optimized for that feature set, and ideally, with minimal code-size explosion.
No description provided.