Skip to content

Deny all warnings and fix errors #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 22, 2017

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Oct 22, 2017

No description provided.

*self
}
#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"),
target_feature = "sse"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm so mostly just a random comment about this tests, but shouldn't this be using cfg_feature_enabled!?

Copy link
Contributor Author

@gnzlbg gnzlbg Oct 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called in a loop, so here no run-time checks are intended.

One could try to write it as:

if cfg_feature_enabled!("sse") {
    use self::stdsimd::vendor;
    use simd::f32x4;
    let t = self.as_f32x2();
    let u = unsafe {
        vendor::_mm_rsqrt_ps(
            f32x4::new(t.extract(0), t.extract(1), 0., 0.)).as_f64x4()
    };
    f64x2::new(u.extract(0), u.extract(1))
} else if cfg_feature_enabled!("neon") {
    use self::stdsimd::vendor;
    unsafe { vendor::vrsqrte_f32(self.as_f32x2()).as_f64x2() }
} else {
    self.replace(0, 1. / self.extract(0).sqrt());
    self.replace(1, 1. / self.extract(1).sqrt());
    *self
}

but one of the first two branches will always fail to compile because the content of vendor differs (those are different architectures).

To make code portable across architectures one needs to use conditional compilation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right yeah it won't work for ARM, but presumably it'd work for x86?

Copy link
Contributor Author

@gnzlbg gnzlbg Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that frsqrt is used in the tightest loop, so while we could do that, that would be equivalent to:

for n in 0..no_iters() { for i in 0..no_bodies() { for j in i+1..no_bodies() {
    if cfg_feature_enabled!("sse") {  // atomic load-store
       // intrinsic
    } else {
       // fallback
    }
}}}

I don't think that's how cfg_feature_enabled! should be used. If every "portable" intrinsic would be implemented like that, the inner-most loop would be filled with those if... conditions introducing atomic load/stores each.

IMO cfg_feature_enabled! should be used to switch algorithms at the out-most point of use and not on every intrinsic. This is currently hard in Rust.

Depending on the result of cfg_feature_enabled! one typically wants to use a different vector type. Once we get stdsimd updated on crates.io I'd like to experiment building a higher-level library, but I think that until we get for<T> Fn(T) -> T-kind of where bounds, this won't be really possible.

Ideally, I'd like to write:

slice.par_iter_mut().zip(other_slice.par_iter())  // rayon outer loop
     .for_each_simd(|(a, b)| {  // simd inner loop
       // algorithm on abstract vector types
});

(and/or a .simd::<x4>(). iterator that maps from T to x4 vector types)

And have the for_each_simd function monomorphize the lambda for different vector types (e.g. x2, x4, x8), chunking the iterator appropriately, handling remainder, and selecting the best at run-time (but note that here cfg_feature_enabled! would be used only once and out of the inner loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm getting at is that this example isn't what I figured we wanted idiomatic SIMD-in-Rust to look like as on x86 at least it'd use cfg_feature_enabled!. I don't really mind how it looks but I figured we'd want it to be used somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 100%, this is why I want to start building an idiomatic simd wrapper over stdsimd already. So that we can see what will be possible and what not, and correct stdsimd appropriately.

Copy link
Contributor Author

@gnzlbg gnzlbg Oct 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton I mean, this code isn't that bad:

pub trait Frsqrt {
    fn frsqrt(&self) -> Self;
}

impl Frsqrt for f64x2 {
    fn frsqrt(&self) -> Self {
        #[cfg(all(any(target_arch = "x86", target_arch = "x86_64"),
                  target_feature = "sse"))]
        {
            use self::stdsimd::vendor;
            use simd::f32x4;

            let t = self.as_f32x2();

            let u = unsafe {
                vendor::_mm_rsqrt_ps(
                    f32x4::new(t.extract(0), t.extract(1), 0., 0.)).as_f64x4()
            };
            f64x2::new(u.extract(0), u.extract(1))
        }
        #[cfg(all(any(target_arch = "arm", target_arch = "aarch64"),
                  target_feature = "neon"))]
        {
            use self::stdsimd::vendor;
            unsafe { vendor::vrsqrte_f32(self.as_f32x2()).as_f64x2() }
        }
        #[cfg(not(any(all(any(target_arch = "x86", target_arch = "x86_64"),
                          target_feature = "sse"),
                      all(any(target_arch = "arm", target_arch = "aarch64"),
                          target_feature = "neon")
        )))]
        {
            self.replace(0, 1. / self.extract(0).sqrt());
            self.replace(1, 1. / self.extract(1).sqrt());
            *self
        }
    }
}

when a user does f64x2::frsqrt(...) they get the fastest feature enabled at compile-time.

The piece that is missing is for run-time feature detection to propagate features top-down, so that:

if cfg_feature_enabled_2000X!("avx") {
    f64x2::frsqrt(...) // AVX version called
}  else {
    f64x2::frsqrt(...) // SSE version called
}

That way one could, in the most absurd case, do:

fn main() {
    if cfg_feature_enabled_2000X!("avx") {
        run()
     } else {
        run()
     }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But IMO it is important to be able to write "generic" code that is a zero-cost abstraction, and the current function, as provided, is a zero-cost abstraction.

It just isn't able to choose the best implementation at run-time in a zero-cost way. Neither for users that want this (because if every intrinsic does the run-time check, then feature detection is not zero cost) nor for user that do not want this (because they don't want the run-time check at all).

Copy link
Contributor Author

@gnzlbg gnzlbg Oct 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So IMO if we want to be "best class" at this, we need to figure out a way on binary initialization to do run-time feature detection once, and basically choose a binary optimized for that feature set, and ideally, with minimal code-size explosion.

@alexcrichton alexcrichton merged commit 6fdfce5 into rust-lang:master Oct 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants