-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Closed as duplicate of#124216
Closed as duplicate of#124216
Copy link
Labels
A-SIMDArea: SIMD (Single Instruction Multiple Data)Area: SIMD (Single Instruction Multiple Data)C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-heavyIssue: Problems and improvements with respect to binary size of generated code.Issue: Problems and improvements with respect to binary size of generated code.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)T-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.Performance or correctness regression from one stable version to another.
Description
use std::arch::x86_64::*;
pub fn mul_and_shift(a: __m128i, b: __m128i) -> __m128i {
unsafe { _mm_srli_epi16(_mm_mulhi_epu16(a, b), 1) }
}
This should compile to, and Clang does compile this to:
mul_and_shift:
pmulhuw xmm0, xmm1
psrlw xmm0, 1
ret
But due to rust-lang/stdarch#1477, these intrinsics are mapped to portable SIMD operations, which are then compiled to this mess:
mul_and_shift:
pmulhuw xmm0, xmm1
punpcklwd xmm1, xmm0
punpckhwd xmm0, xmm0
psrld xmm0, 17
psrld xmm1, 17
packssdw xmm1, xmm0
movdqa xmm0, xmm1
ret
This is a regression in 1.75.0. IMO the corresponding parts of the stdarch PR should just be reverted, because I (and most people these days, I think) use specialized non-portable intrinsics precisely when LLVM can't optimize generic code correctly, and the PR explicitly breaks this use case. But I'd like to track this and hear other people's opinion.
@rustbot label +C-optimization +I-heavy +I-slow +A-SIMD +O-x86_64 +T-libs +regression-from-stable-to-stable
Metadata
Metadata
Assignees
Labels
A-SIMDArea: SIMD (Single Instruction Multiple Data)Area: SIMD (Single Instruction Multiple Data)C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-heavyIssue: Problems and improvements with respect to binary size of generated code.Issue: Problems and improvements with respect to binary size of generated code.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)T-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.Performance or correctness regression from one stable version to another.