Open
Description
https://godbolt.org/z/v7fq5q7q8
following code:
pub fn example(x: &mut [f32; 1024]) {
for x in x.iter_mut() {
*x *= 4.0;
}
}
when compiled with codegen_gcc gives following assembly:
example::example:
test rdi, rdi
je .L6
movss xmm1, DWORD PTR .LC0[rip]
lea rax, 4092[rdi]
jmp .L9
.L19:
add rdi, 4
je .L6
.L9:
movss xmm0, DWORD PTR [rdi]
mulss xmm0, xmm1
movss DWORD PTR [rdi], xmm0
cmp rdi, rax
jne .L19
.L6:
ret
.LC0:
.long 1082130432
compared to llvm codegen, it seems (at least to me, but i'm definietly not an asm expert) that:
- llvm version is vectorised, gcc not
- while llvm version has 1 label and conditional jump (which is not suprisings for simple loop), gcc version seems unnecessary convoluted, with 3 labels and 4 jump instructions
- gcc version starts with something that looks like a null check , which is strange given that &mut cannot be null
i'm not sure if this is really a problem, but i figured out it is worth asking :)
Anyway, thank you for working on this project, it is really exciting development :)