Skip to content

[RISC-V] expandload should compile to viota+vrgather #101914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Validark opened this issue Aug 5, 2024 · 1 comment · Fixed by #101954
Closed

[RISC-V] expandload should compile to viota+vrgather #101914

Validark opened this issue Aug 5, 2024 · 1 comment · Fixed by #101954
Assignees

Comments

@Validark
Copy link

Validark commented Aug 5, 2024

export fn expandload16(a: *const [16]u8, b: u16, c: @Vector(16, u8)) @Vector(16, u8) {
    return struct {
        extern fn @"llvm.masked.expandload.v16i8"(@TypeOf(a), @Vector(16, u1), @TypeOf(c)) callconv(.Unspecified) @Vector(16, u8);
    }.@"llvm.masked.expandload.v16i8"(a, @as(@Vector(16, u1), @bitCast(b)), c);
}
define dso_local <16 x i8> @expandload16(ptr nocapture nonnull readonly align 1 %0, i16 zeroext %1, <16 x i8> %2) local_unnamed_addr {
Entry:
  %3 = bitcast i16 %1 to <16 x i1>
  %4 = tail call fastcc <16 x i8> @llvm.masked.expandload.v16i8(ptr nonnull readonly align 1 %0, <16 x i1> %3, <16 x i8> %2)
  ret <16 x i8> %4
}

declare void @llvm.dbg.value(metadata, metadata, metadata) #1

declare fastcc <16 x i8> @llvm.masked.expandload.v16i8(ptr nocapture, <16 x i1>, <16 x i8>) #2

When compiled for the Sifive x280, we check bit-by-bit and jump based on that:

...
...
...
.LBB0_2:
        andi    a2, a1, 4
        bnez    a2, .LBB0_20
.LBB0_3:
        andi    a2, a1, 8
        bnez    a2, .LBB0_21
.LBB0_4:
        andi    a2, a1, 16
        bnez    a2, .LBB0_22
.LBB0_5:
        andi    a2, a1, 32
        bnez    a2, .LBB0_23
.LBB0_6:
        andi    a2, a1, 64
        bnez    a2, .LBB0_24
.LBB0_7:
        andi    a2, a1, 128
        bnez    a2, .LBB0_25
.LBB0_8:
        andi    a2, a1, 256
        bnez    a2, .LBB0_26
.LBB0_9:
        andi    a2, a1, 512
        bnez    a2, .LBB0_27
.LBB0_10:
        andi    a2, a1, 1024
        bnez    a2, .LBB0_28
...
...
...

It should be able to work according to the documentation here:

https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#1651-synthesizing-vdecompress

@llvmbot
Copy link
Member

llvmbot commented Aug 5, 2024

@llvm/issue-subscribers-backend-risc-v

Author: Niles Salter (Validark)

```zig export fn expandload16(a: *const [16]u8, b: u16, c: @Vector(16, u8)) @Vector(16, u8) { return struct { extern fn @"llvm.masked.expandload.v16i8"(@TypeOf(a), @Vector(16, u1), @TypeOf(c)) callconv(.Unspecified) @Vector(16, u8); }.@"llvm.masked.expandload.v16i8"(a, @as(@Vector(16, u1), @bitCast(b)), c); } ```
define dso_local &lt;16 x i8&gt; @<!-- -->expandload16(ptr nocapture nonnull readonly align 1 %0, i16 zeroext %1, &lt;16 x i8&gt; %2) local_unnamed_addr {
Entry:
  %3 = bitcast i16 %1 to &lt;16 x i1&gt;
  %4 = tail call fastcc &lt;16 x i8&gt; @<!-- -->llvm.masked.expandload.v16i8(ptr nonnull readonly align 1 %0, &lt;16 x i1&gt; %3, &lt;16 x i8&gt; %2)
  ret &lt;16 x i8&gt; %4
}

declare void @<!-- -->llvm.dbg.value(metadata, metadata, metadata) #<!-- -->1

declare fastcc &lt;16 x i8&gt; @<!-- -->llvm.masked.expandload.v16i8(ptr nocapture, &lt;16 x i1&gt;, &lt;16 x i8&gt;) #<!-- -->2

When compiled for the Sifive x280, we check bit-by-bit and jump based on that:

...
...
...
.LBB0_2:
        andi    a2, a1, 4
        bnez    a2, .LBB0_20
.LBB0_3:
        andi    a2, a1, 8
        bnez    a2, .LBB0_21
.LBB0_4:
        andi    a2, a1, 16
        bnez    a2, .LBB0_22
.LBB0_5:
        andi    a2, a1, 32
        bnez    a2, .LBB0_23
.LBB0_6:
        andi    a2, a1, 64
        bnez    a2, .LBB0_24
.LBB0_7:
        andi    a2, a1, 128
        bnez    a2, .LBB0_25
.LBB0_8:
        andi    a2, a1, 256
        bnez    a2, .LBB0_26
.LBB0_9:
        andi    a2, a1, 512
        bnez    a2, .LBB0_27
.LBB0_10:
        andi    a2, a1, 1024
        bnez    a2, .LBB0_28
...
...
...

It should be able to work according to the documentation here:

https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#1651-synthesizing-vdecompress

@wangpc-pp wangpc-pp self-assigned this Aug 5, 2024
wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this issue Aug 5, 2024
We can use `iota+vrgather` to synthesize `vdecompress` and lower
expanding load to `vcpop+load+vdecompress`.

Fixes llvm#101914
wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this issue Aug 5, 2024
We can use `viota.m` + indexed load to synthesize expanding load:
```
%res = llvm.masked.expandload(%ptr, %mask, %passthru)
->
%index = viota %mask
if elt_size > 8:
  %index = vsll.vi %index, log2(elt_size), %mask
%res = vluxei<n> %passthru, %ptr, %index, %mask
```

And if `%mask` is all ones, we can lower expanding load to an normal
unmasked load.

Fixes llvm#101914
wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this issue Aug 6, 2024
We can use `viota.m` + indexed load to synthesize expanding load:
```
%res = llvm.masked.expandload(%ptr, %mask, %passthru)
->
%index = viota %mask
if elt_size > 8:
  %index = vsll.vi %index, log2(elt_size), %mask
%res = vluxei<n> %passthru, %ptr, %index, %mask
```

And if `%mask` is all ones, we can lower expanding load to a normal
unmasked load.

Fixes llvm#101914
wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this issue Oct 23, 2024
We can use `viota.m` + indexed load to synthesize expanding load:
```
%res = llvm.masked.expandload(%ptr, %mask, %passthru)
->
%index = viota %mask
if elt_size > 8:
  %index = vsll.vi %index, log2(elt_size), %mask
%res = vluxei<n> %passthru, %ptr, %index, %mask
```

And if `%mask` is all ones, we can lower expanding load to a normal
unmasked load.

Fixes llvm#101914
wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this issue Oct 31, 2024
We can use `viota`+`vrgather` to synthesize `vdecompress` and lower
expanding load to `vcpop`+`load`+`vdecompress`.

And if `%mask` is all ones, we can lower expanding load to a normal
unmasked load.

Fixes llvm#101914.
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this issue Nov 3, 2024
We can use `viota`+`vrgather` to synthesize `vdecompress` and lower
expanding load to `vcpop`+`load`+`vdecompress`.

And if `%mask` is all ones, we can lower expanding load to a normal
unmasked load.

Fixes llvm#101914.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this issue Nov 4, 2024
We can use `viota`+`vrgather` to synthesize `vdecompress` and lower
expanding load to `vcpop`+`load`+`vdecompress`.

And if `%mask` is all ones, we can lower expanding load to a normal
unmasked load.

Fixes llvm#101914.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants