igzip/riscv64: Optimize isal_adler32_rvv with 4x loop unrolling and tail agnostic(ta) #373

leiwen2025 · 2025-11-10T07:35:17Z

This PR optimizes adler32_rvv implementation by introducing 4x loop unrolling and tail agnostic(ta) policy.

The optimization has been verified on the SG2044 platform:

SG2044:
        new: adler32_warm: runtime =    3062502 usecs, bandwidth 11996 MB in 3.0625 sec = 3917.29 MB/s
        old: adler32_warm: runtime =    3062465 usecs, bandwidth 9233 MB in 3.0625 sec = 3015.15 MB/s

sunyuechi · 2025-11-19T05:15:28Z

It looks like among the 3 commits, one is empty and the other two are the same. Can you merge them into one?

leiwen2025 · 2025-11-19T06:56:43Z

It looks like among the 3 commits, one is empty and the other two are the same. Can you merge them into one?

Done. I have merged them as requested.

sunyuechi · 2025-11-20T06:00:01Z

igzip/riscv64/igzip_isal_adler32_rvv.S

+  add           a1, a1, t4
+  sub           a2, a2, t4
+
+  vsetvli       zero, t1, e32, m4, tu, ma


Maybe it's better to use zero, zero here to represent processing the same number of elements as before? (including some other vset)

I will make the modifications as suggested.

sunyuechi · 2025-11-20T06:03:46Z

igzip/riscv64/igzip_isal_adler32_rvv.S

+  vle8.v        v2, (a4)
+  add           a5, a4, t1
+  vle8.v        v3, (a5)
+  mv            t5, a2


Same as before unrolling, this can be moved outside the loop, as long as vrsub.vx usage is changed to a2, and the modification of a2 is moved to a bit later

Thanks for the review. I will modify this part

sunyuechi · 2025-11-20T06:05:37Z

igzip/riscv64/igzip_isal_adler32_rvv.S

+  add           a4, a3, t1
+  vle8.v        v2, (a4)
+  add           a5, a4, t1
+  vle8.v        v3, (a5)


Here we can use the same register

I will make the modifications as suggested.

sunyuechi · 2025-11-20T06:07:11Z

igzip/riscv64/igzip_isal_adler32_rvv.S

+  vmv.x.s       a4, v0                               // B = a4
  vmv.x.s       t2, v24                              // A = t2
-  add           t3, t4, t3
+  add           t3, a4, t3


No need to change the name here

I will make the modifications as suggested.

pablodelara · 2025-11-24T15:23:39Z

How is it looking now, @sunyuechi ?

sunyuechi · 2025-12-03T14:54:18Z

igzip/riscv64/igzip_isal_adler32_rvv.S

+  vid.v         v12                                 // 0, 1, 2, .. vl-1
+  vadd.vv       v8, v8, v4
+  vrsub.vx      v12, v12, a2                        // len, len-1, len-2
+  vwmaccu.vv    v16, v12, v4                        // v16: B += weight * next


These three comment lines have added extra spaces. Please remove them
to restore the original alignment.

sunyuechi · 2025-12-03T14:54:28Z

igzip/riscv64/igzip_isal_adler32_rvv.S

+  sub           a2, a2, t1
+  bnez          a2, single

+3:


The numbering is 1: 3: 4: but missing 2:. It might be better to use
1: 2: 3: instead.

sunyuechi · 2025-12-03T14:54:47Z

igzip/riscv64/igzip_isal_adler32_rvv.S

+  vadd.vv       v8, v8, v28
+  vwmaccu.vv    v16, v12, v28
+  sub           a2, a2, a4
+  bge           a2, t0, unroll_loop_4x


Please move the sub instruction earlier to avoid the dependency
with bge.

sunyuechi · 2025-12-03T14:55:39Z

igzip/riscv64/igzip_isal_adler32_rvv.S

+  vwmaccu.vv    v16, v12, v4                        // v16: B += weight * next
  add           a1, a1, t1
-  bnez          a2, 1b
+  sub           a2, a2, t1


Please move the sub instruction earlier to avoid the dependency
with bge.

sunyuechi · 2025-12-03T14:56:26Z

After addressing the above minor issues,
please squash the commit messages into one and it should be ready to merge.

leiwen2025 · 2025-12-04T01:05:11Z

After addressing the above minor issues, please squash the commit messages into one and it should be ready to merge.

Thanks for the review. I 'll address the issues and merge the commits.

…ail agnostic(ta) Signed-off-by: WenLei <[email protected]>

sunyuechi · 2025-12-04T09:03:05Z

LGTM

leiwen2025 force-pushed the rv64-igzip-adler32rvv branch from 9e77959 to d571a01 Compare November 11, 2025 08:51

leiwen2025 force-pushed the rv64-igzip-adler32rvv branch from d571a01 to 3d3eee7 Compare November 19, 2025 06:50

sunyuechi reviewed Nov 20, 2025

View reviewed changes

leiwen2025 force-pushed the rv64-igzip-adler32rvv branch from f6b8c54 to 3d3eee7 Compare November 20, 2025 06:40

leiwen2025 requested a review from sunyuechi November 21, 2025 08:20

sunyuechi reviewed Dec 3, 2025

View reviewed changes

leiwen2025 force-pushed the rv64-igzip-adler32rvv branch from fbcf370 to 8b9e5e6 Compare December 4, 2025 08:07

igzip/riscv64: Optimize isal_adler32_rvv with 4x loop unrolling and t…

7a11b91

…ail agnostic(ta) Signed-off-by: WenLei <[email protected]>

leiwen2025 force-pushed the rv64-igzip-adler32rvv branch from 8b9e5e6 to 7a11b91 Compare December 4, 2025 08:09

igzip/riscv64: Optimize isal_adler32_rvv with 4x loop unrolling and tail agnostic(ta) #373

Are you sure you want to change the base?

igzip/riscv64: Optimize isal_adler32_rvv with 4x loop unrolling and tail agnostic(ta) #373

Uh oh!

Conversation

leiwen2025 commented Nov 10, 2025

Uh oh!

sunyuechi commented Nov 19, 2025

Uh oh!

leiwen2025 commented Nov 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pablodelara commented Nov 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sunyuechi commented Dec 3, 2025

Uh oh!

leiwen2025 commented Dec 4, 2025

Uh oh!

sunyuechi commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants