Skip to content

Conversation

@leiwen2025
Copy link

This PR optimizes adler32_rvv implementation by introducing 4x loop unrolling and tail agnostic(ta) policy.

The optimization has been verified on the SG2044 platform:

SG2044:
        new: adler32_warm: runtime =    3062502 usecs, bandwidth 11996 MB in 3.0625 sec = 3917.29 MB/s
        old: adler32_warm: runtime =    3062465 usecs, bandwidth 9233 MB in 3.0625 sec = 3015.15 MB/s

@leiwen2025 leiwen2025 force-pushed the rv64-igzip-adler32rvv branch from 9e77959 to d571a01 Compare November 11, 2025 08:51
@sunyuechi
Copy link
Contributor

It looks like among the 3 commits, one is empty and the other two are the same. Can you merge them into one?

@leiwen2025 leiwen2025 force-pushed the rv64-igzip-adler32rvv branch from d571a01 to 3d3eee7 Compare November 19, 2025 06:50
@leiwen2025
Copy link
Author

It looks like among the 3 commits, one is empty and the other two are the same. Can you merge them into one?

Done. I have merged them as requested.

add a1, a1, t4
sub a2, a2, t4

vsetvli zero, t1, e32, m4, tu, ma
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to use zero, zero here to represent processing the same number of elements as before? (including some other vset)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make the modifications as suggested.

vle8.v v2, (a4)
add a5, a4, t1
vle8.v v3, (a5)
mv t5, a2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before unrolling, this can be moved outside the loop, as long as vrsub.vx usage is changed to a2, and the modification of a2 is moved to a bit later

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I will modify this part

add a4, a3, t1
vle8.v v2, (a4)
add a5, a4, t1
vle8.v v3, (a5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can use the same register

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make the modifications as suggested.

vmv.x.s a4, v0 // B = a4
vmv.x.s t2, v24 // A = t2
add t3, t4, t3
add t3, a4, t3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change the name here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make the modifications as suggested.

@pablodelara
Copy link
Contributor

How is it looking now, @sunyuechi ?

vid.v v12 // 0, 1, 2, .. vl-1
vadd.vv v8, v8, v4
vrsub.vx v12, v12, a2 // len, len-1, len-2
vwmaccu.vv v16, v12, v4 // v16: B += weight * next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three comment lines have added extra spaces. Please remove them
to restore the original alignment.

sub a2, a2, t1
bnez a2, single

3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbering is 1: 3: 4: but missing 2:. It might be better to use
1: 2: 3: instead.

vadd.vv v8, v8, v28
vwmaccu.vv v16, v12, v28
sub a2, a2, a4
bge a2, t0, unroll_loop_4x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the sub instruction earlier to avoid the dependency
with bge.

vwmaccu.vv v16, v12, v4 // v16: B += weight * next
add a1, a1, t1
bnez a2, 1b
sub a2, a2, t1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the sub instruction earlier to avoid the dependency
with bge.

@sunyuechi
Copy link
Contributor

After addressing the above minor issues,
please squash the commit messages into one and it should be ready to merge.

@leiwen2025
Copy link
Author

After addressing the above minor issues, please squash the commit messages into one and it should be ready to merge.

Thanks for the review. I 'll address the issues and merge the commits.

@leiwen2025 leiwen2025 force-pushed the rv64-igzip-adler32rvv branch from fbcf370 to 8b9e5e6 Compare December 4, 2025 08:07
@leiwen2025 leiwen2025 force-pushed the rv64-igzip-adler32rvv branch from 8b9e5e6 to 7a11b91 Compare December 4, 2025 08:09
@sunyuechi
Copy link
Contributor

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants