-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RISCV][EVL] Improve sdiv/udiv code generation for tail folding by EVL. #129538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-risc-v Author: Mel Chen (Mel-Chen)
After https://github.com//pull/127180, the vectorizer emits vp.merge + general sdiv/udiv instead of vp.sdiv/udiv for tail folding by EVL.
However, using vp.udiv/sdiv may yield better performance. The improvement could come from fewer vsetvli instructions and lower vector register pressure.
The current IR and assembly for sdiv: https://godbolt.org/z/YvPhGa8df Not yet sure at which stage this optimization should be applied. We need more discussion. |
I think emitting a VP intrinsic here (also possibly for masked folding?) makes sense.
Just to check, my understanding was that before #127180 we still emitted a vp.merge (previously a select on LLVM 19: https://godbolt.org/z/bqGzxbWr5) We could either try and fold it away to a This potentially motivates the need for #125991 |
If possible, I still prefer handling it in the backend to share more optimization resources. However, if it is too hard to handle it in the backend, we can address this issue in the vectorizer instead.
Oh... I originally thought that |
After #127180, the vectorizer emits vp.merge + general sdiv/udiv instead of vp.sdiv/udiv for tail folding by EVL.
However, using vp.udiv/sdiv may yield better performance. The improvement could come from fewer vsetvli instructions and lower vector register pressure.
The current IR and assembly for sdiv: https://godbolt.org/z/YvPhGa8df
The vp intrinsic IR and assembly for sdiv: https://godbolt.org/z/1achsE3Wo
Not yet sure at which stage this optimization should be applied. We need more discussion.
Label it as RISCV backend issue for now.
The text was updated successfully, but these errors were encountered: