Skip to content

[RISCV][EVL] Improve sdiv/udiv code generation for tail folding by EVL. #129538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Mel-Chen opened this issue Mar 3, 2025 · 4 comments
Open

Comments

@Mel-Chen
Copy link
Contributor

Mel-Chen commented Mar 3, 2025

After #127180, the vectorizer emits vp.merge + general sdiv/udiv instead of vp.sdiv/udiv for tail folding by EVL.
However, using vp.udiv/sdiv may yield better performance. The improvement could come from fewer vsetvli instructions and lower vector register pressure.

The current IR and assembly for sdiv: https://godbolt.org/z/YvPhGa8df
The vp intrinsic IR and assembly for sdiv: https://godbolt.org/z/1achsE3Wo

Not yet sure at which stage this optimization should be applied. We need more discussion.
Label it as RISCV backend issue for now.

@llvmbot
Copy link
Member

llvmbot commented Mar 3, 2025

@llvm/issue-subscribers-backend-risc-v

Author: Mel Chen (Mel-Chen)

After https://github.com//pull/127180, the vectorizer emits vp.merge + general sdiv/udiv instead of vp.sdiv/udiv for tail folding by EVL. However, using vp.udiv/sdiv may yield better performance. The improvement could come from fewer vsetvli instructions and lower vector register pressure.

The current IR and assembly for sdiv: https://godbolt.org/z/YvPhGa8df
The vp intrinsic IR and assembly for sdiv: https://godbolt.org/z/1achsE3Wo

Not yet sure at which stage this optimization should be applied. We need more discussion.
Label it as RISCV backend issue for now.

@Mel-Chen
Copy link
Contributor Author

Mel-Chen commented Mar 3, 2025

@lukel97
Copy link
Contributor

lukel97 commented Mar 3, 2025

I think emitting a VP intrinsic here (also possibly for masked folding?) makes sense.

After #127180, the vectorizer emits vp.merge + general sdiv/udiv instead of vp.sdiv/udiv for tail folding by EVL.

Just to check, my understanding was that before #127180 we still emitted a vp.merge (previously a select on LLVM 19: https://godbolt.org/z/bqGzxbWr5)

We could either try and fold it away to a VPWidenIntrinsicRecipe in transformRecipestoEVLRecipes, or emit it directly in VPRecipeBuilder::tryToWiden. I think the former might be cleaner if it means we can keep the EVL stuff in one place.

This potentially motivates the need for #125991

@Mel-Chen
Copy link
Contributor Author

Mel-Chen commented Mar 5, 2025

I think emitting a VP intrinsic here (also possibly for masked folding?) makes sense.

If possible, I still prefer handling it in the backend to share more optimization resources. However, if it is too hard to handle it in the backend, we can address this issue in the vectorizer instead.

Just to check, my understanding was that before #127180 we still emitted a vp.merge (previously a select on LLVM 19: https://godbolt.org/z/bqGzxbWr5)

Oh... I originally thought that masked_divisor = vp.merge(true, divisor, X, evl) + vp.sdiv/udiv(dividend, masked_divisor, mask, evl) could be combined into vp.sdiv/udiv(dividend, divisor, mask, evl). In that case, this indeed has little to do with #127180.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants