Skip to content

[EraVM] Do not set Uses=[Flags] unconditionally in JCl #558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 786 commits into
base: main
Choose a base branch
from

Conversation

atrosinenko
Copy link
Contributor

No description provided.

akiramenai and others added 30 commits April 8, 2024 00:05
This patch is a joint work by:
Dmitry Borisenkov <[email protected]>
Vladimir Radosavljevic <[email protected]>
This pass tries to replace an stdlib call with either a call to
the more efficient algorithm, or just an optimized sequence of instructions.
Curretly, it handles only __exp(2^m, exp) calls and replaces them with calls
to __exp_pow2() that just computes 1 << (m * exp) value.
Signed-off-by: Vladimir Radosavljevic <[email protected]>
Also remove TODO and FIXME that are no longer needed.
Signed-off-by: Vladimir Radosavljevic <[email protected]>
Signed-off-by: Vladimir Radosavljevic <[email protected]>
When lowering SDIV and SREM nodes, we are generating
UDIV and UREM nodes with two outputs, instead of one.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
Substitute MVT::fatptr with MVT::i256 when creating a constant DAG node.

To be squashed with ed7d384 when update LLVM to reduce local change
noice.
This allows -print-after-all/-print-before-all to dump MIR around
these passes.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
* PostRA pass to fold conditional move with operand def instruction
* added IR tests and MIR tests
This pass intends to recoginze memops that can be optimized to indexed
memops and then rewrite them to favor the subsequent indexed memops
combine pass.
Signed-off-by: Vladimir Radosavljevic <[email protected]>
In order to efficiently emit expanded SELECT instruction, it is
best to coalesce a source register with the dest to avoid a redundant
copy.

This is done by tying the register allocator to assign them a
same register.
The pass only worked correctly if a loop body was a latch, otherwise the
phi with a wrong incoming BB was created. The patch changes the incoming
BB to the actual latch. It also ensure that a loop has only one latch.
Support log.decommit, tload, tstore.
Signed-off-by: Vladimir Radosavljevic <[email protected]>
When one of SELECT's input operands is zero, we can fold it with its
sole user to benefit the performance. The tie is used to ensure the
two register operands in user inst will be allocated to same reg.
Enna1 and others added 28 commits May 27, 2024 00:07
…in function (#88477)

With this change, the wall time for of GVN pass decreased from
873,745.492 ms to 367,375.304 ms in an our internal testcase.
Make R1 register be understood as the default operand of `RETrl` and
`REVERTrl` instructions (but make this alias non-default for printing
until switched to the new asm syntax).

Change panic instruction to either accept jump target operand
or none at all.
In preparation for renaming of assembler mnemonics, split `EraVMOpcode`
definitions for ret, revert and panic instructions into "without label"
and "with label" variants, so their mnemonics can be specified
independently.
Add tests for folding select with overflow intrinsics where we shouldn't
attempt to inverse overflow LT caused by uaddo and umulo.

Co-authored-by: Alan Li <[email protected]>
PR: #521, Issue: #491.
After folding, the select instructions will use overflow condition code
which is LT actually, this will give us shorter code sequence normally.
Please be noted that GE can only be used as reversal code for overflow
LT caused by usubo, but not for uaddo and umulo.

Co-authored-by: Alan Li <[email protected]>
PR: #521, Issue: #491.
Make NOP support the remaining combinations of operands.

Reorganize the tests: add `nop.(s|txt)` test files similar to those for
arithmetic instructions, put the test cases for `nop`, `incsp` and
`decsp` aliases to `sp-changes.(s|txt)` files.
Simplify the syntax for specifying both "static" and "shard" modifiers
of a far call: either `mnemonic.st.sh` or `mnemonic.sh.st`.

Please note that ".st" and ".sh" are technically just parts of the
mnemonic's name, so they should go before a predicate, if any.
Clarify the `@fat.ptr.call` test case before renaming the mnemonics.

The instructions actually generated for `@fat.ptr.call` function are

    ptr.add    r2, r0, r1
    near_call  r0, @fat.ptr.arg, @DEFAULT_UNWIND
    ret

Before renaming `ptr.add` to `addp`, the first line was accidentally
matched by the `add r2, r0, r1` substring.
Rename RETURN and REVERT pseudos to make their purpose easier to
understand.

Remove InstAlias for panic instruction. Since register operand of panic
instruction was removed, it virtually became a MnemonicAlias, so update
the definition of OpPanic instead as no alias needed in the new syntax.
Reuse the same `parseMCOperandsCode` and `parseMCOperandsStack`
functions that are used by code emitter to analyze operands of MCInst.

While it makes easier to improve formatting of printed assembly code,
this commit strives to keep the existing formatting as much as possible,
to not combine refactoring with massive changes of tests.
Explicitly specify EncodedOperandMode type and only convert it to int
when necessary.

Introduce MemOperandKind::OperandInvalid to not implicitly use
OperandCode as invalid stack operand kind.
Rename load/store as well as several special instructions to reflect
their current names according to new syntax.

Remove trailing "r" from several instructions that only have a fixed
set of supported operands and would need "rr" or "rrr" suffix anyway.
Update `(EraVMchange_sp GR256:$sp_change)` pattern to select NOPrrs
instead of NOPSPr (used by `llvm.restorestack` intrinsic).

Remove `(EraVMchange_sp imm16:$sp_change)` pattern as it is currently
unused and untested.

Change the `EraVMchange_sp` name to more descriptive `EraVMadd_to_sp`.
If looking for a miscompile revert candidate, look here!

The transform being enabled prefers comparing to a loop invariant
exit value for a secondary IV over using an otherwise dead primary
IV.  This increases register pressure (by requiring the exit value
to be live through the loop), but reduces the number of instructions
within the loop by one.

On RISC-V which has a large number of scalar registers, this is
generally a profitable transform.  We loose the ability to use a beqz
on what is typically a count down IV, and pay the cost of computing
the exit value on the secondary IV in the loop preheader, but save
an add or sub in the loop body.  For anything except an extremely
short running loop, or one with extreme register pressure, this is
profitable.  On spec2017, we see a 0.42% geomean improvement in
dynamic icount, with no individual workload regressing by more than
0.25%.

Code size wise, we trade a (possibly compressible) beqz and a (possibly
compressible) addi for a uncompressible beq.  We also add instructions
in the preheader.  Net result is a slight regression overall, but
neutral or better inside the loop.

Previous versions of this transform had numerous cornercase correctness
bugs.  All of them ones I can spot by inspection have been fixed, and I
have run this through all of spec2017, but there may be further issues
lurking.  Adding uses to an IV is a fraught thing to do given poison
semantics, so this transform is somewhat inherently risky.

This patch is a reworked version of D134893 by @eop.  That patch has
been abandoned since May, so I picked it up, reworked it a bit, and
am landing it.
This patch makes `shouldFoldTerminatingConditionAfterLSR` return `true`
for EraVM target. Thus LSR will try to replace main induction variable
of a loop with one of secondary IVs. Thus it reduces the number of
instructions in a loop by eliminating main IV increment.
This transformation increases register pressure, but it's rarely the
problem for EraVM.

PR: #599, Issue: #580.
* Collapse OperandAddrMode and OperandAM fields of IBinary class
* Use named constants instead of numbers
* Drop synthetic Value field from OperandAddrModeValue class
  (renamed to SrcOperandMode)
This field's value has no particular meaning neither in TableGen
nor in C++ and can be autogenerated if GenericEnum is ever desired.
Unlike mapping between different input operand kinds, the three other
`InstrMapping`s only have two options each that are flipped by the
mapper function.

Remove the synthetic Value field from `DestAddrModeValue` (renamed to
`DestOperandMode`) class. Keep Value fields in `mod_set_flags` and
`mod_swap` as these fields are meaningful for opcode computation.

Remove the `ToReg` vs. `ToRegReg` and `ToStack` vs. `ToStackReg`
separation as it is not really used.
Perform a few preparations before removing repeated definitions of
operands in `EraVMInstrInfo.td`:
* move the definitions of EraVM-specific `AsmOperandClass`es and
  `Operand`s, so that they can be used in `EraVMInstrFormats.td`
* define tablegen classes to be mixed into `Ixx_x` instruction classes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.