Skip to content

[AutoBump] Merge with fixes of 729f958c (Jan 22) (14) #552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 349 commits into from
Jun 26, 2025

Conversation

jorickert
Copy link

No description provided.

jacquesguan and others added 30 commits January 23, 2025 10:30
…lvm#123701)

Now a `emitc.switch` with argument of `emitc.expression` wouldn't emit
its argument to cpp. This patch fix it.
…ng it LLVM dialect (llvm#123840)

With these changes, CUF atomic operations are handled as cudadevice
intrinsics and are converted straight to the LLVM dialect with the
`llvm.atomicrw` operation.

I am only submitting changes for `atomicadd` to gather feedback. If we
are to proceed with these changes I will add support for all other
applicable atomic operations following this pattern.
Increases minimum CAS size from 16 bit to 32 bit, for better SASS
codegen.

When atomics are emulated using atom.cas.b16, the SASS generated
includes 2 (nested) emulation loops. When emulated using an atom.cas.b32
loop, the SASS too has a single emulation loop. Using 32 bit CAS thus
results in better codegen.
When looking at the slowest lit tests, I'm seeing these four tests take
two to eight minutes. Test coverage on Linux should be sufficient for
the functionality on top of it not really being useful on Windows at
all.

This was observed when hacking on the new premerge in a windows VM.
…vm#116771)

Two options for clang
  -mno-scq:                Disable sc.q instruction.
  -mscq:                   Enable sc.q instruction.
The default is -mno-scq.
…lvm#123881)

This extension adds eight 48 bit load store instructions.

The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest

This patch adds assembler only support.

---------

Co-authored-by: Harsh Chandel <[email protected]>
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744
proposes to partition static data sections.

This patch introduces a codegen pass. This patch produces jump table
hotness in the in-memory states (machine jump table info and entries).
Target-lowering and asm-printer consume the states and produce `.hot`
section suffix. The follow up PR
llvm#122215 implements such
changes.

---------

Co-authored-by: Ellis Hoag <[email protected]>
…#118656)

This patch is an extension to llvm#115128.

After profiling LLVM test-suite, I see a lot of loop nest of depth more
than `MaxLoopNestDepth` which is 10. Early exit for them would save
compile-time as it would avoid computing DependenceInfo and CacheCost.

Please see 'bound-max-depth' branch on compile-time-tracker.
Fixes llvm#113191

Issue: [flang][OpenMP] Runtime segfault when an allocatable variable is
used with copyin

Rootcause: The value of the threadprivate variable is not being copied
from the primary thread to the other threads within a parallel region.
As a result it tries to access a null pointer inside a parallel region
which causes segfault.

Fix: When allocatables used with copyin clause need to ensure that, on
entry to any parallel region each thread’s copy of a variable will
acquire the allocation status of the primary thread, before copying the
value of a threadprivate variable of the primary thread to the
threadprivate variable of each other member of the team.
When `try_table`'s catch clause's destination has a return type, as in
the case of catch with a concrete tag, catch_ref, and catch_all_ref. For
example:
```wasm
block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
end_block
... use exnref ...
```

This code is not valid because the block's body type is not exnref. So
we add an unreachable after the 'end_try_table' to make the code valid
here:
```wasm
block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
  unreachable                    ;; Newly added
end_block
```
Because 'unreachable' is a terminator we also need to split the BB.

---

We need to handle the same thing for unwind mismatch handling. In the
code below, we create a "trampoline BB" that will be the destination for
the nested `try_table`~`end_try_table` added to fix a unwind mismatch:
```wasm
try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table
```
While the `block` added for the trampoline BB has the return type
`exnref`, its body, which contains the nested `try_table` and other
code, wouldn't have the `exnref` return type. Most times it didn't
become a problem because the block's body ended with something like `br`
or `return`, but that may not always be the case, especially when there
is a loop. So we add an `unreachable` to make the code valid here too:
```wasm
try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
    unreachable                  ;; Newly added
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table
```
In this case we just append the `unreachable` at the end of the layout
predecessor BB. (This was tricky to do in the first (non-mismatch) case
because there `end_try_table` and `end_block` were added in the
beginning of an EH pad in `placeTryTableMarker` and moving
`end_try_table` and the new `unreachable` to the previous BB caused
other problems.)

---

This adds many `unreaachable`s to the output, but this adds
`unreachable` to only a few places to see if this is working. The
FileCheck lines in `exception.ll` and `cfg-stackify-eh.ll` are already
heavily redacted to only leave important control-flow instructions, so I
don't think it's worth adding `unreachable`s everywhere.
Resubmit, previously PR has compilation issues.
…24041)

`X86FrameLowering::emitSPUpdate()` assumes that 64-bit targets use a
64-bit stack pointer, but that's not true on x32.
When checking the stack pointer size, we need to look at
`Uses64BitFramePtr` rather than `Is64Bit`. This avoids generating
invalid instructions like `add esp, rcx`.

For impossibly-large stack frames (4 GiB or larger with a 32-bit stack
pointer), we were also generating invalid instructions like `mov eax,
5000000000`. The inline stack probe code already had a check for that
situation; I've moved the check into `emitSPUpdate()`, so any attempt to
allocate a 4 GiB stack frame with a 32-bit stack pointer will now trap
rather than adjusting ESP by the wrong amount. This also fixes the
"can't have 32-bit 16GB stack frame" assertion, which used to be
triggerable by user code but is now correct.

To help catch situations like this in the future, I've added
`-verify-machineinstrs` to the stack clash tests that generate large
stack frames.

This fixes the expensive-checks buildbot failure caused by llvm#113219.
…lvm#123916)

ecb5ea6 tried to fix cases when LLD
links what seems to be import library header objects from MSVC. However,
the fix seems incorrect; the review at https://reviews.llvm.org/D133627
concluded that if this (treating this kind of symbol as a common symbol)
is what link.exe does, it's fine.

However, this is most probably not what link.exe does. The symbol
mentioned in the commit message of
ecb5ea6 would be a common symbol with a
size of around 3 GB; this is not what might have been intended.

That commit tried to avoid running into the error ".idata$4 should not
refer to special section 0"; that issue is fixed for a similar style of
section symbols in 4a4a8a1.

Therefore, revert ecb5ea6 and extend
the fix from 4a4a8a1 to also work for
the section symbols in MSVC generated import libraries.

The main detail about them, is that for symbols of type
IMAGE_SYM_CLASS_SECTION, the Value field is not an offset, but it is an
optional set of flags, corresponding to the Characteristics of the
section header (although it may be empty).

This is a reland of a previous version of this commit, earlier merged in
9457418 / llvm#122811. The previous version
failed tests when run with address sanitizer. The issue was that the
synthesized coff_symbol_generic object actually will be used to access a
full coff_symbol16 or coff_symbol32 struct, see
DefinedCOFF::getCOFFSymbol. Therefore, we need to make a copy of the
full size of either of them.
Now that we have a dedicated abstraction for string tables, switch the
option parser library's string table over to it rather than using a raw
`const char*`. Also try to use the `StringTable::Offset` type rather
than a raw `unsigned` where we can to avoid accidental increments or
other issues.

This is based on review feedback for the initial switch of options to a
string table. Happy to tweak or adjust if desired here.
This is part of
https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792.

OpAsmOpInterface controls the SSA Name/Block Name and Default Dialect
Prefix. This PR adds the usage of them by existing examples in MLIR.
…lvm#123934)

Once we get to SelectionDAG the IR should not be changing anymore, so we
can use BatchAAResults rather than AAResults to cache AA queries.

This should be a NFC change for targets that enable AA during codegen
(such as AArch64), but also give a nice compile-time improvement in some
cases. See:
llvm#123787 (comment)

Note: This follows Nikita's suggestion on llvm#123787.
HerrCai0907 and others added 24 commits January 24, 2025 19:29
…#123958)

`TimerGroup` don't need to use as field of `ClangTidyProfiling`.
We can construct it local during destructing.
…23454)

skip header file before register AST Matchers
it can avoid to matcher lots of ast node when lint header file
Add IDs for bit width that cover multiple LLTs: B32 B64 etc.
"Predicate" wrapper class for bool predicate functions used to
write pretty rules. Predicates can be combined using &&, || and !.
Lowering for splitting and widening loads.
Write rules for loads to not change existing mir tests from old
regbankselect.
…l coroutine clones (llvm#118628)

Summary:
CoroCloner, by calling into CloneFunctionInto, does a lot of repeated
work priming DIFinder and building a list of common module-level debug
info metadata. For programs compiled with full debug info this can get
very expensive.

This diff builds the data once and shares it between all clones.

Anecdata for a sample cpp source file compiled with full debug info:

|                 | Baseline | IdentityMD set | Prebuilt CommonDI (cur.) |
|-----------------|----------|----------------|--------------------------|
| CoroSplitPass   | 306ms    | 221ms          | 68ms                     |
| CoroCloner      | 101ms    | 72ms           | 0.5ms                    |
| CollectCommonDI | -        | -              | 63ms                     |
| Speed up        | 1x       | 1.4x           | 4.5x                     |

Note that CollectCommonDebugInfo happens once *per coroutine* rather than per clone.

Test Plan:
ninja check-llvm-unit
ninja check-llvm

Compiled a sample internal source file, checked time trace output for scope timings.
…2866)

Change existing code for G_PHI to match what LLVM-IR version is doing
via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI
since it may appear with an undef operand and getVRegDef can fail.
Most notably this improves number of values that can be allocated
to sgpr in AMDGPURegBankSelect.
Common case here are phis that appear in structurize-cfg lowering
for cycles with multiple exits:
Undef incoming value is coming from block that reached cycle exit
condition, if other incoming is uniform keep the phi uniform despite
the fact it is joining values from pair of blocks that are entered
via divergent condition branch.
This is the behavior expected by DWARF. It also requires some fixups to
algorithms which were storing the addresses of some objects (Blocks and
Variables) relative to the beginning of the function.

There are plenty of things that still don't work in this setups, but
this change is sufficient for the expression evaluator to correctly
recognize the entry point of a function in this case.
…llvm#123745)

Add the following workflows:

- `fullbuild` on aarch64 ubuntu
- `overlay` on windows 2025
- `overlay` on aarch64 ubuntu

`ccache` variant is used on `aarch64` due to
hendrikmuhs/ccache-action#279
…ot (llvm#121463)

In function handleMFLOSlot, we may get a variable LastInstInFunction
with a value of true from function getNextMachineInstr and IInSlot may
be null which would trigger an assert.
So we need to skip this case.

Fix llvm#118223.
With the removal of mlir-vulkan-runner (as part of llvm#73457) in
e7e3c45, mlir-cpu-runner is now the
only runner for all CPU and GPU targets, and the "cpu" name has been
misleading for some time already. This commit renames it to mlir-runner.
[AutoBump] Merge with fixes of eb206e9 (Jan 24) (21)
[AutoBump] Merge with b4e81fd (Jan 24) (20)
[AutoBump] Merge with fixes of 8388040 (Jan 23) (19)
[AutoBump] Merge with 08195f3 (Jan 23) (18)
[AutoBump] Merge with fixes of 7e622b6 (Jan 22) (17)
Base automatically changed from bump_to_bd56950b to bump_to_67b9d3ff June 26, 2025 08:12
[AutoBump] Merge with 3057d0f (Jan 22) (16)
[AutoBump] Merge with fixes of 7986e0c (Jan 22) (15)
Base automatically changed from bump_to_67b9d3ff to bump_to_f4943464 June 26, 2025 08:14
@jorickert jorickert merged commit 6373ac4 into bump_to_f4943464 Jun 26, 2025
18 checks passed
@jorickert jorickert deleted the bump_to_729f958c branch June 26, 2025 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.