[AutoBump] Merge with fixes of 729f958c (Jan 22) (14) #552

jorickert · 2025-05-21T06:54:21Z

No description provided.

…lvm#123701) Now a `emitc.switch` with argument of `emitc.expression` wouldn't emit its argument to cpp. This patch fix it.

…e constraints" (llvm#111143) Closes llvm#98592

…ng it LLVM dialect (llvm#123840) With these changes, CUF atomic operations are handled as cudadevice intrinsics and are converted straight to the LLVM dialect with the `llvm.atomicrw` operation. I am only submitting changes for `atomicadd` to gather feedback. If we are to proceed with these changes I will add support for all other applicable atomic operations following this pattern.

Increases minimum CAS size from 16 bit to 32 bit, for better SASS codegen. When atomics are emulated using atom.cas.b16, the SASS generated includes 2 (nested) emulation loops. When emulated using an atom.cas.b32 loop, the SASS too has a single emulation loop. Using 32 bit CAS thus results in better codegen.

Widely supported but missing on AIX https://www.austingroupbugs.net/view.php?id=993

When looking at the slowest lit tests, I'm seeing these four tests take two to eight minutes. Test coverage on Linux should be sufficient for the functionality on top of it not really being useful on Windows at all. This was observed when hacking on the new premerge in a windows VM.

…vm#116771) Two options for clang -mno-scq: Disable sc.q instruction. -mscq: Enable sc.q instruction. The default is -mno-scq.

…llvm#97930)

…lvm#123881) This extension adds eight 48 bit load store instructions. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support. --------- Co-authored-by: Harsh Chandel <[email protected]>

https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744 proposes to partition static data sections. This patch introduces a codegen pass. This patch produces jump table hotness in the in-memory states (machine jump table info and entries). Target-lowering and asm-printer consume the states and produce `.hot` section suffix. The follow up PR llvm#122215 implements such changes. --------- Co-authored-by: Ellis Hoag <[email protected]>

…#118656) This patch is an extension to llvm#115128. After profiling LLVM test-suite, I see a lot of loop nest of depth more than `MaxLoopNestDepth` which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost. Please see 'bound-max-depth' branch on compile-time-tracker.

Fixes llvm#113191 Issue: [flang][OpenMP] Runtime segfault when an allocatable variable is used with copyin Rootcause: The value of the threadprivate variable is not being copied from the primary thread to the other threads within a parallel region. As a result it tries to access a null pointer inside a parallel region which causes segfault. Fix: When allocatables used with copyin clause need to ensure that, on entry to any parallel region each thread’s copy of a variable will acquire the allocation status of the primary thread, before copying the value of a threadprivate variable of the primary thread to the threadprivate variable of each other member of the team.

When `try_table`'s catch clause's destination has a return type, as in the case of catch with a concrete tag, catch_ref, and catch_all_ref. For example: ```wasm block exnref try_table (catch_all_ref 0) ... end_try_table end_block ... use exnref ... ``` This code is not valid because the block's body type is not exnref. So we add an unreachable after the 'end_try_table' to make the code valid here: ```wasm block exnref try_table (catch_all_ref 0) ... end_try_table unreachable ;; Newly added end_block ``` Because 'unreachable' is a terminator we also need to split the BB. --- We need to handle the same thing for unwind mismatch handling. In the code below, we create a "trampoline BB" that will be the destination for the nested `try_table`~`end_try_table` added to fix a unwind mismatch: ```wasm try_table (catch ... ) block exnref ... try_table (catch_all_ref N) some code end_try_table ... end_block ;; Trampoline BB throw_ref end_try_table ``` While the `block` added for the trampoline BB has the return type `exnref`, its body, which contains the nested `try_table` and other code, wouldn't have the `exnref` return type. Most times it didn't become a problem because the block's body ended with something like `br` or `return`, but that may not always be the case, especially when there is a loop. So we add an `unreachable` to make the code valid here too: ```wasm try_table (catch ... ) block exnref ... try_table (catch_all_ref N) some code end_try_table ... unreachable ;; Newly added end_block ;; Trampoline BB throw_ref end_try_table ``` In this case we just append the `unreachable` at the end of the layout predecessor BB. (This was tricky to do in the first (non-mismatch) case because there `end_try_table` and `end_block` were added in the beginning of an EH pad in `placeTryTableMarker` and moving `end_try_table` and the new `unreachable` to the previous BB caused other problems.) --- This adds many `unreaachable`s to the output, but this adds `unreachable` to only a few places to see if this is working. The FileCheck lines in `exception.ll` and `cfg-stackify-eh.ll` are already heavily redacted to only leave important control-flow instructions, so I don't think it's worth adding `unreachable`s everywhere.

…ypes (llvm#123818) Fixes clangd/clangd#1249

Resubmit, previously PR has compilation issues.

…24041) `X86FrameLowering::emitSPUpdate()` assumes that 64-bit targets use a 64-bit stack pointer, but that's not true on x32. When checking the stack pointer size, we need to look at `Uses64BitFramePtr` rather than `Is64Bit`. This avoids generating invalid instructions like `add esp, rcx`. For impossibly-large stack frames (4 GiB or larger with a 32-bit stack pointer), we were also generating invalid instructions like `mov eax, 5000000000`. The inline stack probe code already had a check for that situation; I've moved the check into `emitSPUpdate()`, so any attempt to allocate a 4 GiB stack frame with a 32-bit stack pointer will now trap rather than adjusting ESP by the wrong amount. This also fixes the "can't have 32-bit 16GB stack frame" assertion, which used to be triggerable by user code but is now correct. To help catch situations like this in the future, I've added `-verify-machineinstrs` to the stack clash tests that generate large stack frames. This fixes the expensive-checks buildbot failure caused by llvm#113219.

…lvm#123916) ecb5ea6 tried to fix cases when LLD links what seems to be import library header objects from MSVC. However, the fix seems incorrect; the review at https://reviews.llvm.org/D133627 concluded that if this (treating this kind of symbol as a common symbol) is what link.exe does, it's fine. However, this is most probably not what link.exe does. The symbol mentioned in the commit message of ecb5ea6 would be a common symbol with a size of around 3 GB; this is not what might have been intended. That commit tried to avoid running into the error ".idata$4 should not refer to special section 0"; that issue is fixed for a similar style of section symbols in 4a4a8a1. Therefore, revert ecb5ea6 and extend the fix from 4a4a8a1 to also work for the section symbols in MSVC generated import libraries. The main detail about them, is that for symbols of type IMAGE_SYM_CLASS_SECTION, the Value field is not an offset, but it is an optional set of flags, corresponding to the Characteristics of the section header (although it may be empty). This is a reland of a previous version of this commit, earlier merged in 9457418 / llvm#122811. The previous version failed tests when run with address sanitizer. The issue was that the synthesized coff_symbol_generic object actually will be used to access a full coff_symbol16 or coff_symbol32 struct, see DefinedCOFF::getCOFFSymbol. Therefore, we need to make a copy of the full size of either of them.

Now that we have a dedicated abstraction for string tables, switch the option parser library's string table over to it rather than using a raw `const char*`. Also try to use the `StringTable::Offset` type rather than a raw `unsigned` where we can to avoid accidental increments or other issues. This is based on review feedback for the initial switch of options to a string table. Happy to tweak or adjust if desired here.

This is part of https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792. OpAsmOpInterface controls the SSA Name/Block Name and Default Dialect Prefix. This PR adds the usage of them by existing examples in MLIR.

…lvm#123934) Once we get to SelectionDAG the IR should not be changing anymore, so we can use BatchAAResults rather than AAResults to cache AA queries. This should be a NFC change for targets that enable AA during codegen (such as AArch64), but also give a nice compile-time improvement in some cases. See: llvm#123787 (comment) Note: This follows Nikita's suggestion on llvm#123787.

…#123958) `TimerGroup` don't need to use as field of `ClangTidyProfiling`. We can construct it local during destructing.

…23454) skip header file before register AST Matchers it can avoid to matcher lots of ast node when lint header file

Add IDs for bit width that cover multiple LLTs: B32 B64 etc. "Predicate" wrapper class for bool predicate functions used to write pretty rules. Predicates can be combined using &&, || and !. Lowering for splitting and widening loads. Write rules for loads to not change existing mir tests from old regbankselect.

…l coroutine clones (llvm#118628) Summary: CoroCloner, by calling into CloneFunctionInto, does a lot of repeated work priming DIFinder and building a list of common module-level debug info metadata. For programs compiled with full debug info this can get very expensive. This diff builds the data once and shares it between all clones. Anecdata for a sample cpp source file compiled with full debug info: | | Baseline | IdentityMD set | Prebuilt CommonDI (cur.) | |-----------------|----------|----------------|--------------------------| | CoroSplitPass | 306ms | 221ms | 68ms | | CoroCloner | 101ms | 72ms | 0.5ms | | CollectCommonDI | - | - | 63ms | | Speed up | 1x | 1.4x | 4.5x | Note that CollectCommonDebugInfo happens once *per coroutine* rather than per clone. Test Plan: ninja check-llvm-unit ninja check-llvm Compiled a sample internal source file, checked time trace output for scope timings.

…2866) Change existing code for G_PHI to match what LLVM-IR version is doing via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI since it may appear with an undef operand and getVRegDef can fail. Most notably this improves number of values that can be allocated to sgpr in AMDGPURegBankSelect. Common case here are phis that appear in structurize-cfg lowering for cycles with multiple exits: Undef incoming value is coming from block that reached cycle exit condition, if other incoming is uniform keep the phi uniform despite the fact it is joining values from pair of blocks that are entered via divergent condition branch.

This is the behavior expected by DWARF. It also requires some fixups to algorithms which were storing the addresses of some objects (Blocks and Variables) relative to the beginning of the function. There are plenty of things that still don't work in this setups, but this change is sufficient for the expression evaluator to correctly recognize the entry point of a function in this case.

…llvm#123745) Add the following workflows: - `fullbuild` on aarch64 ubuntu - `overlay` on windows 2025 - `overlay` on aarch64 ubuntu `ccache` variant is used on `aarch64` due to hendrikmuhs/ccache-action#279

…ot (llvm#121463) In function handleMFLOSlot, we may get a variable LastInstInFunction with a value of true from function getNextMachineInstr and IInSlot may be null which would trigger an assert. So we need to skip this case. Fix llvm#118223.

With the removal of mlir-vulkan-runner (as part of llvm#73457) in e7e3c45, mlir-cpu-runner is now the only runner for all CPU and GPU targets, and the "cpu" name has been misleading for some time already. This commit renames it to mlir-runner.

[AutoBump] Merge with fixes of eb206e9 (Jan 24) (21)

[AutoBump] Merge with b4e81fd (Jan 24) (20)

[AutoBump] Merge with fixes of 8388040 (Jan 23) (19)

[AutoBump] Merge with 08195f3 (Jan 23) (18)

[AutoBump] Merge with fixes of 7e622b6 (Jan 22) (17)

[AutoBump] Merge with 3057d0f (Jan 22) (16)

[AutoBump] Merge with fixes of 7986e0c (Jan 22) (15)

jacquesguan and others added 30 commits January 23, 2025 10:30

[emitc] Fix the translation switchop with argument of expressionop (l…

3ef90f8

…lvm#123701) Now a `emitc.switch` with argument of `emitc.expression` wouldn't emit its argument to cpp. This patch fix it.

[Clang] Implement CWG 2628 "Implicit deduction guides should propagat…

b46fcb9

…e constraints" (llvm#111143) Closes llvm#98592

[Signals] Exclude dladdr for AIX after llvm#123879

1c5d971

Widely supported but missing on AIX https://www.austingroupbugs.net/view.php?id=993

[LoongArch] Support sc.q instruction for 128bit cmpxchg operation (ll…

19834b4

…vm#116771) Two options for clang -mno-scq: Disable sc.q instruction. -mscq: Enable sc.q instruction. The default is -mno-scq.

[Clang] [NFC] Mark UnresolvedSetImpl's move operations as defaulted (…

0bcf34e

…llvm#97930)

[LoongArch] Summary llvm20 release notes

d80b814

[LoongArch] Summary clang20 release notes

3c7a878

[LoongArch] Update lld20 release notes

aa273fd

[gn build] Port de209fa

646f034

Specify triple for llc test

ea49d47

Temporarily disable test on Fuchsia

5d8390d

[clang][CodeComplete] Use HeuristicResolver to resolve DependentNameT…

ba17485

…ypes (llvm#123818) Fixes clangd/clangd#1249

[GISel] Add more FP opcodes to CSE (llvm#123949)

220004d

Resubmit, previously PR has compilation issues.

Android no longer supports arm < 7 (llvm#123952)

2b67ece

Remove reference to android-mips (llvm#124021)

2a51a0d

[libfuzzer] Clarify -max_len behavior on bigger files (llvm#123095)

091741a

[clang][Tooling] Prefer <atomic> for atomic_* family in C++

4b0df28

HerrCai0907 and others added 24 commits January 24, 2025 19:29

[clang-tidy][NFC] simplify TimerGroup in ClangTidyProfiling (llvm…

46a08ce

…#123958) `TimerGroup` don't need to use as field of `ClangTidyProfiling`. We can construct it local during destructing.

[clang-tidy][NFC] improve performance misc-unused-using-decls (llvm#1…

8e6d6a5

…23454) skip header file before register AST Matchers it can avoid to matcher lots of ast node when lint header file

[libc][workflow] improve ci coverage with windows-2025 and arm ubuntu (…

acc13db

…llvm#123745) Add the following workflows: - `fullbuild` on aarch64 ubuntu - `overlay` on windows 2025 - `overlay` on aarch64 ubuntu `ccache` variant is used on `aarch64` due to hendrikmuhs/ccache-action#279

[gn] port 4018317

b4e81fd

[AutoBump] Merge with fixes of 729f958 (Jan 22)

3e6b7bf

[AutoBump] Merge with fixes of 7986e0c (Jan 22)

6b2a4b3

Disable invalid test

b92605a

[AutoBump] Merge with 3057d0f (Jan 22)

cccba22

[AutoBump] Merge with fixes of 7e622b6 (Jan 22)

a28ff29

[AutoBump] Merge with 08195f3 (Jan 23)

d21dbbb

[AutoBump] Merge with fixes of 8388040 (Jan 23)

a8bfe24

[AutoBump] Merge with b4e81fd (Jan 24)

9dfe3ec

[AutoBump] Merge with fixes of eb206e9 (Jan 24)

f106a87

Merge pull request #559 from Xilinx/bump_to_eb206e9e

c48c70f

[AutoBump] Merge with fixes of eb206e9 (Jan 24) (21)

Merge pull request #558 from Xilinx/bump_to_b4e81fd1

19ec244

[AutoBump] Merge with b4e81fd (Jan 24) (20)

Merge pull request #557 from Xilinx/bump_to_8388040f

9a3a74a

[AutoBump] Merge with fixes of 8388040 (Jan 23) (19)

Merge pull request #556 from Xilinx/bump_to_08195f31

91fab1b

[AutoBump] Merge with 08195f3 (Jan 23) (18)

Merge pull request #555 from Xilinx/bump_to_7e622b61

789c55b

[AutoBump] Merge with fixes of 7e622b6 (Jan 22) (17)

Base automatically changed from bump_to_bd56950b to bump_to_67b9d3ff June 26, 2025 08:12

jorickert added 2 commits June 26, 2025 10:13

Merge pull request #554 from Xilinx/bump_to_3057d0f1

da007dd

[AutoBump] Merge with 3057d0f (Jan 22) (16)

Merge pull request #553 from Xilinx/bump_to_7986e0ca

3c0cc53

[AutoBump] Merge with fixes of 7986e0c (Jan 22) (15)

Base automatically changed from bump_to_67b9d3ff to bump_to_f4943464 June 26, 2025 08:14

jorickert merged commit 6373ac4 into bump_to_f4943464 Jun 26, 2025
18 checks passed

jorickert deleted the bump_to_729f958c branch June 26, 2025 08:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with fixes of 729f958c (Jan 22) (14) #552

[AutoBump] Merge with fixes of 729f958c (Jan 22) (14) #552

Uh oh!

jorickert commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

[AutoBump] Merge with fixes of 729f958c (Jan 22) (14) #552

[AutoBump] Merge with fixes of 729f958c (Jan 22) (14) #552

Uh oh!

Conversation

jorickert commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!