forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 5
[AutoBump] Merge with fixes of 729f958c (Jan 22) (14) #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…lvm#123701) Now a `emitc.switch` with argument of `emitc.expression` wouldn't emit its argument to cpp. This patch fix it.
…e constraints" (llvm#111143) Closes llvm#98592
…ng it LLVM dialect (llvm#123840) With these changes, CUF atomic operations are handled as cudadevice intrinsics and are converted straight to the LLVM dialect with the `llvm.atomicrw` operation. I am only submitting changes for `atomicadd` to gather feedback. If we are to proceed with these changes I will add support for all other applicable atomic operations following this pattern.
Increases minimum CAS size from 16 bit to 32 bit, for better SASS codegen. When atomics are emulated using atom.cas.b16, the SASS generated includes 2 (nested) emulation loops. When emulated using an atom.cas.b32 loop, the SASS too has a single emulation loop. Using 32 bit CAS thus results in better codegen.
Widely supported but missing on AIX https://www.austingroupbugs.net/view.php?id=993
When looking at the slowest lit tests, I'm seeing these four tests take two to eight minutes. Test coverage on Linux should be sufficient for the functionality on top of it not really being useful on Windows at all. This was observed when hacking on the new premerge in a windows VM.
…vm#116771) Two options for clang -mno-scq: Disable sc.q instruction. -mscq: Enable sc.q instruction. The default is -mno-scq.
…lvm#123881) This extension adds eight 48 bit load store instructions. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support. --------- Co-authored-by: Harsh Chandel <[email protected]>
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744 proposes to partition static data sections. This patch introduces a codegen pass. This patch produces jump table hotness in the in-memory states (machine jump table info and entries). Target-lowering and asm-printer consume the states and produce `.hot` section suffix. The follow up PR llvm#122215 implements such changes. --------- Co-authored-by: Ellis Hoag <[email protected]>
…#118656) This patch is an extension to llvm#115128. After profiling LLVM test-suite, I see a lot of loop nest of depth more than `MaxLoopNestDepth` which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost. Please see 'bound-max-depth' branch on compile-time-tracker.
Fixes llvm#113191 Issue: [flang][OpenMP] Runtime segfault when an allocatable variable is used with copyin Rootcause: The value of the threadprivate variable is not being copied from the primary thread to the other threads within a parallel region. As a result it tries to access a null pointer inside a parallel region which causes segfault. Fix: When allocatables used with copyin clause need to ensure that, on entry to any parallel region each thread’s copy of a variable will acquire the allocation status of the primary thread, before copying the value of a threadprivate variable of the primary thread to the threadprivate variable of each other member of the team.
When `try_table`'s catch clause's destination has a return type, as in the case of catch with a concrete tag, catch_ref, and catch_all_ref. For example: ```wasm block exnref try_table (catch_all_ref 0) ... end_try_table end_block ... use exnref ... ``` This code is not valid because the block's body type is not exnref. So we add an unreachable after the 'end_try_table' to make the code valid here: ```wasm block exnref try_table (catch_all_ref 0) ... end_try_table unreachable ;; Newly added end_block ``` Because 'unreachable' is a terminator we also need to split the BB. --- We need to handle the same thing for unwind mismatch handling. In the code below, we create a "trampoline BB" that will be the destination for the nested `try_table`~`end_try_table` added to fix a unwind mismatch: ```wasm try_table (catch ... ) block exnref ... try_table (catch_all_ref N) some code end_try_table ... end_block ;; Trampoline BB throw_ref end_try_table ``` While the `block` added for the trampoline BB has the return type `exnref`, its body, which contains the nested `try_table` and other code, wouldn't have the `exnref` return type. Most times it didn't become a problem because the block's body ended with something like `br` or `return`, but that may not always be the case, especially when there is a loop. So we add an `unreachable` to make the code valid here too: ```wasm try_table (catch ... ) block exnref ... try_table (catch_all_ref N) some code end_try_table ... unreachable ;; Newly added end_block ;; Trampoline BB throw_ref end_try_table ``` In this case we just append the `unreachable` at the end of the layout predecessor BB. (This was tricky to do in the first (non-mismatch) case because there `end_try_table` and `end_block` were added in the beginning of an EH pad in `placeTryTableMarker` and moving `end_try_table` and the new `unreachable` to the previous BB caused other problems.) --- This adds many `unreaachable`s to the output, but this adds `unreachable` to only a few places to see if this is working. The FileCheck lines in `exception.ll` and `cfg-stackify-eh.ll` are already heavily redacted to only leave important control-flow instructions, so I don't think it's worth adding `unreachable`s everywhere.
Resubmit, previously PR has compilation issues.
…24041) `X86FrameLowering::emitSPUpdate()` assumes that 64-bit targets use a 64-bit stack pointer, but that's not true on x32. When checking the stack pointer size, we need to look at `Uses64BitFramePtr` rather than `Is64Bit`. This avoids generating invalid instructions like `add esp, rcx`. For impossibly-large stack frames (4 GiB or larger with a 32-bit stack pointer), we were also generating invalid instructions like `mov eax, 5000000000`. The inline stack probe code already had a check for that situation; I've moved the check into `emitSPUpdate()`, so any attempt to allocate a 4 GiB stack frame with a 32-bit stack pointer will now trap rather than adjusting ESP by the wrong amount. This also fixes the "can't have 32-bit 16GB stack frame" assertion, which used to be triggerable by user code but is now correct. To help catch situations like this in the future, I've added `-verify-machineinstrs` to the stack clash tests that generate large stack frames. This fixes the expensive-checks buildbot failure caused by llvm#113219.
…lvm#123916) ecb5ea6 tried to fix cases when LLD links what seems to be import library header objects from MSVC. However, the fix seems incorrect; the review at https://reviews.llvm.org/D133627 concluded that if this (treating this kind of symbol as a common symbol) is what link.exe does, it's fine. However, this is most probably not what link.exe does. The symbol mentioned in the commit message of ecb5ea6 would be a common symbol with a size of around 3 GB; this is not what might have been intended. That commit tried to avoid running into the error ".idata$4 should not refer to special section 0"; that issue is fixed for a similar style of section symbols in 4a4a8a1. Therefore, revert ecb5ea6 and extend the fix from 4a4a8a1 to also work for the section symbols in MSVC generated import libraries. The main detail about them, is that for symbols of type IMAGE_SYM_CLASS_SECTION, the Value field is not an offset, but it is an optional set of flags, corresponding to the Characteristics of the section header (although it may be empty). This is a reland of a previous version of this commit, earlier merged in 9457418 / llvm#122811. The previous version failed tests when run with address sanitizer. The issue was that the synthesized coff_symbol_generic object actually will be used to access a full coff_symbol16 or coff_symbol32 struct, see DefinedCOFF::getCOFFSymbol. Therefore, we need to make a copy of the full size of either of them.
Now that we have a dedicated abstraction for string tables, switch the option parser library's string table over to it rather than using a raw `const char*`. Also try to use the `StringTable::Offset` type rather than a raw `unsigned` where we can to avoid accidental increments or other issues. This is based on review feedback for the initial switch of options to a string table. Happy to tweak or adjust if desired here.
This is part of https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792. OpAsmOpInterface controls the SSA Name/Block Name and Default Dialect Prefix. This PR adds the usage of them by existing examples in MLIR.
…lvm#123934) Once we get to SelectionDAG the IR should not be changing anymore, so we can use BatchAAResults rather than AAResults to cache AA queries. This should be a NFC change for targets that enable AA during codegen (such as AArch64), but also give a nice compile-time improvement in some cases. See: llvm#123787 (comment) Note: This follows Nikita's suggestion on llvm#123787.
…#123958) `TimerGroup` don't need to use as field of `ClangTidyProfiling`. We can construct it local during destructing.
…23454) skip header file before register AST Matchers it can avoid to matcher lots of ast node when lint header file
Add IDs for bit width that cover multiple LLTs: B32 B64 etc. "Predicate" wrapper class for bool predicate functions used to write pretty rules. Predicates can be combined using &&, || and !. Lowering for splitting and widening loads. Write rules for loads to not change existing mir tests from old regbankselect.
…l coroutine clones (llvm#118628) Summary: CoroCloner, by calling into CloneFunctionInto, does a lot of repeated work priming DIFinder and building a list of common module-level debug info metadata. For programs compiled with full debug info this can get very expensive. This diff builds the data once and shares it between all clones. Anecdata for a sample cpp source file compiled with full debug info: | | Baseline | IdentityMD set | Prebuilt CommonDI (cur.) | |-----------------|----------|----------------|--------------------------| | CoroSplitPass | 306ms | 221ms | 68ms | | CoroCloner | 101ms | 72ms | 0.5ms | | CollectCommonDI | - | - | 63ms | | Speed up | 1x | 1.4x | 4.5x | Note that CollectCommonDebugInfo happens once *per coroutine* rather than per clone. Test Plan: ninja check-llvm-unit ninja check-llvm Compiled a sample internal source file, checked time trace output for scope timings.
…2866) Change existing code for G_PHI to match what LLVM-IR version is doing via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI since it may appear with an undef operand and getVRegDef can fail. Most notably this improves number of values that can be allocated to sgpr in AMDGPURegBankSelect. Common case here are phis that appear in structurize-cfg lowering for cycles with multiple exits: Undef incoming value is coming from block that reached cycle exit condition, if other incoming is uniform keep the phi uniform despite the fact it is joining values from pair of blocks that are entered via divergent condition branch.
This is the behavior expected by DWARF. It also requires some fixups to algorithms which were storing the addresses of some objects (Blocks and Variables) relative to the beginning of the function. There are plenty of things that still don't work in this setups, but this change is sufficient for the expression evaluator to correctly recognize the entry point of a function in this case.
…llvm#123745) Add the following workflows: - `fullbuild` on aarch64 ubuntu - `overlay` on windows 2025 - `overlay` on aarch64 ubuntu `ccache` variant is used on `aarch64` due to hendrikmuhs/ccache-action#279
…ot (llvm#121463) In function handleMFLOSlot, we may get a variable LastInstInFunction with a value of true from function getNextMachineInstr and IInSlot may be null which would trigger an assert. So we need to skip this case. Fix llvm#118223.
With the removal of mlir-vulkan-runner (as part of llvm#73457) in e7e3c45, mlir-cpu-runner is now the only runner for all CPU and GPU targets, and the "cpu" name has been misleading for some time already. This commit renames it to mlir-runner.
[AutoBump] Merge with fixes of eb206e9 (Jan 24) (21)
[AutoBump] Merge with b4e81fd (Jan 24) (20)
[AutoBump] Merge with fixes of 8388040 (Jan 23) (19)
[AutoBump] Merge with 08195f3 (Jan 23) (18)
[AutoBump] Merge with fixes of 7e622b6 (Jan 22) (17)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.