Skip to content

[AutoBump] Merge with fixes of 977d744b (Jan 20) (10) (May need downstream changes) #548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 102 commits into from
Jun 26, 2025

Conversation

jorickert
Copy link

No description provided.

kchibisov and others added 30 commits January 20, 2025 14:48
…3036)

This is a follow up to 68a3908 (func: Set default dialect to
'emitc'), but for other instructions with blocks to make it look
consistent.
)

In https://reviews.llvm.org/D136765 / https://reviews.llvm.org/D144155,
the asan annotations for `std::vector` were modified to unpoison freed
backing memory on destruction, instead of leaving it poisoned. However,
calling `__clear()` instead of `clear()` skips informing the asan runtime
of this decrease in the accessible container size, which breaks the
invariant that the value of `old_mid` should match the value of `new_mid`
from the previous call to `__sanitizer_annotate_contiguous_container`, which
can trip the sanity checks for the partial poison between [d1, d2) and the
container redzone between [d2, c), if enabled. To fix this, ensure that
`clear()` is called instead, as is already done by `__vdeallocate()`.
Also remove `__clear()`, since it is no longer called.
The function getPartialReductionCost is already quite large and
is likely to grow in size as we add support for more cases in
future. Therefore, I think it's best to move this into the cpp
file.
llvm::FixedPoint is not trivially copyable.
…lvm#123611)

Summary:
We used this globally scoped `ext_no_call_asm` as a sort of hack around
the compiler that allowed the attributor to optimize out inline assembly
calls to PTX instructions. Quite some time ago I got rid of every inline
assembly call and replaced it with a builitin, so this can just be
deleted.

Furthermore, I use the `[[omp::assume]]` attribute directly for the
aligned barrier usage. This prints an unknown assumption warning (even
though it isn't) so I'm just silencing that for now until I fix it
later.

---------

Co-authored-by: Michael Kruse <[email protected]>
After a30e50f, AMDGPUAAResult is being called in more situations where
BasicAA isn't sure. This exposed some regressions where NoAlias is being
incorrectly returned for two identical pointers.

The fix is to check the underlying objects for equality before returning
NoAlias.
… option private-headers (llvm#121226)

[llvm-objdump] Print out xcoff load section of xcoff object file with
option private-headers
…#123076)

1. Remove `%c0 = arith.constant 0 : index` from testt functions. This
   extra Op is not needed (the index can be passed as an argument), so
   this is just noise.
2. Replaced `%cst_0` with `%pad` to communicate what the underlying SSA
   value is intended for.
3. Unified some comments.
Summary:
This is spelled `ompx_aligned_barrier` when used directly, but wasn't
included in the list of known assumptions. Fix that so now th test
works.
)

For the following variable

DenseMap<const Instruction *, std::pair<PartialReductionChain,
unsigned>>
  ScaledReductionExitInstrs;

we never actually need the PartialReductionChain when using the map.
I've cleaned this up so that this now becomes

  DenseMap<const Instruction *, unsigned> ScaledReductionMap;
The indices of SGPR register pairs need to be 2-aligned and SGPR
quadruplets need to be 4-aligned. With this patch, we report an error
when inline asm register constraints specify a misaligned register
index, instead of silently dropping the specified index.

Fixes llvm#123208

---------

Co-authored-by: Matt Arsenault <[email protected]>
…es (llvm#121079)

This is a follow-up for llvm#119110
and a fix for llvm#118450

RemoveDeadValues used to delete Values and analyzing the IR at the same
time, because of that, `isMemoryEffectFree` got invalid IR with
half-deleted linalg.generic operation. This PR separates analysis and
cleanup to prevent such situation.

Thank you!

---------

Co-authored-by: Renat Idrisov <[email protected]>
Co-authored-by: Andrzej Warzyński <[email protected]>
…us. (llvm#122807)

For similar reasons for fixed-width being prefered to scalable for
Neoverse V2, this patch enables the UseFixedOverScalableIfEqualCost
feature when using -mcpu=cortex-x2, x3, x4 and x925 that are similar to
Neoverse V2.
…_mul(x, C0)) (llvm#123468)

This PR introduces the following transformations:

- If C0 is not 0:  
umax(nuw_shl(x, C0), x + 1) -> x == 0 ? 1 : nuw_shl(x, C0)  
- If C0 is not 0 or 1:  
umax(nuw_mul(x, C0), x + 1) -> x == 0 ? 1 : nuw_mul(x, C0)  

Fixes llvm#122388.
Alive2 proof: https://alive2.llvm.org/ce/z/rkp_8U
…m#123617)

In accordance with llvm#123569

In order to keep the patch at reasonable size, this PR only covers for
the llvm subproject, unittests excluded.
…auses (llvm#121356)

No functional change.

(Also, tried to filter out all ALLOCATOR modifiers, but that makes some
other tests fail).
…tom flags (llvm#123577)

The test was failing in the case where a `multilib.yaml` file was
present in the installation. This is because the presence of a multilib
YAML file leads to the diagnosing of validity of the multilib custom
flags.

This patch fixes the test by creating a new YAML file with multilib
custom flags to be used by the test.
All targets build `__clc_mad` -- even SPIR-V targets -- since it
compiles to the optimal `llvm.fmuladd` intrinsic. There is no change to
the bytecode generated for non-SPIR-V targets.

The `mix` builtin, which is implemented as a wrapper around `mad`, is
left as an OpenCL-layer wrapper of `__clc_mad`. I don't know if it's
worth having a specific CLC version of `mix`.

The changes to the other CLC files/functions are moving uses of `mad` to
`__clc_mad`, and reformatting. There is an additional instance of
`trunc` becoming `__clc_trunc`, which was missed before.
…m#123629)

These instructions return a 64-bit result and a 1-bit carry, unlike
smul_lohi and umul_lohi which return a pair of 32-bit results.

This does not appear to make any difference in practice because the DAG
types are not used for anything before these nodes are converted to
MachineInstrs.
…123619)

But keep evaluating. This is what the current interpreter does as well.
Failed to notice them when landing that patch - apologies!
In gcc-15, explicit includes of `<cstdint>` are required when fixed-size
integers are used. In this file, this include only happened as a side
effect of including SmallVector.h

Although llvm compiles fine, the root-project would benefit from
explicitly including it here, so we can backport the patch.

Maybe interesting for @hahnjo and @vgvassilev
)

Currently we're using quite different internal names for the
`std::invoke` family of type traits. This adds a layer around the
current implementation to make it easier to understand when it is used
and makes it easier to define multiple implementations of it.
Correct CSE in SelectionDAG can make DAG combining more effective and
reduces the size of the DAG and thus should improve compile time.
ilogb libcall was not being constant folded correctly. This patch adds 
ilogb case in isMathLibCallNoop with correct error condition.

Fixes llvm#101873
Before this change InstrSet in SPIRVEmitIntrinsics was uninitialized
before running runOnFunction. This change adds a new function
getPreferredInstructionSet in SPIRVSubtarget.
…m#123477)

Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.

I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.
jhuber6 and others added 23 commits January 20, 2025 21:58
…122786)

Summary:
Right now we just default to device for each type, and mix an ad-hoc
scope with the one used by the compiler's builtins. Unify this can make
each version take the scope optionally.

For @ronlieb, this will remove the need for `add_system` in the fork as
well as the extra `cas` with system scope, just pass `system`.
It appears that omp_lib is not correctly (or maybe not at all?) found
from the build directory. This made a few buildbots break after
[PR#121356](llvm#121356) landed.
This is a workaround to unblock the buildbots.

https://lab.llvm.org/staging/#/builders/130/builds/12654
https://lab.llvm.org/buildbot/#/builders/140/builds/15102
https://lab.llvm.org/staging/#/builders/105/builds/13855
)

In the current state of the code, the transform computes entries for the
dependency matrix until `MaxMemInstrCount` which is 100. After 99th
entry, it terminates and thus overall wastes compile-time.

It would be nice if we can compute total number of entries upfront and
early exit if the number of entries > 100. However, computing the number
of entries is not always possible as it depends on two factors:
1. Number of load-store pairs in a loop.
2. Number of common loop levels for each of the pair.

This patch constrains the whole computation on the number of loads and
stores instructions in the loop.

In another approach, I experimented with computing 1 and constraining
the number of pairs, but that did not lead to any additional benefit in
terms of compile time. However, when other issues are fixed, I can
revisit this approach.
)

This is preparation for extending ReachingDefAnalysis to stack slots. We
should use `Register`, not `MCRegister` for something that can be a
physical register or a stack slot.
…CallOperatorInstantiationRAII (llvm#123687)

Now that the RAII object has a dedicate logic for handling nested
lambdas, where the inner lambda could reference any
captures/variables/parameters from the outer lambda, we can shift the
responsibility for managing lambdas away from SetupConstraintScope().

I think this also makes the structure clearer.

Fixes llvm#123441
…form (llvm#123575)

Enable ELFNixPlatform support for loongarch64. These are few simple
changes, but it allows us to use the orc runtime in ELF/LoongArch64
backend.

This change adds test cases targeting the LoongArch64 Linux platform to
the ORC runtime integration test suite. Since jitlink for loongarch64 is
ready for general use, and ELF-based platforms support defining multiple
static initializer table sections with differing priorities, some
relevant test cases in compiler-rt for ELFNixPlatform support can be
enabled.
…3465)

Turn free-standing `MemRefType`-related helper functions in
`BuiltinTypes.h` into member functions.
…FFLE` (llvm#123555)

This PR fixes operand order of `ILVOD.df` when lowering
`VECTOR_SHUFFLE`, the result was `<y[1], x[1]>` while it should be
`<x[1], y[1]>`.

* This PR is split from llvm#123040.
…20912)

On Windows, imported symbols must be searched with '__imp_' prefix.
Support imported global variables and imported functions.
PR llvm#122344 adds intrinsics for Bulk Async Copy
(non-tensor variants) using TMA. This patch
adds the corresponding NVVM Dialect Ops.

lit tests are added to verify the lowering to all
variants of the intrinsics.

Signed-off-by: Durgadoss R <[email protected]>
…123344)

Commit 1eed469 added logic to
reassociate a (add (add x y) -c) operand to a CSEL instruction with a
comparison involving x and c (or a similar constant) in order to obtain
a common (SUBS x c) instruction.
    
This commit extends this logic to non-constants. In this way, we also
reassociate a (sub (add x y) z) operand of a CSEL instruction to
(add (sub x z) y) if the CSEL compares x and z, for example.
    
Alive proof: https://alive2.llvm.org/ce/z/SEVpR
…vm#120363)

This started out as trying to combine bf16 fpround to BFCVT2
instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in
favour of generating fpround instructions directly. This simplifies the
patterns and can lead to other optimizations. The BFCVT2 instruction is
adjusted to makes sure the types are valid, and a bfcvt2 is now
generated in more place. The old intrinsics are auto-upgraded to fptrunc
instructions too.
This will be sent by Arm's Guarded Control Stack extension when an
invalid return is executed.

The signal does have an address we could show, but it's the PC at which
the fault occured. The debugger has plenty of ways to show you that
already, so I've left it out.

```
(lldb) c
Process 460 resuming
Process 460 stopped
* thread #1, name = 'test', stop reason = signal SIGSEGV: control protection fault
    frame #0: 0x0000000000400784 test`main at main.c:57:1
   54  	  afunc();
   55  	  printf("return from main\n");
   56  	  return 0;
-> 57  	}
(lldb) dis
<...>
->  0x400784 <+100>: ret
```

The new test case generates the signal by corrupting the link register
then attempting to return. This will work whether we manually enable GCS
or the C library does it for us.

(in the former case you could just return from main and it would fault)
…rray is indexed by const evaluatable expressions (llvm#119340)"" (llvm#123713)

This reverts commit 7dd34ba.

Fixed the assertion violation reported by
7dd34ba

Co-authored-by: MalavikaSamak <[email protected]>
@jorickert jorickert changed the title [AutoBump] Merge with fixes of 977d744b (Jan 20) (10) [AutoBump] Merge with fixes of 977d744b (Jan 20) (10) (May need downstream changes) May 20, 2025
Base automatically changed from bump_to_57466db7 to bump_to_d70f54f2 June 26, 2025 08:12
Base automatically changed from bump_to_d70f54f2 to bump_to_f4943464 June 26, 2025 08:12
[AutoBump] Merge with 2a8c12b (Jan 21) (11)
@jorickert jorickert merged commit 815db3c into bump_to_f4943464 Jun 26, 2025
24 checks passed
@jorickert jorickert deleted the bump_to_977d744b branch June 26, 2025 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.