Skip to content

[pull] main from llvm:main #643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4,617 commits into
base: main
Choose a base branch
from
Open

[pull] main from llvm:main #643

wants to merge 4,617 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Jun 2, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Jun 2, 2025
preames and others added 29 commits July 8, 2025 09:41
…36546)

This commit updates the Hexagon backend to handle vxi1 call operands
Without HVX enabled. It ensures compatibility for vector types of sizes
4, 8, 16, 32, 64, and 128 x i1 when HVX is not enabled.
Follow-up to #146307

Moved MCInst storage to MCSection, enabling trivial ~MCRelaxableFragment
and eliminating the need for a fragment walk in ~MCSection.

Updated MCRelaxableFragment::getInst to construct an MCInst on demand.
Modified MCAssembler::relaxInstruction's mayNeedRelaxation to accept
opcode and operands instead of an MCInst, avoiding redundant MCInst
creation. Note that MCObjectStreamer::emitInstructionImpl calls
mayNeedRelaxation before determining the target fragment for the MCInst.

Unfortunately, we also have to encode `MCInst::Flags` to support
the EVEX prefix, e.g. `{evex} xorw $foo, %ax`

There is a small decrease in max-rss (stage1-ReleaseLTO-g (link only))
with negligible instructions:u change.
https://llvm-compile-time-tracker.com/compare.php?from=0b533f2d9f0551aaffb13dcac8e0fd0a952185b5&to=f26b57f33bc7ccae749a57dfc841de7ce2acc2ef&stat=max-rss&linkStats=on

Next: Enable MCFragment to store fixed-size data (was MCDataFragment's job)
and optional Opcode/Operands data (was MCRelaxableFragment's job),
and delete MCDataFragment/MCRelaxableFragment.
This will allow re-encoding of Data+Relax+Data+Relax sequences as
Frag+Frag. The saving should outweigh the downside of larger
MCFragment.

Pull Request: #147229
One of them operates on values, the other on shadows. It is confusing
for both of them to have the same name but only different number of
parameters.
…ourceLocation` in diagnostics (#147084)

The `SourceLocation` of a `RootSignatureToken` is incorrectly set to be
the "offset" into the concatenated string that denotes the
rootsignature. This causes an issue when the `StringLiteral` is a
multi-line expansion macro, since the offset will not account for the
characters between `StringLiteral` tokens.

This pr resolves this by retaining the `SourceLocation` information that
is kept in `StringLiteral` and then converting the offset in the
concatenated string into the proper `SourceLocation` using the
`StringLiteral::getLocationOfByte` interface. To do so, we will need to
adjust the `RootSignatureToken` to only hold its offset into the root
signature string. Then when the parser will use the token, it will need
to compute its actual `SourceLocation`.

See linked issue for more context.

For example:

```
#define DemoRootSignature \
 "CBV(b0)," \
 "RootConstants(num32BitConstants = 3, b0, invalid)"
  expected caret location ---------------^
  actual caret location ------------^
```

The caret points 5 characters early because the current offset did not
account for the characters:
```
'"' ' ' '\' ' ' '"'
 1   2   3   4   5
```

- Updates `RootSignatureParser` to retain `SourceLocation` information
by retaining the `StringLiteral` and passing the underlying `StringRef`
to the `Lexer`
- Updates `RootSignatureLexer` so that the constructed tokens only
reflect an offset into the `StringRef`
- Updates `RootSignatureParser` to directly construct its used `Lexer`
so that the `StringLiteral` is directly tied with the string used in the
`RootSignatureLexer`
- Updates `RootSignatureParser` to use
`StringLiteral::getLocationOfByte` to get the actual token location for
diagnostics
- Updates `ParseHLSLRootSignatureTest` to construct a phony
`AST`/`StringLiteral` for the test cases
- Adds a test to `RootSignature-err.hlsl` showing that the
`SourceLocation` is correctly set for diagnostics in a multi-line macro
expansion

Resolves: #146967
ptr-annotation.ll was incorrectly applying a decoration to an unsuitable
target.

The patch changes the decoration to a valid one for the test.
…arget triple on AIX (#147488)

PR #145685 introduced constructor overload ambiguity in the Triple
class, causing `updateTripleOSVersion()` to construct Triple objects
with `unknown` instead of the configured target triple (e.g.,
`powerpc-ibm-aix7.3.0.0`). This results in Clang driver errors like
`error: unknown target triple 'unknown'`.

Used `Twine` constructor with braced initialization to bypass ambiguity.

---------

Co-authored-by: Tony Varghese <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
As reported in #145917 and #147309, there are situation's where flang
may crash. This is because `nextIt` in
`RewriteOpenMPLoopConstruct` gets re-assigned when an iterator is erased
from the block. If this is missed, Flang may attempt to access a
location in memory that is not accessable and cause a compiler crash.

This adds protection where the crash can occur, and a test with a
reproducer that can trigger the crash.

Fixes #147309
…ce cast to getRefPtrIfDeclareTarget

The patch introduced changes to add address spaces to a wider array of MLIR/LLVM values, however,
it was missing an address space cast that exists in our downstream implementation that's required
for declare target to work correctly.
I forgot to include a release note in #143520, and it also ocurred to me
that while #143514 is technically a bugfix in LLVM/Support, I think we
should have one for it as well.
Host associated variables were not being handled properly.
For array references, get the fixed shape extents from the value
type instead, that works correctly in all cases.
…char *` (#147301)

Some of these are even global mutable state — probably not what was
intended!
```cpp
static const char *AnalyzerCheckNamePrefix = "clang-analyzer-";
```
…#147435)

If a `do concurrent` loop is offloaded then there should be no CUDA data
transfer in it. Update the semantic and lowering to take that into
account.

`AssignmentChecker` has to be put into a separate pass because the
checkers in `SemanticsVisitor` cannot have the same `Enter/Leave`
functions. The `DoForallChecker` already has `Eneter/Leave` functions
for the `DoConstruct`.
This allows us to change the number of blocks stored according to the
size of BatchClass.

Also change the name `TransferBatch` to `Batch` given that it's never
the unit of transferring blocks.
…2521)

By separating the Unwind table into a different file, this functionality
can be a part of the DWARF library with no dependency on MC, which makes
it usable in the MC layer.

This is a continuation of
[PR#14520](#142520).
Number of threads on z/OS are controlled at the system level and thus we eed to XFAIL this test.
This fixes a bug introduced by aa24029,
"[VPlan] Unroll VPReplicateRecipe by VF", which cloned a
VPReplicateRecipe without transferring the flags from the original.

That can cause incorrect nsw/nuw flags to be emitted on the new
instructions, which may result in miscompiles.

It turns out there were no test-cases in the repo which end up hitting
the situation where the recipe requires instruction clones to have
different flags from the underlying instruction. The existing tests
covered the flags being correct when the replacement instruction is a
vectorized version of the initial instruction, but not when it required
clones. A new test is added covering this.
#147354)

Re-land #146582 now that the Flang bugs have been fixed.

There is no way in Arm64 Windows to indicate that a given function has
used the Frame Pointer as a General Purpose Register, as such stack
walks will always assume that the frame chain is valid and will follow
whatever value has been saved for the Frame Pointer (even if it is
pointing to data, etc.).

This change makes the Frame Pointer always reserved when building for
Arm64 Windows to avoid this issue.

We will be updating the official Windows ABI documentation to reflect
this requirement, and I will provide a link once it's available.
When complete record support was initially added, the parsing support
was left incomplete. This change adds the necessary parsing.
Fixes #146973

When an object with alignment requirements is placed on the stack, this
causes a stack realignment which causes AArch64 to use x19 to refer to
objects on the stack as there may be a gap between local variables and
the Stack Pointer. This causes issues with the MSVC C++ exception
personality as the offset to the catch object recorded in the handler
table no longer matches the object being used in the catch block itself.

The fix for this is to place catch objects into the fixed object area.
…_truncf to amdgpu (#146372)

- add conversion from arith.scaling_extf to amdgpu.scaled_ext_packed
- add conversion from arith.scaling_truncf to amdgpu.packed_scaled_trunc
Michael137 and others added 30 commits July 9, 2025 22:16
…7701)

This just moves the test from `libcxx` to `generic`. There are currently
no `std::function` formatters for libstdc++ so I didn't add a test-case
for it.

Split out from #146740
Implemented wcslcpy and tests.

---------

Co-authored-by: Sriya Pratipati <[email protected]>
…#147764)

When we create a lambda, we would skip over declaration contexts
representing a require expression body, which would lead to wrong
lookup.

Note that I wasn't able to establish why the code
in `Sema::createLambdaClosureType` was there to begin with (it's not
exactly recent)

The changes to mangling only ensure the status quo is preserved and do
not attempt to address the known issues of
mangling lambdas in require clauses.

In particular the itanium mangling is consistent with Clang before this
patch but differs from GCC's.

Fixes #147650
When an llvm tool crashes (e.g. from a segmentation fault),
SignalHandler will re-raise the signal. The effect is that crash reports
now contain SignalHandler in the stack trace. The crash reports are
still useful, but the presence of SignalHandler can confuse tooling and
automation that deduplicate or analyze crash reports.

rdar://150464802
Since profile inference improves sample coverage, it should be turned on by default.
This patch bumps the runner version from v3.222.0 to v3.226.0 as
v3.222.0 is too old at this point to connect to Github. This is needed
for the new premerge system given we are directly using this container.
This did not impact the existing libc++ CI as the runner was contained
in a separate container image.
Copy pasted the ctype equivalents

---------

Co-authored-by: Sriya Pratipati <[email protected]>
This is so that we'll be able to use it in compiler-rt as well.
Dependencies on LLVM Support were removed from the header by restoring
code from the original SipHash implementation.

Reviewers: kuhar, dwblaikie, ahmedbougacha

Reviewed By: dwblaikie

Pull Request: #134197
This PR adds a new transformation that turns sequences of `vector.to_elements` and `vector.from_elements` into a binary tree of `vector.shuffle` operations.

(Related RFC:
https://discourse.llvm.org/t/rfc-adding-vector-to-elements-op-to-the-vector-dialect/86779).

Example:

```
  %0:4 = vector.to_elements %a : vector<4xf32>
  %1:4 = vector.to_elements %b : vector<4xf32>
  %2:4 = vector.to_elements %c : vector<4xf32>
  %3 = vector.from_elements %0#0, %0#1, %0#2, %0#3,
                            %1#0, %1#1, %1#2, %1#3,
                            %2#0, %2#1, %2#2, %2#3 : vector<12xf32>

==>

  %0 = vector.shuffle %a, %b [0, 1, 2, 3, 4, 5, 6, 7] : vector<4xf32>, vector<4xf32>
  %1 = vector.shuffle %c, %c [0, 1, 2, 3, -1, -1, -1, -1] : vector<4xf32>, vector<4xf32>
  %2 = vector.shuffle %0, %1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] : vector<8xf32>, vector<8xf32>
```

The algorithm leverages the structured extraction/insertion information
of `vector.to_elements` and `vector.from_elements` operations and builds
a set of intervals to determine the vector length that should be used at
each level of the tree to combine the level inputs in pairs.

There are a few improvements that can be implemented in the future, such
as shuffle mask compression to avoid unnecessarily large vector lengths
with poison values, but I decided to keep things "simpler" and spend
more time documenting the different steps of the algorithm so that
people can follow along.
…placed with std::numeric_limits (#147623)

This PR addresses instances of compiler warning C4146 that can be
replaced with std::numeric_limits. Specifically, these are cases where a
literal such as '-1ULL' was used to assign a value to a uint64_t
variable. The intent is much cleaner if we use the appropriate
std::numeric_limits value<Type>::max() for these cases.


Addresses #147439
…s. (#147830)

These instructions operate on bytes so we need to round the demanded
bits up to the nearest byte which we aren't doing. I think we forgot to
update this when we changed from hasAllWUsers to hasNBitUsers.

We don't have any test case for these instruction so remove them until
we can put together a test.
The emulated PAC runtime functions emulate the ARMv8.3a pointer
authentication instructions and are intended for use in heterogeneous
testing environments. For more information, see the associated RFC:
https://discourse.llvm.org/t/rfc-emulated-pac/85557

Reviewers: llvm-beanz, petrhosek

Pull Request: #133530
This patch fixes:

  mlir/lib/Dialect/Vector/Transforms/LowerVectorToFromElementsToShuffleTree.cpp:42:20:
  error: unused variable 'kIndScale' [-Werror,-Wunused-const-variable]
* Introduce an error code for illegal_line_offset in sampleprof_error
namespace, and use it for line offset parsing error.
* Add `const` for `LineLocation::serialize`.
* Use structured binding, make_first/second_range in loops.

I'm working on a [sample-profile format
change](https://github.com/llvm/llvm-project/compare/users/mingmingl-llvm/samplefdo-profile-format)
to extend SampleFDO profile with vtable profiles. And this change splits
the non-functional changes.
…ing operands (#147583)

Added emission of the 2-element reduction instead of 2 extracts + scalar
op, when trying to vectorize operands of the instruction, if it is more
profitable.
implemented wcslcat and tests.

---------

Co-authored-by: Sriya Pratipati <[email protected]>
These scripts belong in the `mlgo-utils` directory when directly used
with python3. But since they are also used to package with pip, symlink
the entrypoint scripts to mlgo-utils directory. Adjust the bazel paths
to account for this as well. This loosely follows the same structure as lit.

Verified that I was also able to build the package successfully and use
the script.
…m built-ins into clc (#144333)

Changes in this PR:
* Declare most of workitem functions in clc and opencl folders.
* Call clc workitem function in corresponding OpenCL workitem function.
* Move ptx-nvidiacl workitem built-in implementations into clc.
* Move a few amdgcn workitem built-in implementations into clc.
* Include only needed headers in OpenCL workitem functions.
* Implement get_local_linear_id, get_max_sub_group_size,
get_num_sub_groups,
get_sub_group_id, get_sub_group_local_id, get_sub_group_size for
ptx-nvidiacl.

llvm-diff shows this PR adds a few new symbols to nvptx64--nvidiacl.bc.
llvm-diff shows no change to amdgcn--amdhsa.bc, nvptx--.bc and
nvptx64--.bc.
To prepare for other platforms, such as 64-bit AIX, that have a non-zero
mmap beginning address.

---------

Co-authored-by: David Justo <[email protected]>
The current instrumentation has more or and element extraction than a
coal mine:

```
[[TMP10:%.*]] = extractelement <16 x i32> [[TMP9]], i64 0
[[TMP11:%.*]] = and i32 [[TMP10]], 15
[[TMP43:%.*]] = or i32 [[TMP10]], [[TMP11]]
[[TMP12:%.*]] = extractelement <16 x i32> [[TMP9]], i64 1
[[TMP13:%.*]] = and i32 [[TMP12]], 15
[[TMP44:%.*]] = or i32 [[TMP12]], [[TMP13]]
    ...
[[TMP40:%.*]] = extractelement <16 x i32> [[TMP9]], i64 15
[[TMP41:%.*]] = and i32 [[TMP40]], 15
[[TMP57:%.*]] = or i32 [[TMP40]], [[TMP41]]
[[_MSCMP:%.*]] = icmp ne i32 [[TMP57]], 0
br i1 [[_MSCMP]], label [[TMP102:%.*]], label [[TMP103:%.*]], !prof [[PROF1]]
```

Simplify it to:

```
[[TMP10:%.*]] = trunc <16 x i32> [[T]] to <16 x i4>
[[TMP12:%.*]] = bitcast <16 x i4> [[TMP10]] to i64
[[_MSCMP:%.*]] = icmp ne i64 [[TMP12]], 0
br i1 [[_MSCMP]], label %[[BB13:.*]], label %[[BB14:.*]], !prof [[PROF1]]
```
…147668)

CWG papers requiring library support are also listed.
Attempt to fix these build failures:
https://lab.llvm.org/buildbot/#/builders/107/builds/12601

The suspected cause is that #133530 caused us to start
passing -std:c11 to MSVC, which activated this code path
that uses _Complex, which MSVC does not support. See:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/complex-math-support

Fix it by also checking _MSC_VER.
…s in C++23 (#145164)

C++23 mandates that temporaries used in range-based for loops are
lifetime-extended
to cover the full loop. This patch adds a check for loop variables and
compiler-
generated `__range` bindings to apply the correct extension.

Includes test cases based on examples from CWG900/P2644R1.

Fixes #109793
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment