[pull] main from llvm:main #643

pull · 2025-06-02T20:46:13Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

…36546) This commit updates the Hexagon backend to handle vxi1 call operands Without HVX enabled. It ensures compatibility for vector types of sizes 4, 8, 16, 32, 64, and 128 x i1 when HVX is not enabled.

Follow-up to #146307 Moved MCInst storage to MCSection, enabling trivial ~MCRelaxableFragment and eliminating the need for a fragment walk in ~MCSection. Updated MCRelaxableFragment::getInst to construct an MCInst on demand. Modified MCAssembler::relaxInstruction's mayNeedRelaxation to accept opcode and operands instead of an MCInst, avoiding redundant MCInst creation. Note that MCObjectStreamer::emitInstructionImpl calls mayNeedRelaxation before determining the target fragment for the MCInst. Unfortunately, we also have to encode `MCInst::Flags` to support the EVEX prefix, e.g. `{evex} xorw $foo, %ax` There is a small decrease in max-rss (stage1-ReleaseLTO-g (link only)) with negligible instructions:u change. https://llvm-compile-time-tracker.com/compare.php?from=0b533f2d9f0551aaffb13dcac8e0fd0a952185b5&to=f26b57f33bc7ccae749a57dfc841de7ce2acc2ef&stat=max-rss&linkStats=on Next: Enable MCFragment to store fixed-size data (was MCDataFragment's job) and optional Opcode/Operands data (was MCRelaxableFragment's job), and delete MCDataFragment/MCRelaxableFragment. This will allow re-encoding of Data+Relax+Data+Relax sequences as Frag+Frag. The saving should outweigh the downside of larger MCFragment. Pull Request: #147229

Fix asserting in the error case.

One of them operates on values, the other on shadows. It is confusing for both of them to have the same name but only different number of parameters.

…ourceLocation` in diagnostics (#147084) The `SourceLocation` of a `RootSignatureToken` is incorrectly set to be the "offset" into the concatenated string that denotes the rootsignature. This causes an issue when the `StringLiteral` is a multi-line expansion macro, since the offset will not account for the characters between `StringLiteral` tokens. This pr resolves this by retaining the `SourceLocation` information that is kept in `StringLiteral` and then converting the offset in the concatenated string into the proper `SourceLocation` using the `StringLiteral::getLocationOfByte` interface. To do so, we will need to adjust the `RootSignatureToken` to only hold its offset into the root signature string. Then when the parser will use the token, it will need to compute its actual `SourceLocation`. See linked issue for more context. For example: ``` #define DemoRootSignature \ "CBV(b0)," \ "RootConstants(num32BitConstants = 3, b0, invalid)" expected caret location ---------------^ actual caret location ------------^ ``` The caret points 5 characters early because the current offset did not account for the characters: ``` '"' ' ' '\' ' ' '"' 1 2 3 4 5 ``` - Updates `RootSignatureParser` to retain `SourceLocation` information by retaining the `StringLiteral` and passing the underlying `StringRef` to the `Lexer` - Updates `RootSignatureLexer` so that the constructed tokens only reflect an offset into the `StringRef` - Updates `RootSignatureParser` to directly construct its used `Lexer` so that the `StringLiteral` is directly tied with the string used in the `RootSignatureLexer` - Updates `RootSignatureParser` to use `StringLiteral::getLocationOfByte` to get the actual token location for diagnostics - Updates `ParseHLSLRootSignatureTest` to construct a phony `AST`/`StringLiteral` for the test cases - Adds a test to `RootSignature-err.hlsl` showing that the `SourceLocation` is correctly set for diagnostics in a multi-line macro expansion Resolves: #146967

ptr-annotation.ll was incorrectly applying a decoration to an unsuitable target. The patch changes the decoration to a valid one for the test.

…arget triple on AIX (#147488) PR #145685 introduced constructor overload ambiguity in the Triple class, causing `updateTripleOSVersion()` to construct Triple objects with `unknown` instead of the configured target triple (e.g., `powerpc-ibm-aix7.3.0.0`). This results in Clang driver errors like `error: unknown target triple 'unknown'`. Used `Twine` constructor with braced initialization to bypass ambiguity. --------- Co-authored-by: Tony Varghese <[email protected]> Co-authored-by: Matt Arsenault <[email protected]>

As reported in #145917 and #147309, there are situation's where flang may crash. This is because `nextIt` in `RewriteOpenMPLoopConstruct` gets re-assigned when an iterator is erased from the block. If this is missed, Flang may attempt to access a location in memory that is not accessable and cause a compiler crash. This adds protection where the crash can occur, and a test with a reproducer that can trigger the crash. Fixes #147309

…nds (#146078) Closes #142961

…ce cast to getRefPtrIfDeclareTarget The patch introduced changes to add address spaces to a wider array of MLIR/LLVM values, however, it was missing an address space cast that exists in our downstream implementation that's required for declare target to work correctly.

I forgot to include a release note in #143520, and it also ocurred to me that while #143514 is technically a bugfix in LLVM/Support, I think we should have one for it as well.

Host associated variables were not being handled properly. For array references, get the fixed shape extents from the value type instead, that works correctly in all cases.

…char *` (#147301) Some of these are even global mutable state — probably not what was intended! ```cpp static const char *AnalyzerCheckNamePrefix = "clang-analyzer-"; ```

…#147435) If a `do concurrent` loop is offloaded then there should be no CUDA data transfer in it. Update the semantic and lowering to take that into account. `AssignmentChecker` has to be put into a separate pass because the checkers in `SemanticsVisitor` cannot have the same `Enter/Leave` functions. The `DoForallChecker` already has `Eneter/Leave` functions for the `DoConstruct`.

This allows us to change the number of blocks stored according to the size of BatchClass. Also change the name `TransferBatch` to `Batch` given that it's never the unit of transferring blocks.

…2521) By separating the Unwind table into a different file, this functionality can be a part of the DWARF library with no dependency on MC, which makes it usable in the MC layer. This is a continuation of [PR#14520](#142520).

Number of threads on z/OS are controlled at the system level and thus we eed to XFAIL this test.

This fixes a bug introduced by aa24029, "[VPlan] Unroll VPReplicateRecipe by VF", which cloned a VPReplicateRecipe without transferring the flags from the original. That can cause incorrect nsw/nuw flags to be emitted on the new instructions, which may result in miscompiles. It turns out there were no test-cases in the repo which end up hitting the situation where the recipe requires instruction clones to have different flags from the underlying instruction. The existing tests covered the flags being correct when the replacement instruction is a vectorized version of the initial instruction, but not when it required clones. A new test is added covering this.

#147354) Re-land #146582 now that the Flang bugs have been fixed. There is no way in Arm64 Windows to indicate that a given function has used the Frame Pointer as a General Purpose Register, as such stack walks will always assume that the frame chain is valid and will follow whatever value has been saved for the Frame Pointer (even if it is pointing to data, etc.). This change makes the Frame Pointer always reserved when building for Arm64 Windows to avoid this issue. We will be updating the official Windows ABI documentation to reflect this requirement, and I will provide a link once it's available.

When complete record support was initially added, the parsing support was left incomplete. This change adds the necessary parsing.

…ts (#147566) Forked from llvm/test/CodeGen/X86.

Fixes #146973 When an object with alignment requirements is placed on the stack, this causes a stack realignment which causes AArch64 to use x19 to refer to objects on the stack as there may be a gap between local variables and the Stack Pointer. This causes issues with the MSVC C++ exception personality as the offset to the catch object recorded in the handler table no longer matches the object being used in the catch block itself. The fix for this is to place catch objects into the fixed object area.

…_truncf to amdgpu (#146372) - add conversion from arith.scaling_extf to amdgpu.scaled_ext_packed - add conversion from arith.scaling_truncf to amdgpu.packed_scaled_trunc

…7701) This just moves the test from `libcxx` to `generic`. There are currently no `std::function` formatters for libstdc++ so I didn't add a test-case for it. Split out from #146740

Implemented wcslcpy and tests. --------- Co-authored-by: Sriya Pratipati <[email protected]>

…#147764) When we create a lambda, we would skip over declaration contexts representing a require expression body, which would lead to wrong lookup. Note that I wasn't able to establish why the code in `Sema::createLambdaClosureType` was there to begin with (it's not exactly recent) The changes to mangling only ensure the status quo is preserved and do not attempt to address the known issues of mangling lambdas in require clauses. In particular the itanium mangling is consistent with Clang before this patch but differs from GCC's. Fixes #147650

When an llvm tool crashes (e.g. from a segmentation fault), SignalHandler will re-raise the signal. The effect is that crash reports now contain SignalHandler in the stack trace. The crash reports are still useful, but the presence of SignalHandler can confuse tooling and automation that deduplicate or analyze crash reports. rdar://150464802

Since profile inference improves sample coverage, it should be turned on by default.

This patch bumps the runner version from v3.222.0 to v3.226.0 as v3.222.0 is too old at this point to connect to Github. This is needed for the new premerge system given we are directly using this container. This did not impact the existing libc++ CI as the runner was contained in a separate container image.

Copy pasted the ctype equivalents --------- Co-authored-by: Sriya Pratipati <[email protected]>

This is so that we'll be able to use it in compiler-rt as well. Dependencies on LLVM Support were removed from the header by restoring code from the original SipHash implementation. Reviewers: kuhar, dwblaikie, ahmedbougacha Reviewed By: dwblaikie Pull Request: #134197

This PR adds a new transformation that turns sequences of `vector.to_elements` and `vector.from_elements` into a binary tree of `vector.shuffle` operations. (Related RFC: https://discourse.llvm.org/t/rfc-adding-vector-to-elements-op-to-the-vector-dialect/86779). Example: ``` %0:4 = vector.to_elements %a : vector<4xf32> %1:4 = vector.to_elements %b : vector<4xf32> %2:4 = vector.to_elements %c : vector<4xf32> %3 = vector.from_elements %0#0, %0#1, %0#2, %0#3, %1#0, %1#1, %1#2, %1#3, %2#0, %2#1, %2#2, %2#3 : vector<12xf32> ==> %0 = vector.shuffle %a, %b [0, 1, 2, 3, 4, 5, 6, 7] : vector<4xf32>, vector<4xf32> %1 = vector.shuffle %c, %c [0, 1, 2, 3, -1, -1, -1, -1] : vector<4xf32>, vector<4xf32> %2 = vector.shuffle %0, %1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] : vector<8xf32>, vector<8xf32> ``` The algorithm leverages the structured extraction/insertion information of `vector.to_elements` and `vector.from_elements` operations and builds a set of intervals to determine the vector length that should be used at each level of the tree to combine the level inputs in pairs. There are a few improvements that can be implemented in the future, such as shuffle mask compression to avoid unnecessarily large vector lengths with poison values, but I decided to keep things "simpler" and spend more time documenting the different steps of the algorithm so that people can follow along.

…placed with std::numeric_limits (#147623) This PR addresses instances of compiler warning C4146 that can be replaced with std::numeric_limits. Specifically, these are cases where a literal such as '-1ULL' was used to assign a value to a uint64_t variable. The intent is much cleaner if we use the appropriate std::numeric_limits value<Type>::max() for these cases. Addresses #147439

…s. (#147830) These instructions operate on bytes so we need to round the demanded bits up to the nearest byte which we aren't doing. I think we forgot to update this when we changed from hasAllWUsers to hasNBitUsers. We don't have any test case for these instruction so remove them until we can put together a test.

The emulated PAC runtime functions emulate the ARMv8.3a pointer authentication instructions and are intended for use in heterogeneous testing environments. For more information, see the associated RFC: https://discourse.llvm.org/t/rfc-emulated-pac/85557 Reviewers: llvm-beanz, petrhosek Pull Request: #133530

…sjoint. (#147838)

This patch fixes: mlir/lib/Dialect/Vector/Transforms/LowerVectorToFromElementsToShuffleTree.cpp:42:20: error: unused variable 'kIndScale' [-Werror,-Wunused-const-variable]

* Introduce an error code for illegal_line_offset in sampleprof_error namespace, and use it for line offset parsing error. * Add `const` for `LineLocation::serialize`. * Use structured binding, make_first/second_range in loops. I'm working on a [sample-profile format change](https://github.com/llvm/llvm-project/compare/users/mingmingl-llvm/samplefdo-profile-format) to extend SampleFDO profile with vtable profiles. And this change splits the non-functional changes.

…ing operands (#147583) Added emission of the 2-element reduction instead of 2 extracts + scalar op, when trying to vectorize operands of the instruction, if it is more profitable.

implemented wcslcat and tests. --------- Co-authored-by: Sriya Pratipati <[email protected]>

These scripts belong in the `mlgo-utils` directory when directly used with python3. But since they are also used to package with pip, symlink the entrypoint scripts to mlgo-utils directory. Adjust the bazel paths to account for this as well. This loosely follows the same structure as lit. Verified that I was also able to build the package successfully and use the script.

…m built-ins into clc (#144333) Changes in this PR: * Declare most of workitem functions in clc and opencl folders. * Call clc workitem function in corresponding OpenCL workitem function. * Move ptx-nvidiacl workitem built-in implementations into clc. * Move a few amdgcn workitem built-in implementations into clc. * Include only needed headers in OpenCL workitem functions. * Implement get_local_linear_id, get_max_sub_group_size, get_num_sub_groups, get_sub_group_id, get_sub_group_local_id, get_sub_group_size for ptx-nvidiacl. llvm-diff shows this PR adds a few new symbols to nvptx64--nvidiacl.bc. llvm-diff shows no change to amdgcn--amdhsa.bc, nvptx--.bc and nvptx64--.bc.

To prepare for other platforms, such as 64-bit AIX, that have a non-zero mmap beginning address. --------- Co-authored-by: David Justo <[email protected]>

The current instrumentation has more or and element extraction than a coal mine: ``` [[TMP10:%.*]] = extractelement <16 x i32> [[TMP9]], i64 0 [[TMP11:%.*]] = and i32 [[TMP10]], 15 [[TMP43:%.*]] = or i32 [[TMP10]], [[TMP11]] [[TMP12:%.*]] = extractelement <16 x i32> [[TMP9]], i64 1 [[TMP13:%.*]] = and i32 [[TMP12]], 15 [[TMP44:%.*]] = or i32 [[TMP12]], [[TMP13]] ... [[TMP40:%.*]] = extractelement <16 x i32> [[TMP9]], i64 15 [[TMP41:%.*]] = and i32 [[TMP40]], 15 [[TMP57:%.*]] = or i32 [[TMP40]], [[TMP41]] [[_MSCMP:%.*]] = icmp ne i32 [[TMP57]], 0 br i1 [[_MSCMP]], label [[TMP102:%.*]], label [[TMP103:%.*]], !prof [[PROF1]] ``` Simplify it to: ``` [[TMP10:%.*]] = trunc <16 x i32> [[T]] to <16 x i4> [[TMP12:%.*]] = bitcast <16 x i4> [[TMP10]] to i64 [[_MSCMP:%.*]] = icmp ne i64 [[TMP12]], 0 br i1 [[_MSCMP]], label %[[BB13:.*]], label %[[BB14:.*]], !prof [[PROF1]] ```

…147668) CWG papers requiring library support are also listed.

This patch implements clang intrinsic support for XAndesVSIntLoad. The document for the intrinsics can be found at: https://github.com/andestech/andes-vector-intrinsic-doc/blob/ast-v5_4_0-release-v5/auto-generated/andes-v5/intrinsic_funcs/04_andes_vector_int4_load_extension.adoc Co-authored-by: Lino Hsing-Yu Peng <[email protected]>

…bfmin directory. NFC. A follow-up commit for #147644.

Attempt to fix these build failures: https://lab.llvm.org/buildbot/#/builders/107/builds/12601 The suspected cause is that #133530 caused us to start passing -std:c11 to MSVC, which activated this code path that uses _Complex, which MSVC does not support. See: https://learn.microsoft.com/en-us/cpp/c-runtime-library/complex-math-support Fix it by also checking _MSC_VER.

…s in C++23 (#145164) C++23 mandates that temporaries used in range-based for loops are lifetime-extended to cover the full loop. This patch adds a check for loop variables and compiler- generated `__range` bindings to apply the correct extension. Includes test cases based on examples from CWG900/P2644R1. Fixes #109793

pull bot added the ⤵️ pull label Jun 2, 2025

preames and others added 29 commits July 8, 2025 09:41

[RISCV] Add coverage for optimizations in deinterleave load lowering

6f748fd

[Hexagon] Handle Call Operand vxi1 in Hexagon without HVX Enabled (#1…

de732df

…36546) This commit updates the Hexagon backend to handle vxi1 call operands Without HVX enabled. It ensures compatibility for vector types of sizes 4, 8, 16, 32, 64, and 128 x i1 when HVX is not enabled.

WebAssembly: Add test for sincos intrinsic (#147467)

4a507b1

DAG: Fall back to separate sin and cos when softening sincos (#147468)

3697d6d

Fix asserting in the error case.

[NFC] [MSAN] disambiguate insertShadowCheck (#146616)

36dbe51

One of them operates on values, the other on shadows. It is confusing for both of them to have the same name but only different number of parameters.

LoongArch: Add test for sincos intrinsic (#147471)

3614d49

[X86] Add test coverage for #143456

64c3ba8

[NFC][SPIRV] Fix test after spirv-val update (#147523)

320f682

ptr-annotation.ll was incorrectly applying a decoration to an unsuitable target. The patch changes the decoration to a valid one for the test.

[AArch64] Expand UADDLV patterns to handle two-step i8->i16->i32 exte…

adaa409

…nds (#146078) Closes #142961

[Clang] [Docs] Add release notes for #143514 and #143520 (#147562)

eb2b63c

I forgot to include a release note in #143520, and it also ocurred to me that while #143514 is technically a bugfix in LLVM/Support, I think we should have one for it as well.

[flang] Fix optimization of array assignments after #146408 (#147371)

e976eaf

Host associated variables were not being handled properly. For array references, get the fixed shape extents from the value type instead, that works correctly in all cases.

[clang-tidy][NFC] Prefer constexpr llvm::StringLiteral over `const …

1e3f6a6

…char *` (#147301) Some of these are even global mutable state — probably not what was intended! ```cpp static const char *AnalyzerCheckNamePrefix = "clang-analyzer-"; ```

[scudo] Make block storage in TransferBatch trailing objects (#144204)

8b65c9d

This allows us to change the number of blocks stored according to the size of BatchClass. Also change the name `TransferBatch` to `Batch` given that it's never the unit of transferring blocks.

[gn] port 0580563

2485c51

[gn build] Port c44c142

0863979

[libc++][z/OS] XFAIL thread_create_failure.pass.cpp on z/OS (#147520)

bc8aa97

Number of threads on z/OS are controlled at the system level and thus we eed to XFAIL this test.

[CIR] Add support for parsing complete records (#147403)

d09984e

When complete record support was initially added, the parsing support was left incomplete. This change adds the necessary parsing.

[NFCI][msan] Add avx512bw-intrinsics, avx512bw-intrinsics-upgrade tes…

c8048e7

…ts (#147566) Forked from llvm/test/CodeGen/X86.

[mlir][amdgpu] Add conversion from arith.scaling_extf / arith.scaling…

6f291cb

…_truncf to amdgpu (#146372) - add conversion from arith.scaling_extf to amdgpu.scaled_ext_packed - add conversion from arith.scaling_truncf to amdgpu.packed_scaled_trunc

Michael137 and others added 30 commits July 9, 2025 22:16

[lldb][test] Move std::function from libcxx to generic directory (#14…

9d8058e

…7701) This just moves the test from `libcxx` to `generic`. There are currently no `std::function` formatters for libstdc++ so I didn't add a test-case for it. Split out from #146740

[libc] wcslcpy implementation (#146571)

16f0462

Implemented wcslcpy and tests. --------- Co-authored-by: Sriya Pratipati <[email protected]>

[Driver][SamplePGO] Enable -fsample-profile-use-profi (#146795)

2fc4a4a

Since profile inference improves sample coverage, it should be turned on by default.

[libc] Added internal wctype functions (#147798)

f1acd69

Copy pasted the ctype equivalents --------- Co-authored-by: Sriya Pratipati <[email protected]>

[libc][NFC] fix comment typo ("documentation") (#147836)

071e302

[RISCV] Use Selection::haveNoCommonBitsSet in RISCVDAGToDAGISel::orDi…

574b66f

…sjoint. (#147838)

gn build: Port db03408

a37f0a0

[mlir] Fix a warning

cd65f8b

This patch fixes: mlir/lib/Dialect/Vector/Transforms/LowerVectorToFromElementsToShuffleTree.cpp:42:20: error: unused variable 'kIndScale' [-Werror,-Wunused-const-variable]

[SLP] Emit reduction instead of 2 extracts + scalar op, when vectoriz…

ac4a38e

…ing operands (#147583) Added emission of the 2-element reduction instead of 2 extracts + scalar op, when trying to vectorize operands of the instruction, if it is more profitable.

[libc] wcslcat implementation (#146588)

d5436b0

implemented wcslcat and tests. --------- Co-authored-by: Sriya Pratipati <[email protected]>

[sanitizer_common] Introduce SANITIZER_MMAP_BEGIN macro (#147645)

d286540

To prepare for other platforms, such as 64-bit AIX, that have a non-zero mmap beginning address. --------- Co-authored-by: David Justo <[email protected]>

[mlir][xegpu] Relax rank restriction of TensorDescType (#145916)

75524de

[libc++][docs] Update paper & LWG issue lists after 2025-06 meeting (#…

e8a50a2

…147668) CWG papers requiring library support are also listed.

[RISCV] Move the intrinsic tests for vfwcvtbf16 and vfncvtbf16 to zvf…

2eab6f9

…bfmin directory. NFC. A follow-up commit for #147644.

[msan] Fix -Wunused-but-set-variable after #147839

1ae99f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] main from llvm:main #643

[pull] main from llvm:main #643

pull bot commented Jun 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

[pull] main from llvm:main #643

Are you sure you want to change the base?

[pull] main from llvm:main #643

Conversation

pull bot commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pull bot commented Jun 2, 2025 •

edited

Loading