forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
[pull] main from llvm:main #643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pull
wants to merge
4,617
commits into
tgockel:main
Choose a base branch
from
llvm:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+750,196
−431,054
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…36546) This commit updates the Hexagon backend to handle vxi1 call operands Without HVX enabled. It ensures compatibility for vector types of sizes 4, 8, 16, 32, 64, and 128 x i1 when HVX is not enabled.
Follow-up to #146307 Moved MCInst storage to MCSection, enabling trivial ~MCRelaxableFragment and eliminating the need for a fragment walk in ~MCSection. Updated MCRelaxableFragment::getInst to construct an MCInst on demand. Modified MCAssembler::relaxInstruction's mayNeedRelaxation to accept opcode and operands instead of an MCInst, avoiding redundant MCInst creation. Note that MCObjectStreamer::emitInstructionImpl calls mayNeedRelaxation before determining the target fragment for the MCInst. Unfortunately, we also have to encode `MCInst::Flags` to support the EVEX prefix, e.g. `{evex} xorw $foo, %ax` There is a small decrease in max-rss (stage1-ReleaseLTO-g (link only)) with negligible instructions:u change. https://llvm-compile-time-tracker.com/compare.php?from=0b533f2d9f0551aaffb13dcac8e0fd0a952185b5&to=f26b57f33bc7ccae749a57dfc841de7ce2acc2ef&stat=max-rss&linkStats=on Next: Enable MCFragment to store fixed-size data (was MCDataFragment's job) and optional Opcode/Operands data (was MCRelaxableFragment's job), and delete MCDataFragment/MCRelaxableFragment. This will allow re-encoding of Data+Relax+Data+Relax sequences as Frag+Frag. The saving should outweigh the downside of larger MCFragment. Pull Request: #147229
Fix asserting in the error case.
One of them operates on values, the other on shadows. It is confusing for both of them to have the same name but only different number of parameters.
…ourceLocation` in diagnostics (#147084) The `SourceLocation` of a `RootSignatureToken` is incorrectly set to be the "offset" into the concatenated string that denotes the rootsignature. This causes an issue when the `StringLiteral` is a multi-line expansion macro, since the offset will not account for the characters between `StringLiteral` tokens. This pr resolves this by retaining the `SourceLocation` information that is kept in `StringLiteral` and then converting the offset in the concatenated string into the proper `SourceLocation` using the `StringLiteral::getLocationOfByte` interface. To do so, we will need to adjust the `RootSignatureToken` to only hold its offset into the root signature string. Then when the parser will use the token, it will need to compute its actual `SourceLocation`. See linked issue for more context. For example: ``` #define DemoRootSignature \ "CBV(b0)," \ "RootConstants(num32BitConstants = 3, b0, invalid)" expected caret location ---------------^ actual caret location ------------^ ``` The caret points 5 characters early because the current offset did not account for the characters: ``` '"' ' ' '\' ' ' '"' 1 2 3 4 5 ``` - Updates `RootSignatureParser` to retain `SourceLocation` information by retaining the `StringLiteral` and passing the underlying `StringRef` to the `Lexer` - Updates `RootSignatureLexer` so that the constructed tokens only reflect an offset into the `StringRef` - Updates `RootSignatureParser` to directly construct its used `Lexer` so that the `StringLiteral` is directly tied with the string used in the `RootSignatureLexer` - Updates `RootSignatureParser` to use `StringLiteral::getLocationOfByte` to get the actual token location for diagnostics - Updates `ParseHLSLRootSignatureTest` to construct a phony `AST`/`StringLiteral` for the test cases - Adds a test to `RootSignature-err.hlsl` showing that the `SourceLocation` is correctly set for diagnostics in a multi-line macro expansion Resolves: #146967
ptr-annotation.ll was incorrectly applying a decoration to an unsuitable target. The patch changes the decoration to a valid one for the test.
…arget triple on AIX (#147488) PR #145685 introduced constructor overload ambiguity in the Triple class, causing `updateTripleOSVersion()` to construct Triple objects with `unknown` instead of the configured target triple (e.g., `powerpc-ibm-aix7.3.0.0`). This results in Clang driver errors like `error: unknown target triple 'unknown'`. Used `Twine` constructor with braced initialization to bypass ambiguity. --------- Co-authored-by: Tony Varghese <[email protected]> Co-authored-by: Matt Arsenault <[email protected]>
As reported in #145917 and #147309, there are situation's where flang may crash. This is because `nextIt` in `RewriteOpenMPLoopConstruct` gets re-assigned when an iterator is erased from the block. If this is missed, Flang may attempt to access a location in memory that is not accessable and cause a compiler crash. This adds protection where the crash can occur, and a test with a reproducer that can trigger the crash. Fixes #147309
…ce cast to getRefPtrIfDeclareTarget The patch introduced changes to add address spaces to a wider array of MLIR/LLVM values, however, it was missing an address space cast that exists in our downstream implementation that's required for declare target to work correctly.
…char *` (#147301) Some of these are even global mutable state — probably not what was intended! ```cpp static const char *AnalyzerCheckNamePrefix = "clang-analyzer-"; ```
…#147435) If a `do concurrent` loop is offloaded then there should be no CUDA data transfer in it. Update the semantic and lowering to take that into account. `AssignmentChecker` has to be put into a separate pass because the checkers in `SemanticsVisitor` cannot have the same `Enter/Leave` functions. The `DoForallChecker` already has `Eneter/Leave` functions for the `DoConstruct`.
This allows us to change the number of blocks stored according to the size of BatchClass. Also change the name `TransferBatch` to `Batch` given that it's never the unit of transferring blocks.
Number of threads on z/OS are controlled at the system level and thus we eed to XFAIL this test.
This fixes a bug introduced by aa24029, "[VPlan] Unroll VPReplicateRecipe by VF", which cloned a VPReplicateRecipe without transferring the flags from the original. That can cause incorrect nsw/nuw flags to be emitted on the new instructions, which may result in miscompiles. It turns out there were no test-cases in the repo which end up hitting the situation where the recipe requires instruction clones to have different flags from the underlying instruction. The existing tests covered the flags being correct when the replacement instruction is a vectorized version of the initial instruction, but not when it required clones. A new test is added covering this.
#147354) Re-land #146582 now that the Flang bugs have been fixed. There is no way in Arm64 Windows to indicate that a given function has used the Frame Pointer as a General Purpose Register, as such stack walks will always assume that the frame chain is valid and will follow whatever value has been saved for the Frame Pointer (even if it is pointing to data, etc.). This change makes the Frame Pointer always reserved when building for Arm64 Windows to avoid this issue. We will be updating the official Windows ABI documentation to reflect this requirement, and I will provide a link once it's available.
When complete record support was initially added, the parsing support was left incomplete. This change adds the necessary parsing.
…ts (#147566) Forked from llvm/test/CodeGen/X86.
Fixes #146973 When an object with alignment requirements is placed on the stack, this causes a stack realignment which causes AArch64 to use x19 to refer to objects on the stack as there may be a gap between local variables and the Stack Pointer. This causes issues with the MSVC C++ exception personality as the offset to the catch object recorded in the handler table no longer matches the object being used in the catch block itself. The fix for this is to place catch objects into the fixed object area.
…_truncf to amdgpu (#146372) - add conversion from arith.scaling_extf to amdgpu.scaled_ext_packed - add conversion from arith.scaling_truncf to amdgpu.packed_scaled_trunc
Implemented wcslcpy and tests. --------- Co-authored-by: Sriya Pratipati <[email protected]>
…#147764) When we create a lambda, we would skip over declaration contexts representing a require expression body, which would lead to wrong lookup. Note that I wasn't able to establish why the code in `Sema::createLambdaClosureType` was there to begin with (it's not exactly recent) The changes to mangling only ensure the status quo is preserved and do not attempt to address the known issues of mangling lambdas in require clauses. In particular the itanium mangling is consistent with Clang before this patch but differs from GCC's. Fixes #147650
When an llvm tool crashes (e.g. from a segmentation fault), SignalHandler will re-raise the signal. The effect is that crash reports now contain SignalHandler in the stack trace. The crash reports are still useful, but the presence of SignalHandler can confuse tooling and automation that deduplicate or analyze crash reports. rdar://150464802
Since profile inference improves sample coverage, it should be turned on by default.
This patch bumps the runner version from v3.222.0 to v3.226.0 as v3.222.0 is too old at this point to connect to Github. This is needed for the new premerge system given we are directly using this container. This did not impact the existing libc++ CI as the runner was contained in a separate container image.
Copy pasted the ctype equivalents --------- Co-authored-by: Sriya Pratipati <[email protected]>
This is so that we'll be able to use it in compiler-rt as well. Dependencies on LLVM Support were removed from the header by restoring code from the original SipHash implementation. Reviewers: kuhar, dwblaikie, ahmedbougacha Reviewed By: dwblaikie Pull Request: #134197
This PR adds a new transformation that turns sequences of `vector.to_elements` and `vector.from_elements` into a binary tree of `vector.shuffle` operations. (Related RFC: https://discourse.llvm.org/t/rfc-adding-vector-to-elements-op-to-the-vector-dialect/86779). Example: ``` %0:4 = vector.to_elements %a : vector<4xf32> %1:4 = vector.to_elements %b : vector<4xf32> %2:4 = vector.to_elements %c : vector<4xf32> %3 = vector.from_elements %0#0, %0#1, %0#2, %0#3, %1#0, %1#1, %1#2, %1#3, %2#0, %2#1, %2#2, %2#3 : vector<12xf32> ==> %0 = vector.shuffle %a, %b [0, 1, 2, 3, 4, 5, 6, 7] : vector<4xf32>, vector<4xf32> %1 = vector.shuffle %c, %c [0, 1, 2, 3, -1, -1, -1, -1] : vector<4xf32>, vector<4xf32> %2 = vector.shuffle %0, %1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] : vector<8xf32>, vector<8xf32> ``` The algorithm leverages the structured extraction/insertion information of `vector.to_elements` and `vector.from_elements` operations and builds a set of intervals to determine the vector length that should be used at each level of the tree to combine the level inputs in pairs. There are a few improvements that can be implemented in the future, such as shuffle mask compression to avoid unnecessarily large vector lengths with poison values, but I decided to keep things "simpler" and spend more time documenting the different steps of the algorithm so that people can follow along.
…placed with std::numeric_limits (#147623) This PR addresses instances of compiler warning C4146 that can be replaced with std::numeric_limits. Specifically, these are cases where a literal such as '-1ULL' was used to assign a value to a uint64_t variable. The intent is much cleaner if we use the appropriate std::numeric_limits value<Type>::max() for these cases. Addresses #147439
…s. (#147830) These instructions operate on bytes so we need to round the demanded bits up to the nearest byte which we aren't doing. I think we forgot to update this when we changed from hasAllWUsers to hasNBitUsers. We don't have any test case for these instruction so remove them until we can put together a test.
The emulated PAC runtime functions emulate the ARMv8.3a pointer authentication instructions and are intended for use in heterogeneous testing environments. For more information, see the associated RFC: https://discourse.llvm.org/t/rfc-emulated-pac/85557 Reviewers: llvm-beanz, petrhosek Pull Request: #133530
This patch fixes: mlir/lib/Dialect/Vector/Transforms/LowerVectorToFromElementsToShuffleTree.cpp:42:20: error: unused variable 'kIndScale' [-Werror,-Wunused-const-variable]
* Introduce an error code for illegal_line_offset in sampleprof_error namespace, and use it for line offset parsing error. * Add `const` for `LineLocation::serialize`. * Use structured binding, make_first/second_range in loops. I'm working on a [sample-profile format change](https://github.com/llvm/llvm-project/compare/users/mingmingl-llvm/samplefdo-profile-format) to extend SampleFDO profile with vtable profiles. And this change splits the non-functional changes.
…ing operands (#147583) Added emission of the 2-element reduction instead of 2 extracts + scalar op, when trying to vectorize operands of the instruction, if it is more profitable.
implemented wcslcat and tests. --------- Co-authored-by: Sriya Pratipati <[email protected]>
These scripts belong in the `mlgo-utils` directory when directly used with python3. But since they are also used to package with pip, symlink the entrypoint scripts to mlgo-utils directory. Adjust the bazel paths to account for this as well. This loosely follows the same structure as lit. Verified that I was also able to build the package successfully and use the script.
…m built-ins into clc (#144333) Changes in this PR: * Declare most of workitem functions in clc and opencl folders. * Call clc workitem function in corresponding OpenCL workitem function. * Move ptx-nvidiacl workitem built-in implementations into clc. * Move a few amdgcn workitem built-in implementations into clc. * Include only needed headers in OpenCL workitem functions. * Implement get_local_linear_id, get_max_sub_group_size, get_num_sub_groups, get_sub_group_id, get_sub_group_local_id, get_sub_group_size for ptx-nvidiacl. llvm-diff shows this PR adds a few new symbols to nvptx64--nvidiacl.bc. llvm-diff shows no change to amdgcn--amdhsa.bc, nvptx--.bc and nvptx64--.bc.
To prepare for other platforms, such as 64-bit AIX, that have a non-zero mmap beginning address. --------- Co-authored-by: David Justo <[email protected]>
The current instrumentation has more or and element extraction than a coal mine: ``` [[TMP10:%.*]] = extractelement <16 x i32> [[TMP9]], i64 0 [[TMP11:%.*]] = and i32 [[TMP10]], 15 [[TMP43:%.*]] = or i32 [[TMP10]], [[TMP11]] [[TMP12:%.*]] = extractelement <16 x i32> [[TMP9]], i64 1 [[TMP13:%.*]] = and i32 [[TMP12]], 15 [[TMP44:%.*]] = or i32 [[TMP12]], [[TMP13]] ... [[TMP40:%.*]] = extractelement <16 x i32> [[TMP9]], i64 15 [[TMP41:%.*]] = and i32 [[TMP40]], 15 [[TMP57:%.*]] = or i32 [[TMP40]], [[TMP41]] [[_MSCMP:%.*]] = icmp ne i32 [[TMP57]], 0 br i1 [[_MSCMP]], label [[TMP102:%.*]], label [[TMP103:%.*]], !prof [[PROF1]] ``` Simplify it to: ``` [[TMP10:%.*]] = trunc <16 x i32> [[T]] to <16 x i4> [[TMP12:%.*]] = bitcast <16 x i4> [[TMP10]] to i64 [[_MSCMP:%.*]] = icmp ne i64 [[TMP12]], 0 br i1 [[_MSCMP]], label %[[BB13:.*]], label %[[BB14:.*]], !prof [[PROF1]] ```
…147668) CWG papers requiring library support are also listed.
This patch implements clang intrinsic support for XAndesVSIntLoad. The document for the intrinsics can be found at: https://github.com/andestech/andes-vector-intrinsic-doc/blob/ast-v5_4_0-release-v5/auto-generated/andes-v5/intrinsic_funcs/04_andes_vector_int4_load_extension.adoc Co-authored-by: Lino Hsing-Yu Peng <[email protected]>
…bfmin directory. NFC. A follow-up commit for #147644.
Attempt to fix these build failures: https://lab.llvm.org/buildbot/#/builders/107/builds/12601 The suspected cause is that #133530 caused us to start passing -std:c11 to MSVC, which activated this code path that uses _Complex, which MSVC does not support. See: https://learn.microsoft.com/en-us/cpp/c-runtime-library/complex-math-support Fix it by also checking _MSC_VER.
…s in C++23 (#145164) C++23 mandates that temporaries used in range-based for loops are lifetime-extended to cover the full loop. This patch adds a check for loop variables and compiler- generated `__range` bindings to apply the correct extension. Includes test cases based on examples from CWG900/P2644R1. Fixes #109793
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )