Skip to content

[flang][fir] Basic lowering fir.do_concurrent locality specs to fir.do_loop ... unordered #138512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 9, 2025

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented May 5, 2025

Extends lowering fir.do_concurrent to fir.do_loop ... unordered by adding support for locality specifiers.

In particular, for local specifiers, a fir.alloca op is created using the localizer type. For local_init specifiers, the copy region is additionally inlined in the do concurrent loop's body.

PR stack:

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels May 5, 2025
@llvmbot
Copy link
Member

llvmbot commented May 5, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Extends lowering fir.do_concurrent to fir.do_loop ... unordered by adding support for locality specifiers.

In particular, for local specifiers, a fir.alloca op is created using the localizer type. For local_init specifiers, the copy region is additionally inlined in the do concurrent loop's body.


Full diff: https://github.com/llvm/llvm-project/pull/138512.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/Transforms/SimplifyFIROperations.cpp (+57-1)
  • (modified) flang/test/Transforms/do_concurrent-to-do_loop-unodered.fir (+61)
diff --git a/flang/lib/Optimizer/Transforms/SimplifyFIROperations.cpp b/flang/lib/Optimizer/Transforms/SimplifyFIROperations.cpp
index 6d106046b70f2..e2dc4e14ff650 100644
--- a/flang/lib/Optimizer/Transforms/SimplifyFIROperations.cpp
+++ b/flang/lib/Optimizer/Transforms/SimplifyFIROperations.cpp
@@ -149,6 +149,17 @@ mlir::LogicalResult BoxTotalElementsConversion::matchAndRewrite(
 
 class DoConcurrentConversion
     : public mlir::OpRewritePattern<fir::DoConcurrentOp> {
+  /// Looks up from the operation from and returns the LocalitySpecifierOp with
+  /// name symbolName
+  static fir::LocalitySpecifierOp
+  findLocalizer(mlir::Operation *from, mlir::SymbolRefAttr symbolName) {
+    fir::LocalitySpecifierOp localizer =
+        mlir::SymbolTable::lookupNearestSymbolFrom<fir::LocalitySpecifierOp>(
+            from, symbolName);
+    assert(localizer && "localizer not found in the symbol table");
+    return localizer;
+  }
+
 public:
   using mlir::OpRewritePattern<fir::DoConcurrentOp>::OpRewritePattern;
 
@@ -162,7 +173,52 @@ class DoConcurrentConversion
     assert(loop.getRegion().hasOneBlock());
     mlir::Block &loopBlock = loop.getRegion().getBlocks().front();
 
-    // Collect iteration variable(s) allocations do that we can move them
+    // Handle localization
+    if (!loop.getLocalVars().empty()) {
+      mlir::OpBuilder::InsertionGuard guard(rewriter);
+      rewriter.setInsertionPointToStart(&loop.getRegion().front());
+
+      std::optional<mlir::ArrayAttr> localSyms = loop.getLocalSyms();
+
+      for (auto [localVar, localArg, localizerSym] : llvm::zip_equal(
+               loop.getLocalVars(), loop.getRegionLocalArgs(), *localSyms)) {
+        mlir::SymbolRefAttr localizerName =
+            llvm::cast<mlir::SymbolRefAttr>(localizerSym);
+        fir::LocalitySpecifierOp localizer = findLocalizer(loop, localizerName);
+
+        mlir::Value localAlloc =
+            rewriter.create<fir::AllocaOp>(loop.getLoc(), localizer.getType());
+
+        if (localizer.getLocalitySpecifierType() ==
+            fir::LocalitySpecifierType::LocalInit) {
+          // It is reasonable to make this assumption since, at this stage,
+          // control-flow ops are not converted yet. Therefore, things like `if`
+          // conditions will still be represented by their encapsulating `fir`
+          // dialect ops.
+          assert(localizer.getCopyRegion().hasOneBlock() &&
+                 "Expected localizer to have a single block.");
+          mlir::Block *beforeLocalInit = rewriter.getInsertionBlock();
+          mlir::Block *afterLocalInit = rewriter.splitBlock(
+              rewriter.getInsertionBlock(), rewriter.getInsertionPoint());
+          rewriter.cloneRegionBefore(localizer.getCopyRegion(), afterLocalInit);
+          mlir::Block *copyRegionBody = beforeLocalInit->getNextNode();
+
+          rewriter.eraseOp(copyRegionBody->getTerminator());
+          rewriter.mergeBlocks(afterLocalInit, copyRegionBody);
+          rewriter.mergeBlocks(copyRegionBody, beforeLocalInit,
+                               {localVar, localArg});
+        }
+
+        rewriter.replaceAllUsesWith(localArg, localAlloc);
+      }
+
+      loop.getRegion().front().eraseArguments(loop.getNumInductionVars(),
+                                              loop.getNumLocalOperands());
+      loop.getLocalVarsMutable().clear();
+      loop.setLocalSymsAttr(nullptr);
+    }
+
+    // Collect iteration variable(s) allocations so that we can move them
     // outside the `fir.do_concurrent` wrapper.
     llvm::SmallVector<mlir::Operation *> opsToMove;
     for (mlir::Operation &op : llvm::drop_end(wrapperBlock))
diff --git a/flang/test/Transforms/do_concurrent-to-do_loop-unodered.fir b/flang/test/Transforms/do_concurrent-to-do_loop-unodered.fir
index d2ceafdda5b22..d9ef36b175598 100644
--- a/flang/test/Transforms/do_concurrent-to-do_loop-unodered.fir
+++ b/flang/test/Transforms/do_concurrent-to-do_loop-unodered.fir
@@ -121,3 +121,64 @@ func.func @dc_2d_reduction(%i_lb: index, %i_ub: index, %i_st: index,
 // CHECK:           }
 // CHECK:           return
 // CHECK:         }
+
+// -----
+
+fir.local {type = local} @local_localizer : i32
+
+fir.local {type = local_init} @local_init_localizer : i32 copy {
+^bb0(%arg0: !fir.ref<i32>, %arg1: !fir.ref<i32>):
+    %0 = fir.load %arg0 : !fir.ref<i32>
+    fir.store %0 to %arg1 : !fir.ref<i32>
+    fir.yield(%arg1 : !fir.ref<i32>)
+}
+
+func.func @do_concurrent_locality_specs() {
+  %3 = fir.alloca i32 {bindc_name = "local_init_var", uniq_name = "_QFdo_concurrentElocal_init_var"}
+  %4:2 = hlfir.declare %3 {uniq_name = "_QFdo_concurrentElocal_init_var"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+  %5 = fir.alloca i32 {bindc_name = "local_var", uniq_name = "_QFdo_concurrentElocal_var"}
+  %6:2 = hlfir.declare %5 {uniq_name = "_QFdo_concurrentElocal_var"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+  %c1 = arith.constant 1 : index
+  %c10 = arith.constant 1 : index
+  fir.do_concurrent {
+    %9 = fir.alloca i32 {bindc_name = "i"}
+    %10:2 = hlfir.declare %9 {uniq_name = "_QFdo_concurrentEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+    fir.do_concurrent.loop (%arg0) = (%c1) to (%c10) step (%c1) local(@local_localizer %6#0 -> %arg1, @local_init_localizer %4#0 -> %arg2 : !fir.ref<i32>, !fir.ref<i32>) {
+      %11 = fir.convert %arg0 : (index) -> i32
+      fir.store %11 to %10#0 : !fir.ref<i32>
+      %13:2 = hlfir.declare %arg1 {uniq_name = "_QFdo_concurrentElocal_var"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+      %15:2 = hlfir.declare %arg2 {uniq_name = "_QFdo_concurrentElocal_init_var"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+      %17 = fir.load %10#0 : !fir.ref<i32>
+      %c5_i32 = arith.constant 5 : i32
+      %18 = arith.cmpi slt, %17, %c5_i32 : i32
+      fir.if %18 {
+        %c42_i32 = arith.constant 42 : i32
+        hlfir.assign %c42_i32 to %13#0 : i32, !fir.ref<i32>
+      } else {
+        %c84_i32 = arith.constant 84 : i32
+        hlfir.assign %c84_i32 to %15#0 : i32, !fir.ref<i32>
+      }
+    }
+  }
+  return
+}
+
+// CHECK-LABEL:   func.func @do_concurrent_locality_specs() {
+// CHECK:           %[[LOC_INIT_DECL:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "{{.*}}Elocal_init_var"}
+// CHECK:           fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} unordered {
+// Verify localization of the `local` var.
+// CHECK:             %[[PRIV_LOC_ALLOC:.*]] = fir.alloca i32
+
+// Verify localization of the `local_init` var.
+// CHECK:             %[[PRIV_LOC_INIT_ALLOC:.*]] = fir.alloca i32
+// CHECK:             %[[LOC_INIT_VAL:.*]] = fir.load %[[LOC_INIT_DECL]]#0 : !fir.ref<i32>
+// CHECK:             fir.store %[[LOC_INIT_VAL]] to %[[PRIV_LOC_INIT_ALLOC]] : !fir.ref<i32>
+
+// CHECK:             %[[VAL_15:.*]]:2 = hlfir.declare %[[PRIV_LOC_ALLOC]]
+// CHECK:             %[[VAL_16:.*]]:2 = hlfir.declare %[[PRIV_LOC_INIT_ALLOC]]
+
+// CHECK:               hlfir.assign %{{.*}} to %[[VAL_15]]#0 : i32, !fir.ref<i32>
+// CHECK:               hlfir.assign %{{.*}} to %[[VAL_16]]#0 : i32, !fir.ref<i32>
+// CHECK:           }
+// CHECK:           return
+// CHECK:         }

@ergawy ergawy force-pushed the users/ergawy/fir-dc-local-spec-1 branch from 33312c3 to e673a6f Compare May 5, 2025 18:15
@ergawy ergawy force-pushed the users/ergawy/fir-dc-local-spec-2 branch from 07c29b3 to 18d42fc Compare May 5, 2025 18:17
@ergawy ergawy changed the title [flang][fir] Basci lowering fir.do_concurrent locality specs to fir.do_loop ... unordered [flang][fir] Basiclowering fir.do_concurrent locality specs to fir.do_loop ... unordered May 5, 2025
@ergawy ergawy changed the title [flang][fir] Basiclowering fir.do_concurrent locality specs to fir.do_loop ... unordered [flang][fir] Basic lowering fir.do_concurrent locality specs to fir.do_loop ... unordered May 5, 2025
Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a TODO message for locality specifiers which are not yet supported (as I understand it this means ones with init or dealloc regions)

Comment on lines +189 to +192
mlir::Value localAlloc =
rewriter.create<fir::AllocaOp>(loop.getLoc(), localizer.getType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be done on the heap because it will happen for every loop iteration. Does the spec say that it has to be on the stack?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in the spec says it has to be on the stack AFAICT. However, this is following the behavior of handleLocalitySpecs in Bridge.cpp (without delayed localization). Added a todo, let me know if you disagree with looking into this later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you use TODO(loc, "message") so that the compiler crashes (in a controlled way) instead of producing incorrect code if any of these do sneak through.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant for the init and dealloc todo right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes, sorry for commenting in a confusing place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good. Done.

@ergawy ergawy force-pushed the users/ergawy/fir-dc-local-spec-1 branch from e673a6f to 3562e43 Compare May 6, 2025 19:10
@ergawy ergawy force-pushed the users/ergawy/fir-dc-local-spec-2 branch from 18d42fc to a4acb56 Compare May 6, 2025 19:10
@ergawy
Copy link
Member Author

ergawy commented May 7, 2025

This needs a TODO message for locality specifiers which are not yet supported (as I understand it this means ones with init or dealloc regions)

Done. I think we might end up not supporting init and dealloc regions for fir.local ops since this is the case for early localization (https://github.com/llvm/llvm-project/blob/main/flang/lib/Lower/Bridge.cpp#L2052).

ergawy added a commit that referenced this pull request May 7, 2025
Adds a new `fir.local` op to model `local` and `local_init` locality
specifiers. This op is a clone of `omp.private`. In particular, this new
op also models the privatization/localization logic of an SSA value in
the `fir` dialect just like `omp.private` does for OpenMP.

PR stack:
- #137928
- #138505 (this PR)
- #138506
- #138512
- #138534
- #138816
@ergawy ergawy force-pushed the users/ergawy/fir-dc-local-spec-1 branch from 3562e43 to bf2bd22 Compare May 7, 2025 12:01
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 7, 2025
…138505)

Adds a new `fir.local` op to model `local` and `local_init` locality
specifiers. This op is a clone of `omp.private`. In particular, this new
op also models the privatization/localization logic of an SSA value in
the `fir` dialect just like `omp.private` does for OpenMP.

PR stack:
- llvm/llvm-project#137928
- llvm/llvm-project#138505 (this PR)
- llvm/llvm-project#138506
- llvm/llvm-project#138512
- llvm/llvm-project#138534
- llvm/llvm-project#138816
Base automatically changed from users/ergawy/fir-dc-local-spec-1 to main May 8, 2025 19:42
ergawy added 2 commits May 9, 2025 05:31
…r.do_loop ... unordered`

Extends lowering `fir.do_concurrent` to `fir.do_loop ... unordered` by
adding support for locality specifiers.

In particular, for `local` specifiers, a `fir.alloca` op is created
using the localizer type. For `local_init` specifiers, the `copy` region
is additionally inlined in the `do concurrent` loop's body.
@ergawy ergawy force-pushed the users/ergawy/fir-dc-local-spec-2 branch from f94ef63 to a505cf5 Compare May 9, 2025 10:31
Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates

@ergawy ergawy merged commit 2cc8734 into main May 9, 2025
11 checks passed
@ergawy ergawy deleted the users/ergawy/fir-dc-local-spec-2 branch May 9, 2025 14:27
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 9, 2025
…pecs to `fir.do_loop ... unordered` (#138512)

Extends lowering `fir.do_concurrent` to `fir.do_loop ... unordered` by
adding support for locality specifiers.

In particular, for `local` specifiers, a `fir.alloca` op is created
using the localizer type. For `local_init` specifiers, the `copy` region
is additionally inlined in the `do concurrent` loop's body.

PR stack:
- llvm/llvm-project#137928
- llvm/llvm-project#138505
- llvm/llvm-project#138506
- llvm/llvm-project#138512 (this PR)
- llvm/llvm-project#138534
- llvm/llvm-project#138816
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants