-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[flang][fir] Basic lowering fir.do_concurrent
locality specs to fir.do_loop ... unordered
#138512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesExtends lowering In particular, for Full diff: https://github.com/llvm/llvm-project/pull/138512.diff 2 Files Affected:
diff --git a/flang/lib/Optimizer/Transforms/SimplifyFIROperations.cpp b/flang/lib/Optimizer/Transforms/SimplifyFIROperations.cpp
index 6d106046b70f2..e2dc4e14ff650 100644
--- a/flang/lib/Optimizer/Transforms/SimplifyFIROperations.cpp
+++ b/flang/lib/Optimizer/Transforms/SimplifyFIROperations.cpp
@@ -149,6 +149,17 @@ mlir::LogicalResult BoxTotalElementsConversion::matchAndRewrite(
class DoConcurrentConversion
: public mlir::OpRewritePattern<fir::DoConcurrentOp> {
+ /// Looks up from the operation from and returns the LocalitySpecifierOp with
+ /// name symbolName
+ static fir::LocalitySpecifierOp
+ findLocalizer(mlir::Operation *from, mlir::SymbolRefAttr symbolName) {
+ fir::LocalitySpecifierOp localizer =
+ mlir::SymbolTable::lookupNearestSymbolFrom<fir::LocalitySpecifierOp>(
+ from, symbolName);
+ assert(localizer && "localizer not found in the symbol table");
+ return localizer;
+ }
+
public:
using mlir::OpRewritePattern<fir::DoConcurrentOp>::OpRewritePattern;
@@ -162,7 +173,52 @@ class DoConcurrentConversion
assert(loop.getRegion().hasOneBlock());
mlir::Block &loopBlock = loop.getRegion().getBlocks().front();
- // Collect iteration variable(s) allocations do that we can move them
+ // Handle localization
+ if (!loop.getLocalVars().empty()) {
+ mlir::OpBuilder::InsertionGuard guard(rewriter);
+ rewriter.setInsertionPointToStart(&loop.getRegion().front());
+
+ std::optional<mlir::ArrayAttr> localSyms = loop.getLocalSyms();
+
+ for (auto [localVar, localArg, localizerSym] : llvm::zip_equal(
+ loop.getLocalVars(), loop.getRegionLocalArgs(), *localSyms)) {
+ mlir::SymbolRefAttr localizerName =
+ llvm::cast<mlir::SymbolRefAttr>(localizerSym);
+ fir::LocalitySpecifierOp localizer = findLocalizer(loop, localizerName);
+
+ mlir::Value localAlloc =
+ rewriter.create<fir::AllocaOp>(loop.getLoc(), localizer.getType());
+
+ if (localizer.getLocalitySpecifierType() ==
+ fir::LocalitySpecifierType::LocalInit) {
+ // It is reasonable to make this assumption since, at this stage,
+ // control-flow ops are not converted yet. Therefore, things like `if`
+ // conditions will still be represented by their encapsulating `fir`
+ // dialect ops.
+ assert(localizer.getCopyRegion().hasOneBlock() &&
+ "Expected localizer to have a single block.");
+ mlir::Block *beforeLocalInit = rewriter.getInsertionBlock();
+ mlir::Block *afterLocalInit = rewriter.splitBlock(
+ rewriter.getInsertionBlock(), rewriter.getInsertionPoint());
+ rewriter.cloneRegionBefore(localizer.getCopyRegion(), afterLocalInit);
+ mlir::Block *copyRegionBody = beforeLocalInit->getNextNode();
+
+ rewriter.eraseOp(copyRegionBody->getTerminator());
+ rewriter.mergeBlocks(afterLocalInit, copyRegionBody);
+ rewriter.mergeBlocks(copyRegionBody, beforeLocalInit,
+ {localVar, localArg});
+ }
+
+ rewriter.replaceAllUsesWith(localArg, localAlloc);
+ }
+
+ loop.getRegion().front().eraseArguments(loop.getNumInductionVars(),
+ loop.getNumLocalOperands());
+ loop.getLocalVarsMutable().clear();
+ loop.setLocalSymsAttr(nullptr);
+ }
+
+ // Collect iteration variable(s) allocations so that we can move them
// outside the `fir.do_concurrent` wrapper.
llvm::SmallVector<mlir::Operation *> opsToMove;
for (mlir::Operation &op : llvm::drop_end(wrapperBlock))
diff --git a/flang/test/Transforms/do_concurrent-to-do_loop-unodered.fir b/flang/test/Transforms/do_concurrent-to-do_loop-unodered.fir
index d2ceafdda5b22..d9ef36b175598 100644
--- a/flang/test/Transforms/do_concurrent-to-do_loop-unodered.fir
+++ b/flang/test/Transforms/do_concurrent-to-do_loop-unodered.fir
@@ -121,3 +121,64 @@ func.func @dc_2d_reduction(%i_lb: index, %i_ub: index, %i_st: index,
// CHECK: }
// CHECK: return
// CHECK: }
+
+// -----
+
+fir.local {type = local} @local_localizer : i32
+
+fir.local {type = local_init} @local_init_localizer : i32 copy {
+^bb0(%arg0: !fir.ref<i32>, %arg1: !fir.ref<i32>):
+ %0 = fir.load %arg0 : !fir.ref<i32>
+ fir.store %0 to %arg1 : !fir.ref<i32>
+ fir.yield(%arg1 : !fir.ref<i32>)
+}
+
+func.func @do_concurrent_locality_specs() {
+ %3 = fir.alloca i32 {bindc_name = "local_init_var", uniq_name = "_QFdo_concurrentElocal_init_var"}
+ %4:2 = hlfir.declare %3 {uniq_name = "_QFdo_concurrentElocal_init_var"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+ %5 = fir.alloca i32 {bindc_name = "local_var", uniq_name = "_QFdo_concurrentElocal_var"}
+ %6:2 = hlfir.declare %5 {uniq_name = "_QFdo_concurrentElocal_var"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+ %c1 = arith.constant 1 : index
+ %c10 = arith.constant 1 : index
+ fir.do_concurrent {
+ %9 = fir.alloca i32 {bindc_name = "i"}
+ %10:2 = hlfir.declare %9 {uniq_name = "_QFdo_concurrentEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+ fir.do_concurrent.loop (%arg0) = (%c1) to (%c10) step (%c1) local(@local_localizer %6#0 -> %arg1, @local_init_localizer %4#0 -> %arg2 : !fir.ref<i32>, !fir.ref<i32>) {
+ %11 = fir.convert %arg0 : (index) -> i32
+ fir.store %11 to %10#0 : !fir.ref<i32>
+ %13:2 = hlfir.declare %arg1 {uniq_name = "_QFdo_concurrentElocal_var"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+ %15:2 = hlfir.declare %arg2 {uniq_name = "_QFdo_concurrentElocal_init_var"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+ %17 = fir.load %10#0 : !fir.ref<i32>
+ %c5_i32 = arith.constant 5 : i32
+ %18 = arith.cmpi slt, %17, %c5_i32 : i32
+ fir.if %18 {
+ %c42_i32 = arith.constant 42 : i32
+ hlfir.assign %c42_i32 to %13#0 : i32, !fir.ref<i32>
+ } else {
+ %c84_i32 = arith.constant 84 : i32
+ hlfir.assign %c84_i32 to %15#0 : i32, !fir.ref<i32>
+ }
+ }
+ }
+ return
+}
+
+// CHECK-LABEL: func.func @do_concurrent_locality_specs() {
+// CHECK: %[[LOC_INIT_DECL:.*]]:2 = hlfir.declare %{{.*}} {uniq_name = "{{.*}}Elocal_init_var"}
+// CHECK: fir.do_loop %{{.*}} = %{{.*}} to %{{.*}} step %{{.*}} unordered {
+// Verify localization of the `local` var.
+// CHECK: %[[PRIV_LOC_ALLOC:.*]] = fir.alloca i32
+
+// Verify localization of the `local_init` var.
+// CHECK: %[[PRIV_LOC_INIT_ALLOC:.*]] = fir.alloca i32
+// CHECK: %[[LOC_INIT_VAL:.*]] = fir.load %[[LOC_INIT_DECL]]#0 : !fir.ref<i32>
+// CHECK: fir.store %[[LOC_INIT_VAL]] to %[[PRIV_LOC_INIT_ALLOC]] : !fir.ref<i32>
+
+// CHECK: %[[VAL_15:.*]]:2 = hlfir.declare %[[PRIV_LOC_ALLOC]]
+// CHECK: %[[VAL_16:.*]]:2 = hlfir.declare %[[PRIV_LOC_INIT_ALLOC]]
+
+// CHECK: hlfir.assign %{{.*}} to %[[VAL_15]]#0 : i32, !fir.ref<i32>
+// CHECK: hlfir.assign %{{.*}} to %[[VAL_16]]#0 : i32, !fir.ref<i32>
+// CHECK: }
+// CHECK: return
+// CHECK: }
|
33312c3
to
e673a6f
Compare
07c29b3
to
18d42fc
Compare
fir.do_concurrent
locality specs to fir.do_loop ... unordered
fir.do_concurrent
locality specs to fir.do_loop ... unordered
fir.do_concurrent
locality specs to fir.do_loop ... unordered
fir.do_concurrent
locality specs to fir.do_loop ... unordered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a TODO message for locality specifiers which are not yet supported (as I understand it this means ones with init or dealloc regions)
mlir::Value localAlloc = | ||
rewriter.create<fir::AllocaOp>(loop.getLoc(), localizer.getType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be done on the heap because it will happen for every loop iteration. Does the spec say that it has to be on the stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing in the spec says it has to be on the stack AFAICT. However, this is following the behavior of handleLocalitySpecs
in Bridge.cpp
(without delayed localization). Added a todo, let me know if you disagree with looking into this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could you use TODO(loc, "message") so that the compiler crashes (in a controlled way) instead of producing incorrect code if any of these do sneak through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant for the init
and dealloc
todo right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yes, sorry for commenting in a confusing place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good. Done.
e673a6f
to
3562e43
Compare
18d42fc
to
a4acb56
Compare
Done. I think we might end up not supporting |
Adds a new `fir.local` op to model `local` and `local_init` locality specifiers. This op is a clone of `omp.private`. In particular, this new op also models the privatization/localization logic of an SSA value in the `fir` dialect just like `omp.private` does for OpenMP. PR stack: - #137928 - #138505 (this PR) - #138506 - #138512 - #138534 - #138816
3562e43
to
bf2bd22
Compare
…138505) Adds a new `fir.local` op to model `local` and `local_init` locality specifiers. This op is a clone of `omp.private`. In particular, this new op also models the privatization/localization logic of an SSA value in the `fir` dialect just like `omp.private` does for OpenMP. PR stack: - llvm/llvm-project#137928 - llvm/llvm-project#138505 (this PR) - llvm/llvm-project#138506 - llvm/llvm-project#138512 - llvm/llvm-project#138534 - llvm/llvm-project#138816
…r.do_loop ... unordered` Extends lowering `fir.do_concurrent` to `fir.do_loop ... unordered` by adding support for locality specifiers. In particular, for `local` specifiers, a `fir.alloca` op is created using the localizer type. For `local_init` specifiers, the `copy` region is additionally inlined in the `do concurrent` loop's body.
f94ef63
to
a505cf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates
…pecs to `fir.do_loop ... unordered` (#138512) Extends lowering `fir.do_concurrent` to `fir.do_loop ... unordered` by adding support for locality specifiers. In particular, for `local` specifiers, a `fir.alloca` op is created using the localizer type. For `local_init` specifiers, the `copy` region is additionally inlined in the `do concurrent` loop's body. PR stack: - llvm/llvm-project#137928 - llvm/llvm-project#138505 - llvm/llvm-project#138506 - llvm/llvm-project#138512 (this PR) - llvm/llvm-project#138534 - llvm/llvm-project#138816
Extends lowering
fir.do_concurrent
tofir.do_loop ... unordered
by adding support for locality specifiers.In particular, for
local
specifiers, afir.alloca
op is created using the localizer type. Forlocal_init
specifiers, thecopy
region is additionally inlined in thedo concurrent
loop's body.PR stack:
do concurrent
loop nests tofir.do_concurrent
#137928fir.local
op for locality specifiers #138505fir.do_concurrent.loop
#138506fir.do_concurrent
locality specs tofir.do_loop ... unordered
#138512 (this PR)