-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[flang][OpenMP] Enable tiling #143715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[flang][OpenMP] Enable tiling #143715
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
collapseValue = collapseValue - numTileSizes; | ||
int64_t result = collapseValue > numTileSizes ? collapseValue : numTileSizes; | ||
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain to me why we need this calculation? To see the effect of this, I tried replacing these few lines with simply return collapseValue;
and ran all tests but no tests failed. So it seems this part is not tested. A test can also help explaining the purpose of the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general thing this computes is the number of loops that need to be considered in the source code. If you have collapse(4) on a loop nest with 2 loops that would be incorrect since we can max collapse 2 loops. However tiling creates new loops, so collapse(4) would theoretically be legal if tiling is done first e.g. tile(5,10) since that will result in 4 loops. This is not really testable though since collapse requires independent loops, which is only true for the 2 outer loops after tiling is done. There is a check for this, and an error message is given if the collapse value is larger than the number of loops that are tiled to prevent incorrect code. We could just use numTileSizes if that is present, but if collapse could handle dependent loops in the future the above calculation should be the correct one.
flang/lib/Lower/OpenMP/Utils.cpp
Outdated
std::int64_t sizesLengthValue = 0l; | ||
if (auto *clause = | ||
ClauseFinder::findUniqueClause<omp::clause::Sizes>(clauses)) { | ||
sizesLengthValue = clause->v.size(); | ||
found = true; | ||
} | ||
|
||
collapseValue = collapseValue - sizesLengthValue; | ||
collapseValue = | ||
collapseValue < sizesLengthValue ? sizesLengthValue : collapseValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This as well does not seem to affect on tests added/modified by the PR.
Let me know if I missed something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No you didn't miss anything, and because of the limitations with tile/collapse only one case (tile vs no tile) is possible. We could go with a simpler calculation if we want to.
It was also a but of an illustration for when we allow multiple loop transformations it could be pretty tricky to know what is legal and not depending on how they are combined.
!CHECK: omp.loop_nest | ||
!CHECK-SAME: tiles(2, 5, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more details check lines for the loop nest structure?
Looking at the generated IR, we have 3 ops to represent the loop:
- The
i
loop is anomp.loop_nest
with atile
clause attached to it. - The
j
andk
loops arefir.do_loop
ops.
I kind of expected that the 3 loops will be collapsed into one omp.loop_nest
op. But I am probably still missing the big picture here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a single omp.loop_nest op, if not then it is a bug. I should add to the omp.loop_nest op verifier that the number of tiles is not more than the number of loops that it represents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran it on my machine and I'm getting this:
omp.wsloop private(@_QFtripleloopEi_private_i32 %8#0 -> %arg1, @_QFtripleloopEj_private_i32 %10#0 -> %arg2, @_QFtripleloopEk_private_i32 %12#0 -> %arg3 : !fir.ref, !fir.ref, !fir.ref) {
omp.loop_nest (%arg4, %arg5, %arg6) : i32 = (%c1_i32, %c1_i32_1, %c1_i32_3) to (%14, %15, %16) inclusive step (%c1_i32_0, %c1_i32_2, %c1_i32_4) tiles(2, 5, 10) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry for the confusion. It turns out I commented out parts of your code (to understand their effects) and forgot to uncomment them again before testing.
Well, the collapsone value calculation that I was asking about here is the part responsible for computing the proper collapse value for tiled loops (for that particular test at least).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I added checks so that collapse(N) or tiles(1..N) makes sure that N is not greater than the number of loops in the omp.loop_nest op. It doesn't take into account the interaction between tiling and collapse, but we could add this also.
@@ -131,20 +133,48 @@ class CanonicalizationOfOmp { | |||
// Ignore compiler directives. | |||
if (GetConstructIf<parser::CompilerDirective>(*nextIt)) | |||
continue; | |||
// Keep track of the loops to handle the end loop directives | |||
std::stack<parser::OpenMPLoopConstruct *> loops; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use std::vector
instead of stack unless you need to add/remove elements from both ends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm::SmallVector would be preferred unless you may need to store more than 4294967295 elements
https://llvm.org/docs/ProgrammersManual.html#vector
std::stack is just a wrapper around the actual container, std::deque
by default, but std::vector is possible as well (a stack only allows adding/removing elements from one end anyway). Unfortuntely, std::deque is a quite bulky container, e.g. 4096 bytes object size (!) with libc++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is quite bad! Changed it to llvm::SmallVector.
// ExecutableConstruct -> DoConstruct | ||
// [ExecutableConstruct -> OmpEndLoopDirective] | ||
// ExecutableConstruct -> OmpEndLoopDirective (if available) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to delete this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the optional OmpEndLoopDirective associated with the optional BeginLoopDirective for the tile construct, so it is meant to be there. I will update the "After rewriting" comment to hopefully clarify what is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done updating comments.
std::get<std::optional<parser::OmpEndLoopDirective>>( | ||
loops.top()->t) = std::move(*endDir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should check if the actual directive matches the "begin loop" directive, e.g
!$omp do
!$omp tile ...
do ...
enddo
!$omp end do ! should be paired with "do", not "tile"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this and before this there was not checking for any mismatches, however in CheckDirectiveStructure.h is the code that makes sure that things are matched properly and errors out if not. I will take a look at this code and see if it needs improvement in any way. If there are no tests cases using tile in clause-validiy01/02 I can add that also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the !$omp end tile optional? In that case we would need a special case for tile, since generally the mismatches are not handled here. If it is not optional then it should be handled later on in the checker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with the special handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the !$omp end tile optional? In that case we would need a special case for tile, since generally the mismatches are not handled here. If it is not optional then it should be handled later on in the checker.
Yes, for most of the loop directives the end-directive is optional.
std::get<std::optional< | ||
common::Indirection<parser::OpenMPLoopConstruct>>>( | ||
loops.top()->t) = std::move(*innerOmpLoop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you assign the lhs to a variable? You're using the same expression again in the "loops.push" call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
CollectAssociatedLoopLevelsFromLoopConstruct( | ||
innerOptional.value().value(), levels, clauses); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the meaning of "level" change (by something like +1 or -1) when going to an inner construct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The level here refers to the number of loops in the source code that eventually get combined into a single omp.loop_nest op, so it will be the number of tiled loops or the collapse value that will determine the level.
CollectAssociatedLoopLevelFromClauseValue<parser::OmpClause::Ordered>( | ||
clause, levels, clauses); | ||
CollectAssociatedLoopLevelFromClauseValue<parser::OmpClause::Collapse>( | ||
clause, levels, clauses); | ||
CollectAssociatedLoopLevelFromClauseSize<parser::OmpClause::Sizes>( | ||
clause, levels, clauses); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be easier to read if these functions were inlined here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
std::int64_t tileLevel = 0; | ||
for (auto [level, clause] : llvm::zip_equal(levels, clauses)) { | ||
if (isSizesClause(clause)) { | ||
tileLevel = level; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be tileLevel += level
? You can stack tile
s...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is unsupported right now. We don't have a way of representing nested tiling in MLIR, but will be possible when the loop transformation ops are added. I can add a check and report an error for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out nesting tile already causes parse errors, so I'm going to leave it as-is for now if that is okay?
8e08ac3
to
753653d
Compare
Making this into a regular PR and adding more reviewers since we are getting more into the details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took another look this time trying to understand the approach on a higher level. I have one concern/question. It seems like we are "hiding" the tile
directive too early in the pipeline (i.e. during PFT to the omp
dialect); both for the combined and the standalone cases. I would have expected to model tile
as a separate op in the dialect and handle mapping combined and standalone versions later in the compiler. In the standalone case, for instance, we conflate the loop
and tile
directives in one op which overloads omp.loop
with more semantics than it is intended to carry.
To me, the tile
is similar to simd
in the sense that it is a loop wrapper op that controls division of the loop nests' iterations. So I would have expected both to have similar modelling (i.e. a separate op that impelements the LoopWrapperInterface
).
You probably have good reason not to do what I suggested so just asking for my understanding.
std::tuple<OmpBeginLoopDirective, std::optional<DoConstruct>, | ||
std::optional<common::Indirection<OpenMPLoopConstruct>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you add a comment here explaining that this is used mainly for tiling constructs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this strictly only support nesting of two loop constructs or do you plan for it to work like a linked list: with each loop construct pointing to the next inner-most construct?
flang/lib/Lower/OpenMP/OpenMP.cpp
Outdated
@@ -2131,6 +2170,42 @@ genLoopOp(lower::AbstractConverter &converter, lower::SymMap &symTable, | |||
return loopOp; | |||
} | |||
|
|||
static mlir::omp::LoopOp genTiledLoopOp(lower::AbstractConverter &converter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse genLoopOp
instead of adding a new function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The reason for not doing this right now is that to use a separate op for tiling, we would need the more of the complete implementation which includes canonical loop info (CLI) values, a separate tile operation, and extending the loop nest op in a different way. This is the next step that @Meinersbur is working on, but we wanted to make tiling available sooner. |
last.clauses.push_back(node); | ||
return true; | ||
} else { | ||
// llvm::errs() << "** OVERRIDING isAllowedClauseForDirective **\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kparzysz I'm not sure if there is a way to override this in a cleaner way? Is it easy to introduce a new internal clause (e..g tile_sizes) and instead of just moving the sizes clause from tile to the outer construct, we instead translate it into the internal clause? I'm not sure if there are surrounding data structures that will make this hard, like location information and such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be only one leaf in the compound directive that "sizes" can apply to. Are you expecting a deviation from this in your implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, since we are moving the sizes clause to the parent directive from tile to whatever happens to enclose it. That's why I was thinking about adding a new internal clause to specify tiling (just like collapse is specified) in the case where the tiling is not stand alone. This will no longer be needed though once @Meinersbur's patch is merged and we would use a separate op for tiling instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into adding a new clause and allow it wherever collapse is allowed, but it doesn't work well because it affects clang. Practically the effect of always allowing the sizes clause almost the same as only allowing it in the same places as collapse, since collapse is allowed in almost all loop directives, but it is still not good. I will look for a different solution if possible, maybe keeping things separate until lowering to the omp.loop_nest op is doable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be to leave this unchanged and then modify the output of this.
Edit: Or perhaps remove "sizes" from the clause list at the beginning, and store the values somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is best left unchanged. I'm looking into making changes to the genOMP functions so that we can either store the values as you suggest, or update the generated omp.loop_nest op afterwards. It depends on which direction is easier to pass information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of trying to pass information around, the parse tree is available when collecting the information to create the loop nest ops, so that's the solution I went with. We no longer try to move the sizes clause to the parent loop construct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marked as unresolved again, hit the button by accident. @kparzysz I think the new solution is better since no information needs to be passed anywhere, we just don't get the to use the conveniences from the PFT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the sizes clause is on the directive that allows it, then do you need to make any changes to this file at all?
Are you against a deadline for the tiling support? @Meinersbur's patch is already up for review and looked high quality to me. I don't think it is that far away from being approved. Maybe the tiling support should be based upon that patch. |
Yes, there is a bit of a deadline, however @Meinersbur 's patch is close to being approved it is probably best to get that in first, and then rework this patch. My concern is that this patch will cause merge issues that would delay the more permanent solution, which is not ideal. |
Looking at the other patch, the overlap is actually not that big since it focuses on stand-alone unrolling only, so I think the order is not that critical. |
@@ -112,15 +111,19 @@ class CanonicalizationOfOmp { | |||
// in the same iteration | |||
// | |||
// Original: | |||
// ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct | |||
// OmpBeginLoopDirective | |||
// ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct t-> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the implementation of this function quite complex to understand tbh, for me at least. Specially with the 2 nested while
loops (the old one and the one introduced by the PR).
While trying to understand what is going on, I tried to simpify it by flattening the logic a little bit:
void RewriteOpenMPLoopConstruct(parser::OpenMPLoopConstruct &x,
parser::Block &block, parser::Block::iterator it) {
// Check the sequence of DoConstruct and OmpEndLoopDirective
// in the same iteration
//
// Original:
// ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct t->
// OmpBeginLoopDirective t-> OmpLoopDirective
// [ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct u->
// OmpBeginLoopDirective t-> OmpLoopDirective t-> Tile v-> OMP_tile]
// ExecutableConstruct -> DoConstruct
// [ExecutableConstruct -> OmpEndLoopDirective] (note: tile)
// ExecutableConstruct -> OmpEndLoopDirective (if available)
//
// After rewriting:
// ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct t->
// [OpenMPLoopConstruct t -> OmpBeginLoopDirective -> OmpLoopDirective
// OmpEndLoopDirective] (note: tile)
// OmpBeginLoopDirective t -> OmpLoopDirective -> DoConstruct
// OmpEndLoopDirective (if available)
parser::Block::iterator nextIt;
auto &beginDir{std::get<parser::OmpBeginLoopDirective>(x.t)};
auto &dir{std::get<parser::OmpLoopDirective>(beginDir.t)};
// Ignore compiler directives.
nextIt = std::find_if_not(++it, block.end(), [this](auto &y) {
return GetConstructIf<parser::CompilerDirective>(y);
});
if (nextIt == block.end()) {
return;
}
auto &optionalInnerTile = std::get<
std::optional<common::Indirection<parser::OpenMPLoopConstruct>>>(x.t);
if (auto *innerConstruct{
GetConstructIf<parser::OpenMPConstruct>(*nextIt)}) {
if (auto *innerOmpLoop{
std::get_if<parser::OpenMPLoopConstruct>(&innerConstruct->u)}) {
auto &innerBeginDir{
std::get<parser::OmpBeginLoopDirective>(innerOmpLoop->t)};
auto &innerDir{std::get<parser::OmpLoopDirective>(innerBeginDir.t)};
if (innerDir.v == llvm::omp::Directive::OMPD_tile) {
optionalInnerTile = std::move(*innerOmpLoop);
// Retrieveing the address so that DoConstruct or inner loop can be
// set later.
nextIt = block.erase(nextIt);
}
}
}
if (auto *doCons{GetConstructIf<parser::DoConstruct>(*nextIt)}) {
if (doCons->GetLoopControl()) {
parser::OpenMPLoopConstruct *innermostLoop =
optionalInnerTile ? &optionalInnerTile.value().value() : &x;
std::get<std::optional<parser::DoConstruct>>(innermostLoop->t) =
std::move(*doCons);
nextIt = block.erase(nextIt);
} else {
messages_.Say(dir.source,
"DO loop after the %s directive must have loop control"_err_en_US,
parser::ToUpperCaseLetters(dir.source.ToString()));
}
} else {
messages_.Say(dir.source,
"A DO loop must follow the %s directive"_err_en_US,
parser::ToUpperCaseLetters(dir.source.ToString()));
}
auto tryToMatchEndLoopDirective = [&, this](
parser::OpenMPLoopConstruct *loop) {
if (nextIt == block.end()) {
return false;
}
if (auto *endDir{GetConstructIf<parser::OmpEndLoopDirective>(*nextIt)}) {
auto &endOmpDirective{std::get<parser::OmpLoopDirective>(endDir->t)};
auto &loopBegin{std::get<parser::OmpBeginLoopDirective>(loop->t)};
auto &loopDir{std::get<parser::OmpLoopDirective>(loopBegin.t)};
// If the directive is a tile we try to match the corresponding
// end tile if it exsists. If it is not a tile directive we
// always assign the end loop directive and fall back on the
// existing directive structure checks.
if (loopDir.v != llvm::omp::Directive::OMPD_tile ||
loopDir.v == endOmpDirective.v) {
std::get<std::optional<parser::OmpEndLoopDirective>>(loop->t) =
std::move(*endDir);
nextIt = block.erase(nextIt);
}
return true;
}
// If there is a mismatch bail out.
return false;
};
if (optionalInnerTile) {
if (tryToMatchEndLoopDirective(&optionalInnerTile.value().value())) {
tryToMatchEndLoopDirective(&x);
}
} else {
tryToMatchEndLoopDirective(&x);
}
}
This is a, hopefully, more simplified version that matches the original logic. Feel free to ignore if I got the logic wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep the existing one for now, so other reviewers don't have to go through it again. We can make this a separate cleanup PR afterwards. I'd move the lambda function outside of this function definition btw since it is not passed anywhere.
…d the tile sizes through the parse tree when getting the information needed to create the loop nest ops.
// Keep track of the loops to handle the end loop directives | ||
llvm::SmallVector<parser::OpenMPLoopConstruct *> loops; | ||
loops.push_back(&x); | ||
if (auto *innerConstruct{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (auto *innerConstruct{ | |
if (auto *innerConstruct{GetOmpIf<parser::OpenMPLoopConstruct>(*nextIt)}) { |
We can simplify here and get the OpenMPLoopConstruct directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay thanks! I will fix that.
@@ -131,20 +134,59 @@ class CanonicalizationOfOmp { | |||
// Ignore compiler directives. | |||
if (GetConstructIf<parser::CompilerDirective>(*nextIt)) | |||
continue; | |||
// Keep track of the loops to handle the end loop directives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show how the Parse Tree looks with these changes, and include tests for this?
I am currently working on something similar for tile
and unroll
as they can both be nested within an OMP Loop Construct as part of #110008 so the semantics allow for the nesting.
I am not working on lowering, only semantics support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can add a test for this.
std::tuple<OmpBeginLoopDirective, std::optional<DoConstruct>, | ||
std::optional<common::Indirection<OpenMPLoopConstruct>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this strictly only support nesting of two loop constructs or do you plan for it to work like a linked list: with each loop construct pointing to the next inner-most construct?
@@ -2992,18 +2991,60 @@ convertOmpLoopNest(Operation &opInst, llvm::IRBuilderBase &builder, | |||
loopInfos.push_back(*loopResult); | |||
} | |||
|
|||
// Collapse loops. Store the insertion point because LoopInfos may get | |||
// invalidated. | |||
// llvm::OpenMPIRBuilder::InsertPointTy afterIP = builder.saveIP(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was keeping this unintentional?
TileSizes.push_back(TileVal); | ||
} | ||
|
||
std::vector<llvm::CanonicalLoopInfo *> NewLoops = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: SmallVector
auto AfterBB = NewLoops.front()->getAfter(); | ||
auto AfterAfterBB = AfterBB->getSingleSuccessor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (feel free to ignore): personally I find it a bit error-prone when auto
hides that something is a pointer.
auto AfterBB = NewLoops.front()->getAfter(); | |
auto AfterAfterBB = AfterBB->getSingleSuccessor(); | |
auto *AfterBB = NewLoops.front()->getAfter(); | |
auto *AfterAfterBB = AfterBB->getSingleSuccessor(); |
Alternatively, spelling out auto
would be fine here too.
std::vector<llvm::CanonicalLoopInfo *> NewLoops = | ||
ompBuilder->tileLoops(ompLoc.DL, loopInfos, TileSizes); | ||
|
||
// Collapse loops. Store the insertion point because LoopInfos may get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand in what sense this is collapsing the loops
This patch enables tiling in flang. It modifies the front-end to handle tiling as a nested construct if it is enlosed by an outer loop construct, or it can be stand-alone. In MLIR tiling is handled by changing the the omp.loop_nest op to be able to represent both collapse and tiling. In the MLIR->LLVM-IR the lowering of the LoopNestOp is enhanced to first do the tiling if present, then collapse.
Notes/Details:
We are able to do tiling, and combine tiling with collapse. In the parse tree after canonicalization the tile loop construct becomes nested inside any outer loop construct. For now the nesting is done for the tile construct, but keep the do-construct at the top level to minimize any impact on existing lowering/passes, and when we prepare for lowering we associate the sizes clause with the outermost loop construct so the sizes clause is available directly when we want to lower to the omp.loop_nest op. The omp.loop_nest op has been enhanced to explicitly have a 'collapse' and 'tiles' attribute so that the transforms can be applied when lowering the code to LLVM-IR. This is a temporary solution since we are working on the full solution using canonical loops and loop transformation ops. The idea was to make a minimal, but still reasonable solution. Maybe there are some some ways to clean up this code. Please let me know if you have some ideas on how to do this better. One we are happy with the structure I can fix any typos/formatting and make this into a real PR if we decide to go ahead and merge this.