Skip to content

[flang][OpenMP] Enable tiling #143715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Conversation

jsjodin
Copy link
Contributor

@jsjodin jsjodin commented Jun 11, 2025

This patch enables tiling in flang. It modifies the front-end to handle tiling as a nested construct if it is enlosed by an outer loop construct, or it can be stand-alone. In MLIR tiling is handled by changing the the omp.loop_nest op to be able to represent both collapse and tiling. In the MLIR->LLVM-IR the lowering of the LoopNestOp is enhanced to first do the tiling if present, then collapse.

Notes/Details:
We are able to do tiling, and combine tiling with collapse. In the parse tree after canonicalization the tile loop construct becomes nested inside any outer loop construct. For now the nesting is done for the tile construct, but keep the do-construct at the top level to minimize any impact on existing lowering/passes, and when we prepare for lowering we associate the sizes clause with the outermost loop construct so the sizes clause is available directly when we want to lower to the omp.loop_nest op. The omp.loop_nest op has been enhanced to explicitly have a 'collapse' and 'tiles' attribute so that the transforms can be applied when lowering the code to LLVM-IR. This is a temporary solution since we are working on the full solution using canonical loops and loop transformation ops. The idea was to make a minimal, but still reasonable solution. Maybe there are some some ways to clean up this code. Please let me know if you have some ideas on how to do this better. One we are happy with the structure I can fix any typos/formatting and make this into a real PR if we decide to go ahead and merge this.

Copy link

github-actions bot commented Jun 11, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@jsjodin jsjodin requested a review from Meinersbur June 16, 2025 14:58
Comment on lines +53 to +55
collapseValue = collapseValue - numTileSizes;
int64_t result = collapseValue > numTileSizes ? collapseValue : numTileSizes;
return result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain to me why we need this calculation? To see the effect of this, I tried replacing these few lines with simply return collapseValue; and ran all tests but no tests failed. So it seems this part is not tested. A test can also help explaining the purpose of the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general thing this computes is the number of loops that need to be considered in the source code. If you have collapse(4) on a loop nest with 2 loops that would be incorrect since we can max collapse 2 loops. However tiling creates new loops, so collapse(4) would theoretically be legal if tiling is done first e.g. tile(5,10) since that will result in 4 loops. This is not really testable though since collapse requires independent loops, which is only true for the 2 outer loops after tiling is done. There is a check for this, and an error message is given if the collapse value is larger than the number of loops that are tiled to prevent incorrect code. We could just use numTileSizes if that is present, but if collapse could handle dependent loops in the future the above calculation should be the correct one.

Comment on lines 637 to 646
std::int64_t sizesLengthValue = 0l;
if (auto *clause =
ClauseFinder::findUniqueClause<omp::clause::Sizes>(clauses)) {
sizesLengthValue = clause->v.size();
found = true;
}

collapseValue = collapseValue - sizesLengthValue;
collapseValue =
collapseValue < sizesLengthValue ? sizesLengthValue : collapseValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This as well does not seem to affect on tests added/modified by the PR.

Let me know if I missed something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you didn't miss anything, and because of the limitations with tile/collapse only one case (tile vs no tile) is possible. We could go with a simpler calculation if we want to.

It was also a but of an illustration for when we allow multiple loop transformations it could be pretty tricky to know what is legal and not depending on how they are combined.

Comment on lines 16 to 17
!CHECK: omp.loop_nest
!CHECK-SAME: tiles(2, 5, 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more details check lines for the loop nest structure?

Looking at the generated IR, we have 3 ops to represent the loop:

  1. The i loop is an omp.loop_nest with a tile clause attached to it.
  2. The j and k loops are fir.do_loop ops.

I kind of expected that the 3 loops will be collapsed into one omp.loop_nest op. But I am probably still missing the big picture here.

Copy link
Contributor Author

@jsjodin jsjodin Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a single omp.loop_nest op, if not then it is a bug. I should add to the omp.loop_nest op verifier that the number of tiles is not more than the number of loops that it represents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it on my machine and I'm getting this:
omp.wsloop private(@_QFtripleloopEi_private_i32 %8#0 -> %arg1, @_QFtripleloopEj_private_i32 %10#0 -> %arg2, @_QFtripleloopEk_private_i32 %12#0 -> %arg3 : !fir.ref, !fir.ref, !fir.ref) {
omp.loop_nest (%arg4, %arg5, %arg6) : i32 = (%c1_i32, %c1_i32_1, %c1_i32_3) to (%14, %15, %16) inclusive step (%c1_i32_0, %c1_i32_2, %c1_i32_4) tiles(2, 5, 10) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry for the confusion. It turns out I commented out parts of your code (to understand their effects) and forgot to uncomment them again before testing.

Well, the collapsone value calculation that I was asking about here is the part responsible for computing the proper collapse value for tiled loops (for that particular test at least).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I added checks so that collapse(N) or tiles(1..N) makes sure that N is not greater than the number of loops in the omp.loop_nest op. It doesn't take into account the interaction between tiling and collapse, but we could add this also.

@@ -131,20 +133,48 @@ class CanonicalizationOfOmp {
// Ignore compiler directives.
if (GetConstructIf<parser::CompilerDirective>(*nextIt))
continue;
// Keep track of the loops to handle the end loop directives
std::stack<parser::OpenMPLoopConstruct *> loops;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use std::vector instead of stack unless you need to add/remove elements from both ends.

Copy link
Member

@Meinersbur Meinersbur Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm::SmallVector would be preferred unless you may need to store more than 4294967295 elements

https://llvm.org/docs/ProgrammersManual.html#vector

std::stack is just a wrapper around the actual container, std::deque by default, but std::vector is possible as well (a stack only allows adding/removing elements from one end anyway). Unfortuntely, std::deque is a quite bulky container, e.g. 4096 bytes object size (!) with libc++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is quite bad! Changed it to llvm::SmallVector.

// ExecutableConstruct -> DoConstruct
// [ExecutableConstruct -> OmpEndLoopDirective]
// ExecutableConstruct -> OmpEndLoopDirective (if available)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to delete this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the optional OmpEndLoopDirective associated with the optional BeginLoopDirective for the tile construct, so it is meant to be there. I will update the "After rewriting" comment to hopefully clarify what is going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done updating comments.

Comment on lines 171 to 172
std::get<std::optional<parser::OmpEndLoopDirective>>(
loops.top()->t) = std::move(*endDir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check if the actual directive matches the "begin loop" directive, e.g

!$omp do
!$omp tile ...
do ...
enddo
!$omp end do    ! should be paired with "do", not "tile"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and before this there was not checking for any mismatches, however in CheckDirectiveStructure.h is the code that makes sure that things are matched properly and errors out if not. I will take a look at this code and see if it needs improvement in any way. If there are no tests cases using tile in clause-validiy01/02 I can add that also.

Copy link
Contributor Author

@jsjodin jsjodin Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the !$omp end tile optional? In that case we would need a special case for tile, since generally the mismatches are not handled here. If it is not optional then it should be handled later on in the checker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the special handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the !$omp end tile optional? In that case we would need a special case for tile, since generally the mismatches are not handled here. If it is not optional then it should be handled later on in the checker.

Yes, for most of the loop directives the end-directive is optional.

Comment on lines 147 to 149
std::get<std::optional<
common::Indirection<parser::OpenMPLoopConstruct>>>(
loops.top()->t) = std::move(*innerOmpLoop);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you assign the lhs to a variable? You're using the same expression again in the "loops.push" call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +1991 to +1994
CollectAssociatedLoopLevelsFromLoopConstruct(
innerOptional.value().value(), levels, clauses);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the meaning of "level" change (by something like +1 or -1) when going to an inner construct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The level here refers to the number of loops in the source code that eventually get combined into a single omp.loop_nest op, so it will be the number of tiled loops or the collapse value that will determine the level.

Comment on lines 2024 to 2031
CollectAssociatedLoopLevelFromClauseValue<parser::OmpClause::Ordered>(
clause, levels, clauses);
CollectAssociatedLoopLevelFromClauseValue<parser::OmpClause::Collapse>(
clause, levels, clauses);
CollectAssociatedLoopLevelFromClauseSize<parser::OmpClause::Sizes>(
clause, levels, clauses);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be easier to read if these functions were inlined here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::int64_t tileLevel = 0;
for (auto [level, clause] : llvm::zip_equal(levels, clauses)) {
if (isSizesClause(clause)) {
tileLevel = level;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be tileLevel += level? You can stack tiles...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is unsupported right now. We don't have a way of representing nested tiling in MLIR, but will be possible when the loop transformation ops are added. I can add a check and report an error for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out nesting tile already causes parse errors, so I'm going to leave it as-is for now if that is okay?

@jsjodin jsjodin force-pushed the jleyonberg/tiling branch from 8e08ac3 to 753653d Compare June 19, 2025 19:54
@jsjodin jsjodin requested review from kparzysz and tblah June 21, 2025 16:33
@jsjodin jsjodin marked this pull request as ready for review June 21, 2025 16:33
@jsjodin
Copy link
Contributor Author

jsjodin commented Jun 21, 2025

Making this into a regular PR and adding more reviewers since we are getting more into the details.

Copy link
Member

@ergawy ergawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another look this time trying to understand the approach on a higher level. I have one concern/question. It seems like we are "hiding" the tile directive too early in the pipeline (i.e. during PFT to the omp dialect); both for the combined and the standalone cases. I would have expected to model tile as a separate op in the dialect and handle mapping combined and standalone versions later in the compiler. In the standalone case, for instance, we conflate the loop and tile directives in one op which overloads omp.loop with more semantics than it is intended to carry.

To me, the tile is similar to simd in the sense that it is a loop wrapper op that controls division of the loop nests' iterations. So I would have expected both to have similar modelling (i.e. a separate op that impelements the LoopWrapperInterface).

You probably have good reason not to do what I suggested so just asking for my understanding.

std::tuple<OmpBeginLoopDirective, std::optional<DoConstruct>,
std::optional<common::Indirection<OpenMPLoopConstruct>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you add a comment here explaining that this is used mainly for tiling constructs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this strictly only support nesting of two loop constructs or do you plan for it to work like a linked list: with each loop construct pointing to the next inner-most construct?

@@ -2131,6 +2170,42 @@ genLoopOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
return loopOp;
}

static mlir::omp::LoopOp genTiledLoopOp(lower::AbstractConverter &converter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse genLoopOp instead of adding a new function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jsjodin
Copy link
Contributor Author

jsjodin commented Jun 23, 2025

Took another look this time trying to understand the approach on a higher level. I have one concern/question. It seems like we are "hiding" the tile directive too early in the pipeline (i.e. during PFT to the omp dialect); both for the combined and the standalone cases. I would have expected to model tile as a separate op in the dialect and handle mapping combined and standalone versions later in the compiler. In the standalone case, for instance, we conflate the loop and tile directives in one op which overloads omp.loop with more semantics than it is intended to carry.

To me, the tile is similar to simd in the sense that it is a loop wrapper op that controls division of the loop nests' iterations. So I would have expected both to have similar modelling (i.e. a separate op that impelements the LoopWrapperInterface).

You probably have good reason not to do what I suggested so just asking for my understanding.

The reason for not doing this right now is that to use a separate op for tiling, we would need the more of the complete implementation which includes canonical loop info (CLI) values, a separate tile operation, and extending the loop nest op in a different way. This is the next step that @Meinersbur is working on, but we wanted to make tiling available sooner.

last.clauses.push_back(node);
return true;
} else {
// llvm::errs() << "** OVERRIDING isAllowedClauseForDirective **\n";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kparzysz I'm not sure if there is a way to override this in a cleaner way? Is it easy to introduce a new internal clause (e..g tile_sizes) and instead of just moving the sizes clause from tile to the outer construct, we instead translate it into the internal clause? I'm not sure if there are surrounding data structures that will make this hard, like location information and such?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be only one leaf in the compound directive that "sizes" can apply to. Are you expecting a deviation from this in your implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since we are moving the sizes clause to the parent directive from tile to whatever happens to enclose it. That's why I was thinking about adding a new internal clause to specify tiling (just like collapse is specified) in the case where the tiling is not stand alone. This will no longer be needed though once @Meinersbur's patch is merged and we would use a separate op for tiling instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into adding a new clause and allow it wherever collapse is allowed, but it doesn't work well because it affects clang. Practically the effect of always allowing the sizes clause almost the same as only allowing it in the same places as collapse, since collapse is allowed in almost all loop directives, but it is still not good. I will look for a different solution if possible, maybe keeping things separate until lowering to the omp.loop_nest op is doable.

Copy link
Contributor

@kparzysz kparzysz Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to leave this unchanged and then modify the output of this.

Edit: Or perhaps remove "sizes" from the clause list at the beginning, and store the values somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is best left unchanged. I'm looking into making changes to the genOMP functions so that we can either store the values as you suggest, or update the generated omp.loop_nest op afterwards. It depends on which direction is easier to pass information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of trying to pass information around, the parse tree is available when collecting the information to create the loop nest ops, so that's the solution I went with. We no longer try to move the sizes clause to the parent loop construct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked as unresolved again, hit the button by accident. @kparzysz I think the new solution is better since no information needs to be passed anywhere, we just don't get the to use the conveniences from the PFT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the sizes clause is on the directive that allows it, then do you need to make any changes to this file at all?

@tblah
Copy link
Contributor

tblah commented Jun 23, 2025

This is the next step that @Meinersbur is working on, but we wanted to make tiling available sooner.

Are you against a deadline for the tiling support? @Meinersbur's patch is already up for review and looked high quality to me. I don't think it is that far away from being approved. Maybe the tiling support should be based upon that patch.

@jsjodin
Copy link
Contributor Author

jsjodin commented Jun 23, 2025

This is the next step that @Meinersbur is working on, but we wanted to make tiling available sooner.

Are you against a deadline for the tiling support? @Meinersbur's patch is already up for review and looked high quality to me. I don't think it is that far away from being approved. Maybe the tiling support should be based upon that patch.

Yes, there is a bit of a deadline, however @Meinersbur 's patch is close to being approved it is probably best to get that in first, and then rework this patch. My concern is that this patch will cause merge issues that would delay the more permanent solution, which is not ideal.

@jsjodin
Copy link
Contributor Author

jsjodin commented Jun 23, 2025

This is the next step that @Meinersbur is working on, but we wanted to make tiling available sooner.

Are you against a deadline for the tiling support? @Meinersbur's patch is already up for review and looked high quality to me. I don't think it is that far away from being approved. Maybe the tiling support should be based upon that patch.

Yes, there is a bit of a deadline, however @Meinersbur 's patch is close to being approved it is probably best to get that in first, and then rework this patch. My concern is that this patch will cause merge issues that would delay the more permanent solution, which is not ideal.

Looking at the other patch, the overlap is actually not that big since it focuses on stand-alone unrolling only, so I think the order is not that critical.

@@ -112,15 +111,19 @@ class CanonicalizationOfOmp {
// in the same iteration
//
// Original:
// ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct
// OmpBeginLoopDirective
// ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct t->
Copy link
Member

@ergawy ergawy Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the implementation of this function quite complex to understand tbh, for me at least. Specially with the 2 nested while loops (the old one and the one introduced by the PR).

While trying to understand what is going on, I tried to simpify it by flattening the logic a little bit:

  void RewriteOpenMPLoopConstruct(parser::OpenMPLoopConstruct &x,
      parser::Block &block, parser::Block::iterator it) {
    // Check the sequence of DoConstruct and OmpEndLoopDirective
    // in the same iteration
    //
    // Original:
    //   ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct t->
    //     OmpBeginLoopDirective t-> OmpLoopDirective
    //   [ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct u->
    //     OmpBeginLoopDirective t-> OmpLoopDirective t-> Tile v-> OMP_tile]
    //   ExecutableConstruct -> DoConstruct
    //   [ExecutableConstruct -> OmpEndLoopDirective] (note: tile)
    //   ExecutableConstruct -> OmpEndLoopDirective (if available)
    //
    // After rewriting:
    //   ExecutableConstruct -> OpenMPConstruct -> OpenMPLoopConstruct t->
    //     [OpenMPLoopConstruct t -> OmpBeginLoopDirective -> OmpLoopDirective
    //                               OmpEndLoopDirective] (note: tile)
    //     OmpBeginLoopDirective t -> OmpLoopDirective -> DoConstruct
    //     OmpEndLoopDirective (if available)
    parser::Block::iterator nextIt;
    auto &beginDir{std::get<parser::OmpBeginLoopDirective>(x.t)};
    auto &dir{std::get<parser::OmpLoopDirective>(beginDir.t)};

    // Ignore compiler directives.
    nextIt = std::find_if_not(++it, block.end(), [this](auto &y) {
      return GetConstructIf<parser::CompilerDirective>(y);
    });

    if (nextIt == block.end()) {
      return;
    }

    auto &optionalInnerTile = std::get<
        std::optional<common::Indirection<parser::OpenMPLoopConstruct>>>(x.t);

    if (auto *innerConstruct{
            GetConstructIf<parser::OpenMPConstruct>(*nextIt)}) {
      if (auto *innerOmpLoop{
              std::get_if<parser::OpenMPLoopConstruct>(&innerConstruct->u)}) {
        auto &innerBeginDir{
            std::get<parser::OmpBeginLoopDirective>(innerOmpLoop->t)};
        auto &innerDir{std::get<parser::OmpLoopDirective>(innerBeginDir.t)};
        if (innerDir.v == llvm::omp::Directive::OMPD_tile) {
          optionalInnerTile = std::move(*innerOmpLoop);
          // Retrieveing the address so that DoConstruct or inner loop can be
          // set later.
          nextIt = block.erase(nextIt);
        }
      }
    }

    if (auto *doCons{GetConstructIf<parser::DoConstruct>(*nextIt)}) {
      if (doCons->GetLoopControl()) {
        parser::OpenMPLoopConstruct *innermostLoop =
            optionalInnerTile ? &optionalInnerTile.value().value() : &x;
        std::get<std::optional<parser::DoConstruct>>(innermostLoop->t) =
            std::move(*doCons);
        nextIt = block.erase(nextIt);

      } else {
        messages_.Say(dir.source,
            "DO loop after the %s directive must have loop control"_err_en_US,
            parser::ToUpperCaseLetters(dir.source.ToString()));
      }
    } else {
      messages_.Say(dir.source,
          "A DO loop must follow the %s directive"_err_en_US,
          parser::ToUpperCaseLetters(dir.source.ToString()));
    }

    auto tryToMatchEndLoopDirective = [&, this](
                                          parser::OpenMPLoopConstruct *loop) {
      if (nextIt == block.end()) {
        return false;
      }

      if (auto *endDir{GetConstructIf<parser::OmpEndLoopDirective>(*nextIt)}) {
        auto &endOmpDirective{std::get<parser::OmpLoopDirective>(endDir->t)};
        auto &loopBegin{std::get<parser::OmpBeginLoopDirective>(loop->t)};
        auto &loopDir{std::get<parser::OmpLoopDirective>(loopBegin.t)};

        // If the directive is a tile we try to match the corresponding
        // end tile if it exsists. If it is not a tile directive we
        // always assign the end loop directive and fall back on the
        // existing directive structure checks.
        if (loopDir.v != llvm::omp::Directive::OMPD_tile ||
            loopDir.v == endOmpDirective.v) {
          std::get<std::optional<parser::OmpEndLoopDirective>>(loop->t) =
              std::move(*endDir);
          nextIt = block.erase(nextIt);
        }

        return true;
      }

      // If there is a mismatch bail out.
      return false;
    };

    if (optionalInnerTile) {
      if (tryToMatchEndLoopDirective(&optionalInnerTile.value().value())) {
        tryToMatchEndLoopDirective(&x);
      }
    } else {
      tryToMatchEndLoopDirective(&x);
    }
  }

This is a, hopefully, more simplified version that matches the original logic. Feel free to ignore if I got the logic wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep the existing one for now, so other reviewers don't have to go through it again. We can make this a separate cleanup PR afterwards. I'd move the lambda function outside of this function definition btw since it is not passed anywhere.

jsjodin added 2 commits June 25, 2025 10:11
…d the tile

sizes through the parse tree when getting the information needed to create the
loop nest ops.
// Keep track of the loops to handle the end loop directives
llvm::SmallVector<parser::OpenMPLoopConstruct *> loops;
loops.push_back(&x);
if (auto *innerConstruct{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (auto *innerConstruct{
if (auto *innerConstruct{GetOmpIf<parser::OpenMPLoopConstruct>(*nextIt)}) {

We can simplify here and get the OpenMPLoopConstruct directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay thanks! I will fix that.

@@ -131,20 +134,59 @@ class CanonicalizationOfOmp {
// Ignore compiler directives.
if (GetConstructIf<parser::CompilerDirective>(*nextIt))
continue;
// Keep track of the loops to handle the end loop directives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show how the Parse Tree looks with these changes, and include tests for this?

I am currently working on something similar for tile and unroll as they can both be nested within an OMP Loop Construct as part of #110008 so the semantics allow for the nesting.

I am not working on lowering, only semantics support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add a test for this.

std::tuple<OmpBeginLoopDirective, std::optional<DoConstruct>,
std::optional<common::Indirection<OpenMPLoopConstruct>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this strictly only support nesting of two loop constructs or do you plan for it to work like a linked list: with each loop construct pointing to the next inner-most construct?

@@ -2992,18 +2991,60 @@ convertOmpLoopNest(Operation &opInst, llvm::IRBuilderBase &builder,
loopInfos.push_back(*loopResult);
}

// Collapse loops. Store the insertion point because LoopInfos may get
// invalidated.
// llvm::OpenMPIRBuilder::InsertPointTy afterIP = builder.saveIP();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was keeping this unintentional?

TileSizes.push_back(TileVal);
}

std::vector<llvm::CanonicalLoopInfo *> NewLoops =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SmallVector

Comment on lines +3017 to +3018
auto AfterBB = NewLoops.front()->getAfter();
auto AfterAfterBB = AfterBB->getSingleSuccessor();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (feel free to ignore): personally I find it a bit error-prone when auto hides that something is a pointer.

Suggested change
auto AfterBB = NewLoops.front()->getAfter();
auto AfterAfterBB = AfterBB->getSingleSuccessor();
auto *AfterBB = NewLoops.front()->getAfter();
auto *AfterAfterBB = AfterBB->getSingleSuccessor();

Alternatively, spelling out auto would be fine here too.

std::vector<llvm::CanonicalLoopInfo *> NewLoops =
ompBuilder->tileLoops(ompLoc.DL, loopInfos, TileSizes);

// Collapse loops. Store the insertion point because LoopInfos may get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand in what sense this is collapsing the loops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants