Skip to content

Uniformity Analysis: incorrect control-divergence result. #137277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jgu222 opened this issue Apr 25, 2025 · 6 comments
Open

Uniformity Analysis: incorrect control-divergence result. #137277

jgu222 opened this issue Apr 25, 2025 · 6 comments
Assignees

Comments

@jgu222
Copy link
Contributor

jgu222 commented Apr 25, 2025

The uniform analysis fails to recognize PHI node as divergence.

The test (phi_issue.ll) using AMDGPU to get divergence branch. The uniformity
analysis marks all PHI nodes at B6 as uniform, which is incorrect.
Basically, it is the following simple if-then-else (except there are multiple
basic blocks in both then and else brances):

       div-if  ;  divergent branch
      /       \
  x0=1    x1 = 2
     \        /
      \     /
    x = phi  x0, x1 

x is marked as UNIFORM, which is incorrect as div-if is divergent branch


;
; opt -mtriple amdgcn-unknown-amdhsa -passes='print<uniformity>' -disable-output  phi_issue.ll
;
define amdgpu_kernel void @test_ctrl_divergence(i32 %a, i32 %b, i32 %c, i32 %d) {
Entry:
  %tid = call i32 @llvm.amdgcn.workitem.id.x()
  %div.cond = icmp eq i32 %tid, 0
  br i1 %div.cond, label %B3, label %B0 ; divergent branch

B0:
  %a0 = add i32 %a, 1
  br label %B1

B1:
  %b0 = add i32 %b, 2
  br label %B2
  
B2:
  %c0 = add i32 %c, 3
  br label %B6

B3:
  %a1 = add i32 %a, 10
  br label %B4

B4:
  %b1 = add i32 %b, 20
  br label %B5
  
B5:
  %c1 = add i32 %c, 30
  br label %B6

B6:
  %div_a = phi i32 [%a0, %B2], [%a1,  %B5]
  %div_b = phi i32 [%b0, %B2], [%b1,  %B5]
  %div_c = phi i32 [%c0, %B2], [%c1,  %B5]
  br i1 %div.cond, label %B8, label %B7 ; divergent branch

B7:
  %d1 = add i32 %d, 1
  br label %B8

B8:
  %div_d = phi i32 [%d1, %B7], [%d, %B6]
  ret void
}


declare i32 @llvm.amdgcn.workitem.id.x() #0  

attributes #0 = {nounwind readnone }

%opt -mtriple amdgcn-unknown-amdhsa -passes='print<uniformity>' -disable-output  phi_issue.ll
...
LOCK Entry
DEFINITIONS
  DIVERGENT:   %tid = call i32 @llvm.amdgcn.workitem.id.x()
  DIVERGENT:   %div.cond = icmp eq i32 %tid, 0
TERMINATORS
  DIVERGENT:   br i1 %div.cond, label %B3, label %B0
END BLOCK
......
BLOCK B6
DEFINITIONS
               %div_a = phi i32 [ %a0, %B2 ], [ %a1, %B5 ]
               %div_b = phi i32 [ %b0, %B2 ], [ %b1, %B5 ]
               %div_c = phi i32 [ %c0, %B2 ], [ %c1, %B5 ]
TERMINATORS
  DIVERGENT:   br i1 %div.cond, label %B8, label %B7
END BLOCK
@jgu222
Copy link
Contributor Author

jgu222 commented Apr 28, 2025

The issue seems to be "early stop" logic inside GenericUniformityImpl.h/computeJoinPoints(), in which it uses FloorIdx to check early exit. That code seems to be from the previous code (no longer in the llvm trunk):

    SyncDependenceAnalysis.cpp/DivergencePropagator()

The original
commit 05ae04c
[DA][SDA] SyncDependenceAnalysis re-write

Differential Revision: https://reviews.llvm.org/D84413

This code was later to be refactored to support irreducible CFG (I think) from the following:
commit 475ce4c
RFC: Uniformity Analysis for Irreducible Control Flow
Differential Revision: https://reviews.llvm.org/D130746


If adding the following and the it can work correct.

diff --git a/llvm/include/llvm/ADT/GenericUniformityImpl.h b/llvm/include/llvm/ADT/GenericUniformityImpl.h
index d10355fff1be..1fbb245618d4 100644
--- a/llvm/include/llvm/ADT/GenericUniformityImpl.h
+++ b/llvm/include/llvm/ADT/GenericUniformityImpl.h
@@ -704,6 +704,7 @@ public:
         FloorIdx = LoweredFloorIdx;
         FloorLabel = Label;
       }
+      FloorIdx = LoweredFloorIdx;
     }

@jgu222
Copy link
Contributor Author

jgu222 commented Apr 28, 2025

I'd like to work on a fix. Could someone change it to "good first issue" and assign it to jgu222. thx

@ssahasra
Copy link
Collaborator

ssahasra commented May 5, 2025

Hi Junjie, apologies for the late reply! I have reassigned the issue to you. Please feel very welcome at contributing a fix!

@ssahasra
Copy link
Collaborator

ssahasra commented May 6, 2025

Adding @nhaehnle. I believe the initial analysis is correct. There is a problem with the early exit criterion that tracks the BlockIdx and FloorIdx, which are indexes into the post-order listing of basic blocks. The proposed fix will eliminate the bug, but it will defeat the purpose of the early exit. The general idea is that when we start traversing the POT starting with DivTermBlock, we want to stop once we are sure that there are no pending join blocks to be discovered. Otherwise, for every divergent branch, we may end up traversing upto the cycle exits (or function exit if we are not in any cycle).

The implemented approach happens to work only when there is at most one block on each path between a branch and its join block. It seems that this assumption was true for most uses of UniformityAnalysis where it matters, which is the AMDGPU backend. That's why it was never noticed until now. The correct solution is to set FloorIdx to the immediate post-dominator of DivTermBlock. The traversal does not need to continue beyond this point because there are no more join blocks left to discover.

[Informational note: Separately, the current implementation will work if the CFG is reconverging, i.e., one successor of every branch post-dominates the other successor of the branch. Maybe that was the assumption in the original code, but that assumption was never true when we submitted this implementation upstream.]

@jgu222
Copy link
Contributor Author

jgu222 commented May 7, 2025

Yes, setting FloorIdx to the IPD of DivTermBlock seems working. I have a similar approach (based on this IPD concept). It does not use FloorIdx, rather checking FreshLabels.count() (replacing while (true) with while (FreshLabels.count() > 1)). [ draft: https://github.com//pull/138806 ]

But my local patch have two failures on lit tests, both are for irreducible tests. Looks to me that the irreducible cycles do not have head-rewired cycles done when doing propagation (not sure if they should be).

jgu222 added a commit to jgu222/llvm-project that referenced this issue May 13, 2025
Control-divergence finds joins by propagating labels from the divergent
control branch. The code that checks the early stop for propagation is
not correct in some cases.

This change fixes this issue by stopping at the post-dominator of
the successors of the divergent branch.

llvm#137277
@jgu222
Copy link
Contributor Author

jgu222 commented May 13, 2025

New PR : #139667 (old one is closed).

Basically try to stop at the immediate post-dominator of divergent branch's successors. For a reducible graph, the way to check if it is the IPD is to check if there is only one block in FreshLabels.

Not faimiliar with the analysis of irreducible graph, so the change lets the code to propagate through the entire irreducible cycles until the node are no longer inside irreducible cycle. @ssahasra , please help check if it makes sense.
I have to disable one irreducible lit test. With this PR, that lit test have different analysis info, although divergence info are identical with and without this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants