-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[MLIR][XeGPU] Add unroll patterns for XeGPU (1/N) #137010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
Can you please add some justification/explanation regarding those unroll patterns |
for (int64_t i = 0; i < mIters; i++) { | ||
for (int64_t j = 0; j < nIters; j++) { | ||
Value tmpC; | ||
if (c) | ||
tmpC = cVals[i * nIters + j]; // init with acc | ||
for (int64_t k = 0; k < kIters; k++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM coding standards want preincrements (++i): https://llvm.org/docs/CodingStandards.html#prefer-preincrement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed.
auto targetShape = *maybeTargetShape; | ||
|
||
auto convertedTdescTypes = getUnrolledTypes(tdescTy, targetShape); | ||
auto convertedTdesc = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use explicit type instead of auto: https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many places in this code might be more readable with explicit types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
if (any_of(ranges, [](auto &v) { return v.size() == 0; }) || | ||
all_of(ranges, [](auto &v) { return v.size() == 1; })) { | ||
return failure(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No {} for single statement if-body: https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed.
} | ||
} | ||
|
||
if (isa<xegpu::DpasOp>(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! fixed.
Added a few minor comments. On the monkey-level this looks good to me. |
e24d75f
to
e873d59
Compare
Thanks @fschlimb, I made the changes according to your feedback. I hope I addressed all of your concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be interesting to later see if we could generalize and reuse vector unrolling to achieve the same. For now, I think it's a good addition to xegpu infrastructure and we'll see in practice how it holds up.
I take it depends on #138701?
Yeah. these patterns are supposed to be companions to vector unrolling patterns. They share the same idea, one is for XeGPU ops only, and one is for vector ops. A pass are supposed to use both of them. |
} | ||
|
||
private: | ||
const char *const packAttrName = "__xetile_blocking_pack__"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xetile->xegpu I guess, here and in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. fixed it
/// provide a way to customize the native shape of the operation. | ||
struct UnrollOptions { | ||
using FilterConstraintFnType = std::function<LogicalResult(Operation *op)>; | ||
/// Callback function that indicates whether vector unrolling should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's place this comment above "using" to have uniform look :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it
Hi @fschlimb and @adam-smnk, do you have more suggestions? |
no, LGTM! |
Hi @adam-smnk, I am going to merge this first. If you have more suggestions, feel free to let me know. Thanks for your help! |
Similar to vector ops, XeGPU ops need to be unrolled into smaller shapes such that they can be dispatched into a hardware instruction. This PR marks the initial phase of a series dedicated to incorporating unroll patterns for XeGPU operations. In this installment, we introduce patterns for the following operations: