-
Notifications
You must be signed in to change notification settings - Fork 216
Description
Project proposal: polyhedral loop optimizations for Bril
In this proposal, I will:
- Design and implement a polyhedral intermediate representation for Bril code
- Implement loop optimizations on the polyhedral intermediate representation
- Evaluate its performance on a benchmark suite of Bril code
What?
Consider the following code, which sums up the contents of a 100x100 array:
int total = 0;
for (int j = 0; j < 100; j++) {
for (int i = 0; i < 100; i++) {
total += arr[i,j];
}
}If the array is stored in row-major order, this is a slow way to sum it up: since the inner loop strides over 100 elements each iteration, there is no spatial locality! It would be more cache-friendly to loop over i in the outer loop and j in the inner loop.
A polyhedral intermediate representation is an IR used to do this kind of loop analysis and restructuring in modern compilers: LLVM and GCC both incorporate polyhedral IRs, as does the relatively new MLIR framework on top of LLVM. In a polyhedral representation, this program will be have add instruction, parameterized by the space {(i,j) : 0 <= i < 100, 0 <= j < 100} of points for which it's run.
I will implement a polyhedral intermediate representation for Bril programs, using the Bril memory extension, and use it to implement basic loop optimizations. I have, as a concrete goal, the following:
- Construct a polyhedral IR representation for a Bril program (in SSA form)
- Do a dependence analysis on the IR
- Analyze possible iteration orders for spatial cache locality
- Turn the polyhedral IR back into concrete loops
This will improve performance by improving cache locality.
Rationale and scope
There are, of course, many other things you can do with a polyhedral IR. A past 6120 project was also about the polyhedral model, but it was far more ambitious, both (a) using MLIR/LLVM instead of Bril and (b) setting out to implement more non-trivial loop optimizations, and it did not get to a point where it could measure results on benchmarks. In light of this, I am being intentionally unambitious with initial goals, just looking at improving cache locality for Bril programs.
If all goes well, there is practically unbounded room to make the project more complicated, by implementing new loop optimizations.
How?
There are a few sources on design considerations for the polyhedral model. I will follow the dragon book, chapter 11, which has a set of algorithms for the polyhedral loop analyses and transformations.
The code transformation parts depend on a lot of integer set transformations and algorithms; for these, I will use the isl library.
Evaluation
I will evaluate correctness and performance on multiple benchmark suites:
- The
corebril benchmark suite. This does not have arrays in it; therefore, it serves as a baseline to show that it preserves behavior and does not slow down normal code. - The
membril benchmark suite. This does have arrays in it, so it would show the performance change for code which does have arrays. - Purpose written array-manipulating benchmarks, such as matrix multiplication. These would be directly designed to benefit from my optimizations, and showcase the best-case performance improvement.
In terms of benchmarking methodology, running it via the interpreter is potentially a worse measurement of cache effects than running machine code, so I will benchmark it with the brilift Bril-to-native code compiler.
Team
Just me, @mb64