Skip to content

Project Proposal: Polyhedral loop optimizations for Bril #508

@mb64

Description

@mb64

Project proposal: polyhedral loop optimizations for Bril

In this proposal, I will:

  1. Design and implement a polyhedral intermediate representation for Bril code
  2. Implement loop optimizations on the polyhedral intermediate representation
  3. Evaluate its performance on a benchmark suite of Bril code

What?

Consider the following code, which sums up the contents of a 100x100 array:

int total = 0;
for (int j = 0; j < 100; j++) {
    for (int i = 0; i < 100; i++) {
        total += arr[i,j];
    }
}

If the array is stored in row-major order, this is a slow way to sum it up: since the inner loop strides over 100 elements each iteration, there is no spatial locality! It would be more cache-friendly to loop over i in the outer loop and j in the inner loop.

A polyhedral intermediate representation is an IR used to do this kind of loop analysis and restructuring in modern compilers: LLVM and GCC both incorporate polyhedral IRs, as does the relatively new MLIR framework on top of LLVM. In a polyhedral representation, this program will be have add instruction, parameterized by the space {(i,j) : 0 <= i < 100, 0 <= j < 100} of points for which it's run.

I will implement a polyhedral intermediate representation for Bril programs, using the Bril memory extension, and use it to implement basic loop optimizations. I have, as a concrete goal, the following:

  1. Construct a polyhedral IR representation for a Bril program (in SSA form)
  2. Do a dependence analysis on the IR
  3. Analyze possible iteration orders for spatial cache locality
  4. Turn the polyhedral IR back into concrete loops

This will improve performance by improving cache locality.

Rationale and scope

There are, of course, many other things you can do with a polyhedral IR. A past 6120 project was also about the polyhedral model, but it was far more ambitious, both (a) using MLIR/LLVM instead of Bril and (b) setting out to implement more non-trivial loop optimizations, and it did not get to a point where it could measure results on benchmarks. In light of this, I am being intentionally unambitious with initial goals, just looking at improving cache locality for Bril programs.

If all goes well, there is practically unbounded room to make the project more complicated, by implementing new loop optimizations.

How?

There are a few sources on design considerations for the polyhedral model. I will follow the dragon book, chapter 11, which has a set of algorithms for the polyhedral loop analyses and transformations.

The code transformation parts depend on a lot of integer set transformations and algorithms; for these, I will use the isl library.

Evaluation

I will evaluate correctness and performance on multiple benchmark suites:

  1. The core bril benchmark suite. This does not have arrays in it; therefore, it serves as a baseline to show that it preserves behavior and does not slow down normal code.
  2. The mem bril benchmark suite. This does have arrays in it, so it would show the performance change for code which does have arrays.
  3. Purpose written array-manipulating benchmarks, such as matrix multiplication. These would be directly designed to benefit from my optimizations, and showcase the best-case performance improvement.

In terms of benchmarking methodology, running it via the interpreter is potentially a worse measurement of cache effects than running machine code, so I will benchmark it with the brilift Bril-to-native code compiler.

Team

Just me, @mb64

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalcourse project proposals

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions