|
| 1 | ++++ |
| 2 | +title = "Welcome to CS 6120!" |
| 3 | +[extra] |
| 4 | +bio = """ |
| 5 | + Allen Wang is a CS M.Eng student at Cornell University. He's pretty tired right now. |
| 6 | +""" |
| 7 | +[[extra.authors]] |
| 8 | +name = "Allen Wang" |
| 9 | ++++ |
| 10 | + |
| 11 | +### Overview |
| 12 | + |
| 13 | +The goal of this project was to implement global value numbering for Bril programs using value partitioning. I then used this to perform redundancy elimination using available expressions and benchmarked the performance impact. |
| 14 | + |
| 15 | + ### Value Numbering |
| 16 | + |
| 17 | +Value numbering is a family of program analysis techniques that involve assigning an identifying (value) number to each expression, where expressions that are guaranteed to evaluate to the same value have the same identifying number. By separating values from expressions, we can find duplicate expressions that are syntactically different but evaluate to the same value as already-existing expressions, then remove them. For example, we can use value numbering on this program: |
| 18 | +``` |
| 19 | +sum1 : int = add a b; |
| 20 | +sum2 : int = add b a; |
| 21 | +prod: int = mul sum1 sum2; |
| 22 | +``` |
| 23 | +To find out that sum1 and sum2 evaluate to the same value, then optimize it to: |
| 24 | +``` |
| 25 | +sum1 : int = add a b; |
| 26 | +prod: int = mul sum1 sum1; |
| 27 | +``` |
| 28 | +We went over one value numbering algorithm [here](https://www.cs.cornell.edu/courses/cs6120/2025sp/lesson/3/). However, this algorithm assumes a single linear control flow, which means it's only suitable for local value numbering within blocks. |
| 29 | + |
| 30 | +### Global Value Numbering |
| 31 | + |
| 32 | +Global value numbering is a set of techniques which perform value numbering at the level of a function, rather than a single block. [This paper](https://www.cs.tufts.edu/~nr/cs257/archive/keith-cooper/value-numbering.pdf) goes over hash-based and partitioning implementations of global value numbering. There's already a hash-based implementation for Bril [here](https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/global-value-numbering/) and it's very conceptually similar to local value numbering, so I decided to implement value partitioning instead. |
| 33 | + |
| 34 | +### Value partitioning |
| 35 | + |
| 36 | +Instead of hashing expressions to values like local value numbering, value partitioning works by directly computing congruence classes of expressions, where two expressions are congruent if they have the same opcode and all their arguments are congruent with each other. To perform value partitioning, we first put a program into SSA to ensure that each value has a unique variable associated with it. We assume that all operations of a type are in the same congruence class, then repeatedly partition congruence classes where this cannot be true until we obtain a maximum fixed point. |
| 37 | + |
| 38 | +We implemented this algorithm for value partitioning, which was given in the paper: |
| 39 | +``` |
| 40 | +Initial partition: all values computed by the same opcode are in the same congruence classes |
| 41 | +
|
| 42 | +worklist = classes in initial partitio |
| 43 | +while worklist is not empty: |
| 44 | + select a class c from worklist |
| 45 | + for each possible arg position p: |
| 46 | + touched = ∅ |
| 47 | + for each value v: |
| 48 | + if arg p of v is in c, add v to touched |
| 49 | + for each class s where some but not all members are touched: |
| 50 | + n = s & touched |
| 51 | + s = s - n |
| 52 | + if s in worklist: |
| 53 | + add n to worklist |
| 54 | + else: |
| 55 | + add smaller of n and s to worklist |
| 56 | + ``` |
| 57 | +After this, we pick a representative for each congruence class, then replace every operation of that type with the representative. |
| 58 | + |
| 59 | +### Redundancy Elimination |
| 60 | +After standardizing our program to use values instead of expressions, we still need to convert this into a performance improvement. To do this, we use an available expressions dataflow analysis to calculate which values are available at each point in the program. Fortunately, the properties of the renaming algorithm make it very easy to define the analysis for calculating available expressions. |
| 61 | + |
| 62 | +- The initial input is the empty set. |
| 63 | +- The transfer function takes the union of a block's input and every expression in the block. If an expression already exists, it's redundant and can be removed. |
| 64 | +- The merge function takes the intersection of all the outputs of a block's predecessors. |
| 65 | + |
| 66 | +I also tried implementing partial redundancy elimination, which moves computations that are redundant along some execution paths back through the control graph to turn them fully redundant and optimize them away. This can be accomplished by performing global analyses to determine where computations can be safely moved and where moving them would save time, but I didn't have time to fully wrap my head around this and fix the bugs. |
| 67 | + |
| 68 | + |
| 69 | + |
| 70 | +#### Implementation Notes |
| 71 | + |
| 72 | +Getting GVN right was very finicky and required reading the text very carefully. My biggest struggles in the end were first understanding the processing algorithm, then figuring out and debugging all the edge cases that arose from not reading the paper carefully enough. The most annoying edge case I ran into was handling phi statements. I kept getting differences in code execution, and after going through the source code and a log of instructions executed, I discovered that phi statements were mysteriously being removed. After this, I managed to pinpoint that the problem was that the phi statements could not be congruent with phi statements in other blocks and fixed this. Immediately after, I read through the paper again and found that it mentioned this in an aside ... I also had a lot of trouble implementing copy propagation to match the hash-based implementation, then discovered that the paper explicitly used this as an example of an optimization that couldn't be done using value partitioning in a later section. |
| 73 | + |
| 74 | +### evaluation |
| 75 | + |
| 76 | +For correctness, I ran my optimizations on the core benchmarks with different inputs to test whether they would cause problems. I also wrote a series of test cases for various edge cases and optimizations GVN should be able to identify. Each of the benchmarks and hand-written test cases produced the same outputs before and after optimization. I also verified that the benchmarks were able to catch incorrect implementations while fixing bugs. |
| 77 | + |
| 78 | +For performance, I tested against the core benchmarks, using the same inputs as the correctness tests. I found that using only the AVAIL-based removal resulted in a minimum improvement of 0% less instructions executed (as global value numbering doesn't add any operations to a program), a median improvement of 1.5% less instructions executed compared to base SSA, and a maximum improvement around 58% less instructions executed. Most of the benchmarks were written directly in Bril, so they were relatively optimized and there were few opportunities to identify congruence classes across blocks. |
| 79 | + |
| 80 | +<img src="plot.png" alt="" width="60%"> |
0 commit comments