-
Notifications
You must be signed in to change notification settings - Fork 216
feat(blog): Add Zihan and Ethan's final project blog #544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(blog): Add Zihan and Ethan's final project blog #544
Conversation
Signed-off-by: Ethan Uppal <[email protected]>
sampsyo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there—looks like this is still in progress, so I won't read it yet. Please let me know when it's time to read the report.
| Ethan Uppal Cornell CS '27 | ||
| Zihan Li Cornell CS '25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use complete sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above; please fill in your bios.
| > [!NOTE] | ||
| > Some of these questions are redundant in the context of both sections and thus their answers will be too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not structure your blog post as a question-and-answer list. Remember that the audience is external: you need to write something that will be intelligible to someone who wants to learn about your project "from scratch."
24b7f93 to
7121d83
Compare
sampsyo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on the overall design & implementation here! It's cool that you were able to observe nontrivial speedups for one analysis. I think it would wonderful to add some additional reflection about what you think the results mean, and what this tells us about the potential for parallelizing dataflow analyses in general.
| Ethan Uppal Cornell CS '27 | ||
| Zihan Li Cornell CS '25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above; please fill in your bios.
| if out[b] changed: | ||
| Worklist += successors of b | ||
| ``` | ||
| In this [project](https://github.com/zihan0822/para-dflow), we built a parallel dataflow solver in Rust with bitset optimizations for our flattened Bril IR. We parallelized the KILL and GEN set computation and the condensed cfg traversal process. We focused on one forward pass analysis: reaching definition and one backward pass analysis: liveness analysis in particular. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cfg -> CFG
| if out[b] changed: | ||
| Worklist += successors of b | ||
| ``` | ||
| In this [project](https://github.com/zihan0822/para-dflow), we built a parallel dataflow solver in Rust with bitset optimizations for our flattened Bril IR. We parallelized the KILL and GEN set computation and the condensed cfg traversal process. We focused on one forward pass analysis: reaching definition and one backward pass analysis: liveness analysis in particular. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We focused on one forward pass analysis: reaching definition and one backward pass analysis: liveness analysis in particular.
To make this legible, try commas or parentheses:
We focused on one forward pass analysis (reaching definitions) and one backward pass analysis (liveness analysis) in particular.
|
|
||
| ## Preparations | ||
| #### Flattened Bril Representation | ||
| We implemented a flattened representation for Bril to get rid of fragmented heap references in previous Bril representations implemented in [bril-rs](https://github.com/sampsyo/bril/tree/main/bril-rs). Here are some of our flattened equivalents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to pick on bril-rs. You can just say that you created a flattened representation that avoided the heap fragmentation that can come with a standard, pointer-based program representation.
| } | ||
| ``` | ||
|
|
||
| With this flattened representation, we hope to isolate the performance increase to just the dataflow analyses. It also simplifies things by tying all references’ lifetime to the program. We also provide a handy shim that transforms bril’s official repr to our flattened repr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bril -> Bril
repr -> representation
|
|
||
| Both `GEN[b]` and `KILL[b]` only depend on block local info. | ||
|
|
||
| We parallelize KILL and GEN computation with [rayon's par_iter](https://docs.rs/rayon/latest/rayon/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, parallelize over what? Are we parallelizing over basic blocks (and then scanning the instructions within each block sequentially), or are we parallelizing over the instructions within a block?
|
|
||
|
|
||
|
|
||
| ##### 2. Condensed CFG traversal in parallel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding the SCCs and parallelizing across them is a good idea! Nice!
Do you do this for the sequential version too, or just the parallel version? It would be interesting to try both, i.e., to compare three treatments: "standard" sequential, sequential with SCCs, and parallel with SCCs.
|
|
||
|
|
||
| ## Evaluations | ||
| To test the correctness, we compare the results of sequential and parallel solver on core benchmarks and fuzzed programs to make sure they agree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What were the results?
| bril-fuzzer –-num-block 1024 –-block-size-mean 128 –-max-nesting 3 | ||
| ``` | ||
|
|
||
| The sequential baseline is somewhat parallelized with SIMD accelerated bitset implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say something more about your experimental setup? Some data that would be useful include hardware details, OS versions, Rust versions, etc., and especially the number of cores in your machine.
| **Liveness Analysis**: 1.85x faster | ||
| | Method | Fastest (ms) | Slowest (ms) | Mean (ms) | | ||
| |------------|--------------|---------------|-----------| | ||
| | Parallel | 231.6 | 233.9 | 232.7 | | ||
| | Sequential | 427.0 | 434.2 | 430.6 | | ||
|
|
||
|
|
||
| **Reaching Def**: 8% slow down | ||
| | Method | Fastest (s) | Slowest (s) | Mean (s) | | ||
| |------------|--------------|---------------|-----------| | ||
| | Parallel | 17.4 | 24.11 | 20.76 | | ||
| | Sequential | 18.76 | 19.41 | 19.08 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say something about why you think the results turned out this way? Is it due to the profile imbalance you mention below, or something else?
How about any theories for where this might go in the future? Do you think this is a promising approach that could work for other analyses, or did you learn that this is a bad idea and we should stop here? It would be great to do a little reflection about what you think these results tell you, qualitatively speaking.
21721f8 to
9218138
Compare
Signed-off-by: Ethan Uppal <[email protected]>
c26566d to
61bbcc5
Compare
Signed-off-by: Ethan Uppal <[email protected]>
61bbcc5 to
4db6a5f
Compare
Signed-off-by: Ethan Uppal <[email protected]>
…ppal/cs6120-fork-ignore into zihan-ethan-final-project
|
We updated the evaluation setup for reaching definitions, now we can have a consistent 1.2x plus speed up with the parallel solver.
After profiling, we realized that the main bottleneck for reaching definitions was computing The main modifications we made here were:
Those modifications were applied to both the sequential and parallel solver. |
|
Wonderful! This is looking great. Seriously impressive work here. |
Closes #509, closes #512