Skip to content

Conversation

@ethanuppal
Copy link
Contributor

Closes #509, closes #512

@ethanuppal ethanuppal changed the title feat(blog): Add Zihan and Ethan's WIP final project blog feat(blog): Add Zihan and Ethan's final project blog May 14, 2025
Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there—looks like this is still in progress, so I won't read it yet. Please let me know when it's time to read the report.

Comment on lines 5 to 6
Ethan Uppal Cornell CS '27
Zihan Li Cornell CS '25
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use complete sentences.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above; please fill in your bios.

Comment on lines 17 to 18
> [!NOTE]
> Some of these questions are redundant in the context of both sections and thus their answers will be too.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not structure your blog post as a question-and-answer list. Remember that the audience is external: you need to write something that will be intelligible to someone who wants to learn about your project "from scratch."

@zihan0822 zihan0822 force-pushed the zihan-ethan-final-project branch from 24b7f93 to 7121d83 Compare May 15, 2025 14:51
Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on the overall design & implementation here! It's cool that you were able to observe nontrivial speedups for one analysis. I think it would wonderful to add some additional reflection about what you think the results mean, and what this tells us about the potential for parallelizing dataflow analyses in general.

Comment on lines 5 to 6
Ethan Uppal Cornell CS '27
Zihan Li Cornell CS '25
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above; please fill in your bios.

if out[b] changed:
Worklist += successors of b
```
In this [project](https://github.com/zihan0822/para-dflow), we built a parallel dataflow solver in Rust with bitset optimizations for our flattened Bril IR. We parallelized the KILL and GEN set computation and the condensed cfg traversal process. We focused on one forward pass analysis: reaching definition and one backward pass analysis: liveness analysis in particular.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfg -> CFG

if out[b] changed:
Worklist += successors of b
```
In this [project](https://github.com/zihan0822/para-dflow), we built a parallel dataflow solver in Rust with bitset optimizations for our flattened Bril IR. We parallelized the KILL and GEN set computation and the condensed cfg traversal process. We focused on one forward pass analysis: reaching definition and one backward pass analysis: liveness analysis in particular.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We focused on one forward pass analysis: reaching definition and one backward pass analysis: liveness analysis in particular.

To make this legible, try commas or parentheses:

We focused on one forward pass analysis (reaching definitions) and one backward pass analysis (liveness analysis) in particular.


## Preparations
#### Flattened Bril Representation
We implemented a flattened representation for Bril to get rid of fragmented heap references in previous Bril representations implemented in [bril-rs](https://github.com/sampsyo/bril/tree/main/bril-rs). Here are some of our flattened equivalents.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to pick on bril-rs. You can just say that you created a flattened representation that avoided the heap fragmentation that can come with a standard, pointer-based program representation.

}
```

With this flattened representation, we hope to isolate the performance increase to just the dataflow analyses. It also simplifies things by tying all references’ lifetime to the program. We also provide a handy shim that transforms bril’s official repr to our flattened repr.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bril -> Bril
repr -> representation


Both `GEN[b]` and `KILL[b]` only depend on block local info.

We parallelize KILL and GEN computation with [rayon's par_iter](https://docs.rs/rayon/latest/rayon/).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, parallelize over what? Are we parallelizing over basic blocks (and then scanning the instructions within each block sequentially), or are we parallelizing over the instructions within a block?




##### 2. Condensed CFG traversal in parallel:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding the SCCs and parallelizing across them is a good idea! Nice!

Do you do this for the sequential version too, or just the parallel version? It would be interesting to try both, i.e., to compare three treatments: "standard" sequential, sequential with SCCs, and parallel with SCCs.



## Evaluations
To test the correctness, we compare the results of sequential and parallel solver on core benchmarks and fuzzed programs to make sure they agree.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What were the results?

bril-fuzzer –-num-block 1024 –-block-size-mean 128 –-max-nesting 3
```

The sequential baseline is somewhat parallelized with SIMD accelerated bitset implementation.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say something more about your experimental setup? Some data that would be useful include hardware details, OS versions, Rust versions, etc., and especially the number of cores in your machine.

Comment on lines 114 to 125
**Liveness Analysis**: 1.85x faster
| Method | Fastest (ms) | Slowest (ms) | Mean (ms) |
|------------|--------------|---------------|-----------|
| Parallel | 231.6 | 233.9 | 232.7 |
| Sequential | 427.0 | 434.2 | 430.6 |


**Reaching Def**: 8% slow down
| Method | Fastest (s) | Slowest (s) | Mean (s) |
|------------|--------------|---------------|-----------|
| Parallel | 17.4 | 24.11 | 20.76 |
| Sequential | 18.76 | 19.41 | 19.08 |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say something about why you think the results turned out this way? Is it due to the profile imbalance you mention below, or something else?

How about any theories for where this might go in the future? Do you think this is a promising approach that could work for other analyses, or did you learn that this is a bad idea and we should stop here? It would be great to do a little reflection about what you think these results tell you, qualitatively speaking.

@ethanuppal ethanuppal force-pushed the zihan-ethan-final-project branch 6 times, most recently from 21721f8 to 9218138 Compare May 16, 2025 04:01
@ethanuppal ethanuppal force-pushed the zihan-ethan-final-project branch 2 times, most recently from c26566d to 61bbcc5 Compare May 16, 2025 04:13
@ethanuppal ethanuppal force-pushed the zihan-ethan-final-project branch from 61bbcc5 to 4db6a5f Compare May 16, 2025 04:13
@sampsyo sampsyo added the 2025sp label May 16, 2025
@zihan0822
Copy link
Contributor

We updated the evaluation setup for reaching definitions, now we can have a consistent 1.2x plus speed up with the parallel solver. 
After profiling, we realized that the main bottleneck for reaching definitions was computing DEFS, which we somehow can not find an effective parallel solution for (discussed a bit more in the blog). 
The performance gain by parallelizing GEN and dataflow was overshadowed by this bottleneck.


The main modifications we made here were:

  1. Instead of tracking the instruction offset for each definition, we only tracked the block id associated with it. This reduced the memory footprint by a factor of num_total_instructions / num_blocks.
  2. We allocated a bitset arena to serve the frequent bitset allocation requests.

Those modifications were applied to both the sequential and parallel solver.

@sampsyo
Copy link
Owner

sampsyo commented May 18, 2025

Wonderful! This is looking great. Seriously impressive work here.

@sampsyo sampsyo merged commit 69ca357 into sampsyo:2025sp May 18, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Project Proposal: Parallelize Data Flow Analysis Project Proposal: Parallel Dataflow Solver

3 participants