-
Notifications
You must be signed in to change notification settings - Fork 216
[BLOG] Technology Mapping with Egraphs #532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sampsyo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, everybody—thanks for the detailed writeup; this made the adjusted goal and the concrete approach clear. It's too bad that the original plan to use SmoothE did not work out. But it was definitely interesting to see how well ILP-based extraction worked for this workload.
One big question I had was: while I understand how an ILP-based extractor could be an interesting intermediate step, what was your motivation for adding support for good_lp? Were there specific limitations in egg's existing ILP solver support that you needed to work around? It might be nice to add some additional motivation for this in the body of the post.
|
|
||
| ## Introduction | ||
| ### E-graphs and Equality Saturation | ||
| Equality-graphs, or more simply, e-graphs, efficiently represent equivalence classes of expressions. This makes them useful for superoptimization, where, given an input program, we seek to find the sequence of optimizations that emits the *best* program, minimizing some user-provided cost function. E-graphs are directed graphs with edges from expressions (*e-nodes* in an e-graph), to a group of e-nodes (an *e-class* in an e-graph, where all the e-nodes contained are equivalent expressions). To efficiently maintain and merge these equivalence classes, e-graphs rely on a union-find data structure. The example below shows an arithmetic expression `a * 2 / 2` represented as an e-graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be nice to link to the Wikipedia page, or some other definition? https://en.wikipedia.org/wiki/E-graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
|
||
| <img src="canonical-example-before.svg" alt="" width="33%"> | ||
|
|
||
| In *equality saturation*, we apply pattern-based *rewrites* to repeatedly produce equivalent expressions. Each rewrite grows the e-graph, without losing the previous versions of the expression. So if two expressions always evaluate to the same result, they belong in the same e-class. Upon saturation, an e-graph will have undergone enough rewrites to reach a fixed point where it encodes all possible programs simultaneously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Equality saturation probably also deserves a hyperlink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
|
||
| <img src="canonical-example-before.svg" alt="" width="33%"> | ||
|
|
||
| In *equality saturation*, we apply pattern-based *rewrites* to repeatedly produce equivalent expressions. Each rewrite grows the e-graph, without losing the previous versions of the expression. So if two expressions always evaluate to the same result, they belong in the same e-class. Upon saturation, an e-graph will have undergone enough rewrites to reach a fixed point where it encodes all possible programs simultaneously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"all possible programs" perhaps could use a little more precision here… maybe you mean all the possible results of applying all the rewrite rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified that "all possible programs resulting from all rewrite rules are encoded in the e-graph"
| ### E-graphs and Equality Saturation | ||
| Equality-graphs, or more simply, e-graphs, efficiently represent equivalence classes of expressions. This makes them useful for superoptimization, where, given an input program, we seek to find the sequence of optimizations that emits the *best* program, minimizing some user-provided cost function. E-graphs are directed graphs with edges from expressions (*e-nodes* in an e-graph), to a group of e-nodes (an *e-class* in an e-graph, where all the e-nodes contained are equivalent expressions). To efficiently maintain and merge these equivalence classes, e-graphs rely on a union-find data structure. The example below shows an arithmetic expression `a * 2 / 2` represented as an e-graph. | ||
|
|
||
| <img src="canonical-example-before.svg" alt="" width="33%"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a nice idea to use alt text, if you can think of anything…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Descriptions added for each img
| The second is an [exact extractor](https://github.com/egraphs-good/egg/blob/v0.10.0/src/lp_extract.rs) which formulates extraction as a mixed integer linear programming problem and solves it using the [COIN-OR](https://github.com/coin-or/Cbc) Branch-and-cut solver. | ||
|
|
||
| ### E-graphs for Technology Mapping | ||
| The task of technology mapping in logic synthesis is to express a given Boolean function as a network of gates from a standard cell library (for ASICs) or programmable LUTs (for FPGAs) so that an objective function, such as total area, is optimized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Technology mapping" is another good place to use a hyperlink to some kind of place where people can learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
|
||
| ## Results | ||
|
|
||
| We compare the number of LUTs in the designs extracted using greedy and ILP extraction on the [ISCAS85](https://github.com/matth2k/synth-benchmarks/tree/main/verilog/iscas85) design benchmarks. Since finding an exact solution quickly becomes prohibitive in terms of extraction time, (the size of the e-graph and corresponding linear programming problem gets too large), we incrementally increase the number of "rewrite iterations" until reaching 10 . This limits the size of the e-graph by restricting the number of times rewrites are applied. Some benchmarks/solvers were not able to complete even a single iteration within the 30 minute timeout threshold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time, (the size of the e-graph and corresponding linear programming problem gets too large), we
Use commas or parentheses here, but not both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| | c7552 | 335 | 325 | DNF | 315 (7) | DNF | | ||
| | c880 | DNF | DNF | DNF | DNF | DNF | | ||
|
|
||
| The times to solution are reported in the table below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What units?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of LUTs in the synthesized design (specified in column headers)
|
|
||
| | Benchmark | Greedy Time | Microlp Time | HiGHS Time | CBC Time | | ||
| |-----------|-------------------|-------------------|-------------------|--------------------| | ||
| | c1355 | 0.046134086 | DNF | 640.92384104 | 230.738058229 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reduce the number of significant figures. For example, it's enough to say that this benchmark using greedy extraction takes 0.05 units (seconds?); we do not need all these decimal digits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| | c6288 | 0.136245309 | DNF | 608.710588015 | DNF | | ||
| | c7552 | 0.045072024 | DNF | 600.26766583 | DNF | | ||
| | c880 | DNF | DNF | DNF | DNF | | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all these results, is there some kind of context you can give to convey whether this is "good" or "bad"? The ideal thing to have done here would be to compare against ABC or similar (on both QoR and running time). But given that it's probably too late to do that, is there any way you could just give some kind of very vague context for how well this system is doing? Are the LUT count reductions anywhere close to the results of traditional tools? How about the running times—is 600 units of time (seconds?) in the ballpark of a traditional tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. There is still time to rerun with ABC to provide an optimized baseline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will report when this is complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are using yosys synth_xilinx to synthesize a design targetting an FPGA in all configurations, an ABC tech mapping is always performed. This means that any further LUT decreases in the greedy and exact configurations are in addition to the improvements from ABC. This has been clarified in the text and configuration names.
|
|
||
| We also compare the area (µm²) of the designs extracted using greedy and exact extraction (this time only using the top-performing - HiGHs - solver) on the ISCAS85 verilog design benchmarks using a standard cell library. Synthesizing an ASIC design using exact extraction becomes prohibitive faster than synthesis targetting an FPGA due to the larger design search space. There is no "No Optimization" column in this chart. To convert to a standard cell library, a set of rewrite rules must be applied required to convert digital logic to standard cells. We also compare against the Synopsys commercial design compiler to show the design quality a tuned EDA tool can achieve. | ||
|
|
||
| | Bench | Greedy Area | Exact Area (Node Limit) (msynth) | Synopsys | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "msynth" mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is actually another tool. similar to lvv, but uses rewrite rules that convert to a standard cell library to target ASICs. clarified in the text: "We use another e-graph-based logic synthesis tool, called msynth, which operates similarly to lvv, but can synthesize ASIC designs using a standard cell library."
This is a reasonable question which deserved further explanation. We added more information to motivate our implementation decision in Design Extraction in the post. Briefly, the Coin-OR Branch-and-Cut solver used by egg emits errors for some of the large problems constructed for ASIC synthesis. We hypothesized that the issue had to do with the solver itself, since previous users have also triggered unexpected behavior with their problem formulations. We figured we would encounter similar issues with other solvers as well, so being able to quickly switch between them to see if any of them could produce a result would be helpful. In addition, many solvers actually implement different algorithms, so some might be faster than others depending on the problem. This motivated the integration of a good_lp based extractor, which would allow us to quickly switch between solvers at runtime. Also, good_lp provides a clean API for expressing problem variables, constraints and objective function, making it easy to understand the code. Our implementation has drawbacks (performance) as shown in the Results section, though. |
|
Got it; that absolutely makes sense! Thanks for all the changes overall. Can you let me know when I should take another look (and publish) by requesting a review with the widget on the right-hand side of the GitHub PR page? |
|
I have finished reviewing. Thanks for the feedback on this! |
|
Looks great; nice work on this! The expanded detail in the evaluation is especially helpful to see how things worked out. |
closes #511