Add flat bril project blog post #535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

sampsyo merged 9 commits into sampsyo:2025sp from samuelbreckenridge:2025sp

May 15, 2025

Contributor

samuelbreckenridge commented May 13, 2025 •

edited

Loading

Closes #501

@ngernest
@katherinewu312

samuelbreckenridge added 3 commits

May 13, 2025 11:44


          Add flat bril project blog post

4e75369


          Add link to flat-bril repo

0795b4f


          Final small edits and add extensions section

f6519d4

sampsyo requested changes

View reviewed changes

Owner

sampsyo left a comment

Really nice work here, y'all! I found your description of the flattening process detailed and useful, and it's awesome that you got all the way to implementing and evaluating a "native" interpreter. Fantastic!

I have a few questions within. And one more: would you be interested in contributing your new representation & interpreter back to the Bril monorepo? I imagine that other people might find it useful!

content/blog/2025-05-13-flat-bril/index.md Outdated

+. Infrastructure to convert existing Bril JSON files to/from our flattened format
+. An alternate Bril interpreter that operates directly on the flattened data structure (as opposed to the existing one brili, which has to parse JSON)
+              Our complete implementation can be found at https://github.com/ngernest/flat-bril/tree/main

Owner

sampsyo May 14, 2025

Consider integrating this link into the text, like "our complete implantation is [open source on GitHub][…]".

content/blog/2025-05-13-flat-bril/index.md Outdated


		### Flat data structures

		A key design decision was what the flattened data structures would look like. We clearly need similar fields to those of the original representation, but we need a strategy for flattening `Vec`s and `String`s and other non-flat types. Instead of storing variables, labels and function names as Strings and Vec<String>s we aggregate all the names referenced throughout a function into three contiguous arrays of bytes that are stored in our function representation. Then instead of using Strings, our instruction representation stores pairs of indices that can be used to extract the relevant name from the function-level contiguous array. One subtlety here is that if we want to use a Vec, we cannot just index into the array containing all the names, we would not know where the boundaries between each name are! We need to use an intermediary array that stores indices into the top-level name array.

Owner

sampsyo May 14, 2025

You probably want some more code backticks when you refer to Rust types. For example:

as Strings and Vecs

->

as Strings and Vec<String>s

content/blog/2025-05-13-flat-bril/index.md Outdated


		A key design decision was what the flattened data structures would look like. We clearly need similar fields to those of the original representation, but we need a strategy for flattening `Vec`s and `String`s and other non-flat types. Instead of storing variables, labels and function names as Strings and Vec<String>s we aggregate all the names referenced throughout a function into three contiguous arrays of bytes that are stored in our function representation. Then instead of using Strings, our instruction representation stores pairs of indices that can be used to extract the relevant name from the function-level contiguous array. One subtlety here is that if we want to use a Vec, we cannot just index into the array containing all the names, we would not know where the boundaries between each name are! We need to use an intermediary array that stores indices into the top-level name array.

		For instance, in a function with three variable names v1, v2 and v3, the function-level variable array will look like v1v2v3. If an instruction has two arguments v1 and v2, our representation stores a pair of indices into an intermediate array that itself contains pairs of indices. A lookup would proceed as (0,1) → (0,0), (1,1) → v1, v2.

Owner

sampsyo May 14, 2025

A lookup would proceed as (0,1) → (0,0), (1,1) → v1, v2

Not sure I understand this sentence… are (0, 0) and (1, 1) supposed to be pairs of indices in the name buffer? Assuming half-open intervals, wouldn't that be (0, 2) and (2, 4)? Or, with closed intervals, (0, 1) and (2, 3)?

Contributor

ngernest May 14, 2025

I rewrote this section in the .md file, hopefully it's clearer now!

(In hindsight, our design is a bit confusing in that we use close intervals throughout, and we enforce the invariant that the bytes for the actual variable-name strings must be stored contiguously, and their indexes in the interstitial indexes array must also be stored contiguously.)

Owner

sampsyo May 15, 2025

Got it; makes sense.

content/blog/2025-05-13-flat-bril/index.md Outdated

+              }
+              ```
+              Although our function representation does contain Vec s to enable construction, these will be flattened to slices once the entire Function has been created

Owner

sampsyo May 14, 2025

Suggested change

      
            Although our function representation does contain Vec s to enable construction, these will be flattened to slices once the entire Function has been created
          
            Although our function representation does contain `Vec`s to enable construction, these will be flattened to slices once the entire Function has been created.

content/blog/2025-05-13-flat-bril/index.md Outdated


		### Zerocopy

		In order to facilitate conversion between slices of bytes and our flat data structures we used the zerocopy crate. In practice this meant adding the IntoBytes and FromBytes traits to our flat representations, and specifying their byte representations using the repr attribute. It ended up being quite challenging to get this working. Zerocopy was quite finicky about what was allowable in a struct using the zerocopy traits. We ended up needing to create new “extra-flat” versions of many of our data structures to get this to work and experimenting with different repr options. For instance we discovered that zerocopy could not convert pairs of `u32`s, or `Option`s to bytes, so we needed to create a new `I32Pair` struct that itself implemented the zerocopy traits (`I32` as opposed to `u32` because we used -1 to represent the case where the `Option` is None). We probably could have used these extra-flat data structures everywhere, rather than converting between multiple versions, but since we had written most of our flattening logic using the original data structures we decided to stick with using new extra-flat versions.

Owner

sampsyo May 14, 2025

Please use a hyperlink to the zerocopy crate. And probably also to the Rust documentation for repr.

content/blog/2025-05-13-flat-bril/index.md Outdated


		## Evaluation

		For our evaluation, we decided to test flat bril on 70 core bril benchmarks. To check the correctness of our implementation, we used Turnt to verify that all benchmarks using our flat bril interpreter returned the same result as that of the reference Brili interpreter, for which we were successful. Additionally, to check the correctness of our infrastructure converting JSON files to/from our flattened format, we manually checked that the final json output converted back matched that of the original. To measure the performance impacts, we did the following using Hyperfine:

Owner

sampsyo May 14, 2025

Consider adding hyperlinks to Turnt, the reference interpreter, and Hyperfine.

content/blog/2025-05-13-flat-bril/index.md Outdated


		Below is a table showing the time taken for json roundtrips. We tested this on all the core benchmarks, but due to space constraints, we only list a few here. These are averaged over 10 runs, with a warmup of 3.

		\| Benchmark \| Mean [ms] \| Min [ms] \| Max [ms] \| Relative \|

Owner

sampsyo May 14, 2025

What is the "Relative" column here? What's being compared to what?

Contributor

ngernest May 14, 2025 •

edited

Loading

(By default, the Markdown table that Hyperfine outputs includes this "Relative" column, but there is no baseline here for us to compare against, so the column is inappropriate.) I removed the column -- thanks!

content/blog/2025-05-13-flat-bril/index.md Outdated

+              | `nop` | 521.6 ± 114.0 | 438.3 | 744.3 | 1.19 ± 0.26 |
+              | `perfect` | 503.1 ± 37.1 | 462.7 | 578.5 | 1.15 ± 0.09 |
+              | `reverse` | 486.1 ± 61.1 | 432.5 | 616.1 | 1.11 ± 0.14 |
+              | `rot13` | 438.5 ± 7.8 | 428.4 | 452.8 | 1.00 |

Owner

sampsyo May 14, 2025

Is it really the case that a JSON -> flat -> JSON round-trip takes ~500 ms (half a second)? That seems like a really long time! Because these programs are pretty small, I'd have expected it to be, like, less than a millisecond. Any guesses about what is taking so long?

Contributor

ngernest May 14, 2025

Thanks for the catch! We realized that the original JSON roundtrip benchmark results were faulty for two reasons:

The command we passed to Hyperfine for benchmarking was of the form bril2json < SOME_BRIL_FILE | cargo run -- ..., and Samply revealed that a lot of the ~500ms was dominated by bril2json's file I/O. I updated our code so that the JSON benchmarks don't require piping the output of bril2json to our executable.
We forgot to do cargo build --release before benchmarking (so some optimizations from the release build might have been missed)

I've rerun the benchmarks using the --release build and updated the figure in the blogpost!

Owner

sampsyo May 15, 2025

Sounds good. FWIW, there might also be a significant cost to using cargo run instead of just running the executable directly…

content/blog/2025-05-13-flat-bril/index.md Outdated


		Besides comparing our interpreter with the other Brili implementations, we also used the Samply profiler to figure out where our interpreter was spending most of its time. The stack chart below (obtained from the Firefox Profiler UI) demonstrates how our interpreter exectuable spends its time on the benchmark `catalan.bril` – we chose to highlight this benchmark since it has many recursive function calls and the difference between our interpreter and the TypeScript/Rust Brili interpreters’ performance is noticeable.

		The stack chart shows that most of the runtime of the executable spent was in the interpreter-related functions: `mmap`-ing the flat file format was relatively quick, and so was converting the JSON to a flat file format (omitted from the screenshot). Zooming into the stack chart (second screenshot below) and focusing on the function calls towards the bottom, we see that when interpreting individual individual instructions, our interpreter calls a lot of standard library functions that manipulate `HashMap`s and `Vec`s. This is because the environment datatype in our interpreter is defined as `HashMap<&str, BrilValue>` (mapping variable names to a Bril value), i.e. keys in the hashmaps are (references to) strings. We realized that strings were the canonical way to represent variables in the environment, since different indexes in the arguments field of an instruction may point to the same underlying variable. However, the fast Rust Brili interpreter canonicalises variable names into a numerical representation (as described in their blogpost), allowing for faster look-ups in their environment. We suspect that this is one of the reasons why the Rust Brili interpreter out-performs our flat interpreter.

Owner

sampsyo May 14, 2025

Thanks for doing this investigation! I agree that it is likely that this is the performance bottleneck here. It would be interesting to try to alleviate this bottleneck by requiring that variable name indices be unique…

content/blog/2025-05-13-flat-bril/index.md Outdated

+              Our interpreter compares favorably to Rust Brili in terms of peak physical memory usage (max resident set size). Notably, our interpreter has an order of magnitude fewer page faults and involuntary context switches compared to the TypeScript interpreter (250 vs 2036 and 210 vs 2480), although it’s unclear whether this is due to our choice of implementation language (Rust vs TypeScript) as opposed to our data structure flattening strategy, since Rust Brili also has similar metrics.
+              ### What else could be done?
+              At the moment, our implementation creates `Vec`-based buffers to store the variables/labels it sees within a Bril program. Since Bril programs are unlikely to contain excessively large no. of variables/labels, perhaps we could use Rust’s `smallvec` or `arrayvec` libraries to create these buffers instead of the standard library’s `Vec` data structure. The aforementioned libraries offer specialized short vectors which minimize heap allocations, which might offer performance benefits over heap-allocated `Vec`s. However, we would need to properly benchmark these supposed benefits to see if swapping to these libraries is worth it.

Owner

sampsyo May 14, 2025

no. -> number (you're not under space limitations here :)

ngernest added 6 commits

May 14, 2025 12:28


          hyperlinks, capitalization, misc formatting

d047320


          Update JSON roundtrip benchmarks, add aside about correcting for Hype…

c5409b4

…rfine shell startup calibration


          remove relative column from json roundtrip benchmark results

8cc34d1


          Update JSON roundtrip benchmarks

d78b458


          Expand intro, add Aquascope diagrams of how Vecs and Strings are heap…

3d20473

…-allocated


          Rewrite explanation on accessing vars, add bio, simplify code snippets

1b28deb

Contributor

ngernest commented May 14, 2025

Thanks Adrian for the comments! I rewrote a few sections of the blogpost and reran some benchmarks based on the comments. I'll put up a PR for the Bril monorepo later!

Owner

sampsyo commented May 15, 2025

Looks great; thanks!!

sampsyo merged commit 1c685e2 into sampsyo:2025sp

2 checks passed

sampsyo added the 2025sp label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels