BLOG: Simon's course project blog post #531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

sampsyo merged 2 commits into sampsyo:2025sp from scober:blog/project

May 21, 2025

Contributor

scober commented May 9, 2025

closes #505


          BLOG: Simon's course project blog post

d6b3891

sampsyo requested changes

View reviewed changes

Owner

sampsyo left a comment

Excellent work on this, @scober! This was a very cool experiment, and you did a great job of breaking down the steps in the design of the required compiler pipeline. The writeup is extremely clear, to the point where it would be easy for someone to reproduce this work if they wanted to. Awesome!

I just have a few suggestions within.

content/blog/2025-05-13-imperative-quantifier-elimination.md Outdated


		It is worth noting briefly that the two formulas are only equivalent when $x$ is restricted to be a real number. If we allowed $x$ to be any complex number (or further restricted $x$ to be an integer!) the two formulas would no longer be equivalent. We might say that the formulas are equivalent in the _theory_ of real numbers. For our purposes, a theory is a formal language together with some axioms defining which formulas constructed from that language are true.

		When a formula can be written with and without quantifiers, the _quantifier-free_ version has some advantages from a computational perspective. The main advantage in the context of formal mathematical statements is that in most theories we care about, the quantifier-free version is decidable while the quantifier-ful version is not. If we squint and pretend our two formulas are computer programs, the quantifier-free version looks like a few arithmetic operations while the quantifier-ful version looks an awful lot like a loop that might never halt!

Owner

sampsyo May 13, 2025

I really like this explanation for how to view QE from a "programming" perspective.

Contributor Author

scober May 20, 2025

Thank you!

content/blog/2025-05-13-imperative-quantifier-elimination.md Outdated


		Some theories have known procedures for removing quantifiers from any formula in that theory and, in fact, the Z3 SMT solver has a [quantifier elimination tactic](https://microsoft.github.io/z3guide/docs/strategies/summary/#tactic-qe) which attempts to apply such procedures.

		Of course, performing quantifier elimination cannot make an undecidable formula decidable. If one can remove the quantifiers from a formula, that formula was decidable to begin with (you can't make an undecidable formula decidable without changing its meaning). So the reason Z3 has a quantifier elimination tactic is because it can make the deciding _cheaper_. Even if our "loopy" pseudo-program above eventually terminates, it might take a long time. It would be easier to do the arithmetic instead.

Owner

sampsyo May 13, 2025

This is absolutely not the most important thing in the world, but is it correct to say that an individual formula can be decidable or undecidable? If every formula is either true or false, then for a given formula P, then one of the following two “decision procedures” must suffice to decide P:

for all p, decide(p) = true
for all p, decide(p) = false

In other words, maybe the argument you want to make here is less about individual formulas and more about whether any QE procedure can succeed for all formulas simultaneously…

Contributor Author

scober May 20, 2025

This is a great point. Decidability is absolutely a property that is properly discussed in relation to classes of formulas, not individual formulas.

content/blog/2025-05-13-imperative-quantifier-elimination.md Outdated


		As we saw above, true quantifiers generally translate to loops that may never terminate. In most programming circles, writing loops that you know ahead of time might never terminate is generally frowned upon, so we will need more limited substitutes. We will use the commonly available `any` and `all` functions as our restricted quantifiers. `any` returns `true` if and only if _any_ of its inputs is `true` while `all` returns `true` if and only if _all_ of its inputs are `true` (get it?). This makes `any` an _existential quantifier_ ($\exists$) and `all` a _universal quantifier_ ($\forall$).

		We will also restrict ourselves to a single theory: Presburger Integer Arithmetic. Formulas in Presburger Arithmetic are built from the following language: $[ \neg, \wedge, \vee, \exists, \forall ] \cup \mathbb{N} \cup [ +,-,<,= ] \cup [ P_m : m \in \mathbb{N}]$. Where $P_m$ is a function that returns true if its input is divisible by $m$ and the first three symbols are the standard first order predicate logic _not_, _and_, and _or_. For this project, we will ignore the $P_m$ because the process for removing quantifiers from formulas that contain $P_m$ can result in potentially-expensive-to-compute quantifier-free formulas and we want to make programs faster!

Owner

sampsyo May 13, 2025

Maybe it would be nice to link to the Wikipedia page? https://en.wikipedia.org/wiki/Presburger_arithmetic

content/blog/2025-05-13-imperative-quantifier-elimination.md


		##### A Brief Aside On A Technical Hiccup

		When I started out, I was hoping to design my AST-rewriter/optimization pass as a function decorator that would re-write only decorated functions at runtime. Obviously (in hindsight), compiling a single function's hand-mangled AST outside of the context of its enclosing file is difficult to do at all and impossible to do correctly in general. So I pivoted to a design where I parse whole Python source files and write whole Python source files back out. Of course, performing interesting and correct static transformations on untyped Python source files is also impossible (depending on your definition of interesting), so I would call this redesign only a partial victory.

Owner

sampsyo May 13, 2025

[nothing to address in your post, just an idle comment]

compiling a single function's hand-mangled AST outside of the context of its enclosing file is difficult to do at all and impossible to do correctly in general

I don’t doubt that this is true, but it makes me wonder how other Python JIT setups (such as Numba) work when they use decorators to mark functions to compile…

Contributor Author

scober May 20, 2025 •

edited

Loading

I also had this question and I think there are two big answers:

Some (many? all?) decorator-based Python JIT setups target a subset of Python. Numba, for example, will refuse to compile Python code containing del or eval (or many other operations).
If you have the resources of a larger project, you can invest the time that I couldn't in figuring out the fiddly details. One somewhat popular approach that seems to resolve the closure/namespacing problems of out-of-context compilation (in some common cases) is to make a copy of the whole file, modify the AST of the function you decorate, recompile the whole file separately, then frankenstein the resulting compiled code back into the half-compiled original file.

content/blog/2025-05-13-imperative-quantifier-elimination.md Outdated


		Anyone who is curious about performing full quantifier elimination on Presburger Arithmetic formulas is welcome to take a look at [my source for the algorithm](https://amedvedev.ccny.cuny.edu/mathA4400s16/vandenDries.pdf). It is Theorem 4.4.2 and the pass I do in this project is a version of the one presented there.

		The primary difficulty I had in translating a description of the algorithm in a math text to a piece of running code that modified computer programs was the irregularity of Python ASTs. My final design ended up being a series of passes that applied more and more regularity to ASTs until I could perform a simple, final transformation. With that in mind, the rest of this section will be a high-level tour of the various regularizing passes the algorithm performs.

Owner

sampsyo May 13, 2025

[another unimportant remark] I wonder how possible this would be if you were to start with Python bytecode… it would be way harder to reconstruct the necessary hierarchy of closures and comprehensions, probably, but at least the low-level details would be regularized?

Contributor Author

scober May 20, 2025

I wondered about this. Operating on Python bytecode would presumably give you AST regularity, but it turns out that a lot of the irregularity is algebraic. In practice I ended up spending way more time on algebraic regularization (e.g. x<y having a different structure from not x>=y) than I did on AST regularization (e.g. p or q or r having a different structure than p or (q or r)).

content/blog/2025-05-13-imperative-quantifier-elimination.md


		The key insight is that of all the upper bound sub-predicates of the form `m*x < <expr>`, only one of them will matter. Only one of them will be the _least upper bound_. Similarly, only the _greatest lower bound_ will actually restrict what values of `x` return `True`. We don't know statically which upper bound is the least upper bound, because the upper bounds may have variables in them, but we can always check at runtime.

		In a pure predicate logic setting, these special bounds can be "calculated" by statically unrolling a big loop that checks if each (lower bound, upper bound) pair is correct. But in Python we can just call `max` and `min`!

Owner

sampsyo May 13, 2025

Any chance it would be easy to show a Python example like the one above but using max/min to show the result of this final transformation? Is it like this?

any(x < min(…) and x > max(…) for x in range(...))

Contributor Author

scober May 20, 2025

I added an example of a fully eliminated program further down, if that example isn't clear please let me know!

Owner

sampsyo May 21, 2025

Looks great!

content/blog/2025-05-13-imperative-quantifier-elimination.md

+              Compiler passes (even those for class projects) should not change the behavior of the code they modify. This pass does not meet that standard for two reasons:
+. As I mentioned above, this is a static transformation of untyped Python code. There is no guarantee that all of the variables in the program are integers or even that `+` does what I expect. So the best we can hope for is that it is correct _for integer inputs_.
+. If I found all of the bugs in my implementation with testing I think that might be the first recorded instance of that happening in all of human history.

Owner

sampsyo May 13, 2025

🤪

content/blog/2025-05-13-imperative-quantifier-elimination.md Outdated

+. As I mentioned above, this is a static transformation of untyped Python code. There is no guarantee that all of the variables in the program are integers or even that `+` does what I expect. So the best we can hope for is that it is correct _for integer inputs_.
+. If I found all of the bugs in my implementation with testing I think that might be the first recorded instance of that happening in all of human history.
+              That being said, I did some testing! Because this pass operates on such a simple sub-language of Python, I built a random program generator and used those programs as my test inputs. I generated around 60 programs and when one of them failed (my pass changed the program behavior) I would whittle it down to a minimal failing example and keep that as its own test.

Owner

sampsyo May 13, 2025

Really cool (and practical) approach to constructing tests. How many tests did you end up with after going through this process a few times?

Contributor Author

scober May 20, 2025

I added a note about this, but it was roughly 60 randomly generated programs, of which 5-ish failed and were minimized. Although at least one failing program triggered multiple bugs and had to be minimized more than once to different minimal failing examples.

content/blog/2025-05-13-imperative-quantifier-elimination.md


		I chose to test the performance of my pass by testing big batches of function invocations since the running time of individual functions is very short and I was worried that overhead caused by i/o, garbage collection, or other interpreter behaviors would cause too much noise.

		I tested the performance improvement of my pass by taking my randomly generated test programs and running them with significantly more inputs than I tested on (I found that for testing, an input range of $[-10,10)$ was sufficient, but for performance testing I did $[-10 000,10 000)$). I also made sure to generate my inputs in a different process than the one the tested function ran in and to print out the result of each quantifier function to prevent the Python compiler from optimizing away any of the function's work. I collected my numbers using [hyperfine](https://github.com/sharkdp/hyperfine), the file numbers start at 11 because the first 10 randomly generated programs were generated with slightly more restrictive parameters, all numbers are the result of 15 runs and are reported as _mean (standard deviation)_ in milliseconds:

Owner

sampsyo May 13, 2025

While your approach to batching is one way to mitigate the effects of interpreter startup time, FWIW another would be to take replicated measurements inside of the interpreter, à la the standard-library timeit module. [nothing to change here]

Contributor Author

scober May 20, 2025

Do you know if timeit avoids counting gc time?

Owner

sampsyo May 21, 2025

Huh, it appears that it does turn off GC. Fascinating! https://stackoverflow.com/q/4523817

content/blog/2025-05-13-imperative-quantifier-elimination.md Outdated


		One final measure of a compiler pass is how applicable it is to code that people actually write. To measure that, I gathered a bunch of open source Python code. I didn't gather it in any principled way, I just grabbed a bunch from the [GitHub trending page](https://github.com/trending/python?since=daily) and from [this collection of Python projects](https://github.com/vinta/awesome-python).

		According to [cloc](https://github.com/AlDanial/cloc), I searched ~1.68 million lines of Python code and found 0 lines that I could apply my quantifier elimination pass to. Possibly that number could be improved by expanding the pass (by re-including the $P_m$s, for example) but I suspect that people just don't write many Presburger Arithmetic quantifier expressions in their Python code.

Owner

sampsyo May 13, 2025

Not exactly a shocking result, but too bad nonetheless! I had a fun time using a naive text search query to look for other possible candidates: https://github.com/search?q=%2Fany.*%3C.*for+i+in+range%2F+language%3APython&type=code

And yeah, slim pickings. Here are a few examples that are kinda close, maybe, although I didn't think about it very hard:
https://github.com/arin17bishwa/myCP_sols/blob/1e452b4d192b6df9f897602a1341b194aff15079/CF/1363A.py#L73
https://github.com/CalcoDev/aoc-2024/blob/a3ccbc85bf86b98424a76428605e59747a57611c/day2.py#L7

Contributor Author

scober May 20, 2025

Text search is genius! My search was limited to stuff I could actually apply my pass to so it definitely missed out on some things that were possible but not yet supported. The second example you found could definitely be supported with a little (or maybe a lot of) work.

sampsyo added the 2025sp label


          BLOG: Address PR Comments

4b91fa1

scober requested a review from sampsyo

May 20, 2025 15:50

sampsyo approved these changes

View reviewed changes

Owner

sampsyo left a comment

Excellent! Again, nice work on this!

sampsyo merged commit 0f518e2 into sampsyo:2025sp

2 checks passed

scober deleted the blog/project branch

May 31, 2025 13:47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels