-
Notifications
You must be signed in to change notification settings - Fork 216
BLOG: Simon's course project blog post #531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sampsyo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work on this, @scober! This was a very cool experiment, and you did a great job of breaking down the steps in the design of the required compiler pipeline. The writeup is extremely clear, to the point where it would be easy for someone to reproduce this work if they wanted to. Awesome!
I just have a few suggestions within.
|
|
||
| It is worth noting briefly that the two formulas are only equivalent when $x$ is restricted to be a real number. If we allowed $x$ to be any complex number (or further restricted $x$ to be an integer!) the two formulas would no longer be equivalent. We might say that the formulas are equivalent in the _theory_ of real numbers. For our purposes, a theory is a formal language together with some axioms defining which formulas constructed from that language are true. | ||
|
|
||
| When a formula can be written with and without quantifiers, the _quantifier-free_ version has some advantages from a computational perspective. The main advantage in the context of formal mathematical statements is that in most theories we care about, the quantifier-free version is decidable while the quantifier-ful version is not. If we squint and pretend our two formulas are computer programs, the quantifier-free version looks like a few arithmetic operations while the quantifier-ful version looks an awful lot like a loop that might never halt! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this explanation for how to view QE from a "programming" perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
|
|
||
| Some theories have known procedures for removing quantifiers from any formula in that theory and, in fact, the Z3 SMT solver has a [quantifier elimination tactic](https://microsoft.github.io/z3guide/docs/strategies/summary/#tactic-qe) which attempts to apply such procedures. | ||
|
|
||
| Of course, performing quantifier elimination cannot make an undecidable formula decidable. If one can remove the quantifiers from a formula, that formula was decidable to begin with (you can't make an undecidable formula decidable without changing its meaning). So the reason Z3 has a quantifier elimination tactic is because it can make the deciding _cheaper_. Even if our "loopy" pseudo-program above eventually terminates, it might take a long time. It would be easier to do the arithmetic instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is absolutely not the most important thing in the world, but is it correct to say that an individual formula can be decidable or undecidable? If every formula is either true or false, then for a given formula P, then one of the following two “decision procedures” must suffice to decide P:
- for all p, decide(p) = true
- for all p, decide(p) = false
In other words, maybe the argument you want to make here is less about individual formulas and more about whether any QE procedure can succeed for all formulas simultaneously…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great point. Decidability is absolutely a property that is properly discussed in relation to classes of formulas, not individual formulas.
|
|
||
| As we saw above, true quantifiers generally translate to loops that may never terminate. In most programming circles, writing loops that you know ahead of time might never terminate is generally frowned upon, so we will need more limited substitutes. We will use the commonly available `any` and `all` functions as our restricted quantifiers. `any` returns `true` if and only if _any_ of its inputs is `true` while `all` returns `true` if and only if _all_ of its inputs are `true` (get it?). This makes `any` an _existential quantifier_ ($\exists$) and `all` a _universal quantifier_ ($\forall$). | ||
|
|
||
| We will also restrict ourselves to a single theory: Presburger Integer Arithmetic. Formulas in Presburger Arithmetic are built from the following language: $[ \neg, \wedge, \vee, \exists, \forall ] \cup \mathbb{N} \cup [ +,-,<,= ] \cup [ P_m : m \in \mathbb{N}]$. Where $P_m$ is a function that returns true if its input is divisible by $m$ and the first three symbols are the standard first order predicate logic _not_, _and_, and _or_. For this project, we will ignore the $P_m$ because the process for removing quantifiers from formulas that contain $P_m$ can result in potentially-expensive-to-compute quantifier-free formulas and we want to make programs faster! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be nice to link to the Wikipedia page? https://en.wikipedia.org/wiki/Presburger_arithmetic
|
|
||
| ##### A Brief Aside On A Technical Hiccup | ||
|
|
||
| When I started out, I was hoping to design my AST-rewriter/optimization pass as a function decorator that would re-write only decorated functions at runtime. Obviously (in hindsight), compiling a single function's hand-mangled AST outside of the context of its enclosing file is difficult to do at all and impossible to do correctly in general. So I pivoted to a design where I parse whole Python source files and write whole Python source files back out. Of course, performing interesting and correct static transformations on untyped Python source files is also impossible (depending on your definition of interesting), so I would call this redesign only a partial victory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nothing to address in your post, just an idle comment]
compiling a single function's hand-mangled AST outside of the context of its enclosing file is difficult to do at all and impossible to do correctly in general
I don’t doubt that this is true, but it makes me wonder how other Python JIT setups (such as Numba) work when they use decorators to mark functions to compile…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also had this question and I think there are two big answers:
- Some (many? all?) decorator-based Python JIT setups target a subset of Python. Numba, for example, will refuse to compile Python code containing
deloreval(or many other operations). - If you have the resources of a larger project, you can invest the time that I couldn't in figuring out the fiddly details. One somewhat popular approach that seems to resolve the closure/namespacing problems of out-of-context compilation (in some common cases) is to make a copy of the whole file, modify the AST of the function you decorate, recompile the whole file separately, then frankenstein the resulting compiled code back into the half-compiled original file.
|
|
||
| Anyone who is curious about performing full quantifier elimination on Presburger Arithmetic formulas is welcome to take a look at [my source for the algorithm](https://amedvedev.ccny.cuny.edu/mathA4400s16/vandenDries.pdf). It is Theorem 4.4.2 and the pass I do in this project is a version of the one presented there. | ||
|
|
||
| The primary difficulty I had in translating a description of the algorithm in a math text to a piece of running code that modified computer programs was the irregularity of Python ASTs. My final design ended up being a series of passes that applied more and more regularity to ASTs until I could perform a simple, final transformation. With that in mind, the rest of this section will be a high-level tour of the various regularizing passes the algorithm performs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[another unimportant remark] I wonder how possible this would be if you were to start with Python bytecode… it would be way harder to reconstruct the necessary hierarchy of closures and comprehensions, probably, but at least the low-level details would be regularized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about this. Operating on Python bytecode would presumably give you AST regularity, but it turns out that a lot of the irregularity is algebraic. In practice I ended up spending way more time on algebraic regularization (e.g. x<y having a different structure from not x>=y) than I did on AST regularization (e.g. p or q or r having a different structure than p or (q or r)).
|
|
||
| The key insight is that of all the upper bound sub-predicates of the form `m*x < <expr>`, only one of them will matter. Only one of them will be the _least upper bound_. Similarly, only the _greatest lower bound_ will actually restrict what values of `x` return `True`. We don't know statically which upper bound is the least upper bound, because the upper bounds may have variables in them, but we can always check at runtime. | ||
|
|
||
| In a pure predicate logic setting, these special bounds can be "calculated" by statically unrolling a big loop that checks if each (lower bound, upper bound) pair is correct. But in Python we can just call `max` and `min`! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance it would be easy to show a Python example like the one above but using max/min to show the result of this final transformation? Is it like this?
any(x < min(…) and x > max(…) for x in range(...))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an example of a fully eliminated program further down, if that example isn't clear please let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
| Compiler passes (even those for class projects) should not change the behavior of the code they modify. This pass does not meet that standard for two reasons: | ||
|
|
||
| 1. As I mentioned above, this is a static transformation of untyped Python code. There is no guarantee that all of the variables in the program are integers or even that `+` does what I expect. So the best we can hope for is that it is correct _for integer inputs_. | ||
| 2. If I found all of the bugs in my implementation with testing I think that might be the first recorded instance of that happening in all of human history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤪
| 1. As I mentioned above, this is a static transformation of untyped Python code. There is no guarantee that all of the variables in the program are integers or even that `+` does what I expect. So the best we can hope for is that it is correct _for integer inputs_. | ||
| 2. If I found all of the bugs in my implementation with testing I think that might be the first recorded instance of that happening in all of human history. | ||
|
|
||
| That being said, I did some testing! Because this pass operates on such a simple sub-language of Python, I built a random program generator and used those programs as my test inputs. I generated around 60 programs and when one of them failed (my pass changed the program behavior) I would whittle it down to a minimal failing example and keep that as its own test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool (and practical) approach to constructing tests. How many tests did you end up with after going through this process a few times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note about this, but it was roughly 60 randomly generated programs, of which 5-ish failed and were minimized. Although at least one failing program triggered multiple bugs and had to be minimized more than once to different minimal failing examples.
|
|
||
| I chose to test the performance of my pass by testing big batches of function invocations since the running time of individual functions is very short and I was worried that overhead caused by i/o, garbage collection, or other interpreter behaviors would cause too much noise. | ||
|
|
||
| I tested the performance improvement of my pass by taking my randomly generated test programs and running them with significantly more inputs than I tested on (I found that for testing, an input range of $[-10,10)$ was sufficient, but for performance testing I did $[-10 000,10 000)$). I also made sure to generate my inputs in a different process than the one the tested function ran in and to print out the result of each quantifier function to prevent the Python compiler from optimizing away any of the function's work. I collected my numbers using [hyperfine](https://github.com/sharkdp/hyperfine), the file numbers start at 11 because the first 10 randomly generated programs were generated with slightly more restrictive parameters, all numbers are the result of 15 runs and are reported as _mean (standard deviation)_ in milliseconds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While your approach to batching is one way to mitigate the effects of interpreter startup time, FWIW another would be to take replicated measurements inside of the interpreter, à la the standard-library timeit module. [nothing to change here]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if timeit avoids counting gc time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, it appears that it does turn off GC. Fascinating! https://stackoverflow.com/q/4523817
|
|
||
| One final measure of a compiler pass is how applicable it is to code that people actually write. To measure that, I gathered a bunch of open source Python code. I didn't gather it in any principled way, I just grabbed a bunch from the [GitHub trending page](https://github.com/trending/python?since=daily) and from [this collection of Python projects](https://github.com/vinta/awesome-python). | ||
|
|
||
| According to [cloc](https://github.com/AlDanial/cloc), I searched ~1.68 million lines of Python code and found 0 lines that I could apply my quantifier elimination pass to. Possibly that number could be improved by expanding the pass (by re-including the $P_m$s, for example) but I suspect that people just don't write many Presburger Arithmetic quantifier expressions in their Python code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly a shocking result, but too bad nonetheless! I had a fun time using a naive text search query to look for other possible candidates: https://github.com/search?q=%2Fany.*%3C.*for+i+in+range%2F+language%3APython&type=code
And yeah, slim pickings. Here are a few examples that are kinda close, maybe, although I didn't think about it very hard:
https://github.com/arin17bishwa/myCP_sols/blob/1e452b4d192b6df9f897602a1341b194aff15079/CF/1363A.py#L73
https://github.com/CalcoDev/aoc-2024/blob/a3ccbc85bf86b98424a76428605e59747a57611c/day2.py#L7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Text search is genius! My search was limited to stuff I could actually apply my pass to so it definitely missed out on some things that were possible but not yet supported. The second example you found could definitely be supported with a little (or maybe a lot of) work.
sampsyo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! Again, nice work on this!
closes #505