-
Notifications
You must be signed in to change notification settings - Fork 216
Description
Motivation and Problem Statement
There has been a flurry of work on verifying numerical methods and, in particular, bounding round-off error. (See FPTaylor, interval analysis approaches, tools listed on FPBench, etc.) These verification tools are often implemented as underdocumented research code. It is unclear how to properly validate these tools in practice. There are also likely bugs. This project aspires to develop techniques to 1) test floating-point verification tools, and 2) find bugs in real world verification software.
Approach
We can't simply compare the output of two different FP verifiers. If the bounds differ, it may be due to differences in precision, rather than soundness issues.
One might want to test FP verification tools by executing programs over their concrete domain (of floats). For example, for a given program
\forall \vec{x}, \llbracket P(\vec{x}) \rrbracket_{real} - \llbracket P(\vec{x} \rrbracket_{float} \leq Bfor some bound
However, observe that many (but not all) verification tools operate over an abstract domain using overapproximate simplifying assumptions. For example the "standard model of floating-point error" assumes that each floating point operation (op x y) can introduce up to
(op_float x y) = (op_real x y) * (1 + e)
for
The essential insight I wish to exploit is that we can find bugs with respect to the overapproximation rather than with respect to the actual IEEE spec. To construct a bug in such a verification tool, it suffices to furnish an counterexample program, input vector
Note that this notion of correctness is with respect to the assumptions and overapproximations used by the floating point verification tool itself, NOT with respect to the IEEE floating point spec. In particular, it is possible for the (faulty) error bound to be accidentally correct with respect to the IEEE floating point spec (as in, no concrete set of inputs to the program will violate the error guarantee under the floating point semantics) but not sound (the error guarantee was constructed in a way that violates the soundness proof and overapproximations assumed by the verification tool).
MVP
For a program
Note: Some verification tools take advantage of special hardcoded facts about floats. For example we know that multiplying by 2 or Sterbenz subtraction do not introduce error. I would need to be careful that we have added all the special hardcoded facts that the verification tool has access to.
Going Further (with a backwards static analysis)
A simple way to improve on the MVP would be to do a backwards static analysis on the program
As an example, consider the following program, whose error terms we wish to maximize:
(a - b) / (c + d)
To maximize the error of the above program, we wish to maximize the error terms of
Going Even Further (with autodiff and smarter fuzzing)
Note that we are still randomly (perhaps uniformly) sampling for good
Going Much Further (somewhat speculative)
Wouldn't it be cool if we could prove that the backwards static analysis suggested was indeed worst-case for the program (up to our overapproximations)? Then we might be able to combine this information with other verification techniques to provide new kinds of floating point error bounds.
Proposed Evaluation
I plan to run this tehnique on all FPBench benchmarks and as many verification tools that take in FPCore programs as input. Success will be measured in number of bugs found.
Team members
Just me; open to others joining.