Project Proposal: Tractable Validation of Floating-Point Verification Tools

# Motivation and Problem Statement
There has been a flurry of work on verifying numerical methods and, in particular, bounding round-off error. (See FPTaylor, interval analysis approaches, tools listed on FPBench, etc.) These verification tools are often implemented as underdocumented research code.  It is unclear how to properly validate these tools in practice. There are also likely bugs. This project aspires to develop techniques to 1) test floating-point verification tools, and 2) find bugs in real world verification software.

# Approach
We can't simply compare the output of two different FP verifiers. If the bounds differ, it may be due to differences in precision, rather than soundness issues.

One might want to test FP verification tools by executing programs over their concrete domain (of floats). For example, for a given program $P$ and input vector $\vec{x}$, a verification tool might say that:

```latex
\forall \vec{x}, \llbracket P(\vec{x}) \rrbracket_{real} - \llbracket P(\vec{x} \rrbracket_{float} \leq B
```

for some bound $B$. A naive approach would be to search for a concrete vector $\vec{x}$ that violates the above bound.  This might find bugs.

However, observe that many (but not all) verification tools operate over an abstract domain using overapproximate simplifying assumptions.  For example the "standard model of floating-point error" assumes that each floating point operation `(op x y)` can introduce up to $1*e$ error. To be precise, for every operation, the following overapproximation holds for some unit roundoff value $u$:

        (op_float x y) = (op_real x y) * (1 + e)

for $|e| <= u$.

The essential insight I wish to exploit is that we can find bugs with respect to the _overapproximation_ rather than with respect to the actual IEEE spec.  To construct a bug in such a verification tool, it suffices to furnish an counterexample program, input vector $\vec{x}$ and trace of epsilons added at every single floating point operation that violates the error guarantees provided by the verification tool.

Note that this notion of correctness is with respect to the assumptions and overapproximations used by the floating point verification tool itself, NOT with respect to the IEEE floating point spec. In particular, it is possible for the (faulty) error bound to be accidentally correct with respect to the IEEE floating point spec (as in, no concrete set of inputs to the program will violate the error guarantee under the floating point semantics) but not sound (the error guarantee was constructed in a way that violates the soundness proof and overapproximations assumed by the verification tool).

## MVP
For a program $P$, an MVP would be to randomly sample for valid $\vec{x}$. Then, for each valid set of inputs, interpret $P$ with input $\vec{x}$ under an infinite-precision semantics (to mimic the reals) and under an infinite-precision semantics that adds up to $u error at every single operation. Check if the error observed violates the error guarantees given by the verification tool.

Note: Some verification tools take advantage of special hardcoded facts about floats. For example we know that multiplying by 2 or Sterbenz subtraction do not introduce error. I would need to be careful that we have added all the special hardcoded facts that the verification tool has access to.

## Going Further (with a backwards static analysis)
A simple way to improve on the MVP would be to do a backwards static analysis on the program $P$ to find the appropriate directions to point the epsilons at every single floating point operation.

As an example, consider the following program, whose error terms we wish to maximize:

        (a - b) / (c + d)

To maximize the error of the above program, we wish to maximize the error terms of $(a - b)$ and minimize the error terms of $(c + d)$.  To maximize the error terms of $(a - b)$, we need to maximize the error terms in $a$ and minimize the error terms in $b$.  To maximize the error terms of $(c + d)$, we need to maximize the error terms in $c$ and $d$.  We can continue this analysis until we know exactly how to point our error terms at every single floating point operation such that the error terms in the final output of the program is maximized.

## Going Even Further (with autodiff and smarter fuzzing)
Note that we are still randomly (perhaps uniformly) sampling for good $\vec{x}$. This is a search problem.  Also observe that most benchmark floating point programs $P$ are differentiable. To find good $\vec{x}$, we can hook it up to a fuzzer (that tries to increase the error) and run an automatic differentiation tool on $P$ to greedily / locally optimize $\vec{x}$. Explore and exploit.

## Going Much Further (somewhat speculative)
Wouldn't it be cool if we could prove that the backwards static analysis suggested was indeed worst-case for the program (up to our overapproximations)? Then we might be able to combine this information with other verification techniques to provide new kinds of floating point error bounds.

# Proposed Evaluation
I plan to run this tehnique on all FPBench benchmarks and as many verification tools that take in FPCore programs as input.  Success will be measured in number of bugs found.

# Team members
Just me; open to others joining.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Project Proposal: Tractable Validation of Floating-Point Verification Tools #510

Motivation and Problem Statement

Approach

MVP

Going Further (with a backwards static analysis)

Going Even Further (with autodiff and smarter fuzzing)

Going Much Further (somewhat speculative)

Proposed Evaluation

Team members

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Project Proposal: Tractable Validation of Floating-Point Verification Tools #510

Description

Motivation and Problem Statement

Approach

MVP

Going Further (with a backwards static analysis)

Going Even Further (with autodiff and smarter fuzzing)

Going Much Further (somewhat speculative)

Proposed Evaluation

Team members

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions