Dissertation
Dissertation
(Computer Science)
2011
Date
Dean
UMI Number: 3471671
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
UMI 3471671
Copyright 2011 by ProQuest LLC.
All rights reserved. This edition of the work is protected against
unauthorized copying under Title 17, United States Code.
ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346
The undersigned have examined the proposal entitled
presented by
Examining Committee:
ii
Abstract
The first refutationally complete inference systems for first-order logic, called instance-
based systems, were based on Herbrand’s theorem which implies that first-order logic sat-
isfiability can be reduced to propositional logic satisfiability (SAT). Out of this line of
research came the landmark SAT solving DPLL algorithm. Soon after DPLL made its de-
but, Robinson introduced the simple combinatorial resolution rule which detracted interest
in instance-based systems. Recently, with the increase in computational power of the per-
sonal computer, there has been renewed interest in systems for first-order logic theorem
proving that utilize SAT solvers. Here, we present three novel solutions for the first-order
logic validity problem that utilize SAT.
As our first solution, we reduce first-order logic validity to SAT, by encoding a proof
of first-order logic unsatisfiability in propositional logic and use a SAT solver to determine
if the encoding is satisfiable. Specifically we encode a closed connection tableaux proof
of unsatisfiability. A satisfiable propositional encoding implies validity of the first-order
problem. We provide an encoding using SAT, provide soundness and completeness proofs,
and discuss our implementation, called CHEWTPTP-SAT, along with results. We also give
an encoding in SMT, provide soundness and completeness proofs, discuss our implemen-
tation, CHEWTPTP-SMT, and discuss when then encoding in SMT may be better than an
encoding in SAT.
iii
plete system. We allow a set of clauses, S, to be distributed among two sets P and R in any
combination so long as S = P ∪ R. SInst-Gen is run on P with conclusions added to P ,
resolution is run on R with conclusions added to R, and resolution is run on P × R with
along with experimental results. We also discuss a newer implementation, called EVC3,
which combines the SMT solver named CVC3 with the equational theorem prover, E.
Our third solution establishes a framework, called Γ + Λ, which allows a wide range
of first-order logic calculi to be combined into a single sound and refutationally complete
system. This framework can be used to combine instance generation methods that use SAT
with other inference systems. In order to combine two systems, Γ and Λ, into a single sys-
tem, we require Γ to be productive and require Λ to have both the lifting and total-saturation
properties. This framework allows a set of clauses to be distributed among two sets, P and
R, like SIG-Res, so that Γ can be used on P and Λ can be used on R. A limited amount
of information is passed between the systems to establish completeness of the combined
system. We give the inference rules for Γ + Λ, establish soundness and completeness, and
show how Inst-Gen-Eq and superposition can be combined in this framework.
iv
Acknowledgments
During these last six years, while working toward the degree of Doctor of Philosophy,
I have been blessed to have the support of many truly great family members, colleagues
and friends. I thank you all! You have taught me perseverance, modesty and compas-
sion. You have provided encouragement, knowledge, respect and love. You have given me
opportunities that few in this world have been given. For all this, I am indebted to you.
Ahmad Almomani
Susan Blanchard
Elmer Deshane
Howard Deshane
Deena & Brian Donnelly
Michael Felland
Claudette Foisy
Dr. Kathleen Fowler
Michael Fowler
Dr.s Illona & Donald Ferguson
Dr. Daqing Hou
Dr. Christopher Lynch
Dr. James Lynch
Dr. Alexis Maciel
Dr. Jeanna Matthews
Judge Earl McBride
Pamula & Ralph McGregor
Gary McGregor
Sara Morrison
The Brothers of Phi Kappa Sigma
Dr. Joseph Skufca
Janice Searleman
Cindy Smith
Dr. Christino Tamon
Elizabeth Thomas
Dean Peter Turner
Thank you!
v
Contents
Abstract iii
Acknowledgments v
1 Introduction 1
1.1 Techniques for Automated Theorem Proving . . . . . . . . . . . . . . . . . 2
1.1.1 Instance Generation Based Systems . . . . . . . . . . . . . . . . . 2
1.1.2 Resolution Based Systems . . . . . . . . . . . . . . . . . . . . . . 6
2 First-Order Logic 17
2.1 First-Order Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Normal Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Substitutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
vi
3 Encoding First-Order Proofs in SAT and SMT 22
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Propositional Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.1 Jeroslow Constant . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.2 Term Orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.3 Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.4 Closures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Semantic Selection Instance Generation and Ordered Resolution . . . . . . 59
4.3 SIG-Res . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5 Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
vii
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5 The Γ + Λ Framework 74
5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2 Transforming Inference Rules to Support Hypothetical Clauses . . . . . . . 78
5.3 Γ + Λ Inference Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6 Conclusion 96
viii
List of Figures
5.1 λ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2 λ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3 Learn and Delete Inference Rules . . . . . . . . . . . . . . . . . . . . . . 80
5.4 SIG-Sup Inference Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.5 EVC3 System Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
ix
List of Tables
x
Chapter 1
Introduction
Reasoning is the act of learning from the use of models and information [83]. We reason
consciously and subconsciously. We use reason to form intuition, to form habit and we
reason to determine causality, a necessity for survival. We also reason to understand rela-
tionships among entities we perceive and conceive to better understand the universe around
us.
Since antiquity we have also sought to understand the nature of valid reasoning itself
in the study of logic. Since Aristotle [1], attempts have been made to model reasoning in
an effort to mechanize reasoning with the goal of eliminating the errors found in human
reasoning. More recently, with the advent of the computer many have sought to automate
certain forms of reasoning.
Many different logics have been developed over time including propositional, temporal,
modal, Hoare, first-order logic and higher-order logics. In 1930, Gödel [19] proved that
there exist complete and sound inference systems for first-order logic and in 1936 and 1937,
Church and Turing [22, 24], respectively, proved independently that first-order validity is
undecidable, thereby establishing that first-order validity is semi-decidable. So the best
that we can do is construct a sound and complete inference system for first-order logic,
that when given a valid formula, will eventually halt and output an indication of validity.
1
However if an invalid formula is given, the system may never halt.
One subset of the automated reasoning (AR) community focus’ on automated theo-
rem proving (ATP), that is, the study of systems that semi-decide the first-order validity
problem.
Since φ is valid iff ¬φ is unsatisfiable, many ATP systems alternatively seek to prove
unsatisfiability rather than validity. A formula is unsatisfiable if no model exists which
makes the formula true. If a system, when given an unsatisfiable problem, will eventually
halt and output unsatisfiable, we call the system refutationally complete.
Today’s refutationally complete ATP systems are based on a wide array of techniques.
Some seek to provide a proof of unsatisfiability (e.g. resolution and tableaux proof en-
coding) while others attempt to prove unsatisfiability by showing no model exists (e.g.
instance generation). These ATP systems can be used as general theorem provers or as
back end provers in larger software systems such as software verification tools.
Below we discuss some common techniques for first-order theorem proving, then dis-
cuss our contributions in ATP and our current work and conclude with an outline of this
dissertation.
Most modern automated theorem proving systems are based on superposition, semantic
selection instance generation, or tableaux methods. In this section we describe each of
these methods and provide a brief history of the development of each.
Instance generation techniques seek to find a set of ground (variable free) instances for the
input problem which are unsatisfiable when viewed as a propositional logic formula via
a SAT solver. If the SAT solver returns unsatisfiable, the input problem is unsatisfiable.
2
If the SAT solver returns satisfiable and new instances can be generated (according to the
calculus) they are added and the process is repeated, otherwise the input problem is found
to be satisfiable.
One of the first implemented general first-order logic theorem provers was created in
1960 by Davis and Putnam [29]. Their procedure was based on Herbrand’s Theorem [21]
which shows that first-order logic unsatisfiability can be reduced to propositional logic
unsatisfiability.
We say the Herbrand Universe of a formula φ is the set of all ground terms that can
be constructed using the function symbols and constants in φ (if no constant is defined
we include a distinguished constant). For example, if P is a predicate symbol, f is a
function symbol, a is a constant and x is a variable, the Herbrand Universe for P (f (x), a)
is {a, f (a), f (f (a)), f (f (f (a))), ...}. Herbrand’s Theorem can be stated as follows: a first-
order formula φ is unsatisfiable if and only if there exists a finite set of ground instances
(replacing variables with elements of the Herbrand Universe) of φ is unsatisfiable. Since
all of the instances in the set being checked for satisfiability are ground, the formula in
this set can be viewed as a propositional formula whose satisfiability can be checked by a
3
Then we seek to show that ¬φ is unsatisfiable. In this trivial problem, to determine the
unsatisfiability of ¬φ using Herbrand’s Theorem we simply need to create a single instance
of ¬φ, namely, ¬φ [Formula 1.3].
(¬P ∨ Q) ∧ P ∧ ¬Q (1.4)
We derive ¬φ from ¬φ by replacing all instances of the variable x by the constant
a. Since ¬φ is unsatisfiable when viewed as a propositional formula [Formula 1.4], by
Herbrand’s Theorem ¬φ is unsatisfiable as a first-order formula, thus φ is valid.
Davis and Putnam’s procedure (although naively) incrementally adds all possible
ground instances to the set being checked for propositional satisfiability until unsatisfia-
bility is detected. Although their choice to consider all ground instances was found to
be unnecessary, their choice of using conjunctive normal form (CNF) along with their
procedure for satisfiability testing at the propositional level, and the improved procedure
by Davis, Logeman and Loveland [34], now called the DPLL procedure, was revolution-
ary and is the basis for the most efficient satisfiability (SAT) solvers today. Today’s SAT
solvers based on the DPLL procedure can efficiently handle tens of thousands of variables
and clauses [108].
Since Davis and Putnam’s work, there have been a great many advances in theorem
provers based on Herbrand’s Theorem, which are called instance-based theorem provers.
One line of research has been in saturation-based instance generation methods [106].
These methods, like Davis and Putnam’s method, repeatedly call upon a SAT solver to
determine the satisfiability of the current set of ground instances. If unsatisfiable the solver
halts and indicates such, otherwise the set may be augmented with additional instances and
again checked for satisfiability.
4
Notable in this line of research is the work by Prawitz [30] who showed that not all
elements in the Herbrand Universe need be used. Elements in the Herbrand Universe
were chosen using a matching scheme among complementary literals [53]. This technique
evolved into what is now known as unification. In [35], Davis combined the matching
technique of Prawitz with the techniques in his previous paper.
A lull transpired in this line of research, while the focus was on resolution based sys-
tems, until the late eighties when computational power began to increase significantly and
methods based on DPLL and SAT again attracted appeal. In 1988 Jeroslow [63] published
a new saturation-based instance generation method called Partial Instantiation (PI). In this
work he proved that not all variables in the non-ground instances need be replaced by
terms in the Herbrand Universe. Substitutions can be identified using the matching tech-
nique and those variables that are not involved in the unification can simply be mapped to
a distinguished constant, the Jeroslow constant. Hooker, Rago, Chandru and Shrivastava
are noteworthy for developing the first complete PI method for the full first-order predicate
calculus called Primal PI [99] which is now known as instance generation with seman-
tic selection. Ganzinger and Korovin, among many other contributions, formalized and
proved the completeness of the instance generation inference rule (Inst-Gen) and instance
generation with semantic selection and hyper inferences (SInst-Gen) in [106].
Saturation-based instance generation methods perform especially well on problems that
are near to propositional logic where the efficiency of the SAT solver is exploited. One such
class of first-order logic problems is Effectively Propositional Logic (EPR), also know as
the Bernays-Schönfinkel class [18]. Formulas in this class have the form ∃∀φ where φ is
quantifier-free and function-free. As seen in the results from the annual Conference on Au-
tomated Deduction (CADE) ATP System Competition (CASC) [131], the saturation-based
instance generation theorem prover iProver [142] has won the EPR category since 2008.
It should be noted that the success of iProver, and thus a defense of instance generation
solvers as a viable general first-order theorem proving technique, has not only been in the
5
EPR class, but iProver has also ranked in the top 5 in the FOF (first-order formula) and
CNF (conjunctive normal form) divisions, the classes of general first-order formulas, in
the 3 most recent CASC competitions. This is evidence that techniques in saturation based
Another well known line of research in general first-order logic theorem proving began
with Robinson’s landmark paper [36] which describes his resolution principle and unifica-
tion algorithm which computes the most general unifier (mgu) between two atoms. Here
L∨Γ ¬K ∨ Δ (Resolution)
(Γ ∨ Δ)σ
where σ = mgu(L, K)
L ∨ K ∨ Γ (Factoring)
(L ∨ Γ)σ
where σ = mgu(L, K)
6
A key refinement of resolution called ordered resolution (formerly A-ordered resolu-
tion) was made in [42, 92] which restricts the search space significantly. With ordered res-
olution, resolution inferences are only made when complementary maximal literals, based
on an ordering of terms, from different clauses are unifiable. Ordered resolution can be
efficient in practice, because it tends to produce literals in the conclusion of an inference
that are smaller than in the premises. This is not always the case however, because the most
general unifier may prevent that, but it often happens in practice. For the simplest exam-
ple, consider a set of clauses consisting of one non-ground clause C = ¬P (x) ∨ P (f (x))
and any number of ground clauses. Any ordered resolution inference among two ground
clauses will produce another ground clause which does not introduce any new literals. Any
ordered resolution inference between a ground clause and C will produce a new ground
clause where an occurrence of the symbol f has disappeared. This will clearly halt. This
reduction property does not hold for instance generation methods. If this set of clauses is
fed to an instantiation-based prover, it may run forever, depending on the model created by
the SAT solver. In our experiments, this does run forever in practice.
→ x x (reflexivity)
x y → y x (symmetry)
x y∧y z → x z (transitivity)
x1 y1 ∧ ... ∧ xn yn → f (x1 , ..., xn ) f (y1 , ..., yn ) (monotonicity I)
x1 y1 ∧ ... ∧ xn yn
∧P (x1 , ..., xn ) → P (y1, ..., yn ) (monotonicity II)
Figure 1.2: Equality Congruence Axioms
Many other refinements have been made to Robinson’s resolution inference system,
e.g. semantic resolution [38] and set of support resolution [48]. As early as 1960 [32],
researchers have also constructed ways to handle equality directly in the calculus. A naive
approach to handling equality would be to include the congruence axioms 1 (Figure 1.2)
for the equality predicate in the problem and use resolution and factoring inferences.
1
One monotonicity I axiom is added for each non-constant n-ary function symbol in the language and one
monotonicity II axiom is added for each predicate symbol in the language
7
This however causes an explosion in the search space.
To avoid this, in 1969, Wos and Robinson in [41] introduced the inference rule called
paramodulation (Figure 1.3) that has been a cornerstone of modern resolution based theo-
rem provers that handle equality. In their work they proved that the paramodulation infer-
ence rule along with resolution, factoring and additional function reflexivity axioms 2 form
a refutationally complete system.
L[u]p ∨ Γ l r∨Δ
Paramodulation 3
(L[r]p ∨ Γ ∨ Δ)σ
where σ = mgu(l, u)
f (x) = x P (f (a))
x → a
P (a) ¬P (a)
⊥
Brand showed in [47] (later by Peterson in [54]) that paramodulation, resolution and
factoring alone form a refutationally complete system and that paramodulation into vari-
ables, which is allowable in Wos and Robinson’s work, is not necessary [47].
In 1970, working independently from Wos and Robinson, Knuth and Bendix published
the first ordered paramodulation calculus for systems of equations (the word problem)
called Knuth-Bendix completion [44]. Later, in 1987, Rusinowitch and Hsiang [60] (and
8
Peterson, in 1983, combined the works of Wos and Robinson with that of Knuth and
Bendix to produce what is known as ordered paramodulation (Figure 1.4), a refutationally
complete system for first-order logic that uses term orderings to restrict paramodulation
inferences [54].
L[u]p ∨ Γ l r∨Δ
(Ordered Paramodulation)
(L[r]p ∨ Γ ∨ Δ)σ
where
1. σ = mgu(u, l)
2. u is not a variable
3. rσ lσ
In 1991, Pais and Peterson [67] introduced maximal paramodulation (Figure 1.5)
which utilizes term orderings as well as orderings on literals [91].
L[u]p ∨ Γ l r∨Δ
(Maximal Paramodulation)
(L[r]p ∨ Γ ∨ Δ)σ
where
1. σ = mgu(u, l)
2. u is not a variable
3. rσ lσ
4. L[u]p σ is maximal w.r.t. in (L[u]p ∨ Γ)σ
5. (l r)σ is maximal w.r.t. in (l r ∨ Δ)σ
The next refinement called superposition considers all literals as equations. Here the
selected literal from the left premise is of the form s t or s t. We include the case for
the former in Figure 1.6. In 1988, Zhang and Kapur were the first to propose this rule [62].
Bachmair and Ganzinger showed in [64] that Zhang and Kapur’s system was incomplete in
the presence of tautology elimination, and provide a complete superposition calculus which
includes the additional equality factoring inference rule.
9
s[u]p t ∨ Γ l r∨Δ
(Superposition)
(s[r]p t ∨ Γ ∨ Δ)σ
where
1. σ = mgu(u, l)
2. u is not a variable
3. tσ s[u]σ
4. rσ lσ
5. (s[u]p t)σ is maximal w.r.t. in (s[u]p t ∨ Γ)σ
6. (l r)σ is maximal w.r.t. in (l r ∨ Δ)σ
more efficient in practice, there are some classes of problems that are suited better for
instantiation-based methods. As stated above, instantiation-based methods work especially
well on problems that are close to propositional problems, because then the key technique
is the DPLL procedure in the SAT solver. On the EPR class of first-order logic problems
A tableau is a finitely branching tree with nodes labeled by formulas. Tableaux have been
studied as far back as 1955 in works by Beth [25] and Hintikka [26] as a way to represent
refutation proofs in first-order predicate calculus theorem proving. For example a proof for
p(a)
¬p(x) q(x)
q(a)
10
In standard refutational theorem proving we attempt to prove the unsatisfiability of a
set of clauses and allow an unbounded number of renamed instances of each clause. In
rigid theorem proving, only one instance of each clause is allowed. The tableau above is an
example of a rigid proof (in the form of a rigid tableau). Rigid theorem proving has been
studied as early as [45, 50] and can be used to solve the general theorem proving problem
as described in [88]. To solve the general theorem proving problem we can repeatedly call a
rigid theorem proving solver, each time adding additional renamed instances of the original
clauses, until a proof is found [126].
Our work has focused on three novel methods for general first-order theorem proving that
utilize SAT. The first line of research solves the general first-order logic theorem proving
problem by encoding the existence of rigid tableaux proofs in propositional logic and us-
ing an external SAT solver to determine the satisfiability of the encodings. We extended
this work by encoding the existence of rigid tableaux proof in propositional logic modulo
theories with the hopes of reducing the size of the encoding. The second line of research
combined the instance generation and resolution inferences systems into one refutationally
complete inference system. The third line, provides a general framework for combining
inference systems and show that Inst-Gen-Eq and superposition can be combined in this
framework. Below we give a general overview of each of these systems and describe my
personal contributions.
Rather than directly computing a first-order logic refutation proof, one can encode the ex-
istence of a first-order logic refutation proof in propositional logic and establish the satisfi-
ability of the encoding using a SAT solver. In this case, if the SAT solver returns satisfiable
11
then a refutation proof exists for the first-order formula. This is an attractive solution to
the general first-order theorem proving problem, especially with the advancement of SAT
solving technology and computing power, since the majority of computational work is done
general first-order theorem proving. We also provide a proof of its completeness, describe
our implementation called ChewTPTP-SAT and give results.
ChewTPTP-SAT is a sound and complete first-order theorem prover. While we are able
to identify problems that ChewTPTP-SAT is able to solve and other theorem provers are
I contributed to the development of the encoding, wrote the Lex/Yacc TPTP parser for the
implementation, ran the experimental tests and compiled the results, wrote the soundness
and completeness proofs and authored the paper that we submitted for publication.
In [128], a joint work with Bongio, Katrak, Lin and Lynch, we state that our original
ChewTPTP-SAT implementation [126] performed well on some problems, but some of the
encodings created huge sets of clauses. Some parts of our encoding represented choices
made, such as which clause to extend each literal with, and other parts of our encoding
represented deterministic procedures, such as deciding the consistency of unification con-
straints and deciding the acyclicity of the DAG which verifies that a particular property
12
holds of the DAG. In our solver, ChewTPTP-SAT, we had an eager encoding of unifica-
tion and acyclicity. In experimental results with Horn clauses (clauses containing no more
than one negative literal), approximately 99% of the clauses generated were encoding the
Satisfiability modulo Theories (SMT) [119] and replaced Minisat[108] with the SMT solver
Yices [147].
Below we discuss our encoding for a tableaux proof in propositional logic modulo The-
ories (datatypes and arithmetic). We discuss our implementation called ChewTPTP-SMT,
compiled the results, wrote the soundness and completeness proofs and authored the paper
that we submitted for publication. In addition, using CHEWTPTP-SAT as a code base, I
wrote a significant amount of the new code for CHEWTPTP-SMT.
13
1.2.3 Combining Instance Generation and Resolution
Both instance generation and resolution methods have a long history in the automated rea-
soning community and are the basis for the most advanced modern ATP systems. Both
types of systems have their strengths and weaknesses. As we saw above, instance-based
provers may run forever on the satisfiable problem consisting of a single non-ground clause
¬P (x) ∨ P (f (x)) combined with any number of ground clauses and resolution-based sys-
tems may run forever on problems in EPR.
Below and in [136], a joint work with Lynch, we showed that we can combine both
instance generation and resolution into a single refutationally complete inference system
with the aim of getting the best of both methods. We proposed the inference systems
SIG-Res which combines semantic selection instance generation (SInst-Gen) with ordered
in SInst-Gen, while inferences that involve clauses in R are resolution inferences. Unsatis-
fiability is witnessed by the unsatisfiability of P under Inst-Gen or the unsatisfiability of R
under resolution. Satisfiability is witnessed by a saturated system with P being satisfiable
under Inst-Gen and R not containing the empty clause.
Spectrum is a sound and complete implementation of SIG-Res. More work however,
needs to go into its development in order to make it competitive with state-of-the-art the-
orem provers. Specifically, we need to include more redundancy elimination techniques.
Even though our implementation is not as sophisticated as state-of-the-art provers, we iden-
tify a class of problems that when run on Spectrum are solved faster by SIGRes than by
SInstGen or resolution alone. We discuss these results and suggest improvements below.
Personal contributions to this work are the soundness and completeness proofs for
SIG-Res, primary authorship of the published paper and sole developer of Spectrum.
14
1.2.4 Γ + Λ
In joint work, also with Lynch, we have developed a framework called Γ + Λ which allows
two different (or possibly the same) inference systems to be combined into a single system.
The only conditions on Γ and Λ are that they both be sound and refutationally complete, Γ
be productive and Λ have the lifting and total-saturation properties. When these conditions
are met, the resulting system is sound and refutationally complete.
Like SIG-Res clauses are partitioned into two sets, P and R. Here, in a fair way,
We present the inference system for Γ + Λ, discuss the soundness and provide a proof
of its completeness. We also demonstrate how Inst-Gen-Eq and superposition can be com-
bined into a single inference system, which we call SIG-Sup, when viewed in the Γ + Λ
framework.
developed. Currently, EVC3 is a sound and complete implementation of SIGRes but does
not yet implement SIGSup.
Contributions to this work include all that is written in Chapter 5. This includes the
formal description of the Γ + Λ framework, the completeness proof, the formal description
of SIG-Sup, and our current work on EVC3. We note that it is our intent to submit a version
of Chapter 5 for publication.
15
1.3 Outline of this Dissertation
In Chapter 2 we present the first-order predicate calculus and definitions for terms used
throughout this dissertation. In Chapter 3 we discuss our method of using propositional
encodings of tableaux proofs for first-order theorem proving and our implementation
CHEWTPTP-SAT as well as our propositional encodings (modulo theories) and our imple-
mentation CHEWTPTP-SMT. In Chapter 4 we discuss our inference system SIG-Res and
our implementation Spectrum and in Chapter 5 we discuss the Γ + Λ framework, SIG-Sup
and our initial work on an implementation called EVC3. A conclusion is given in Chapter
6.
16
Chapter 2
First-Order Logic
The alphabet for first-order logic consists of functions symbols, predicate symbols, vari-
ables, the quantifiers ∀ (universal) and ∃ (existential), the logical connectives ¬ (negation),
∨ (disjunction), ∧ (conjunction), → (implication), ↔ (if and only if), and parentheses.
A signature, Σ, is a triple (F , P, α), where F is a set of function symbols, P is a set of
n-ary function symbol and t1 , . . . , tn are terms then f (t1 , . . . , tn ) is a term. We denote the
(size) number of symbols in term t as |t| and the number of occurrences of the variable x
in term t as |t|x .
If P is any n-ary predicate symbol and t1 , . . . , tn are terms then P (t1 , . . . , tn ) is an
atomic formula (atom). As a special case we allow the infix predicate symbol and if ti
and tj are terms then ti tj is an atom. The negation of the equality atom ti tj is written
ti tj .
17
2.1 First-Order Formula
Well-formed formulas in first-order logic are constructed using the standard rules of for-
mula construction given in Figure 2.1 [43].
A literal is either an atom (positive literal) or the negation of an atom (negative literal).
Suppose L is a literal. The complement of L is denoted L and is defined as follows: if
L = A for some atom A then L = ¬A else if L = ¬A for some atom A then L = A. If L
and K are literals with the same predicate symbol and opposite signs we say that L and K
of clauses. We denote by ∅ the empty set and denote the empty clause by ⊥. We define a
Horn clause as a clause which contains at most one positive literal. A clause which contains
only negative literals is called a negative clause.
A formula is in negation normal form if it does not contain the logical connectives → and
↔ and if all negations are applied to atoms. Any well-formed formula in the first-order
logic can be transformed into a logically equivalent formula in negation normal form by
18
exhaustively applying the transformation rules in Figure 2.2 (in any order) [89]:
1. (A → B) (¬A ∨ B)
2. (A ↔ B) ((¬A ∨ B) ∧ (¬B ∨ A))
3a. ¬(∀v)A (∃v)¬A
3b. ¬(∃v)A (∀v)¬A)
4. ¬¬A A
5a. ¬(A ∧ B) (¬A ∨ ¬B)
5b ¬(A ∨ B) (¬A ∧ ¬B)
Figure 2.2: Negation Normal Form Transformation Rules
When a formula contains two or more variables that have the same name but are under
the scope of different quantifiers we apply a renaming to the formula to give the variables
under the scope of the innermost quantifier fresh variable names. For example we transform
[134].
1
If Φ is a first-order formula and l and r are terms then Φl→r denotes the formula Φ with all occurrences
of l replaced by r
19
1. Convert to negation normal form
2. Rename apart
3. Skolemize
4. Remove universal quantifiers
5. Distribute ∨s over ∧s 2
2.3 Substitutions
general unifier and can be accessed in any number of ways. If such a unifier exists, we say
that L and K are unifiable. A most general unifier of L and K, denoted mgu(L, K), is
a unifier, σ, of L and K such that for every unifier, τ , of L and K, there exists some
substitution ρ such that τ = σρ over the variables of L and K. A renaming is an injective
substitution that maps variables to variables and we say that two literals are variants if there
exists a renaming which unifies them.
A substitution that maps at least one variable of an expression E to a non-variable term
is called a proper instantiator of E. We say that a clause C is a (proper) instance of clause
C if there exists some (proper instantiator) substitution σ such that C = C σ. For a set of
clauses S, we denote the set of all ground instances of the clauses in S as Gr(S).
2
We distribute ∨s over ∧s using A∨(B ∧C) (A∨B)∧(A∨C) and (B ∧C)∨A (B ∨A)∧(C ∨A)
for all formula A, B and C
20
Algorithm 1: Unify(Term p, Term q)
Let Q be an empty queue of pairs of terms
Let mgu be an empty list of pairs of terms
Push(Q, (p, q))
while Q is not empty do
(s, t) := Pop(Q)
if s is a term of the form f (s1 , ..., sn ) then
if t is a variable then
Push(Q, (t, s))
else if t is a term with a different top symbol than f then
Return false
else
Check the unifiability of the pairwise arguments of s and t
else if s is a variable then
if t is the same variable as s then
Continue at top of while loop
else if s occurs in t then
Return false
else
Replace all instances of s in U and mgu with t and Push(mgu, (s, t))
Return true
21
Chapter 3
An impressive success recently in theorem proving has been the efficiency of SAT solving
methods based on the DPLL method [29, 34]. The success of these methods seems to be
based on the fact that the data structures are defined in advance and an exponential number
encode rigid first order unsatisfiability, which only requires us to encode that each clause
appears at most a fixed number of times. Rigid unsatisfiability has been studied as early as
[45, 50].
22
In order to encode a rigid proof, we need to decide what kind of proof should be en-
coded. We chose to encode a connection tableau proof because all of the clauses in the
proof are instances of the original clauses whereas resolution proofs introduce clauses not
contained in the original set. A set of first order clauses is rigidly unsatisfiable if and only if
there exists a closed rigid connection tableau for that set of clauses [88]. Our method uses
this fact and solves the unsatisfiability of a set of rigid clauses by encoding the existence of
a rigid connection tableau in SAT.
The idea of the encoding is the following. We encode the existence of a clause as the
root of the connection tableau. We encode the fact that every literal assigned to a non-leaf
node is extended with a clause containing a complementary literal. Those things are easy
to encode, and do not take up much space. There are three things which are more costly to
encode.
First, we must encode the fact that two literals are complementary, in other words that
their corresponding atoms are unifiable. For that, we basically have to encode a unification
algorithm. In our encoding of unification, we leave out the occurs check, because it is
expensive, and because it rarely occurs. We add a check for this after the SAT solver
returns the truth assignment. If there really is an occurs check, we add a propositional
clause and call the SAT solver again.
Second, the above encoding leaves open the possibility that the connection tableau is
infinite. Therefore, we must encode the fact that the connection tableau is finite, i.e., that
the connection tableau contains no cycle.
Third, we must encode the fact that every literal assigned to a leaf node is closed by a
previous literal on its branch. Our encoding is simpler for the Horn case, because it is only
necessary to close a literal with the previous one on the branch. For the non-Horn case, we
must encode the fact that there is a complementary literal higher up in the tree. Since the
same clause may occur on two different branches, and a literal on that clause may close
with different literals on different branches, we may need to add more than one copy of a
23
clause in the rigid non-Horn case, because of the fact that the literal is closed differently.
But we still try to get as much structure sharing in our tree as possible. Note that rigid
Horn clause satisfiability is NP -complete, but Rigid non-Horn clause satisfiability is Σp2 -
complete1 [68]. So it is not surprising that a SAT solver cannot solve rigid non-Horn clause
satisfiability directly.
Once we construct a propositional encoding of a rigid connection tableau proof for a
first-order formula, we can utilize a SAT solver to establish the satisfiability of the encod-
ing. Since SAT is a decidable problem, the SAT solver will return rigidly SATISFIABLE if
a rigid connection tableau exits, and rigidly UNSATISFIABLE if a closed rigid connection
tableau does not exist. If satisfiable, we can recover the rigid connection tableau proof from
the truth assignment returned by the SAT solver.
Since we encode rigid proofs, the proof of unsatisfiability of a set of clauses may require
augmenting the encoding using additional fresh variants of each clause. However, there are
also applications which only require rigid proofs [122]. The SAT encodings are given in
Section 3.2 and algorithms follow.
In [128] we encode a rigid connection tableaux proof of first order unsatisfiability in
SMT rather than SAT. We expresses choices involved in building the tableau using propo-
sitional logic, and verification of unification and acyclicity using underlying theories. In
our implementation called ChewTPTP-SMT we utilize the SMT solver called Yices.
Yices has a theory for recursive datatypes, which can be used to represent terms. A
term can be defined by using constructors as function symbols. Each function symbol
in E, then we assert an inequality xu < xv for some real numbers xu and xv . Then G
p
Σp2 := N P Σ1 := N P N P is the set of decision problems solvable by a Turing machine in N P that is
1
augmented by an oracle in N P
24
is acyclic if and only if the set of inequalities is consistent. The encodings in SMT can
be found in Section 3.3 with a description of our implementation ChewTPTP-SMT and
experimental results to follow.
3.1 Preliminaries
The alphabet for propositional logic formula consists of propositional variables and the
Definition 1 Rigid clausal tableaux are trees with nodes labeled with literals, branches
labeled either open or closed, and edges labeled with zero or more assignments. Rigid
clausal tableaux are inductively defined as follows. Let S = {C1 ...Cn } be a set of clauses.
If T is a tree consisting of a single unlabeled node (the root node) N then T is a rigid
clausal tableau for S. The branch consisting of only the root node N is open. If N is a leaf
node on an open branch B in the rigid clausal tableaux T for S and one of the following
inference rules are applied to T then the resulting tree is a rigid clausal tableaux for S.
(Expansion rule) Let Ck = Lk1 ∨ ... ∨ Lki be a clause in S. Construct a new tableaux
T by adding i nodes to N and labeling them Lk1 through Lki . Label each of the i branches
open.
(Closure rule) Suppose Lij is the literal at N and for some predecessor node M with
literal Lpq there exists some most general unifier σ such that Lij σ = ¬Lpq σ and the assign-
25
ments of σ are consistent with the assignments of T . Construct T from T by labeling the
edge from Lpq to Lij with the assignments used in the unification and by closing the branch
of N.
We call the clause which is added to the root node the start clause and we say that a
clause is in a tableau if the clause was used in an application of the expansion rule.
Definition 2 A clausal tableaux is connected if each clause (except the start clause) in the
tableaux contains some literal which is unifiable with the negation of its predecessor [93].
Definition 3 (Extension Rule) Let N be a node in the tableau T and let Ck be a clause in
S such that there exists a literal Lki in Ck which is unifiable with the negation of N. Apply
the expansion rule with Ck and immediately apply the closure rule with Lki .
The calculus for connection tableaux (or model elimination tableau [93]) consists of the
expansion rule (for the start clause only), the closure rule, and the extension rule. We call a
tableau closed if each leaf node has been closed by an application of the closure rule.
By [93] we can require that the start clause be a negative clause since there exists a
negative clause in any minimally satisfiable set.
Unless otherwise stated, we let F be a set of first order logic formulas. The main problem
A result of Tableau Theory is the completeness and soundness of closed (rigid) connec-
tion tableaux.
26
Theorem 1 There exists a closed (rigid) connection tableau for F iff F is (rigidly)
unsatisfiable[88].
Our method to determine the rigid satisfiability of F generates a set of propositional logic
clauses for F which encodes a closed rigid connection tableau for F . We provide two
encoding, the first for problems which contain only Horn clauses and the second for those
containing non-Horn clauses. Given F , we give unique symbols to each of the clauses in
F and each of the literals in each clause. We represent clause i by Ci . We represent the
j th literal in clause i by Lij (which is used to label the tableaux). Note that as multiple
copies of a clause may appear in a rigid connection tableau, multiple nodes may have the
same literal label. And whereas the same literal may appear in distinct clauses, they are
identified with different labels. We denote Aij to be the atom of Lij . Therefore Lij is either
We define the variables cm , lmn , emnq , uk , pmq as follows: Define cm = T iff Cm appears in
the tableau. Define lmn = T iff Lmn is an internal node in the tableau. Define emnq = T iff
cm (3.1)
Cm is a negative clause
If Cm appears in the tableau and Lmn is a negative literal then Lmn is an internal node
27
in the tableau:
¬cm ∨ lmn (3.2)
If Lmn is an internal node in the tableau then for some qj , Cqj is an extension of Lmn :
where {Cq1 ...Cqk } represents the set of all clauses whose positive literals are unifiable with
Lmn
If Cq is an extension of Lmn then Cq exists in the tableau:
¬emnq ∨ cq (3.4)
If Cq is an extension of Lmn and τ is an assignment of the mgu used to unify Aqr with
Amn then τ is implied by the mgu:
If for two assignments x = s and x = t there does not exist a mgu θ such that sθ = tθ
then both assignments can not be true:
28
Transitivity of the path relation:
¬pmm (3.10)
For non-Horn problems we use an alternative set of variables and generate a different set
of clauses.
We define the variables sm , cmn , lmn , emnqj , uk , pmq , and qmnij as follows. Define
sm = T iff Cm is the start clause. Define cmn = T iff Cm appears in the tableau and Lmn
is used to close its parent. Define lmn = T iff Lmn is a node in the tableau and is not a leaf
node created by an application of the closure rule. Define emnqj = T iff Cq is an extension
of Lmn and Lmn is used to close Lqi . Define uτ = T iff τ is an assignment implied by
the unifiers used in the applications of the closure rules. Define oijkl = T iff Lkl is used
to close Lij . Define pmq = T iff there exists a path from a literal in Cm to Cq . Define
qmnij = T iff Lmn is a leaf and Lmn is closed by a literal between the root node and Lij .
The clauses are as follows.
There exists a start clause in the tableau which only contains negative literals:
sm (3.11)
sm is a negative clause
29
If Cm is the start clause in the tableau then each literal Lmn of Cm is in the tableau:
If Ci appears in the tableau and Lij is the complement of a literal in its parent then all
other literals of Ci are in the tableau:
If Lij exists in the tableau and is not a leaf node created by an application of the closure
rule then either Lij is closed by a literal between the root and Lij or there is an extension
of Lij :
¬lij ∨ qijij eijkl (3.14)
k,l
If Lij is extended with Ck then Ck is in the tableau and some Lkl of Ck is closed by Lij :
If clause Cm is an extension of Lij and τ is an assignment of the mgu used to unify Aml
with Aij then τ is true:
If for two assignments x = s and x = t there does not exist a mgu θ such that sθ = tθ
then both assignments can not be true:
30
If x = s, x = t, σ = mgu(s, t) and y = r ∈ σ then y = r:
If Lij is used to close Lkl then their atoms must be unifiable by some unifier σ, hence
each assignment of σ is true:
If Lij has the same sign as Lkl or their respective atoms are not unifiable then Lij is not
¬oijkl where Lij and Lkl have the same sign or Aij and Akl are not unifiable (3.20)
If leaf Lij is closed by a literal between the root and Lkl and clause Ck is an extension
of Lmn then Lij is closed by some literal between the root and Lmn :
¬pii (3.24)
31
If Ci is the start clause then there are no extensions into any of the literals in Ci :
If Ci is the start clause and Lmn is a leaf which is closed by a literal between the root
node and Lij , then Lmn must be closed with Lij :
We provide three algorithms, each with subtle differences. The first algorithm HT E at-
tempts to find a rigid proof and takes as an argument a problem containing only Horn
clauses. The second, NHT E, also attempts to find a rigid proof and takes as an argument
a non-Horn problem. The last algorithm, NRT E, seeks to finds a non-rigid proof and takes
set of problem clauses. The number of copies required can be bounded by k n where n is
the number of clauses in F and k is the maximum number of literals in any clause in F . In
the case of the non-rigid algorithm, new instances of clauses in F which are standardized
apart are added to the problem clauses.
Each algorithm initially enters a while loop. While in the loop the set of clauses S,
which encode the closed rigid connection tableau, is given to an external SAT solver. The
SAT solver returns satisfiable or unsatisfiable and if the set of clauses is satisfiable, the SAT
solver returns a model M. If the SAT solver returns satisfiable we check if the assignments
32
which are assigned true in M are consistent. If not, we add additional clauses to S to resolve
these inconsistencies and call the SAT solver again. If the algorithm determines that S is
satisfiable and the assignments which are assigned true are consistent, the algorithm returns
is created which prevents the conflict. These clauses are added to the original set of clauses
generated by the algorithm which are again checked by the SAT solver.
In the following proofs we refer to the sets of clauses generated by HTE by the enumeration
given in Section 3.2.1.
Theorem 2 (Completeness) Let F be a set of first order logic Horn clauses. If F is rigidly
33
Algorithm 3: Rigid Algorithm For Non-Horn Problems (NHTE)
input : F , a multi-set of FO formula in conjunctive normal form
output: RIGIDLY UNSATISFIABLE
F := F ;
S := ∅;
while true do
Generate S, the encoding for F ;
result := SAT-Solver(S ∪ S );
if result == SATISFIABLE and the model M is consistent then
return RIGIDLY UNSATISFIABLE;
else if result == SATISFIABLE then
S := Unify-Substitution(M);
else
F = F ∪ F;
34
Proof Assume F is rigidly unsatisfiable and let S be the set of clauses for F generated by
HTE. As F is rigidly unsatisfiable then by Theorem 1 there exists a closed rigid connection
tableaux T . It also follows that the start node of T contains only negative literals. From T
we will construct a map from the variables in S to {true, false} so that S is satisfiable.
If Cm appears in the tableau set cm = true otherwise set cm = f alse. If Lmn is an
internal node in the tableau set lmn = true otherwise set lmn = f alse. If Cq is an extension
of Lmn set emnq = true otherwise set emnq = f alse. If τ is an assignment implied by the
unifiers used applications of the closure rule set uτ = true otherwise set uτ = f alse and
if there exists a path from Cm to Cq set pmq = true otherwise set pmq = f alse.
As T has a start node containing only negative literals, there exists a variable in Set 1
which is true. Thus Set 1 of S is satisfiable.
As T is a connection tableau and each extension of T closes the branch containing the
positive literal of a clause, and since each clause contains at most one positive literal, then
each negative literal in T is an internal node. Hence each variable representing a clause in
T is true iff its negative literal variables are also true. Thus Set 2 is satisfiable.
Since each negative literal in T must be extended it follows that each variable represent-
ing a negative literal in T is true iff the variable representing its extension is true. Therefore
Set 3 is satisfiable. Furthermore since each extension of T extends a literal to all the literals
in a clause, an extension variable is true iff the clause variable associated with the extension
is true. Thus Set 4 is satisfiable.
Since each extension in T unifies complementary literals, it follows that an extension
variable is true iff each of the variables representing the assignments in the unifier used in
the unification of the complementary literals are true. Hence Set 5 is satisfiable. It also
follows by the consistency of T that inconsistent assignments can not both be true, thus
for each pair of variables representing inconsistent assignments we have one is true iff the
other is false. Hence Set 6 is satisfiable. In addition if two assignments map the same
variable to unifiable terms s and t then the assignments used in the unification of s and t
35
must be true. Therefore Set 7 is satisfiable.
Now as there exists paths between literals and clauses via extensions in T , if a variable
representing an extension is true then the variable representing the path is true. Thus Set 8
is satisfiable. And since the paths in T have a transitive relation and no cycles exist in T ,
Sets 9 and 10 are satisfiable respectively.
Therefore since each of the sets of clauses in S are satisfiable, then the SAT solver
called in HTE returns a satisfiable model with consistent assignments, hence HTE returns
RIGIDLY UNSATISFIABLE.
unsatisfiable.
Our proof of soundness uses the satisfiability map produced by HTE to construct a
tableau for F .
Proof Suppose HTE on F returns RIGIDLY UNSATISFIABLE. Then there exists a set of
clauses S generated by HTE and a model M for which S is satisfiable. Furthermore the set
of assignment variables that are true in M correspond to a consistent set of assignments.
We construct a closed rigid connection tableau T for F using M and S as follows.
Since S is satisfiable the clause C = c1 ∨ ... ∨ cn in Set 1 of S, is satisfiable. Since C
is satisfiable at least one of the variables in C are assigned true. Let cm where m ∈ [1..n]
be a variable of C such that cm = true. We begin constructing T by setting Cm as the start
2
clause of T .
Now as cm = true and Set 2 is satisfiable, each of the variables corresponding to the
literals in Cm are true. Thus for each literal Lmn in Cm we create a node directly off the
root and label it Lmn .
Let Lmn be a literal in Cm . Now as lmn is true and Set 3 is satisfiable there exists some
variable emnqi which is true and as Set 4 is satisfiable emnqi = true implies cqi = true. We
2
It may be the case that more than one variable of C is assigned true. This corresponds to the fact that
there may be more than one closed rigid connection tableau for F .
36
therefore expand the node labeled Lmn in T with clause Cqi . We continue this process until
all literal, clause, and extention variables which are assigned true have been addressed. By
the satisfiability of Sets 2 − 4, T is closed.
Now let emnqi be a variable in M which is set to true. Since Set 5 is satisfiable, emnqi
implies that a set of assignments are true. We label the edge from Lmn to the positive
literal in Cqi with these assignments. Since each extension unifies adjacent complementary
literals and the assignments in M are consistent, T is connected and consistent.
The satisfiability of Sets 8 − 10 ensure that there are no cycles in T , hence T is a tree.
Here we provide the completeness theorem of NHTE which takes as input non-Horn prob-
lems. In the proofs, we refer to the sets of clauses generated by NHTE by the enumeration
given in Section 3.2.2.
Proof Assume F is rigidly unsatisfiable and let S be the set of clauses for F generated by
NHTE. By Theorem 1, as F is unsatisfiable, there exists a closed rigid connection tableaux
T for F . It also follows that the start node of T contains only negative literals. Let S be the
set of clauses generated by NHTE. Given T we will construct a map from the variables in
37
is an extension of Lmn and Lqj closes Lmn . Set uτ = T iff τ is a assignment implied by
substitutions used in the closure rules. Set oijkl = T iff Lkl is used to close Lij but not
during an application of the expansion rule. Set pmq = T iff there exists a path from a
literal in Cm to Cq . Set qmnij = T iff Lmn is a leaf and is closed by a literal between the
root node and Lij .
As T has a start node containing only negative literals, there exists a variable in Set 11
which is true, thus Set 11 of S is satisfiable. Since each of the literals in the start clause
are in T and are not closed by an application of the expansion rule then their respective
Since each extension in T adds a clause to T , Set 15 is satisfiable. Since each extension
in T unifies complementary literals, it follows that an extension variable is true iff each
of the variables representing the assignments in the unifier used in the unification of the
complementary literals are true. Hence Set 16 is satisfiable. It also follows by the consis-
tency of T that inconsistent assignments cannot both be true, thus for each pair of variables
representing inconsistent assignments, one is true iff the other is false. Hence Set 17 is
satisfiable. In addition if two assignments map the same variable to unifiable terms s and
t then the assignments used in the unification of s and t must be true. Therefore Set 18 is
satisfiable.
As each pair of literals which are used in a non-extension closure are complements,
if a variable representing the non-extension closure between two literals is true then the
38
variables representing the assignments implied by unification of their atoms are true. Hence
Set 19 is satisfiable. Since no two literals with have the same sign or which have atoms that
are not unifiable cannot be used in a non-extension closure, Set 20 is satisfiable.
Suppose Lij is a leaf and is closed by a literal between the root and Lkl . If the clause
containing Lkl is an extension of some node Lmn then either Lmn is a complement of Lij or
Lij is closed by a literal between the root node and Lmn . It follows that Set 21 is satisfiable.
Now as there exists paths between literals and clauses via extensions in T , if a variable
representing an extension is true then the variable representing the path is true. Thus Set
22 is satisfiable. And since the paths in T have a transitive relation and no cycles exist in
T , Sets 23 and 24 are satisfiable respectively.
As the start clause has no expansions into it, Set 25 is satisfiable. And since if a leaf,
say Lij in T is closed by a non-extension closure by a literal between the root and Lmn
of the start clause, since there are not literals between the root and the literals of the start
clause, then Lij must be closed by Lmn . Hence Set 26 is satisfiable.
Therefore as each of the sets of clauses in S are satisfiable, then the SAT solver called
in NHTE returns SATISFIABLE. It follows that as T is a tableau the assignments implied
by the closure rule are consistent. Hence, NHTE returns RIGIDLY UNSATISFIABLE.
3.2.6 ChewTPTP-SAT
We have implemented our tableau encoding method in a command line program written
in C++ called ChewTPTP-SAT. The default options assume the input file is in TPTP CNF
format [79]. By default the program assumes the input problem is non-Horn and uses the
non-Horn algorithm with one instance of the clauses in the input file. The user may specify
alternate settings by including the following flags. The flag -h indicates the problem is
Horn, -r specifies the user wishes the program to run one of the rigid algorithms, -i
39
allows the user to input the number of instances of the problem to use, and -p instructs the
program to print a proof. Other options are provided to control input and output.
The program initially parses the input file and constructs a data structure to hold the
clauses in memory. The program then constructs the sets of clauses defined in section
3.2.1 or section 3.2.2. While generating the clauses, a data structure is kept which maps
each variable to a unique integer. We use the integers to format the clauses in a MiniSat
[108] readable format. ChewTPTP-SAT then forks a process and invokes MiniSat on the
set of generated clauses and MiniSat determines the satisfiability of the set. When MiniSat
returns, we inspect the file output by MiniSat. If the file contains an indication of satisfi-
ability we check that the substitutions are unifiable and if so, we use the model provided
by MiniSat to construct a proof. If MiniSat returns back an indication of unsatisfiable, the
program returns SATISFIABLE in the rigid Horn case, and may add additional clauses and
Experimental Results
Preliminary results on 1365 Horn and non-Horn CNF problems without equality in the
TPTP Library show that 221 of them have rigid proofs requiring a single instance. We have
found that ChewTPTP-SAT was able to solve some problems which many theorem provers
could not within a 600 second time limit, e.g. the non-Horn problems ANA003-4.p and
ANA004-4.p. And although we have not tested the library extensively by adding additional
instances, ChewTPTP-SAT was successful solving non-rigid problems that others were
unable, e.g. ANA003-2 was proved with 2 instances in less than 5 seconds.
Below in Table 3.2.6 are some statistics on the problems mentioned above and a few
other problems. The first column identifies the name of the problem in the TPTP library
and the second column identifies whether or not the problem is Horn. The third column
identifies the number of instances that were required to prove the problem. The fourth col-
umn gives the number of seconds ChewTPTP-SAT took to generate the tableau encoding(s)
40
Table 3.1: Statistics on Selected Problems
Name Horn Instances Clause Gen MiniSat Clauses Variables
(sec) (sec)
ALG002-1 N 2 1.2 65.93 411020 13844
ANA003-2 Y 2 .1 4.88 183821 7238
ANA003-4 N 1 1.1 .06 34774 2616
ANA004-4 N 1 1.61 .3 44142 3160
COL121-2 Y 1 1.35 .16 47725 2322
GRP029-2 Y 1 .08 1.41 241272 7943
PUZ031-1 N 1 .24 .71 662145 14672
and the fifth column gives the total time (in seconds) that MiniSat ran on the problem. The
sixth and last columns give the number of clauses and variables respectively that were input
to MiniSat when MiniSat returned SATISFIABLE.
In our ChewTPTP-SAT implementation, some problems generate large encodings for
MiniSat to solve and Minisat usually solves them very quickly. The implementation shows
promise given that it can solve some problems quickly that many other theorem provers
cannot solve. Obviously it will perform best on problems that do not need many instances
of the clauses. From our results, it appears that more than 15% of the problems without
equality in the TPTP library are rigidly unsatisfiable, requiring only one instance of each
clause. Further investigation, however, needs to be done to identify which class of problems
our method does better on.
Implementation Status
ChewTPTP-SAT, in its current state, is a sound and complete theorem prover for first-
order logic without equality. Though we have been successful in finding problems that
ChewTPTP-SAT can solve and others theorem provers can not, the implementation is not a
overall competitive solution to the first-order validity problem. For example, when submit-
ted into the CASC-J4, the 2008 CADE Automated Theorem Proving System Competition,
[131], ChewTPTP-SAT solved 6 of 100 problems in the CNF division whereas Vampire,
the winner of the division, solved 93 problems.
41
We have however identified ways to improve our implementation which we believe will
make the implementation more competitive. One future modification to the implementation
deals with the manner in which the SAT solver is called. Currently, each time the encoding
is modified, the entire encoding is sent to a new instance of the SAT solver. This results in
the SAT solver searching the same space repeatedly. To avoid this repeated work, a single
instance of the SAT solver can be kept in memory while the encoding is augmented. New
clauses added to the encoding can be sent to the SAT solver and when all new clauses
are added, a new satisfiability query can be made. Eliminating restarts should provide a
Our second method to determine the rigid unsatisfiability of F generates a set S of proposi-
tional logic clauses modulo the theories of unification and arithmetic for F which encodes
a rigid closed connection tableau for F and tests the satisfiability of S with a SMT solver.
We provide two encodings, the first for problems containing only Horn clauses and the
second for those containing non-Horn clauses. Given F we enumerate each of the clauses
in F and each of the literals in each clause. We denote clause i by Ci and denote the j th
literal in clause i by Lij . We denote Aij to be the atom of Lij . Therefore Lij is either of the
form Aij or ¬Aij .
We define a set of propositional variables cm , lmn , emnq , disjoint from the symbols in
F , as follows: Define cm = T iff Cm appears in the tableau. Define lmn = T iff Lmn is
an internal node in the tableau. Define emnq = T iff Cq is an extension of Lmn . For each
pair of clauses Ci and Cj we define xi < xj = T (where xi and xj do not exist in F )
42
iff there exists a path from Ci to Cj . For each pair of atoms Ai and Aj in F , we define
(Ai = Aj ) = T iff Ai and Aj are the two atoms involved in an application of the closure
rule.
Below we list the set of clauses that we generate and provide their meaning.
At least one clause containing only negative literals appears in the tableau:
cm (3.27)
Cm is a negative clause
If Cm appears in the tableau and Lmn is a negative literal then Lmn is an internal node
in the tableau:
cm ⇒ lmn (3.28)
If Lmn is an internal node in the tableau then for some qj , Cqj is an extension of Lmn :
where {Cq1 ...Cqk } represent the set of all clauses whose positive literals are unifiable
with Lmn
If Cq is an extension of Lmn then Cq exists in the tableau:
emnq ⇒ cq (3.30)
If Cq is an extension of Lmn and Lqr is the positive literal in Cq then Amn and Aqr are
unifiable:
43
emnq ⇒ (xm < xq ) (3.32)
The encoding is satisfiable if and only if the original set of first order Horn clauses
is rigidly unsatisfiable. We encode non-rigid unsatisfiability by continually adding new
instances of each clause, renamed apart.
For non-Horn problems we use a different set of variables and generate a different set of
clauses.
We define the variables, disjoint from the symbols in F, sm , cmn , lmn , emnqj , oijkl and
qmnij as follows: Define sm = T iff Cm is the start clause. Define cmn = T iff Cm
appears in the tableau and Lmn is complementary to its parent. Define lmn = T iff Lmn is
a node in the tableau and is not a leaf node created by an application of the extension rule.
Define emnqj = T iff Cq is an extension of Lmn and Lqj is the complement of Lmn . Define
oijkl = T iff Lij and Lkl are a pair of literals used in a closure but not by the extension rule.
If a path to a node N contains the complement of N, then we say that the path is closed.
Define qmnij = T iff Lmn is a leaf and Lij is a node on a path from the root node to Lmn
and every path from the root to Lij contains a complement of Lmn . For each pair of clauses
Ci and Cj we define xi < xj = T (where xi and xj do not exist in F ) iff there exists a path
from Ci to Cj . For each pair of atoms Ai and Aj in F , we define (Ai = Aj ) = T iff Ai and
sm (3.33)
sm is a negative clause
If Cm is the start clause in the tableau then each literal Lmn of Cm is in the tableau:
44
sm ⇒ lmn (3.34)
If Ci appears in the tableau and Lij is the complement of a literal in its parent then all
other literals of Ci are in the tableau:
If Lij exists in the tableau and is not a leaf node created by an application of the closure
rule then either every branch ending at Lij is closed or there is an extension of Lij :
lij ⇒ (qijij ∨ ( eijkl )) (3.36)
k,l
If Lij is extended with Ck then Ck is in the tableau and some Lkl of Ck is the comple-
ment of Lij :
If clause Cm is an extension of Lij and literals Lij and Lml are complements then Aij
and Aml are unifiable.
If Lij and Lkl are a pair used in a closure then they must be unifiable:
If Lij has the same sign as Lkl or their respective atoms are not unifiable then they are
not complements:
45
¬oijkl where Lij and Lkl are not unifiable (3.40)
If every path through Lkl to leaf Lij is closed and Ck is an extension of Lmn then either
Lij is a complement of Lmn or every path through Lmn to Lij is closed:
If Ci is the start clause then there are no inferences into any of the literals in Ci :
si ⇒ ¬eklij (3.43)
If Ci is the start clause, Lmn is a leaf, and all paths that traverse Lij to Lmn are closed,
then Lij and Lmn are complementary:
We represent our tableau as a DAG, so there is some structure sharing. But even with
the structure sharing, a non-Horn clause tableau may need more than one instance of the
3.3.3 ChewTPTP-SMT
We have implemented our tableau encoding in our theorem prover ChewTPTP-SMT, which
is an extension of ChewTPTP-SAT[126]. In ChewTPTP-SAT, instead of using theories, we
46
encoded the consistency of the unifiers and the acyclicity of the tableau with additional
propositional clauses. To encode the consistency of the unifiers, we encoded the equations
that would be created if a unification algorithm was run. We do not know ahead of time
which unifiers we will have to create, so we encode everything that can possibly occur
when the unification algorithm is run. To encode the absence of a cycle, we encode the
existence of a path from one clause to another and the fact that there is no path from a
clause to itself. This requires encoding all possible transitivity and irreflexivity axioms that
may occur.
Experimental Results
We tested our prover in all three settings on a subset of TPTP [79] problems. Tables 1-4
provide empirical data from these tests. SMT-Y denotes our prover run in SMT mode, SAT-
Y is SAT mode using Yices, and SAT-M is SAT mode using Minisat. For Horn clauses, we
ran ChewTPTP on all the Horn problems in the TPTP database, but for non-Horn we only
had time to run it through the GRP problems. We report all problems that both provers
solved within five minutes but SAT-M took greater than one second. We believe the prob-
lems in these tables are representative of the overall results. Columns in the table show the
running time of each method, the clause generation time rounded off to the nearest second,
the number of clauses generated, and the number of variables generated for each method.
We also show whether or not the problem is rigidly satisfiable. For these experiments, we
only tested rigid satisfiability with one instance of each clause.
47
ChewTPTP. In the Horn case the running time was reduced significantly, except for a small
percentage of exceptions. In the non-Horn case, working modulo theories often increased
the running time. Generally, Yices was faster than Minisat on SAT problems without theo-
ries.
We believe we have an explanation for our results. In the Horn problems the number
of clauses is reduced by an order of magnitude, whereas in the non-Horn problems the
number of clauses is not reduced by much. This implies that working modulo theories is
only useful when the clauses size is reduced significantly.
In the Horn encoding, everything can be encoded in O(n2 ) except for the encoding
of unification and acyclicity, which require O(n3 ) space. When we remove the clauses
used to represent unification and acyclicity, the number of clauses is now O(n2 ). However,
for the encoding of non-Horn clauses, we must encode the fact of a leaf node having a
is useful to encode properties using theories. We conjecture that if the number of clause
can be reduced by a factor of n, then the coding is useful, but if the asymptotic complexity
remains the same, then it is not a good idea.
Implementation Status
The current version of ChewTPTP-SMT is a sound and complete first-order logic theo-
rem prover. Though ChewTPTP-SMT runs well on Horn problems, it is not currently a
competitive solution to the first-order validity problem compared to other theorem provers.
Similar to ChewTPTP-SAT, eliminating restarts will provide a significant improvement to
the performance ChewTPTP-SMT. Additional improvement in performance would be seen
if the non-Horn encoding can be reduced by a factor of n, however at this point, we have
48
Table 3.2: ChewTPTP Times For Horn Problems
SAT-M/Y SMT-Y SAT-M SAT-Y SMT-Y
Name Clause Gen Clause Gen Total Total Total
PUZ008-1.p 1 0 1.06 0.89 0.11
NLP106-1.p 2 0 1.8 1.9 0.06
NLP104-1.p 2 0 1.82 1.9 0.05
NLP105-1.p 2 0 1.83 1.89 0.06
NLP107-1.p 2 0 2.47 1.99 0.06
GRP033-3.p 1 0 2.48 1.8 0.28
NLP109-1.p 1 0 2.49 1.99 0.05
NLP113-1.p 2 0 2.51 2.01 0.06
NLP110-1.p 2 0 2.74 1.84 0.07
NLP112-1.p 2 0 2.92 1.92 0.07
NLP111-1.p 1 0 2.94 1.93 0.06
NLP108-1.p 2 0 2.94 1.94 0.07
PUZ036-1.005.p 3 0 4.33 2.92 0.03
RNG037-2.p 4 0 5.33 5.35 6.2
RNG038-2.p 4 0 5.34 3.89 19.94
RNG001-5.p 4 0 6.93 5.32 0.84
SWV015-1.p 9 0 9.64 10.08 0.08
SWV017-1.p 11 0 10.82 11.27 0.1
RNG006-2.p 7 0 11.19 7.53 6.03
49
Table 3.3: ChewTPTP Clause and Variable Count For Horn Problems
SAT-M/Y SMT-Y SAT-M/Y SMT-Y Result
Name Cls Ct Cls Ct Var Ct Var Ct
PUZ008-1.p 52957 323 207608 216 sat
NLP106-1.p 130174 338 513774 392 unsat
NLP104-1.p 130724 344 515712 398 unsat
NLP105-1.p 130724 344 515712 398 unsat
NLP107-1.p 137380 315 542996 370 unsat
GRP033-3.p 115013 737 445065 383 sat
NLP109-1.p 137380 315 542996 370 unsat
NLP113-1.p 137897 319 544836 374 unsat
NLP110-1.p 128150 296 506951 350 unsat
NLP112-1.p 135667 287 537099 342 unsat
NLP111-1.p 135667 287 537099 342 unsat
NLP108-1.p 135667 287 537099 342 unsat
PUZ036-1.005.p 185292 45 729464 91 unsat
RNG037-2.p 221760 1524 876393 714 sat
RNG038-2.p 230063 1522 910786 718 sat
RNG001-5.p 258888 1527 1026821 725 sat
SWV015-1.p 559284 1047 2105121 532 unsat
SWV017-1.p 625119 1137 2354882 578 unsat
RNG006-2.p 432194 2058 1702459 925 sat
50
Table 3.4: ChewTPTP Times For Non-Horn Problems
SAT-M/Y SMT-Y SAT-M SAT-Y SMT-Y
Name Clause Gen Clause Gen Total Total Total
ANA025-2.p 1 0 1.02 1.04 2.43
COL121-2.p 0 1 1.02 0.92 1.41
ANA004-4.p 1 0 1.33 1.87 2.77
GRA001-1.p 2 2 1.92 1.74 4.08
ANA029-2.p 2 2 2.05 2.08 4.68
ANA005-2.p 2 1 2.38 2.31 4.72
ANA004-2.p 2 1 2.39 2.3 5.06
ANA003-2.p 3 1 2.96 2.81 5.53
GRP123-1.003.p 3 2 3.41 3.76 18.11
ANA001-1.p 4 2 4 3.84 7.94
GRP123-2.003.p 4 3 5.55 5.37 17.66
ANA002-2.p 5 3 5.73 5.34 10.56
ANA002-1.p 5 3 6.17 5.67 11.84
GRP124-2.004.p 9 6 10.51 11.4 43.91
GRP033-3.p 15 6 20.11 15.69 23.18
GRP123-3.003.p 28 20 30.63 30.73 80.84
ALG002-1.p 1 1 43.51 64.92 75.33
ANA004-5.p 2 1 47.25 21.5 83.54
GRP124-3.004.p 46 31 88.23 83.83 171
COM003-2.p 82 49 88.72 84.54 168.1
51
Table 3.5: ChewTPTP Clause and Variable Count For Non-Horn Problems
SAT-M/Y SMT-Y SAT-M/Y SMT-Y Result
Name Cls Ct Cls Ct Var Ct Var Ct
ANA025-2.p 41129 36020 2655 2286 sat
COL121-2.p 47725 20335 2322 1538 sat
ANA004-4.p 44142 36844 3160 2631 sat
GRA001-1.p 64222 60849 3292 3161 sat
ANA029-2.p 79860 66884 4107 3388 sat
ANA005-2.p 93806 68206 4907 3802 unsat
ANA004-2.p 93806 68206 4907 3802 unsat
ANA003-2.p 114945 78930 5654 4243 unsat
GRP123-1.003.p 111866 94335 4589 3596 unsat
ANA001-1.p 154246 113596 6680 5185 unsat
GRP123-2.003.p 180783 154243 6723 5450 unsat
ANA002-2.p 226149 151313 7457 5436 unsat
ANA002-1.p 229871 151313 7544 5437 unsat
GRP124-2.004.p 339070 283967 10854 8953 unsat
GRP033-3.p 699160 301901 15989 8961 sat
GRP123-3.003.p 1003831 934044 17763 15377 unsat
ALG002-1.p 54559 32731 3524 2460 unsat
ANA004-5.p 101166 44953 4981 3196 unsat
GRP124-3.004.p 1596801 1468732 25314 21981 unsat
COM003-2.p 2920669 2365922 46818 36051 sat
52
yet to develop such an encoding.
3.4 Conclusion
We introduced in [126] and [128] two novel approaches to first-order theorem proving
and the acyclicity of the tableau. In SAT, it was necessary to add cubically many clauses
to encode the solving of unification. In addition, it was necessary to add cubically many
clauses to encode the acyclicity of the tableau. However, when encoding this information in
SMT, there was no need to encode the solving of unification, since this was accomplished
directly with the Yices recursive datatype theory. The number of unification clauses was
reduced from a cubic to a quadratic number. Similarly for acyclicity of tableau, we did
not need to encode the transitivity and irreflexivity of the path relation. We only needed to
express edges in the tableau as inequalities. The number of clauses to represent acyclicity
also dropped from a cubic number to a quadratic number.
In the SMT Horn encoding, all the other information in the tableau can also be encoded
with a quadratic number of clauses. Therefore the entire encoding of the existence of a
tableau dropped from a cubic number of clauses in SAT to a quadratic number in SMT. This
drastically reduced the number of clauses, and simultaneously decreased the time needed
to decide the satisfiability of the clauses. There was only a small reduction in number of
clauses for non-Horn clauses, because we still need to encode the fact that all paths in the
tableau can be closed. Therefore the entire encoding is still cubic, and the running time was
actually worse. We conjecture a rule of thumb saying that it is worthwhile to use theories
if the number of clauses is reduced by a factor of n, but not worthwhile if the asymptotic
53
number remains the same.
For future work includes looking at ways to be able to use SMT to further reduce
the representation for non-Horn clauses, ideally cutting it down to a quadratic number of
clauses. It would be possible to define a theory to do this directly, but we have not yet
figured out how to do it with the existing theories in Yices. In addition, in order to prove
the general first order problem we also need to find a good heuristic to decide exactly
which clauses should be copied. We would like a method to decide satisfiability from rigid
satisfiability. It would be useful to have an encoding of rigid clauses modulo a non-rigid
theory, as discussed in [122]. This way, we could immediately identify some clauses as
non-rigid, and work modulo those clauses.
Though our implementations are not yet state-of-the-art, our work shows the usefulness
of SAT and SMT to theorem proving in first order logic. We suspect there are other logics
54
Chapter 4
Although resolution methods appear to be more efficient in practice, there are some classes
of problems that are suited better for instantiation-based methods. In [136] we show that we
can combine both instance generation and resolution into a single inference system while
retaining completeness with the aim of getting the best of both methods. We define the
inference system named SIG − Res that combines semantic selection instance generation
(SInst-Gen) with ordered resolution.
Each clause in the given set of clauses is determined, by some heuristic, to be an in-
stantiation clause and placed in the set P or a resolution clause and placed in the set R or
placed in both P and R. Clauses from P are given to a SAT solver and inferences among
them are treated as in SInst-Gen, while any inference which involves a clause in R is a
resolution inference.
Our combination of instance generation and resolution differs from the method used in
the instantiation-based theorem prover iProver [133] which uses resolution inferences to
simplify clauses, i.e. if a conclusion of a resolution inference strictly subsumes one if its
premise then the conclusion is added to the set of clauses sent to the SAT solver and the
55
subsumed premise is removed. Our inference system also allows for the use of resolution
for the simplification of the clauses in P , but differs from iProver in that it restricts certain
clauses, the clauses in R, from any instance generation inference.
Our idea is similar to the idea of Satisfiability Modulo Theories (SMT), where clauses
in P represent data, and the clauses in R represent a theory. This is similar to the SMELS
inference system [132] and the DPLL(Γ + T ) inference system [141]. The difference be-
tween those inference systems and ours is that in those inference systems, P must only
contain ground clauses, and the theory is all the nonground clauses, whereas in our case we
4.1 Preliminaries
⊥ is used to denote a distinguished constant called the Jeroslow constant and the substi-
tution which maps all variables to ⊥. If L is a literal then L⊥ denotes the ground literal
A binary relation on a set A is a set of ordered pairs of elements from A. A binary relation
56
infinite descending chain of elements. An ordering is total on S if for every distinct pair
of elements x and y in S it holds that x y or y x. An ordering is stable under
substitution if for any substitution σ and for all x and y in S it holds that if x y then
xσ yσ.
Given a signature Σ we say that is compatible with Σ-operations if s s im-
plies f (t1 , ·, ti−1, s, ti+1 , ·, tn ) f (t1 , ·, ti−1 , s , ti+1 , ·, tn ) for all f /n ∈ Σ, for all terms
s, s , t1 , ·, tn ∈ TΣ∪X and for all coefficients i ∈ N, 1 ≤ i ≤ n.
We say that has the subterm property if s s whenever s is a proper subterm of
substitution and total on ground terms. We extend > to atoms in such a way so that for any
atom A we have ¬A > A. The ordering > is extended to clauses by considering a clause
as a multiset of literals.
Given a clause C, a literal L ∈ C is maximal in C if there is no K ∈ C such that K >
L. We define a mapping, max from clauses to multisets of literals such that max(C) =
{L|L is maximal in C}.
If X is a set of variables, Σ is a signature, and > is a strict partial ordering on Σ a
(regular) symbol weight assignment is a function λ : Σ ∪ X → N. Furthermore, we say λ
is admissible for > if and only if
ordering on the symbols in the signature and a weight function on terms. We give below
the version found in [127].
57
We extend a weight assignment λ to a function wλ : TΣ∪X → N on terms as follows:
• For x ∈ X:
wλ (x) = λ(x)
n
wλ (f (t1 , · · · , tn )) = λ(f ) + wλ (ti )
i=1
or
(KBO2) ∀x ∈ X : |s|x ≥ |t|x , w(s) > w(t) and one of the following cases holds:
58
4.1.3 Interpretations
4.1.4 Closures
Resolution
The main idea behind all saturation-based instance generation methods is to augment a set
of clauses with sufficiently many proper instances so that the satisfiability of the set can be
determined by a SAT solver. Additional instances are generated using some form of the
Inst-Gen [106] inference rule. An instance generation with semantic selection inference
system (SInst-Gen) (See Figure 4.1) uses a selection function and the notion of conflicts
to determine exactly which clauses are to be used as premises in the instance generation
inferences.
Let P be a set of first order clauses and view P ⊥ as a set of propositional clauses.
Under this setting, if P ⊥ is unsatisfiable, then P is unsatisfiable and our work is done.
59
Otherwise a model for P ⊥ is denoted as I⊥ and we define a selection function, sel(C, I⊥ ),
which maps each clause C ∈ P to a singleton set {L} such that L ∈ C and L⊥ is true in
I⊥ .
where
The ordered resolution and factoring inference rules are well known in the literature.
For completeness they are given in Figure 4.2. The strength of ordered resolution is in
its ability to reduce the search space by requiring only inferences between clauses which
isfiable. As is the case with SInst-Gen, ordered resolution with factoring is refutationally
complete, but for some satisfiable problems may not halt.
60
L∨Γ K ∨ Δ (Ordered Resolution)
(Γ ∨ Δ)σ
where
L ∨ K ∨ Δ (Factoring)
(L ∨ Δ)σ
where σ = mgu(L, K)
4.3 SIG-Res
The inferences in SIG-Res are variations of SInst-Gen, ordered resolution and factoring
(see Figure 4.3). SIG-Res is an inference system that establishes two sets of clauses. Given
a problem in CNF, S, which we wish to prove satisfiable or unsatisfiable, we create two sets
of clauses, P ⊆ S and R ⊆ S, not necessarily disjoint, such that P ∪ R = S. Given some
clause C ∈ S, C is designated as either a clause in P , a clause in R, or both, according to
the spectrum, the distribution heuristic can distribute all the clauses to R, leaving P empty,
making the system a resolution system. This flexibility allows any number of heuristics to
be used and heuristics to be tailored to specific classes of problems. An open question is
which heuristics perform best and for which classes of problems. In Section 4.5 we describe
one general heuristic, GSM, which we have incorporated into our implementation.
The selection function, sel(C, I⊥ ), where C ∈ P ∪ R and I⊥ is a model for P ⊥, is
defined as follows. For clarity, we note that sel(C, I⊥ ) returns a singleton set if C ∈ P and
61
a non-empty set if C ∈ R.
⎧
⎪
⎪
⎨{L} for some L ∈ C such that L⊥ ∈ I⊥ if C ∈ P
sel(C, I⊥ ) =
⎪
⎪
⎩max(C) if C ∈ R
We will have the usual redundancy notions for saturation inference systems. We can
define deletion rules to say that a clause can be deleted if it is implied by zero or more
smaller clauses. For example, tautologies can be deleted. The clause ordering, as we will
define it in the next section, will restrict what subsumptions can be done. In particular, if
a clause C is in R, we say that C is subsumed by a clause D if there exists a substitution
σ such that Dσ is a subset of C. If C is a clause in P , we say that C is subsumed by D if
equal to the counter. An alternative method is to perform all possible inferences with the
exception that we restrict conclusions generated during each iteration from being consid-
ered as premises until the next iteration. We have implemented IG-Res in a theorem prover
called Spectrum. Our implementation uses the latter method and follows Algorithm 5.
4.4 Completeness
i. Cσ < Dτ or
ii. Cσ = Dτ and C = Dρ where ρ is a proper instantiator of D
62
L∨Γ K∨Δ (SInst-Gen)
(L ∨ Γ)σ (K ∨ Δ)σ
where
1. L ∨ Γ ∈ P and K ∨ Δ ∈ P
2. L ∈ sel(L ∨ Γ, I⊥ ) and K ∈ sel(K ∨ Δ, I⊥ )
3. σ = mgu(L, K)
4. (L ∨ Γ)σ ∈ P and (K ∨ Δ)σ ∈ P
where
1. L ∨ Γ ∈ R or K ∨ Δ ∈ R
2. L ∈ sel(L ∨ Γ, I⊥ ) and K ∈ sel(K ∨ Δ, I⊥ )
3. σ = mgu(L, K)
4. (Γ ∨ Δ)σ ∈ P if L ∨ Γ ∈ / R or K ∨ Δ ∈
/R
L ∨ K ∨ Δ (Factoring)
(L ∨ Δ)σ
where
1. σ = mgu(L, K)
2. (L ∨ Δ)σ ∈ P if L ∨ K ∨ Δ ∈
/R
63
We denote by ≺S any (subsumption) closure ordering with the following property: for
any closures C · σ and D · τ , C · σ ≺S D · τ if
i. Cσ < Dτ or
clauses divided into sets Pi and Ri , Ii is a model of Pi ⊥, seli is a selection function based
on the model Ii , and Si+1 results from applying an inference rule or deletion rule on Si . The
sequence has as its limit the set of persistent clauses S∞ = i≥0 j≥i Sj . By definition of
redundancy, if a clause is redundant in some Si it is redundant in S∞ .
tuples (Sj , Ij , selj ) where Ij makes A false. If D∞ = (S0 , I0 , sel0 ), · · · , (Si , Ii , seli ), · · · ,
then we define I∞ = j≥0 Ij .
64
S∞ is called saturated if the conclusion of every inference of (S∞ , I∞ , sel∞ ) is in S∞
or is redundant in S∞ . A derivation is fair if no inference is ignored forever, i.e. the
conclusion of every inference among persistent clauses is persistent or redundant in S∞ . A
Otherwise C = ∅.
65
Theorem 7 Let S = P ∪ R be a multiset of clauses saturated under SIG-Res. S is satisfi-
able if P ⊥ is satisfied by I⊥ and S does not contain the empty clause.
ground instances of S.
Suppose on the contrary that IS is not a model for the set of ground instances of S. Let
C = C · σ be the minimal ground closure of S that is false or undefined in IS .
As P ⊥ is satisfied by I⊥ , it follows that the set of ground instances of P is satisfiable
66
Since B σ = (D ∨ C )σ is true in IS and C σ is not true in IS then D σ must be
satisfied in IS . Now as D σ is not true in I D , then it follows that D ≺ B. Therefore
(D ∨ K)σ is smaller than (D ∨ C )σ. Hence Kσ = Lσ is smaller than C σ, which is a
contradiction as Lσ ∈ max(C).
Case 2: Suppose now that D ∈ R. Since D = {Kσ}, Kσ ∈ max(D). Therefore
D σ is smaller than Kσ. Hence B is strictly smaller than C. And as D σ is not true in IS
and C σ is not true in IS we have B σ is not true in IS . If B is in S, this contradicts the
minimality of C.
Since SIG-Res is refutationally complete and the inferences are sound, it should be
clear that it is only necessary that at some point in time we insert the conclusions of SIG-
Res inferences into the appropriate set as defined by the inference rules. Prior to that time,
without affecting completeness, we can insert conclusions from inferences into P or R with
disregard to the algorithm if by doing so we can find a solution quicker.
4.5 Spectrum
We have implemented SIG-Res in a theorem prover for first order logic called Spectrum.
The name comes from the fact that given a set of clauses, our choices to construct the sets
P and R are among a spectrum.
Spectrum is written in C++, has a built-in parser for CNF problems in the TPTP format
[79] and outputs results in accordance to the SZS ontology [79]. It takes as arguments
a filename and mode and outputs satisfiable or unsatisfiable. The modes determine how
the clauses will be distributed to the sets P and R. There are a number of distribution
modes which Spectrum can be run in. When running Spectrum with the -p flag, Spectrum
67
places all clauses in P , hence makes Spectrum run essentially as an instantiation-based
theorem prover. The flag -r makes Spectrum run essentially as a resolution theorem prover
by placing all the clauses in R. Running spectrum without a mode flag runs our default
When the program begins, the program distributes the clauses to the two sets P and
R in accordance with the distribution mode and if a clause is inserted in R, its maximal
literals are identified. After distributing the clauses, Spectrum follows Algorithm 5.
As we begin the instance generation phase on the set P , Yices [147] is used to check the
literals that we use for determining if conflicts exist. If a conflict exists we instantiate the
new clauses and check to see if the new clauses already exist in P . If not, we add them
to P . To ensure that we do not run the instance generation phase forever we do not al-
low conclusions to SInst-Gen inferences to be premises until after the next call to the SAT
solver.
Following the instance generation phase, we check for resolution inferences. We first
resolve all unchecked pairs of clauses where both clauses are in R, and then for the
unchecked pairs where one clause is in P and the other is in R. To ensure fairness, we
exclude from being premises SInst-Gen conclusions that were added during the previous
instantiation phase and conclusions from resolution and factoring inferences that are added
in the current iteration. If an inference is made, we check to see if it is the empty clause.
68
If so, Spectrum reports unsatisfiable and halts. Otherwise, if one of the premises is in P
we perform the simple redundancy check as stated above and when appropriate add the
conclusion to P . If, on the other hand, both premises are in R we check for factors. If a
Experimental Results
We have tested Spectrum on 450 unsatisfiable problems rated easy in the TPTP library.
These problems, in general, are not challenging for state of the art theorem provers, but
allow us to compare the different modes of our implementation and give us simple proofs
to analyze. Of the 450 problems we tested, Spectrum run in GSM mode for 300 seconds
solved 192 problems 2 . Of these 192 problems when given the same time limit, 18 could
not be solved by Spectrum run in -p mode where the problem is solved using only instance
generation or in -r mode where only resolution inferences are allowed. Interestingly, 16
of these are in the LCL class of problems, the class of Propositional Logic Calculi. Many
of these problems contain the axioms of propositional logic which have clauses that are
similar to the transitivity property. These can produce a large number of clauses under
resolution. These clauses, when run under our heuristic, are put in P to avoid this condition.
Also present are clauses which we call growing clauses because their tendency to produce
larger and larger clauses. These growing clauses, e.g. ¬P (x) ∨ P (f (x)), contain pairs of
complementary literals where each argument in the first is a subterm of the second and there
exists at least one argument that is a proper subterm. Growing clauses, under our heuristic,
since they have only a single maximal literal, are put in R which avoids this problem.
2
These results reflect that our implementation is not yet competitive and lacks some key processes such
as robust redundancy deletion.
69
Algorithm 5: Spectrum(P,R)
input : Sets P and R containing FO formula in conjunctive normal form
output: SATISFIABLE or UNSATISFIABLE
while true do
NP := ∅;
NR := ∅;
Run SAT onP ⊥ ;
if P ⊥ is unsatisfiable then return UNSATISFIABLE;
for C1 , C2 ∈ P do
if conflict(C1 , C2 ) = true then
NP := NP ∪ (SInst-Gen(C1 , C2 ) \ P ) ;
for C1 ∈ P, C2 ∈ R do
D := Resolution(C1 , C2 );
if ⊥ ∈ D then
return UNSATISFIABLE;
else if D = ∅ then
NP := NP ∪ (D \ P );
for C1 , C2 ∈ R do
D := Resolution(C1 , C2 ) ;
if ⊥ ∈ D then
return UNSATISFIABLE;
else if D = ∅ then
for C ∈ D do
F := Factor(C) ;
for B ∈ F do
T := distribute(B) ;
NT := NT ∪ ({B} \ T );
if NP = ∅ and NR = ∅ then
return SATISFIABLE;
else
P := P ∪ NP ;
R := R ∪ NR ;
70
There are several examples from LCL, and also the GRP problem we illustrate below,
where our heuristic performs better than solely using instance generation or resolution.
The GRP problem is an example that contains clauses that can cause infinite growth, so it
is not good for systems implementing only instance generation. While at the same time, it
contains clauses similar to Transitivity where ordered resolution is explosive. We believe
these examples show the use of our technique and the potential for further research into this
area.
Example Proof
One problem in the TPTP library that illustrates another benefit of SIG-Res with the GSM
heuristic is problem GRP006-1. Spectrum using our heuristic solved this problem in less
than 1 second, but did not find a solution using instantiation or resolution alone. The initial
distribution of clauses and an SIG-Res proof are given in Figure 4.4. As can be seen, by
placing the clauses with more than one maximal literal, specifically clauses 3 and 4, in P
we avoid many resolution inferences that are not necessary for the proof. We also avoid
generating many SInst-Gen inferences by placing clause 4 in P and clause 6 in R.
Before determining the problem unsatisfiable, Spectrum makes 3 passes through the
while loop generating 32 clauses. During the initial iteration, no conflicts are found and
resolution and factoring inferences produce a total of 9 new clauses. During the second
iteration, 2 conflicts produce 2 new clauses and resolution and factoring produce 21 new
clauses. During the third iteration, Yices returns back unsatisfiable as clause 2, 13 and
14 are inconsistent. This example shows that the clauses in a problem may have different
properties and that by controlling the types of inferences that are applied to the clauses we
may eliminates unnecessary inferences and may produce a solution sooner than if using
resolution or instantiation inferences alone.
71
Clauses in P
1. ¬E(inv(a))
2. E(a)
3. ¬P (x, y, z) ∨ ¬P (y, w, v) ∨ ¬P (x, v, t) ∨ P (z, w, t)
4. ¬P (x, y, z) ∨ ¬P (y, w, v) ∨ ¬P (z, w, t) ∨ P (x, v, t)
Clauses in R
5. ¬E(x) ∨ ¬E(y) ∨ ¬P (x, inv(y), z) ∨ E(z)
6. P (inv(x), x, id)
7. P (x, inv(x), id)
8. P (x, id, x)
9. P (id, x, x)
72
Implementation Status
Spectrum is a sound and complete first-order theorem prover but is not competitive com-
pared to state-of-the-art theorem provers. Much of the disparity is due to the lack of ma-
turity of Spectrum. Many theorem provers that rank high in competition, for example
Vampire [102] and E [101], are more than a decade old thus have been fine tuned over
the years. One major deficiency of Spectrum is the lack of redundancy elimination. Cur-
rently Spectrum only removes tautologies and performs forward subsumption checking on
4.6 Conclusion
SIG-Res is a sound and complete inference system that combines SInstGen with reso-
lution. Here, given a set of clauses, S, we distribute the clauses into to sets, P and R,
and in a fair way run SInstGen on P and run resolution on pairs of clauses in S so long
as one premise is in R. Factoring is applied to conclusions of resolution inferences. We
provide a heuristic, called Ground/Single Max, for distributing the clauses of S into P
and R. We also provide soundness and completeness proofs and discuss our implemen-
tation named Spectrum. Initial results identify a class of problems, LCL, that SIGRes
outperforms SInstGen and resolution alone. We identify a few ways in which Spectrum’s
competitiveness can be increased.
The Completeness Proof for SIG-Res relies on ordered resolution. It may be interesting
to determine if the completeness proof for SIG-Res can be extended to ordered resolution
with selection and if so, how it affects the implementation’s performance. Another area that
might be worthy of investigating is determining for which classes of problems is SIG-Res
a decision procedure and for those classes, what is the complexity?
73
Chapter 5
The Γ + Λ Framework
Due to the common occurrence of equality in formulas of interest to the automated the-
orem proving community, as early as 1969, with Robinson and Wos’ introduction of the
paramodulation inference system [41], many have investigated calculi for first-order pred-
icate calculus with equality. The combination of paramodulation and Knuth and Bendix’s
completion [54] was a significant step forward which led to the superposition1 calculus [64]
which restricts the search space for paramodulation by only requiring inferences that meet
additional ordering constraints.
Recently, with provers like Vampire [102] and E [101] demonstrating the strengths
Λ, which allows the combination of different pairs of sound and refutationally complete
calculi. We require that the two inference systems, Γ and Λ, have certain properties. First,
both calculi must be sound and refutationally complete. Second, Γ must be productive.
1
Throughout this chapter when we refer to superposition we mean the sound and complete inference
system that includes equational factoring and equational resolution
2
Here, SMT can be combined with any inference system with the reduction property for counterexamples
74
Informally3, an inference system, Γ, is productive if when Γ is used to saturate a set of
clauses, say P , we can incrementally construct a set of clauses, a candidate set, that at the
limit can be used to produce a model for P when P is satisfiable. Third, Λ must have
the lifting and total-saturation property4. Lifting is defined in the standard way [87] and
total-saturation, informally, ensures that, all potential inferences are made.
In our method we first separate the input clauses into two sets 5 . The idea is, given a
set of input clauses, to pre-process the clauses to determine which inference systems the
clauses are best suited for. We choose two sound and refutationally complete inference
systems, Γ and Λ, requiring Γ to be productive and Λ to have the lifting and total-saturation
properties, and construct two sets of clauses, P0 and R0 , by including in P0 the clauses best
suited for Γ and including in R0 those best suited for Λ.
We then, initialize two sets P = P0 and R = R0 and in a fair way, apply Γ to P and
apply Λ to M ∪ R where M is a candidate set for P . Conclusions to Γ rules and Λ rules are
added to P and R respectively. Unsatisfiable cores in R are learned from, with new clauses
added to P . If at any point, P is determined to be unsatisfiable, the set of input clauses is
deemed unsatisfiable. Satisfiability of P0 ∪ R0 is witnessed by the satisfiability of P under
Hence our completeness proof can be seen as a generalization of the completeness proof
they provide.
Below we provide preliminary definitions in Section 5.1, discuss how to transform an
3
A formal definition for productive is given in Section 5.1
4
Lifting and total-saturation are also defined in Section 5.1
5
This idea was originally suggested in [136] where we combine instance generation and resolution.
6
M may in fact equal P , but then no benefit is gained other than Λ may witness the unsatisfiability of P
quicker then Γ.
75
inference system into one which supports hypothetical clauses in Section 5.2, give a formal
description of Γ + Λ in Section 5.3 and provide proofs of soundness and completeness in
Section 5.4. In Section 5.5 we show how Inst-Gen-Eq can be combined with superposi-
tion in this framework. In particular, in Section 5.5.1 we discuss the productive, lifting and
total-saturation properties in terms of Inst-Gen-Eq and superposition and in Section 5.5.5
we discuss our implementation to date.
5.1 Preliminaries
Let S be a set of first-order logic formulas, renamed apart, in CNF and Γ and Λ be sound
and refutationally complete first-order logic calculi.
A distribution heuristic can be defined as a function dist : S → {{Γ}, {Λ}, {Γ, Λ}}
that maps the clauses in S to a nonempty subset of {Γ, Λ}. Now let P = {C|Γ ∈ dist(C)}
and let R = {C|Λ ∈ dist(C)}. Clearly S = P ∪ R.
A candidate set is any set (including the empty set) of clauses. Let M1 , M2 , · · · be
a sequence of candidate sets. We define a persistent candidate set, denoted M∞ , in the
following way. Let C1 , C2, · · · be an enumeration of all the clauses in Mi for all i ≥ 1.
Let T0 be the set of all candidate sets. For each i ≥ 1 if Ci appears in infinitely many
Mj ∈ Ti−1 then Ti = {Mj |Ci ∈ Mj ∈ Ti−1 }. Otherwise, Ti = Ti−1 . Now let S0 = ∅. For
each i > 0, if Ci is in an infinite number of Tj then Si = Si−1 ∪ {Ci } otherwise Si = Si−1 .
Then we define M∞ = Si .
i≥1
A derivation of an inference system, Γ, is a sequence, S0 , S1 , · · · , of states of a system
such for each 0 < i, Si is the result of applying a Γ inference rule on Si−1 . We assume
all conclusions to all inferences are renamed apart. A fair derivation is one where all
inferences are eventually performed. The persistent set, S∞ = Sj .
0≤ii≤j
We say that the inference system Γ is productive if for every fair derivation of Γ on P
yielding P0 , P1 , · · · with the limit P∞ there exists a sequence of candidate sets M0 , M1 , · · ·
76
with the limit M∞ , the persistent candidate set, such that there exists M ⊆ Gr(M∞ ) where
M is consistent and M |= P∞ .
Given a candidate set, M, a hypothetical clause is a clause of the form H C where
σ
and σ being the identity. An element K of a candidate set can be written as a hypothetical
clause with H = {K}, C = K and σ being the identity.
Given a set (possibly empty), S, of hypothetical clauses, we define hyp(S) as the union
of all hypotheses of all hypothetical clauses in S. We define conjoin(S) as the conjunction
We say the inference system Λ has the lifting property if when given a set of clauses
C1 , · · · Cn and a set of ground substitutions σ1 · · · σm by Λ we have C1 σ1 · · · Cn σn C
then by Λ we have C1 , · · · , Cn C such that for some ground substitution τ we have
C = C τ or there exists some 0 ≤ i ≤ n such that C = Ci τ . We say Λ has the total-
77
inference on R.
We say a set of clauses T is an unsatisfiable core in R if R is unsatisfiable, T ⊂ R,
and T is a minimal set of clauses in R such that T is unsatisfiable. If Λ produces an
cal Clauses
Inference rules can be modified in a straightforward way to support the use of hypothetical
D1 , · · · , D n
(λ) where τ is the substitution used in the inference
C1 , · · · , Cm
Figure 5.1: λ
We can construct a new inference rule λ , shown in Figure 5.2, from λ that supports
hypothetical clauses as follows.
⎧
H1 D1 , · · · , H n Dn ⎨ (i) τ is the
⎪ substitution used in λ
(λ ) σ1 σn where (ii) H = Hi
H C1 , · · · , H Cm ⎪
⎩ 1≤i≤n
σ σ (iii) σ = σ1 ◦ · · · ◦ σn ◦ τ
Figure 5.2: λ
λ is constructed by replacing clauses by hypothetical clauses and adding two new con-
ditions. The first new condition requires that the hypothesis of each premise be added to
the hypothesis of each conclusion. The second states that the substitution in the conclusion
is the composition of all the substitutions from the premises with the substitution, τ , used
by the inference rule. We also note that λ may have other conditions. These are added to
the conditions of λ .
78
5.3 Γ + Λ Inference Rules
Let’s again let S be a set of first-order logic formulas in CNF and Γ and Λ be sound and
refutationally complete first-order logic calculi but also require that Γ be productive and Λ
have both the lifting and total-saturation properties. We let P and R be sets constructed by
a distribution heuristic such that P ∪ R = S.
The inference rules for Γ+Λ consists of the modified versions (discussed in Section 5.2)
of the Γ and Λ inference rules plus a new inference rules called Learn and a deletion rule
called Delete, given in Figure 5.3. (We note that in the definitions of the inference rules,
below, we denote the set of all premises used in a rule as Prem and the set of all conclusions
in a rule as Concl.)
The Γ inference rules remain unchanged aside from requiring the premises to be in P .
Each Λ inference rule is however replaced with a new inference rule to support the use of
hypothetical clauses. When we refer to Λ inference rules in the remainder of this chapter,
we assume the rules support hypothetical clauses. One additional condition is applied to all
Λ inference rules, that is, the premises are required to be in M ∪ R where M is the current
candidate set. A hypothetical clause (H C) ∈ R is interpreted as Hσ ∧ R∗ |= C, where
σ
Given a candidate set, M, Delete removes from R the hypothetical clauses that are not
implied by M, that is, given a hypothetical clause H C ∈ R, if H ⊆ M then H C is
σ σ
removed from R. The Delete rule given in Figure 5.3 is a state transition diagram. e1 |e2 |e3
denotes a state of the Γ + Λ system where e1 is the current candidate model, e2 = P and
e3 = R. We note that Delete is not required for completeness.
79
⎧
⎪
⎪ (i) Prem ⊆ M ∪ R
C1 , · · · , Hn ⎨
H1 Cn (ii) {C1 , · · · , Cn } witnesses unsatisfiability of M ∪
Learn σ1 σn where
¬(Hσ) ⎪
⎪ (iii) H = conjoin(Prem)
⎩
(iv) σ = σ1 ◦ · · · ◦ σn
M|P |R , H C
Delete σ where H ⊆ M
M|P |R
Figure 5.3: Learn and Delete Inference Rules
1. P0 ∪ R0 = S,
2. for all i ≥ 0 Pi , Ri and Mi are multisets of clauses where Mi is a candidate set for
Pi and
3. for all i ≥ 0 one of the following holds: (a) Pi+1 results from applying a Γ inference
rule or Learn rule on Pi and Ri+1 = Ri or (b) Ri+1 results from applying a Λ
80
candidate set M. Λ(M ∪ R, n) returns the set R after the application of n steps in the
saturation of M ∪ R with Λ inference rules and given an unsatisfiable core, K, Learn(K)
returns a set containing the conclusion(s) to the Learn inference rule with the clauses in K
as the premises.
Algorithm 6: Γ + Λ
input : Sets P and R containing FO formulas with equality in CNF
output: SATISFIABLE or UNSATISFIABLE
Let m, n be positive integers.
while P and R are not saturated do
P := Γ(P, m);
if P is unsatisfiable witnessed by Γ then
return UNSATISFIABLE;
return SATISFIABLE;
that at least one of the clauses, in a set of hypotheses that produce an unsatisfiable core, is
not true. Delete is sound since it removes hypothetical clauses that are no longer implied
by the current candidate set.
81
Theorem 8 Γ + Λ is sound.
Proof Let S0 = P0 ∪R0 be a multiset of clauses and let Γ and Λ be sound and refutationally
complete calculi where Γ is productive and Λ has the lifting and total-saturation properties.
Suppose {P∞ , R∞ , N∞ } is the limit of a derivation of Γ + Λ on S0 and suppose Γ does not
witness the unsatisfiability of P∞ . Let P ,R, and N be the set of ground instances of P∞ ,
R∞ and N∞ , respectively. As Γ is productive and P∞ is satisfiable, there exists M ⊆ N
such that M is consistent and M |= P . Let RM = {H C ∈ R|H ⊆ M}.
We claim M ∪ RM is saturated by Λ, that is, the conclusion of every Λ inference on
M ∪ RM is in RM or is redundant in RM . Suppose on the contrary that there exists some
82
P∞ is satisfiable, hence a contradiction. If H = ∅, then M |= conjoin(H). By Learn,
¬(conjoin(H)) ∈ P∞ or is redundant in P∞ . Thus M |= ¬(conjoin(H)), a contradiction.
Hence M ∪ RM is satisfiable.
The only work, that we are aware of, that combines SInst-Gen and superposition is the
system by Ganzinger and Korovin called Inst-Gen-Eq[111]. In their work they provide
a calculus which extends SInst-Gen with superposition to support formulas that include
equality8.
In their method, the role of superposition is to generate a proof of inconsistency in the
current candidate model. When a proof of an inconsistency is found, the substitution used
in the proof is applied to the clauses whose literals are in the proof via SInst-Gen. These
new instances help to refine the next candidate model.
We believe, their method can be enhanced in our framework so that the full power of
superposition can be utilized. In this section we show how, using the Γ + Λ framework,
Inst-Gen-Eq and superposition can be combined in a way that takes full advantage of each
of the two calculi. We call this inference system SIG-Sup.
Among the requirements of the Γ + Λ framework is that the inference system Γ must be
productive and Λ must have the lifting and total-saturation properties. Here we show that
8
Later in [142],Korovin and Sticksel describe their implementation iProver-Eq which implements the
calculus given in [111]
83
InstGenEq is productive and discuss the lifting and total-saturation properties of super-
position.
For the sake of convenience, we provide definitions for terms that are used in this sec-
the next candidate model. The process is repeated until P⊥ is unsatisfiable or there exists
an IP that is consistent.
In this setting, since for each ground instance, one literal is selected in each IP and
since there are finitely many literals in a clause, one literal will be chosen infinitely often.
84
[90]. And since we can exhaustively apply superposition on all subsets of R, even in the
presence of the empty clause, superposition has the total-saturation property.
For clarity, we note that selP (C, I⊥ ) returns a singleton set and selM ∪R (C) returns a
non-empty set. The inferences rules for SIG-Sup are given in Figures 5.4.
Since both Inst-Gen-Eq and superposition are sound and refutationally complete,
Inst-Gen-Eq is productive and superposition has the lifting and total-saturation proper-
ties, by the soundness and completeness of Γ + Λ it follows that SIG-Sup is both sound
and complete.
The system we propose takes as input a first order formula (with equality) in CNF, S, and
first, by some heuristic, establishes two sets of clauses, P and R, such that P contains
clauses that are better suited for Inst-Gen-Eq and R contains clauses that are better suited
for superposition. Any heuristic can be used so long as P ∪ R = S. In [136] we proposed
85
a heuristic we called, Ground-Single Max (GSM) where P contains ground clauses and
clauses containing more than one maximal literal and R all other clauses. Although the
division heuristic does not alter the system’s completeness, it can obviously affect perfor-
mance.
After the clauses have been distributed, the system starts in search mode9 where we
first check to see if P , when viewed as a set of ground formula, is unsatisfiable. This is
a quick check. In order to do this, we jeroslowize P , replacing all variables in P with
the Jeroslow constant, traditionally denoted ⊥, to produce P ⊥. We check the satisfiability
of P ⊥ using SMT, possibly adding additional clauses, G, to P in the process to rule out
inconsistencies in the ground formula. If P ⊥ is unsatisfiable we halt and we conclude that
S is unsatisfiable.
If on the other hand, P⊥ is satisfiable, we switch to saturation mode and construct a set
of selected literals, IP . That is, for each clause C ∈ P we choose a single literal L in C
such that L⊥ ∈ I⊥ and add it to IP . Clearly IP , our candidate set, implies P .
Then given IP and R, we first remove any hypothetical clauses in R that contain lit-
erals in their hypotheses that are not in IP via Delete inferences. Then, in a fair way, we
such a clause is produced we enter conflict resolution mode. In conflict resolution mode we
analyze the conflict to determine a course of action. If H = ∅, then the initial set of clauses
R is unsatisfiable and we report that S is unsatisfiable and halt. (Note that this is equivalent
to adding the empty clause to P via Learn). If H can be derived from IP only, then
σ
there is a conflict in IP , the candidate model. Here the learned clause is a tautology which
need not be added to P . But we do add to P new instances (using SInst-Gen) of the clauses
whose selected literals are in H and go back into search mode directing the SMT solver
9
This is similar to the search and conflict resolution modes used in [130]. In our system we differentiate
between search, saturation, and conflict modes.
86
to again check the consistency of P⊥ . The substitution used in the SInst-Gen inference is
the substitution used in the proof of inconsistency of IP . If IP conflicts with R we apply
the Learn inference rule to H , adding a new clause to P , and re-enter search mode. If
σ
IP ∪ R becomes saturated and R does not contain the empty clause, we conclude that S is
satisfiable and halt.
We note that the Γ + Λ framework requires that (i) P be be saturated by Inst-Gen-Eq,
thus requiring the saturation of the candidate model by superposition and (ii) that the can-
didate set union R be saturated by superposition. Since the candidate model, IP , used
in Inst-Gen-Eq is exactly our candidate set, doing these saturation processes separately
would result in IP being saturated by superposition twice. In the method we propose, in-
consistencies in the candidate model/set, IP , are found during the saturation of IP ∪ R by
superposition. These inconsistencies trigger SInst-Gen inferences, resulting in the satura-
ity of P efficiently using SMT, (iii) controlling the size of R by eliminating hypothetical
clauses that are not implied by the current interpretation and (iv) minimizing the size of the
clauses in R by applying superposition on IP × R rather than P × R.
An algorithm for applying the inferences in SIG-Sup in a fair way is given in Algorithm
7. Note that we do not follow Algorithm 6. The main difference is that we guide the
saturation of P with SInst-Gen using the inconsistencies in IP detected in the saturation of
IP ∪R with superposition. This has the effect of saturating P with Inst-Gen-Eq and IP ∪R
with superposition while avoiding running superposition on IP twice. Since superposition
has the total-saturation property we are assured that any inconsistency in IP will be found,
87
resulting in SInst-Gen inferences being added to P .
We define the functions in Algorithm 7 as follows. SMT(P ) takes a set of clauses, P ,
as an argument and returns a set of clauses. SMT(P ) generates a ground abstraction of P ,
becomes saturated, the function halts and returns the set R. Inst-Gen(P, σ) takes as input a
set of clauses P and returns a set of instances, using the substitution σ, of the clauses in P .
Learn(H, σ) returns a singleton set containing the conclusion to the Learn inference rule.
Algorithm 7: SIGSup(P,R)
input : Sets P and R containing FO formula with equality in CNF
output: SATISFIABLE or UNSATISFIABLE
Let I⊥ , IP be sets of clauses.
while true do
I⊥ := SMT(P ) ;
if I⊥ = ∅ then
return UNSATISFIABLE;
IP := {L|L = sel(C, I⊥ ), ∀C ∈ P } ;
R := Superposition(IP ∪ R) ;
if H ∈ R then
σ
if H = ∅ then
return UNSATISFIABLE ;
else if IP H then
σ
P = P ∪ Inst-Gen(P, σ) ;
else
P = P ∪ Learn(H, σ) ;
else
return SATISFIABLE ;
88
5.5.5 EVC3
EVC3, which is still under development, is our attempt to combine the strengths of existing
software systems to solve the first-order validity problem using SIG-Sup. The current
version is a sound and complete theorem prover that implements the SIGRes inference
system.
In EVC3 we couple together the SMT solver CVC3 [124] and the purely equational
prover E [112]. We chose CVC3 and E for a number of reasons. First, because both
systems are open source, we can offer the source code of our version freely to the public.
Second, both systems are well known and have reputations for their strong performance
and usability in the SMT and ATP communities respectively. Third, by using existing code
bases, we eliminate the need to “reinvent the wheel”.
In developing EVC3, special care was taken as to not alter any existing code in CVC3,
but to utilize the facilities and architecture of CVC3 and augment its code base, initially
with an implementation of the SIG-Res inference system. To execute the system utilizing
our new code the command line flag +sig-res is used. Otherwise, EVC3 behaves identi-
cally to CVC3. The only code modified in E was its main function to facilitate the creation
of an API for its incremental use.
The source code is maintained under a single directory structure and can be configured
and built using a single configure script and make command. The executable runs in a
single process without threading.
A high level view of EVC3 can be found in Figure 5.5. The components defined by
solid bold lines are completely new. Components defined by bold broken lines have been
augmented to support running CVC3 and E together using our quantifier theory reasoner.
At first, CVC3 initializes all of its subsystems; i.e. SAT Solver, DPLL Engine, Search
Engine, and Theory Core (solver). When EVC3 is run with the +sig-res flag, CVC3
installs our new quantifier theory decision procedure which implements the SIG-Res infer-
ence rules and CVC3 installs and initializes E.
89
After the subsystems are initialized, a problem is read into the system, either inter-
actively via the command line or from a file. A new parser was added to CVC3 which
allows problems in TPTP3 [79] format to be read into the system, which is done using the
built in modes are ig-only (all clauses are added to P ), res-only (all clauses are
added to R), and the default mode gsm (defined above). Maximality in the GSM heuristic
is determined via a Knuth-Bendix ordering.
Once the formulas are distributed, CVC3 continually executes a loop in its SAT solver,
back to the SAT solver and the SAT solver continues in its loop. If no inconsistency is
found by the Theory Solver, CVC3 halts and reports satisfiable.
Our quantifier theory decision procedure in the SIG-Res Quantifier Theory Module main-
tains the set of clauses P and retrieves the clauses in R from E when needed.
90
The quantifier theory decision procedure then applies the inference rules of SIG-Res in
a fair manner until one of the following three states occurs, at which time it passes control
back to the SAT solver.
1. It deduces the empty clause, implying the original set of clauses, F , is unsatisfiable.
When this occurs the quantifier decision procedure adds ⊥ to P⊥ forcing the SAT
solver to report unsatisfiable.
2. It concludes that F is saturated with respect to SIG-Res, implying F is satisfiable,
and returns nothing.
Resolution and factoring inferences on clauses in R are delegated to E, Whereas all in-
ferences involving clauses in P are performed directly in the quantifier decision procedure.
E API
A new API was created to allow CVC3 and E to communicate and exchange data between
their systems. The E API, found in the class SPSolver, allows a user to initialize E,
add clauses to E’s initial set of clauses, run E for a fixed number of steps of its saturation
process, query E’s state, retrieve the set of clauses processed by E, and gracefully terminate
E.
Implementation Status
The current version of EVC3 is a sound and complete theorem prover for first-order formula
without equality. In EVC3, we currently support the SIGRes inference system and are
working to extend it to support SIGSup.
As EVC3 is new and still under development, it is not yet competitive with state-of-the-
art theorem provers. Before it can complete with the more mature provers we must address
a number of issues. The major work to be done is to augment the parser and quantification
91
theory to allow formula with equality and to modify the algorithm in the quantification
theory to implement SIG-Sup.
We would like to include the ability to input arguments for E on the command line rather
than having the arguments hard coded in the initialization of E. This will allow the user
to utilize the command line flags of E. Also, a tighter coupling of E to CVC3 by creating
methods to convert E data to CVC3 data, and vice versa, will improve efficiency. We would
like to extend our quantifier decision procedure to handle full first order logic formula
rather than just CNF formulas. More work also needs to go into the quantifier reasoner
on clauses in P and steps taken in E’s saturation process and iii) the distribution of clauses
to P and R.
5.6 Conclusion
tions of inference systems to be combined into a single sound and complete system. The
only requirements on the inference systems are that they both be sound and refutationally
complete and that one be productive and the other have both the lifting property and the
total-saturation property.
92
provide the inference rules for SIG-Sup, give an informal description of the system and
provide an algorithm that can be used to implement such a system.
Future work consists of extending EVC3 to support SIG-Sup. Other problems of in-
terest are whether superposition can be used as Γ, in particular, whether we can establish
that superposition is productive. It would also be interesting to consider the use of Γ + Λ
in distributive computing. Since small amounts of data are exchanged between Γ and Λ,
our framework may provide a new paradigm for this area of research. Lastly, as with all
inference systems of practical use, redundancy criteria in our framework should be investi-
gated.
93
Superposition ⎧
⎪
⎪ τ = mgu(u, l)
⎪
⎪
⎪
⎪ u is not a variable
⎪
⎪
⎪
⎪ L is of the form s[u]p t or s[u]p t
⎪
⎪
⎪
⎪ tτ s[u]τ and rτ lτ
L[u]p ∨ Σ r∨Δ ⎨
J K l (L)τ ∈ selM ∪R ((L ∨ Σ)τ )
σ σ if
(L[r]p ∨ Σ ∨ Δ)τ ⎪
⎪ (l r)τ ∈ selM ∪R ((l r ∨ Δ)τ )
H ⎪
⎪
σ ⎪
⎪ L[u]p ∨ Σ ∈ M ∪ R and l r ∨ Δ ∈ M ∪ R
⎪
⎪
⎪
⎪ (L[r]p ∨ Σ ∨ Δ)τ ∈ R
⎪
⎪
⎪
⎪ H =J ∪K
⎩
σ = σ ◦ σ ◦ τ
Equality Resolution ⎧
⎪
⎪ τ = mgu(l, r)
⎪
⎪
H l r∨Δ ⎨ (l r)τ ∈ selM ∪R ((l r ∨ Δ)τ )
⎪
σ if H l r∨Δ∈M ∪R
σ
⎪
⎪
H (Δ)τ ⎪
⎪ σ = σ ◦ τ
σ ⎪
⎩ H (Δ)τ ∈ R
σ
Equality Factoring ⎧
⎪
⎪ τ = mgu(l, s)
⎪
⎪ rτ lτ and tτ sτ
⎪
⎪
⎪
⎨ (l r)τ ∈ sel((l r ∨ s t ∨ Δ)τ )
H l r∨s t∨Δ
σ if H l r∨s t∨Δ∈M ∪R
t∨r t ∨ Δ)τ ⎪
⎪ σ
H (l ⎪
⎪
σ ⎪
⎪ σ = σ ◦ τ
⎪
⎩ H (l t ∨ r t ∨ Δ)τ ∈ R
σ
SInst-Gen ⎧
⎪
⎪ H ∈R
⎪
⎪ σ
⎪
⎪ I H
⎪
⎨
P
σ
L∨Γ if L∨Γ∈P
(L ∨ Γ)σ ⎪
⎪ L ∈ selP (L ∨ Γ)
⎪
⎪
⎪
⎪ {L} ∈ H ⊆ IP
⎪
⎩
(L ∨ Γ)σ ∈ P
Learn
H H ∈R
σ
σ if
¬(Hσ) IP H ¬(Hσ) ∈ P
σ
Delete
IP |P |R, H C
σ if H ⊆ IP
IP |P |R
Figure 5.4: SIG-Sup Inference Rules
94
USER
CVC3
INTERFACE
E PROVER
DPLL ENGINE SEARCH ENGINE THEORY CORE
SIG−RES
SAT SOLVER THEORY 1 THEORY K
THEORY QUANT
95
Chapter 6
Conclusion
Loveland (DPLL) procedure [29, 34] which revolutionized the solving of problems in SAT.
Soon after however, Robinson presented his resolution rule [36] which changed the focus
of research in the community. Rather than trying to reduce first-order logic unsatisfiability
to propositional logic unsatisfiability, resolution seeks to find a proof of the empty clause
(falsum). Following up his work on resolution, Robinson joined up with Wos to derive
paramodulation [41] which along with resolution and factoring form a refutationally com-
plete inference system for first-order logic with equality. Near the same time that paramod-
ulation was developed Knuth and Bendix developed a rewriting system for unit equations
called Knuth-Bendix completion [44]. The combination of Knuth-Bendix completion and
96
able to solve problems containing tens of thousands of variables and clauses. This renewed
interest in instance generation-based systems. Jeroslow [63] determined that new ground
instances could be generated by considering only pairs of conflicting literals from different
clauses and that any variable not involved in the conflict could simply be mapped to a
distinguished constant. He called this partial instantiation. Hooker, Ragu, Chandru and
Shrivastava then formulated the first refutationally complete partial instantiation method in
their Primal method [99] and Ganzinger and Korovin formalized primal partial instantiation
in their semantic instance generation rule, SInst-Gen, and proved its completion [106].
iProver [142], a state of the art instance generation prover, is one of the most competitive
theorem provers in competition which shows that although resolution based systems are
at the top of the heap, it may be only a matter of time before instance-based systems are
shown to be equally or perhaps more effective for theorem proving.
We discussed three novel approaches for solving the first-order validity problem us-
ing SAT. The first reduces first-order validity to propositional satisfiability. The second
establishes a method to combine SInst-Gen with resolution. The third provides a general
framework for combining different inference systems into a single system.
proved its soundness and completeness [126]. We also identified, with our ChewTPTP-
SAT implementation, problems that our system was able to solve but others could not.
Though not currently competitive, we have reason to believe that by eliminating restarts in
the implementation, the solver will be much more competitive.
Following this work was joint work with Bongio, Katrak, Lin and Lynch where we
established a closed rigid connection tableaux proof in SMT [128]. In this encoding we
97
encoded the choices made in constructing the connection tableaux in SAT and encoded
the unification checks and finiteness of the tableaux in SMT. In an implementation for this
encoding, called ChewTPTP-SMT, we found that the encodings were significantly smaller
in the Horn case and hence faster than the SAT encoding, and found that in the non-Horn
case that the results were worse. Similar to ChewTPTP-SAT, in order to be competitive
with state-of-the-art theorem provers, more work needs to be done on the implementation.
Eliminating restarts and finding a way to reduce the non-Horn encoding by a factor of n are
initial improvements that can be made. Other future work on the tableaux encoding method
includes looking at ways to be able to use SMT to further reduce the representation for non-
Horn clauses, ideally cutting it down to a quadratic number of clauses. In addition, in order
to prove the general first order problem we also need to find a good heuristic to decide
exactly which clauses should be copied. We would like a method to decide satisfiability
from rigid satisfiability. It would be useful to have an encoding of rigid clauses modulo
a non-rigid theory, as discussed in [122]. This way, we could immediately identify some
clauses as non-rigid, and work modulo those clauses.
In the second line of research, in collaboration with Lynch [136], we developed a refu-
tationally complete inference system called SIG-Res which combines semantic selection
instance generation and resolution. In this system, we create two sets of clauses, R and P ,
and only allow resolution inferences between pairs of clauses where at least one clauses is
in R. New instances of clauses in P are generated using SInst-Gen. We established the
soundness and completeness of SIG-Res and have demonstrated in our implementation,
named Spectrum, that we are able to solve some problems faster using SIG-Res than us-
ing SInst-Gen or resolution alone. Though sound and complete, a lack of maturity, has
hindered its performance relative to state-of-the-art theorem provers. More work needs to
go into redundancy elimination in order for it to compete with the leading solvers. A no-
98
SInstGen or resolution alone. We believe these problems work best using SIGRes be-
cause the heuristic places growing clauses in R and clauses with a transitive like structure
in P .
The last line of research we discuss is a framework, called Γ+Λ, which allows the com-
bination of two inference systems, Γ and Λ. The requirements for Γ and Λ are that Γ and
Λ be refutationally complete, Γ be productive, and Λ have the lifting and total-saturation
property. Any Γ and Λ with these properties can be combined under our framework into
a single sound and refutationally complete system. We present the inference rules for Γ
and Λ, prove its completeness and show how under this framework that Inst-Gen-Eq and
superposition can be combined into a single sound and refutationally complete system. We
discuss our work in progress, called EVC3, which will eventually be extended to support
SIG-Sup. The current version, though sound and complete, only implements the SIG-Res
inference system.
Other problems of interest related to the Γ + Λ framework are whether superposition
can be used as Γ, in particular, whether we can establish that superposition is productive.
It would also be interesting to consider the use of Γ + Λ in distributive computing. Since
small amounts of data are exchanged between Γ and Λ, our framework may provide a new
paradigm for this area of research. Lastly, as with all inference systems of practical use,
redundancy criteria in our framework should be investigated.
99
Bibliography
[1] Aristotle. Prior Analytics. In A.J. Jenkinson, trans., Internet Classics Archive. Origi-
[2] P. Abaelardus. Dialectica. In L.M. de Rijk, trans., Petrus Abaelardus. Dialectica. First
Complete Edition of the Parisian Manuscript with an Introduction, Assen: Van Gorcum,
1970. Originally written in 1160.
[4] R. Descartes. The Geometry of Rene Descartes, D. E. Smith and M. L. Lantham, trans.,
Dover, 1954.
[5] A. Arnauld and P. Nicole. Logic or the Art of Thinking. In T.S. Baynes, trans., J.
Gordon, ed., The Port-Royal Logic, Translated from the French; with Introductions,
Notes, and Appendix, Hamilton, Adams, and Co., London, 1861. Originally written in
1662.
[6] G. Boole. The Mathematical Analysis of Logic: Being an Essay Towards a Calculus of
Deductive Reasoning, 1847.
[7] A. De Morgan. Formal Logic: or, The Calculus of Inference, Necessary and Probable,
1847.
100
[8] G. Boole. An Investigation of the Laws of Thought on Which are Founded the Mathe-
matical Theories of Logic and Probabilities, 1854.
[9] T.S. Baynes. The Port-Royal Logic, Translated from the French; with Introductions,
Notes, and Appendix, J. Gordon, ed., Hamilton, Adams, and Co., London, 1861.
[10] C.S. Peirce. Harvard Lecture 1. In Houser N., et.al., eds., Writings of Charles S.
Peirce, volume 1, pp. 162-175, 2000. Originally written in 1865.
[11] C.S. Peirce. Description of a Notation for the Logic of Relatives, Resulting from an
Amplification of the Conceptions of Boole’s Calculus of Logic, Welch, Bigelow, and
Company, Cambridge, 1870.
[12] G. Frege. Begriffsschrift, A Formal Language, Modeled upon that of Arithmetic, for
Pure Thought. In J. van Heijenoort, ed., From Frege To Godel, Harvard University Press,
Cambridge Massachusetts, 1967. Originally written in 1879.
101
[18] P. Bernays and M. Schönfinkel. Zum Entscheidungsproblem der mathematischen
Logik. In Mathematische Annelen, volume 99, pp. 342-372, 1928.
[19] K. Gödel. Die ollständigkeit der Axiome des logischen Funktionenkalküls. In Monat-
[20] K. Gödel. Ü ber formal unentscheidbare Sätze der Principia mathematica und ver-
wandter System I. In Monatshefte für Mathematik und Physik 38, pp. 173-198, 1931.
[27] A. Newell and H.A. Simon. The Logic Theory Machine - A Complex Information
Processing System. In Transactions on Information Theory, IRE., volume 2, issue 3, pp.
61-79, 1956.
102
[28] D. Prawitz, H. Prawitz and N. Voghera. A Mechanical Proof Procedure and its Real-
ization in an Electric Computer. In Journal of the ACM, volume 7, issue 2, 1960.
Journal of the ACM, volume 7, issue 3, 1960. Reprinted in [Siekmann, Wrightson 1983].
[30] D. Prawitz. An Improved Proof Procedure. In Theoria, volume 26, issue 2, 1960.
Reprinted in [Siekmann, Wrightson 1983].
[31] P.C. Gilmore. A Proof Method for Quantification Theory: Its Justification and Re-
alization In IBM Journal of Research and Development, volume 4, issue 1, pp. 28-35,
[32] H. Wang. Toward Mechanical Mathematics. In IBM Journal of Research and Devel-
opment, volume 4, pp. 2-22, 1960.
[33] W. Kneale and M. Kneale. The Development of Logic, Oxford University Press, 1962.
[35] M. Davis. Eliminating the Irrelevant from Mechanical Proofs. In Proceedings of Sym-
posia in Applied Math, volume 15, pp. 15-30, 1963. Reprinted in [Siekmann, Wrightson
1983].
[37] J. van Heijenoort. From Frege to Godel, Harvard University Press, Cambridge Mas-
sachusetts, 1967.
103
[38] J. R. Slagle. Automatic Theorem Proving with Renamable and Semantic Resolution.
In Journal of the ACM, volume 14, number 4, pp. 687-697, 1967.
[41] G. Robinson and L. Wos. Paramodulation and Theorem Proving in First-order Theo-
ries with Equality. In Machine Intelligence 4, Edinburgh University Press, pp. 135-150,
1969.
[42] R. Kowalski and P.J. Hayes. Semantic Trees in Automatic Theorem Proving. In
Machine Intelligence 4, Edinburgh University Press, pp. 87-101.
[43] J.L. Bell and A.B. Slomson. Models and Ultraproducts, An Introduction, Dover, 1969.
[44] D. Knuth and P. Bendix. Simple Word Problems in Universal Algebras. In Computa-
tional Problems in Abstract Algebra, Pergamon Press, pp. 263-297, 1970.
[45] C. Chang and C.R. Lee. Symbolic Logic and Mechanical Theorem Proving, Aca-
demic Press, New York and London, 1973.
[46] W. Bibel and J. Schreiber. Proof Search in a Gentzen-like system of first order logic.
In Proceedings of the International Computing Symposium, North Holland, pp. 205-212,
1975.
[47] D. Brand. Proving Theorems with the Modification Method. In SIAM Journal on
104
[49] G. Nelson and D. Oppen. Simplification by Cooperating Decision Procedures. In
ACM Transactions on Programming Languages and Systems, volume 1, number 2, pp.
245-257, 1979.
[50] P.B. Andrews. Theorem Proving via General Matings. In Journal of the Association
for Computing Machinery, volume 28, number 2, pp. 193-214, 1981.
[51] H. Putnam. Pierce the Logician. In Historia Mathematica, volume 9, issue 3, pp.
290-301, 1982.
[53] M. Davis. The Prehistory and Early History of Automated Deduction. In Automation
of Reasoning. Classical Papers in Computational Logic, Springer, 1983.
with Equality. In SIAM Journal of Computing, volume 12, issue 1, pp. 82-100, 1983.
[55] R. Shostak. Deciding Combinations of Theories. In Journal of the ACM, volume 31,
issue 1, pp. 1-12, 1984.
[57] P. King. Jean Buridan’s Logic: The Treatise on Supposition; The Treatise on Conse-
quences: Translation from the Latin with a Philosophical Introduction, Reidel, 1985.
105
[59] D.W. Loveland. Automated Theorem Proving: Mapping Logic into AI. In Proceed-
ings of the International Symposium on Methodologies for Intelligent Systems, Press,
pp. 214-229, 1986.
[62] H. Zhang and D. Kapur. First-order Theorem Proving Using Conditional Rewrite
Rules. In Proceedings of the 9th International Conference on Automated Deduction,
Lecture Notes in Computer Science, volume 310, pp. 1-20, 1988.
In Decision Support Systems, Elsevier Science B.V., volume 4, issue 2, pp. 183-197,
1988.
[65] S. Lee. CLIN: An Automated Reasoning System Using Clause Linking, Doctoral
Dissertation in Philosophy, University of North Carolina at Chapel Hill, 1990.
[66] R. Dipert. The Life and Work of Ernst Schroder. In Modern Logic, volume 1, pp.
117-139, 1990.
[67] J. Pais and G. Peterson. Using Forcing to Prove Completeness of Resolution and
Paramodulation. In Journal of Symbolic Computation, volume 11, pp. 3-19, 1991.
106
[68] J. Goubault. The Complexity of Resource-Bounded First-Order Classical Logic. In
Lecture Notes In Computer Science, Proceedings of the 11th Annual Symposium on The-
oretical Aspects of Computer Science, volume 775, Springer-Verlag, pp. 59-70, 1994.
[69] S. Lee and D. Plaisted. Problem Solving by Searching for Models with a Theorem
[71] P. King and S. Shapiro. The History Of Logic. In The Oxford Companion to Philos-
ophy, Oxford University Press, pp. 496, 1995.
[73] M. Moser, C. Lynch and C. Steinbach. Model Elimination with Basic Ordered
[74] C. Barrett, D. Dill and J. Levitt. Validity Checking for Combinations of Theories with
Equality. In Proceedings of the First International Conference on Formal Methods in
Computer-Aided Design, Springer-Verlag, pp. 187-201, 1996.
[75] D. Cyrluk, P. Lincoln and N. Shankar. On Shostak’s Decision Procedure for Combi-
nations of Theories. In Proceedings of the 13th International Conference on Automated
Deduction, Springer-Verlag, pp. 463-477, 1996.
Workshop on Theorem Proving with Analytic Tableaux and Related Methods, Springer-
Verlag, pp. 110-126, 1996.
107
Automated Deduction, Lecture Notes in Computer Science, Springer, volume 1249, pp.
101-115, 1997.
[79] G. Sutcliffe and C.B. Suttner. The TPTP Problem Library: CNF Release v1.2.1. In
Journal of Automated Reasoning, volume 21, number 2, pp. 177-203, 1998.
[84] R. Letz and G. Stenz. Proof and Model Generation with Disconnection Tableaux.
108
[87] L. Bachmair and H. Ganzinger. Resolution Theorem Proving. In Handbook Of Au-
tomated Reasoning, A. Robinson and A. Voronkov, eds., volume 1, MIT Press, Cam-
bridge, pp. 19, 2001.
[93] R. Letz and G. Stenz. Model Elimination and Connection Tableau Procedures. In
Handbook of Automated Reasoning, A. Robinson and A. Voronkov, eds., volume 2,
MIT Press, Cambridge, pp. 2015, 2001.
109
[96] P. King. The Metaphysics and Natural Philosophy of Buridan, J.M.M.H Thijssen and
J. Zupko, eds., Brill, 2001.
[97] C. Barrett, D.L. Dill and A. Stump. Checking Satisfiability of First-order Formulas
[99] J.N. Hooker, G. Rago, V. Chandru and A. Shrivastava. Partial Instantiation Methods
for Inference in First-order Logic. In Journal of Automated Reasoning, volume 28, p.
200, 2002.
[103] A. Riazanov and A. Voronkov. Efficient Instance Retrieval with Standard and Relati-
nal Path Indexing. In Proceedings of the 19th International Conference On Automated
Deduction, Lecture Notes in Computer Science, Springer, volume 2741, pp. 380-396,
2003.
110
[104] P. Baumgartner and C. Tinelli. The Model Evolution Calculus. In Proceedings of
the 19th International Conference On Automated Deduction, Lecture Notes in Computer
Science, Springer, volume 2741, pp. 350-364, 2003.
111
[112] S. Schulz. System Description: E 0.81. In Proceedings of the 2nd International
Joint Conference on Automated Reasoning, volume 3097, pages 223-228, 2004.
ings of the 16th International Conference on Term Rewriting and Applications, Lecture
Notes in Computer Science, Springer, volume 3467, pp. 453-468, 2005.
[114] P. Baumgartner and C. Tinelli. The Model Evolution Calculus with Equality. In
Proceedings of the 20th International Conference on Automated Deduction, Lecture
Notes in Artificial Intelligence, Springer, volume 3632, pp. 392-408, 2005.
[115] R. Nieuwenhuis, A. Oliveras and C. Tinelli. Abstract DPLL and Abstract DPLL
Modulo Theories. In Proceedings of the 11th International Conference on Logic for
Programming, Artificial Intelligence and Reasoning, Lecture Notes in Computer Sci-
ence, Springer, volume 3452, pp. 36-50, 2005.
[116] R. Nieuwenhuis and A. Oliveras. DPLL(T) with Exhaustive Theory Propagation and
Its Application to Difference Logic. In Proceedings of the 17th International Conference
on Computer Aided Verification, Lecture Notes in Computer Science, Springer, volume
3576, pp. 321-334, 2005.
112
[119] R. Nieuwenhuis, A. Oliveras and C. Tinelli. Solving SAT and SAT Modulo Theo-
ries: From an Abstract Davis-Putnam-Logemann-Loveland Procedure to DPLL(T). In
Journal of the ACM, volume 53, number 6, pp. 937-977, 2006.
[122] S. Delaune, H. Lin and C. Lynch. Protocol Verification Via Rigid/Flexible Resolu-
tion. In Proceedings of the 14th International Conference on Logic for Programming,
Artificial Intelligence and Reasoning, pp. 242-256, 2007.
[124] C. Barrett and C. Tinelli. CVC3. In Proceedings of the 19th International Con-
409, 2007.
[126] T. Deshane, W. Hu, P. Jablonski, H. Lin, C. Lynch and R.E. McGregor. Encoding
First Order Proofs in SAT. In Proceedings of the 21st International Conference on
Automated Deduction, Springer-Verlag, pp. 476-491, 2007.
113
[127] M. Ludwig and U. Waldmann. An extension of the Knuth-Bendix Ordering with
LPO-like properties. In Proceedings of the 14th international conference on Logic for
Programming, Springer-Verlag, pp. 348-362, 2007.
[128] J. Bonjio, C. Katrak, H. Lin, C. Lynch and R.E. McGregor. Encoding First Order
Proofs in SMT. In Electronic Notes in Theoretical Computer Science, volume 198,
number 2, pp. 71-84.
[132] C. Lynch and D. Tran. SMELS: Satisfiability Modulo Equality with Lazy Superpo-
sition. In Proceedings of the 6th International Symposium on Automated Technology for
Verification and Analysis, Lecture Notes in Computer Science, Springer, volume 5311,
pp. 186-200, 2008.
[134] S.J. Russell and P. Norvig. Artificial Intelligence: A Modern Approach, Prentice
Hall, 3rd Edition, 2009.
114
[135] C. Sticksel. Efficient Ground Satisfiability Solving In An Instantiation-based
Method For First-order Theorem Proving. Presented at the 16th Workshop on Auto-
mated Reasoning, 2009.
[136] C. Lynch and R.E. McGregor. Combining Instance Generation and Resolution. In
Proceedings of the 7th International Conference on Frontiers of Combining Systems,
Springer-Verlag, pp. 304-318, 2009.
368-382, 2009.
115
[143] P. Baumgartner and E. Thorstensen. Instance Based Methods - An Overview. In
Kunstliche Intelligenz, volume 24, number 1, pp. 35-42, 2010.
cvc3/doc/index.html.
116