The Complexity of Boolean Functions
The Complexity of Boolean Functions
Complexity of
Boolean
Functions
Ingo Wegener
WARNING:
This version of the book is for your personal use only. The material
is copyrighted and may not be redistributed.
Copyright c 1987 by John Wiley & Sons Ltd, and B. G. Teubner, Stuttgart.
All rights reserved.
No part of this book may be reproduced by any means, or transmitted, or translated
into a machine language without the written permission of the publisher.
Library of Congress Cataloguing in Publication Data:
Wegener, Ingo
The complexity of boolean functions.
(Wiley-Teubner series in computer science)
Bibliography: p.
Includes index.
1. Algebra, Boolean. 2. Computational complexity.
I. Title. II. Series.
AQ10.3.W44 1987 511.324 87-10388
ISBN 0 471 91555 6 (Wiley)
British Library Cataloguing in Publication Data:
Wegener, Ingo
The complexity of Boolean functions.(Wiley-Teubner series in computer science).
1. Electronic data processingMathematics 2. Algebra, Boolean
I. Title. II. Teubner, B. G.
004.01511324
QA76.9.M3
ISBN 0 471 91555 6
CIP-Kurztitelaufnahme der Deutschen Bibliothek
Wegener, Ingo
The complexity of Boolean functions/Ingo Wegener.Stuttgart: Teubner; Chichester; New York; Brisbane; Toronto; Singapore: Wiley, 1987
(Wiley-Teubner series in computer science)
ISBN 3 519 02107 2 (Teubner)
ISBN 0 471 91555 6 (Wiley)
Printed and bound in Great Britain
Preface
vi
them.
I should like to express my thanks to Annemarie Fellmann, who
set up the manuscript, to Linda Stapleton for the careful reading of
the text, and to Christa, whose complexity (in its extended denition,
as the sum of all features and qualities) far exceeds the complexity of
all Boolean functions.
Ingo Wegener
Contents
1.
1.1
1.2
1.3
1.4
1.5
2.
2.1
2.2
2.3
2.4
2.5
2.6
3.
3.1
3.2
3.3
3.4
3.5
3.6
3.7
1
1
3
6
10
15
19
25
29
31
33
35
36
39
39
51
67
74
76
78
81
83
vii
22
22
viii
4.
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
87
87
88
93
96
98
06
107
110
117
5.
5.1
5.2
5.3
5.4
5.5
5.6
119
119
122
125
127
133
Monotone circuits
Introduction
Design of circuits for sorting and threshold functions
Lower bounds for threshold functions
Lower bounds for sorting and merging
Replacement rules
Boolean sums
Boolean convolution
Boolean matrix product
A generalized Boolean matrix product
Razborovs method
An exponential lower bound for clique functions
Other applications of Razborovs method
145
145
148
154
158
160
163
168
170
173
180
184
192
6.
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
138
142
ix
195
203
207
214
7.
7.1
7.2
7.3
7.4
218
218
221
225
229
233
8.
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
Formula size
Threshold - 2
Design of ecient formulas for threshold - k
Ecient formulas for all threshold functions
The depth of symmetric functions
The Hodes and Specker method
The Fischer, Meyer and Paterson method
The Nechiporuk method
The Krapchenko method
Exercises
235
235
239
243
247
249
251
253
258
263
9.
9.1
9.2
9.3
9.4
9.5
9.6
267
267
271
277
279
282
285
9.7
9.8
288
292
294
10.
10.1
10.2
10.3
296
296
301
306
318
11.
11.1
11.2
11.3
11.4
11.5
Bounded-depth circuits
Introduction
The design of bounded-depth circuits
An exponential lower bound for the parity function
The complexity of symmetric functions
Hierarchy results
Exercises
320
320
321
325
332
337
338
12.
12.1
12.2
12.3
340
340
344
352
359
13.
13.1
13.2
13.3
13.4
13.5
361
361
363
368
373
380
387
396
411
xi
414
414
418
421
423
431
436
439
References
442
Index
455
1.1 Introduction
functions, the number of possible inputs as well as the number of possible outputs is nite. Obviously, all these functions are computable.
In 2 we introduce a rather general computation model, namely circuits. Circuits build a model for sequential computations as well as
for parallel computations. Furthermore, this model is rather robust.
For several other models we show that the complexity of Boolean
functions in these models does not dier signicantly from the circuit
complexity. Considering circuits we do not take into account the specic technical and organizational details of a computer. Instead of
that, we concentrate on the essential subjects.
The time we require for the computation of a particular function
can be reduced in two entirely dierent ways, either using better computers or better algorithms. We like to determine the complexity of a
function independently from the stage of the development of technology. We only mention a universal time bound for electronic computers.
For any basic step at least 5 6 1033 seconds are needed (Simon (77)).
Boolean functions and their complexity have been investigated
since a long time, at least since Shannons (49) pioneering paper. The
earlier papers of Shannon (38) and Riordan and Shannon (42) should
also be cited. I tried to mention the most relevant papers on the complexity of Boolean functions. In particular, I attempted to present also
results of papers written in Russian. Because of a lack of exchange
several results have been discovered independently in both parts of
the world.
There is large number of textbooks on logical design and switching circuits like Caldwell (64), Edwards (73), Gumm and Poguntke (81), Hill and Peterson (81), Lee (78), Mendelson (82), Miller (79),
Muroga (79), and Weyh (72). These books are essentially concerned
with the minimization of Boolean functions in circuits with only two
logical levels. We only deal with this problem in Ch. 2 briey. The
algebraical starting-point of Hotz (72) will not be continued here. We
develop the theory of the complexity of Boolean functions in the sense
of the book by Savage (76) and the survey papers by Fischer (74),
Harper and Savage (73), Paterson (76), and Wegener (84 a). As almost 60% of our more than 300 cited papers were published later
than Savages book, many results are presented for the rst time in
a textbook. The fact that more than 40% of the relevant papers on
the complexity of Boolean functions are published in the eighties is a
statistical argument for the claim that the importance of this subject
has increased during the last years.
Most of the book is self-contained. Fundamental concepts of linear
algebra, analysis, combinatorics, the theory of ecient algorithms (see
Aho, Hopcroft and Ullman (74) or Knuth (81)) and the complexity
theory (see Garey and Johnson (79) or Paul (78)) will be applied.
ma (x) =
af 1 (1)
a(n)
xn
sb (x) .
bf 1 (0)
The rst and second representation are called disjunctive and conjunctive normal form resp. (DNF and CNF).
Proof : By denition, ma (x) = 1 i x = a and sa (x) = 0 i x = a .
f(x) equals 1 i x f 1(1) i one of the minterms ma(x) for a f 1 (1)
computes 1 . Similar arguments work for the CNF of f .
f(x) =
A{1
n}
xi
(2.1)
iA
We only use a small set of well-known operations, the addition of digits, the application of multiplication tables, comparison of digits, and
if - tests. All our calculations are based on these basic operations only.
Here we choose a nite set of one - output Boolean functions as basis. Inputs of our calculations are the variables x1
xn and w.l.o.g.
also the constants 0 and 1 . We do neither distinguish between constants and constant functions nor between variables and projections
x xi . One computation step is the application of one of the basic operations to some inputs and/or already computed data.
In the following we give a correct description of such a computation
called circuit.
DEFINITION 3.1 : An -circuit works for a xed number n of
Boolean input variables x1
xn . It consists of a nite number b
of gates G(1)
G(b) . Gate G(i) is dened by its type i
and, if i Bn(i) , some n(i)-tuple (P(1)
P(n(i))) of predecessors.
P(j) may be some element from {0 1 x1
xn G(1)
G(i 1)} .
By res G(i) we denote the Boolean function computed at G(i) . res is
dened inductively. For an input I res I is equal to I .
If G(i) = ( i P(1)
P(n(i))) ,
res G(i) (x) = i (res P(1)(x)
(3.1)
(3.2)
We obtain the following circuit where all edges are directed top - down.
G1 = ( x1 x2) G2 = ( G1 x3) G3 = ( x1 x2)
G4 = ( G1 x3) G5 = ( G3 G4) (y1 y0) = (G5 G2)
x1
G3
x2
x3
G1
G4
G5 = y1
G2
Fig. 3.1
In the following we dene circuits in a more informal way.
Many circuits are computing the same function. So we look for
optimal circuits, i.e. we need criterions to compare the eciency of
circuits. If a circuit is used for a sequential computation the number
of gates measures the time for the computation. In order to ease
the discussion we assume that the necessary time is for all basic operations the same. Circuits (or chips) in the hardware of computers
10
(4.1)
Any function computable by an -circuit may be computed by an circuit with fan-out 1 . This can be proved by induction on c = C (f) .
Nothing has to be proved for c = 0 . For c 0 we consider an -circuit
for f with c gates. Let g1
gr be the functions computed at the
predecessors of the last gate. Since C (gi) c , gi can be computed
by an -circuit with fan-out 1 . We take disjoint -circuits with fanout 1 for g1
gr and combine them to an -circuit with fan-out 1
11
for f . The depth of the new circuit is not larger than that of the old
one, thus D (f) = Ds (f) for all s . In future we do not investigate
Ds anymore. With the above procedure the size of the circuit may
increase rapidly. For s 2 , we can bound the increase of size by
the following algorithm of Johnson, Savage and Welch (72). We also
bound the fan-out of the variables by s .
If some gate G (or some variable) has fan-out r s we use s 1
outgoing wires in the same way as before and the last outgoing wire
to save the information of G . We build a subcircuit in which again
res G is computed. We still have to simulate r (s 1) outgoing wires
of G. If s 2 , the number of unsimulated wires decreases with each
step by s 1 . How can we save the information of gate G ? By
computing the identity x x . Let l () be the smallest number of
gates in order to compute a function g = res G at some gate given g
as input. We claim that l () {1 2} . Let be a nonconstant
basic operation. Let Bm . Since is not constant, input vectors
exist diering only at one position (w.l.o.g. the last one) such that
(a1
am1 1) = (a1
am1 0) . We need only one wire out
of G to compute (a1
am1 res G ) which equals res G , implying
l () = 1 , or res G . In the second case we repeat the procedure and
compute ( res G ) = res G implying l () = 2 . At the end we obtain
a circuit for f in which the fan-out is bounded by s .
THEOREM 4.1 : Let k be the fan-in of the basis , i.e. the largest
number of inputs for a function of . If f Bn may be computed by
an -circuit and if s 2 then
Cs (f) (1 + l ()(k 1) (s 1)) C (f)
(4.2)
12
r if r 1
(4.3)
(4.4)
13
tween formula size and depth (see Ch. 7). Another reason is that
Boolean formulas correspond to those expressions we usually call formulas. Given a formula we may also bound the fan-out of the inputs
by 1 by using many copies of the inputs. From our graph representation we obtain a tree where the root is the last gate. Basically this is
the representation of arithmetical expressions by trees.
We could be satised. Bounding the fan-out does not increase the
depth of the circuit and the size has to be increased only by a small
constant factor, if s 2 . But with both algorithms discussed we
cannot bound the increase of size and depth simultaneously. This was
achieved at rst by an algorithm of Hoover, Klawe and Pippenger (84).
Size and depth will increase only by a constant factor. Perhaps the
breadth is still increasing (see Schnorr (77) for a discussion of the
importance of breadth).
We present the algorithm only for the case l () = 1 . We saw that
p identity gates are sucient to simulate a gate of fan-out r where
p is the smallest integer such that r s + p(s 1) . For s = 3 we
show in Fig. 4.1 a how Johnson, Savage and Welch replaced a gate of
fan-out 12 . In general, we obtain a tree consisting of a chain of p + 1
nodes whose fan-out is bounded by s . Any other tree with p+1 nodes,
r leaves and fan-out bounded by s (as shown in Fig. 4.1 b) will also do
the job. The root is the gate that has to be simulated, and the other p
nodes are identity gates. The r outgoing wires can be used to simulate
the r outgoing wires of the gate we simulate. The number of gates
behaves as in the algorithm of Johnson et al. We have some inuence
on the increase in depth of the circuit by choosing appropriate trees.
In a given circuit S with b gates G1
Gb we work bottom-up.
Let Sb = S . We construct Si1 from Si by replacing gate Gi by an
appropriate tree. Then S = S0 is a circuit of fan-out s equivalent
to S . The best thing we could do in each step is to replace Gi by a
tree Ti such that the longest path in Si1 , starting at the root of Ti ,
is kept as short as possible. In the following we describe an ecient
algorithm for the choice of Ti .
14
a)
b)
Fig. 4.1
(4.5)
15
(4.6)
and
D(S ) (1 + l () log s k) D(S)
(4.7)
1.5 Discussion
It turned out that circuits build an excellent model for the computation of Boolean functions. Certainly circuit complexity and depth
of a Boolean function cannot be measured unambigously. These complexity measures depend on
the costs and the computation time of the dierent types of gates
16
This eect is unpleasant. How can we nd out whether f is easier than g ? The results of 3 and 4 showed that the eect of
the above mentioned criterions on circuit complexity and depth of a
Boolean function can be estimated by a constant factor (with the only
exceptions of incomplete bases and the fan-out restriction 1). If we
ignore constant factors, we can limit ourselves to a xed circuit model.
The basis is B2 , all gates cause the same cost, and the fan-out is not
restricted. Comparing two functions f and g not only C(f) and C(g)
but also D(f) and D(g) dier by a constant factor. In fact we do not
consider some denite function f but natural sequences of functions
fn . Instead of the addition of two 7-bit numbers, a function f B14 8 ,
we investigate the sequence of functions fn B2n n+1 where fn is the
addition of two n-bit numbers.
Let (fn) and (gn ) be sequences of functions. If C(fn) = 11 n and
C(gn) = n2 , C(fn ) C(gn) for n 11 but C(fn) C(gn) is bounded
by 11 and converges to 0 . We state that (gn) is more complex than
(fn) , since for all circuit models the quotient of the complexity of fn
and the complexity of gn converges to 0 . We ignore that gn may be
computed more eciently than fn for small n. We are more interested
in the asymptotic behavior of C(fn) and C(gn) .
Certainly, it would be best to know C(fn) exactly. If it is too
dicult to achieve this knowledge, then, in general, the asymptotic
behavior describes the complexity of fn quite good. Sometimes the
concentration on asymptotics may lead to absurd results.
If C(fn) = 15 n34816 and C(gn) = 2n 100 , C(fn) C(gn) converges to 0 ,
but for all relevant n the complexity of fn is larger than the complexity
of gn . But this is an unrealistic example that probably would not
occur. In the following we introduce the big-oh notation.
17
DEFINITION 5.1 :
large n .
i)
Let f g :
0 for
f = O(g) (f does not grow faster than g) if f(n) g(n) c for some
constant c and large n .
0.
18
T(n)
n
n log2 n
n2
n3
2n
1 000
140
31
10
9
60 000
4 893
244
39
15
3 600 000
200 000
1 897
153
21
Tab. 5.1
T(n)
n
n log n
n2
n3
2n
m
m
m
m
m
10 m
(nearly) 10 m
3 16 m 10 3 162
2 15 m 10 2 153
m + 3 3 10 23 3
Tab. 5.2
This notation is based on the experience that algorithms whose running time is a polynomial of very large degree or whose running time
19
EXERCISES
1. What is the cardinality of Bn m ?
2. Let f(x1 x2 x3) = (y1 y0) be the fulladder of 3. y1 is monotone
but y0 is not. Design an { }-circuit for y1 .
3. f is called non degenerated if f depends essentially on all its variables, i.e. the subfunctions of f for xi = 0 and xi = 1 are dierent.
Let Nk be the number of non degenerated functions f Bk and
N0 = 2 .
20
Then
0kn
n
k
Nk = |Bn | .
xn y1
b) fn (x0
xn1 y0
yn ) = 1 i xi = yi for all i .
xi 2 i
yn1) = 1 i
0in1
c) fn (x1
yi 2 i .
0in1
xn ) = 1 i x1 + + xn 2 .
c) { } .
21
log i
1in
i1
c)
1in
i 2i .
d)
1in
0.
18. nlog n does not grow polynomially and also not exponentially.
19. If f grows polynomially, there exists a constant k such that
f(n) nk + k for all n .
22
DEFINITION 1.1 : A partially dened Boolean function is a function f : {0 1}n {0 1 ? } . Bn is the set of all partially dened
Boolean functions on n variables.
A circuit computes f Bn at gate G i f(x) = resG (x) for all
x f 1 ({0 1}).
Since inputs outside of f 1({0 1}) are not possible (or just not
expected ?!), it does not matter which output a circuit produces for
inputs a f 1(?) . Since Bn Bn , all our considerations are valid also
for completely dened Boolean functions. We assume that f is given
by a table of length N = 2n . We are looking for ecient procedures for
the construction of good circuits. The running time of these algorithms
has to be measured in terms of their input size, namely N , the length
of the table, and not n , the length of the inputs of f .
23
The knowledge of circuits, especially of ecient circuits for an arbitrary function is far away from the knowledge that is required to
design always ecient circuits. Therefore one has restricted oneself to
a subclass of circuits. The term minimization of a Boolean function
stands for the design of an optimal circuit in the class of 2-circuits
(for generalizations of the concept of 2 -circuits see Ch. 11). Inputs
of 2-circuits are all literals x1 x1
xn xn . In the rst step we may
compute arbitrary conjunctions (products) of literals. In the second
step we compute the disjunction (sum) of all terms computed in the
rst step. We obtain a sum-of-products for f which also is called polynomial for f . The DNF of f is an example of a polynomial for f . Here
we look for minimal polynomials, i.e. polynomials of minimal cost.
From the practical point of view polynomials have the advantage
that there are only two logical levels needed, the level of disjunctions
is following the level of conjunctions.
DEFINITION 1.2 :
i)
24
25
26
By Lemma 2.1 the sets Qk are computed correctly. Also the sets of
prime implicants Pk are computed correctly. If an implicant of length
k has no proper shortening of length k 1 which is an implicant, then
it has no proper shortening which is an implicant and therefore it is a
prime implicant. In order to obtain an ecient implementation of Algorithm 2.1 we should make sure that Qi does not contain any monom
twice. During the construction of Qi it is not necessary to test for all
pairs (m m ) of monoms in Qi+1 whether m = m xj and m = m xj for
some j . It is sucient to consider pairs (m m ) where the number of
negated variables in m is by 1 larger than the corresponding number
in m . Let Qi+1 l be the set of m Qi+1 with l negated variables. For
m Qi+1 l and all negated variables xj in m it is sucient to test
whether the monom mj where we have replaced xj in m by xj is in
Qi+1 l 1 . Finally we should mark all m Qi+1 which have shortenings
in Qi . Then Pi+1 is the set of unmarked monoms m Qi+1 .
27
LEMMA 2.2 :
i)
28
PI(f) = {a b c a b d b c c d a c}
The PI-table of f
abc
abd
bc
cd
ac
0
0
1
0
0
0
0
1
1
0
1
0
0
0
0
1
1
0
0
0
0
1
0
1
0
0
0
1
0
1
0
0
1
1
1
0
0
0
0
1
0
0
0
1
1
29
0111
abd
cd
1
1
30
EXAMPLE 3.1 : The Karnaugh diagram for the function of Example 2.1
a b 00 01 11 10
cd
00
01
11
10
0
0
1
1
1
1
1
0
0
0
1
1
0
0
1
1
We nd f(a b c d) in column ab and row cd . Where are the neighbors of (a b c d) ? It is easy to check that the neighbors can be
reached by one step in one of the four directions. The left neighbor of
an element in the rst column is the last element in the same row, and
so on. These diagrams are clearly arranged for n = 4 . For n 4 , we
even obtain smaller diagrams. For n = 5 , we use two of the diagrams
above, one for e = 0 and one for e = 1 . Then the fth neighbor may
be found at the same position of the other diagram. For n = 6 , we
already have to work with 4 of these diagrams. For n 7 the situation
becomes unintelligible and Karnaugh diagrams should not be used.
In our example each one in the diagram corresponds to an implicant
of length 4 . Ones which are neighbors correspond to implicants of
length 3 . The ones in the rst column correspond to a b c and the
rst two ones in the second column correspond to a b c . The 1-colored
subcube for a b c can be enlarged to the 1-colored subcube of the ones
in the rst and last column corresponding to the implicant b c . Since
the 1-colored subcube for a b c cannot be enlarged, a b c is a prime
implicant. We easily detect prime implicants in Karnaugh diagrams.
Furthermore, we see that the one in the rst row is contained only
in one maximal 1-colored subcube, namely for a b c , which therefore
31
Quine (53) has shown that the computation of the always unique
minimal polynomial for a monotone Boolean function is easy.
THEOREM 4.1 : Each prime implicant of a monotone function
f Mn only contains positive literals.
Proof : Let m = m xj I(f) . It is sucient to prove that the shortening m I(f) . If m (a) = 1 either aj = 0 implying m xj (a) = 1 and
f(a) = 1 or aj = 1 . In the last case let b be dened by bj = 0 and
bi = ai for i = j . Then m xj (b) = 1 implying f(b) = 1 . Since b a
and f is monotone, also f(a) = 1 . In either case m (a) = 1 implies
f(a) = 1 . Hence m I(f) .
32
The minimal polynomial for f Mn is also called monotone disjunctive normal form (MDNF).
THEOREM 4.3 : The set of functions computable by monotone
circuits, i.e. { }-circuits, is equal to the set of monotone functions.
(4.1)
and
(f g)(a) = max{f(a) g(a)} max{f(b) g(b)} = (f g)(b)
(4.2)
Monotone circuits will be investigated in detail in Ch. 6. The monotone basis { } is denoted by m and the corresponding complexity
measures are denoted by Cm and Dm .
33
34
DEFINITION 5.2 : A 0-1-matrix is called reduced if each row contains at least one 1-entry, each column contains at least two 1-entries
and if no columns c and c have the property c c .
THEOREM 5.1 : For each reduced matrix M there exists a Boolean
function f , whose reduced PI-table equals M . Furthermore f can be
chosen such that all prime implicants of the reduced PI-table have the
same length.
Proof : Let n be the number of rows of M and let S be the set of
columns of M . It will turn out that the following function f Bn+2
satises the assertion of the theorem.
For a {0 1}n we denote a1 an , the parity of a , by |a| . The
vector of zeros only is denoted by 0 .
f(a 0 0) = 1 a = 0 .
(5.1)
(5.2)
(5.3)
(5.4)
(5.5)
35
2.6 Discussion
As we have shown the minimization of a Boolean function is (probably) a hard problem. Furthermore, a minimal polynomial for f does
not lead to an optimal circuit for f . We only obtain an optimal circuit
in the rather restricted class of two-level-circuits. The following example of Lupanov (65 a) shows that a very simple function may have
36
EXERCISES
1. Compute a minimal polynomial for
f(a b c d) = a b c d a b c d a b c d a b c d a b c d
abcd abcd
2. How often do we obtain m Qi while constructing Qi according
to the Quine and McCluskey algorithm ?
37
xi(a) = min{f(a1
ai1 ai 1 ai+1
an) f(a1
an)}
38
xi = f
xi
if j j s 1 .
39
| z x} and
x = max{z
| z x}
DEFINITION 1.1 : The addition function fnadd B2n n+1 has two
n-bit numbers x and y as inputs and computes the (n + 1)-bit representation s of |x| + |y| .
In this section fn means fnadd . How ecient is the addition method
we learnt in school ? We use a halfadder to compute s0 = x0 y0 and
the carry bit c0 = x0 y0 . Afterwards we use n 1 fulladders for
the computation of si and ci from xi , yi and ci1 . Finally sn = cn1 .
40
and
(1.1)
cj = xj yj (xj yj ) cj1
(1.2)
41
x(2i1)L x(2i1)L 1
x(2i2)L
y2iL 1
y(2i1)L y(2i1)L 1
y(2i2)L
(1.3)
and c
(1.4)
1l k
(1.5)
= 3n log n + 10n 6
42
D(S1 ) = 1 and
(1.6)
C(S5) = n 1
D(S5 ) = 1
(1.7)
(1.8)
ui vi+1 vj
=
0ij
43
i+1
j exactly a zero and a one (vi+1 = = vj = 1). More
generally, we dene for b a
Gb a = gba+1 (ub vb
(1.9)
ui vi+1 vb
=
aib
V b a = va v b
(1.10)
ua
va+1
vb1 vb
ua+1
vb1 vb
ub
vb+1
vb+1
vb+1
vb+2
vd1
vb+2
vd1
vb+2
vd1
vd
vd Vd b+1
vd
ub+1
vb+2 vd1 vd
ud1 vd
ud
Gb a
Gd b+1
Fig. 1.1
Gd a = Gd b+1 Gb a Vd b+1
Vd a = Vb a Vd b+1
and
(1.11)
(1.12)
44
(1.13)
(1.14)
(1.15)
45
(1.16)
G2m i 2r 1 2m (i+1) 2r V2m 1 2m i 2r
All triangles on the right side have length 2r , the rectangles can be
computed in depth m , all conjunctions between triangles and rectangles can be done in parallel and by divide-and-conquer, the outer
disjunction can be performed in depth m r .
D(g 2m ) m r + 1 + max{D(g 2r ) m}
(1.17)
since all triangles and rectangles can be computed in parallel. For the
sake of simplicity let us assume that m = h(l ) = 2l . Then we choose
r = h(l 1) . By induction we can prove that
D(g 2h(l ) ) h(l + 1)
(1.18)
(2m )1 2 + 2
(1.19)
in particular d(t j) = 0
(1.20)
46
(1.21)
2h(l )h(l 1) = 2l 1
and
2m h(t1) 2t1
(1.22)
(1.23)
Vj d(l j)
(1.24)
V
1rk
(1.25)
(1.25) is correct by our standard arguments. The rectangles are computed before step l as has been shown above. The triangles on the
right have been computed at step l 1 . Finally Gj d(t j) = Gj 0 is the
carry bit at position (j + 1) 2 1 .
47
(1.26)
m + (2m )1 2 + 2
The size of S3 is estimated in the same way. For the computation of
all rectangles the following number of gates is sucient.
2l (2l 1) 2
k
1l t1 0j2m 1 1ke(l +1 j)
1l t1 0j2m 1
= 2m 2(4t1 1) 3 2t1
(1.27)
(1.28)
2l t 0j2m 1
(1.29)
(1.30)
and
C(S) 3n + 5 2m (1 2 ) + 2m 22t1
where m = log n , m = m and t
= 2(2m)1
(1.31)
(2m )1 2 + 2 . For
+3
(1.32)
48
Since addition is the most fundamental operation, we present another adder which simultaneously has linear size and logarithmic depth
(Ladner and Fischer (80)). The structure of this adder is easier than
Krapchenkos adder.
At rst we solve the prex problem, the ecient computation of
all prexes pi = x1 xi for an associative operation . Later
we explain how the prex problem may be used for the design of an
ecient adder. Ladner and Fischer present a family of algorithms
Ak(n) for inputs of length n . For n = 1 nothing has to be done. Let
n 1.
A0(n) : In parallel we apply A1 ( n 2 ) to x1
x n 2 and A0( n 2 )
to x n 2 +1
xn . Afterwards pi is computed for i n 2 . All
pi = (x1 x n 2 ) (x n 2 +1 xi ) for i
n 2 may be
computed in one step each in parallel.
Ak (n) (k 1) : In parallel we compute the n 2 pairs x1 x2 ,
x3 x4
. Afterwards we apply Ak1( n 2 ) to these pairs and, if
n is odd, xn . We compute all p2i p1 and pn . The missing n 2 1
prexes p2i+1 = p2i x2i+1 can be computed in parallel.
By C(k n) and D(k n) we denote the size and depth resp. of Ak (n) .
Furthermore D (k n) is the depth of pn using Ak (n) . Considering the
description of the algorithms we conclude
C(k 1) = D(k 1) = 0
(1.33)
(1.34)
(1.36)
D(k n) D(k 1 n 2 ) + 2
(1.37)
D (k n) D(k 1 n 2 ) + 1
(1.38)
We have used the fact that Ak(n) computes pn before the last step.
The solution of (1.33) (1.38) easily follows from induction.
49
THEOREM 1.4 :
0 k log n
(1.39)
D(k n) k + log n
(1.40)
How can we use the prex problem for the addition of binary numbers ? We use the subcircuits S1 and S5 of Krapchenkos adder with
size 2n and n 1 resp. and depth 1 each. S1 computes a coding of the
inputs bits.
uj = xj yj
v j = xj y j
(1.41)
sj = vj cj1
for 1 j n 1
sn = cn1
(1.42)
(1.43)
(1.44)
(1.45)
and
(1.46)
(1.47)
50
(1.48)
51
3.2 Multiplication
(ai + bi + ci ) 2i =
0in1
52
vi 2i = |u| + |v|
ui 2i +
=
0in
0in
2
3
n+
2
+
3
2
3
+ +
2
3
2
3
n+2
(2.2)
53
(2.3)
= 2n |x | |y | + 2n 2 (|x | |y | + |x | |y |) + |x | |y |
In (2.3) we multiply four times numbers of length n 2 . The multiplications by 2n or 2n 2 are shifts which are gratis in circuits. Moreover
we perform three additions which have linear size. For the size of the
resulting circuit C(n) we obtain the recursion
C(n) 4 C(n 2) + c n and C(1) = 1
(2.4)
(2.5)
C(1) = D(1) = 1
Obviously D(n) = O(log2 n) and, by Exercise 1, C(n) = O(nlog 3 ) .
THEOREM 2.2 : Circuits for multiplication may have size O(nlog 3 )
and depth O(log2 n) . log 3 1 585 .
For sequential computations the school method of addition is optimal whereas the school method of multiplication is not. Only for
rather long numbers the multiplication method of Karatsuba and Ofman is better than the school method. The reader is asked to investigate exactly the following multiplication method M(k) . If n k , use
the school method and, if n k , start with the method of Karatsuba
and Ofman but solve subproblems for numbers with at most k bits by
54
s0 ) , and cL14L .
55
6 5 4 3 2 1
vi
2 1
2 1
ci
1 1 1 1
x i + yi
1 2 1
0
Tab. 2.1
By denition,
4L = 4 2p = 4 m 4 4 mod m
and (0
(2.6)
s0 )
s
0 ) where |s1 | = |v1 + c0 | 1 . Since
s
1 cL1 s0 ) is a radix-4 representation of x + y .
s
2 ,
56
We also have to consider the computation of a radix-4 representation for x 2s mod m . A multiplication by a power of 2 consists of
a multiplication by a power of 4 and, if necessary, a multiplication
by 2 , i.e. an addition. That is why we consider only the computation
of x 4r mod m . Since 4L 4 mod m , as already shown,
x 4r
xhr 4h +
rhL1
(2.7)
0hr1
+
+
x+
i = max{xi 0} and xi = min{xi 0} . Let (xi 1 xi 0 ) and (xi 1 xi 0 )
57
k 2
and l = n b = 2
k 2
(2.8)
xi zi and g(z) =
Let f(z) =
0ib1
l
yi zi be polynomials. By
0ib1
58
DEFINITION 2.3 :
The convolution v = (v2b2
(xb1
x0) and y = (yb1
y0) is given by
vk =
xi yj
v0) of x =
(2.9)
i+j=k
xj yij
0ji
xj yb+ij
w0) of the
(2.10)
i jb1
59
w0 ) .
(2.11)
(2.12)
(2.13)
LEMMA 2.3 : p
0ib1
60
vi 2il
(2.14)
0i2b1
wi
(i + 1) 22l
(2.15)
61
ai ri si mod m
(2.16)
1ik
(2.17)
0ib1
m
62
63
(1 + r2 k )
rjk =
0jn1
(2.18)
0ip1
This claim is obvious for p = 1 . The induction step follows from the
following elementary calculation
rjk = (1 + rk )
0jn1
(r2)jk
(2.19)
0j(n 2)1
i
(1 + (r2)2 k ) =
= (1 + rk )
0ip2
i
(1 + r2 k )
0ip1
i
64
fj =
(2.20)
0in1
a2i (r2j)i + rj
=
0i(n 2)1
a2i+1 (r2j)i
0i(n 2)1
Instead of evaluating f at r0 r1
rn1 we may evaluate the polynomials given by (a0 a2
an2) and (a1 a3
an1) at r0 r2
r2n2 .
Since rn = 1 , it is even sucient to evaluate both polynomials of degree (n 2) 1 at r0 r2
rn2 . Since it is easy to prove that r2 is
an (n 2 )-th root of identity, we may compute DFTn 2(a0 a2
an2)
and DFTn 2 (a1 a3
an1) by the same procedure. To start with we
0 1
n1
compute r r
r . Afterwards we may compute fj with two further operations. The preprocessing, the computation of all ri , can be
done in linear size and logarithmic depth. The problem is not harder
than the prex problem (see 1). For the complexity of the other
computations we obtain the following recurring relations
C(n) 2 C(n 2) + 2n and D(n) D(n 2) + 2
(2.21)
Here we took advantage of the fact that the two subproblems and
afterwards the computation of all fj can be done in parallel. By (2.21)
the claim of the theorem follows.
65
equals 1. For i
j (the case i
rik rjk =
0kn1
j is analogous)
r(ij)k = 0
(2.22)
0kn1
DFT1
2n (DFT2n (a)DFT2n (b)) , where is the componentwise multiplication, is the convolution of a and b .
66
w = (w0 s w1
sn1 wn1) where w = (w0
wn1) is the
negative envelope of a = (a0
an1) and b = (b0
bn1) .
1
fi gi rim
(2.23)
0in1
= n1
sj aj sk bk r(j+km)i
0in1 0jn1 0kn1
n1
sj+k aj bk
=
0jn1 0kn1
r(j+km) i
0in1
aj bk + sn
j+k=m
aj bk
(2.24)
j+k=n+m
= sm (vm vn+m) = sm wm
Here we used the fact that sn = rn
= 1 .
67
3.3 Division
Since y z = y(1 z) , we consider only the computation of the inverse z1 . In general the binary expansion of z1 is not nite, e.g.
for z = 3 . So we are satised with the computation of the n most
signicant bits of z1 . W.l.o.g. 1 2 z 1 .
DEFINITION 3.1 :
(zn1 = 1 zn2
z0) as input.
68
=
z
1 x 1 x 1 + x(0) 1 + x(1)
1 + x(k 1)
Pk(x)
=
1 x(k)
(3.1)
(3.2)
69
(3.3)
(3.4)
x(j 2) ,
(3.5)
70
(3.7)
By (3.1)
|z1 Pk (x)| = |z1 (1 x(k)) z1|
= x(k) (1 x) 2 22
(3.8)
k
By Lemma 3.2
|Pk(x) Pk (x)| k (x) (3 k 2) 2 2s
(3.9)
(3.10)
By this choice |z1 Pk(x)| 2n+1 and we may output the n most
signicant bits of Pk (x). Altogether we perform one subtraction and
2k2 multiplications of (s+1)-bit numbers which have to be performed
sequentially.
THEOREM 3.2 : The algorithm of Anderson et al. leads to a circuit
for division which has depth O(log2 n) and size O(n2 log n) (if we use
convential multiplication circuits) or size O(n log2 n log log n) (if we use
Schonhage and Strassen multiplication circuits).
For several years one believed, that depth O(log2 n) is necessary for
division circuits. Reif (83) was the rst to beat this bound. Generalizing the ideas of Schonhage and Strassen (see 2) to the multiplication
of n numbers he designed a division circuit of polynomial size and
71
depth O(log n log log n) . Afterwards Beame, Cook and Hoover (84)
applied methods of McKenzie and Cook (84) and proved that the
depth of division is (log n) .
One-output circuits of depth d have at most 2d 1 gates. Therefore
circuits of depth O(log n) always have polynomial size. Since the new
division circuit beats the older ones only for rather large n , we do
not estimate accurately the depth of the new circuit. Because of the
approximation algorithm introduced at the beginning of this section,
it is sucient to prove that n n-bit numbers may be multiplied in
depth O(log n) . Then we can compute all x(j) in parallel in depth
O(log n) and afterwards we multiply all 1 + x (j) in depth O(log n) .
For the basic results of number theory that we apply the reader is
referred to any textbook on number theory, e.g. Ayoub (63). For our
algorithm it is crucial that problems for small numbers may be solved
eciently by table-look-up. Let T be a table (a1 b1)
(aN bN )
where N and the length of each ai are bounded by a polynomial in n .
If all ai are dierent, we may compute for x {a1
aN } in depth
O(log n) that bi which has the same index as x = ai . Obviously we
can test in depth O(log n) whether x = aj . Let cj = 1 i x = aj . All cj
can be computed in parallel. Afterwards all ym where ym is the m-th
output bit, can be computed in parallel as disjunction of all cj bjm
(1 j N). Here bjm is the m-th bit of bj . Altogether the circuit
has depth O(log n) . In general, tables of functions f have exponential
length and table-look-up is not very ecient.
By table-look-up circuits we multiply the input numbers modulo
small prime numbers. By the Chinese Remainder Theorem these products are sucient to compute the product exactly. The size of the
product is bounded by M = (2n 1)n . Therefore it is sucient to
compute the product mod m for some m M . We describe the main
steps of our algorithm before we discuss their ecient implementation.
ALGORITHM 3.1 :
Input : x1
xn , n n-bit numbers.
72
yij
1in
xi mod pj for 1 j r .
1in
xi =
0kn1
(3.11)
0kn1
73
k gind(k) mod pj
(3.12)
yij
1in
(3.13)
1in
xj rj mod p
(3.14)
1jn
74
The class of symmetric functions contains many fundamental functions like sorting and all types of counting functions.
DEFINITION 4.1 :
Sn m is the class of all symmetric functions
f Bn m , that is all functions f such that for all permutations n
f(x1
xn ) = f(x(1)
x(n) ) .
Each vector a {0 1}n with exactly i ones is a permutation of
any other vector with exactly i ones. That is why f is symmetric i
f only depends on the number of ones in the input. For symmetric
functions we may shorten the table (a f(a)) for f to the (value) vector
v(f) = (v0
vn) such that f(x) = vi if x1 + +xn = i . We introduce
some fundamental symmetric functions.
DEFINITION 4.2 :
i)
75
f(x) =
(4.1)
0kn
v0
En) + n and
(4.2)
D(f) D(E0
En) + log(n + 1)
(4.3)
76
xn can be computed
By this lemma E0
En can be computed from a in linear size
and logarithmic depth since k = log(n + 1) . Altogether we obtain
an ecient circuit for all symmetric functions.
THEOREM 4.1 : Each symmetric function f Sn with one output
as well as the sorting function may be computed by a circuit of size
O(n) and depth O(log n) .
77
a 0 x0
xn1) =
(5.1)
a 0 xn
xn1))
a 0 x0
x(n
2)1 ))
Since SA1 (x0) = x0 , (5.1) leads to a circuit for SAn of size C(n) and
depth D(n) where
C(n) = 2 C(n 2) + 3
D(n) = D(n 2) + 2
(5.2)
C(1) = D(1) = 0
The solution of this recurring relation is C(n) = 3n 3 and D(n) =
2 log n .
LEMMA 5.1 : The storage access function SAn can be computed by
a circuit of size 3n 3 and depth 2 log n .
In Ch. 5 we prove a lower bound of 2n 2 for the circuit size of
SAn . Klein and Paterson (80) proved that this lower bound is nearly
optimal.
Let Mi (a) be the minterm of length k computing 1 i |a| = i .
Obviously
Mi (a) xi
SAn (a x) =
(5.3)
0in1
SAn (a x) =
0ir1 0js1
Mi (b)
=
0ir1
Mj (c) xis+j
0js1
(5.4)
78
By Lemma 4.1 we can compute all Mi (b) and all Mj (c) by a circuit of
size r+s+o(r+s) = O(n1 2) and depth log(k 2) = log log n 1 . For
the computation of all Mj (c) xis+j n -gates and n r -gates are
sucient, the depth of this part of the circuit is k 2 +1 . Afterwards
SAn (a,x) can be computed by r -gates and r 1 -gates in depth
k 2 + 1 . This way we have proved
THEOREM 5.1 : The storage access function SAn can be computed
by circuits of size 2n + O(n1 2) and depth log n + log log n + 1 .
xik ykj
(6.1)
1kn
We investigate arithmetic circuits ({+ }-circuits, straight line programs) for the computation of Z = (zij) . Arithmetic circuits lead to
Boolean circuits if we replace each arithmetic operation by a Boolean
circuit of suitable input size for this operation. Furthermore we are interested in the Boolean matrix product which is useful in graph theory
(see Exercises). Here we consider matrices X = (xij) and Y = (yij) of
ones and zeros only. The Boolean matrix product Z = (zij) is dened
by
zij =
xik ykj
(6.2)
1kn
79
paper that this school method is not optimal for the arithmetic matrix
product. His arithmetic circuit has size O(nlog 7 ) and depth O(log n) .
Arlazarov, Dinic, Kronrod and Faradzev (70) designed an arithmetic
circuit of size O(n3 log n) that only works for 0-1-matrices but is better than the school method also for very small n . For 9 years no one
improved Strassens algorithm. Then a violent development started.
Its end was the arithmetic circuit of Coppersmith and Winograd (82)
whose size is O(nc) for some c
2 496 . Pan (84) gives a survey on
this development. Now Strassen (86) improved the exponent to some
c 2 479 . We describe here only Strassens classical algorithm and
point out how the computation of the Boolean matrix product may
be improved by this algorithm too.
Strassens algorithm depends on divide-and-conquer. Again we
assume n = 2k . We partition X , Y and Z into four matrices of n 2
rows and n 2 columns each.
X=
X11 X12
X21 X22
Y=
Y11 Y12
Y21 Y22
Z=
Z11 Z12
Z21 Z22
80
(6.3)
m4 + m5
m1 + m2 m4 + m6
m6 + m7
m2 m3 + m5 m7
(6.4)
Let C(n) and D(n) be the size and depth resp. of Strassens arithmetic
circuit. Then
C(n) = 7 C(n 2) + 18(n 2)2
D(n) = D(n 2) + 3
(6.5)
C(1) = D(1) = 1
implying
THEOREM 6.1 : Strassens algorithm leads to an arithmetic circuit
for matrix multiplication of size 7nlog 7 6n2 and depth 3 log n + 1 .
(log 7 2 81) .
We emphasize that Strassens algorithm as well as Karatsuba and
Ofmans multiplication algorithm is based on additions, multiplications and subtractions. If only additions and multiplications (of positive numbers) are admissible operations, then the school method cannot be improved (see Ch. 6). The protable use of subtractions for a
problem where subtractions seem to be superuous should be taken
as a warning. One should be very careful with stating that certain
operations are obviously not ecient for certain problems.
Fischer and Meyer (71) applied Strassens algorithm to Boolean
matrix multiplication. Similar results for the other matrix multiplication methods have been obtained by Lotti and Romani (80) and
Adleman, Booth, Preparata and Ruzzo (78).
81
The inputs of the Boolean matrix product are numbers xij yij
{0 1} . Let zij be the Boolean matrix product and zij the conventional
matrix product. Obviously
0 zij n
zij is an integer
zij = 0
zij = 0
and
(6.6)
Strassens algorithm consists of additions, substractions and multiplications. Therefore, by (6.6), we can compute zij correctly if we perform
all computations in m for some m n . In particular, the length of
the numbers is O(log n) . Finally all zij , by (6.6) the disjunction of all
bits of zij , can be computed in parallel in depth O(log log n) . Since all
multiplications of Strassens agorithm can be done in parallel, we may
estimate the complexity of the new algorithm for the Boolean matrix
product by our results of 1 and 2.
THEOREM 6.2 : The Boolean matrix product can be computed in
size O(nlog 7 log n log log n log log log n) and depth O(log n) .
3.7 Determinant
In Ch. 9 we consider the simulation of programs by circuits in general. Here we investigate as a second example the computation of the
determinant (the rst one was the Boolean matrix product). The well
known algorithm based on Gaussian elimination whose time complexity is O(n3) can be simulated by a circuit of size O(n3) . Gaussian
elimination is a typical sequential algorithm, so we need additional
tricks to reduce the depth.
DEFINITION 7.1 : The determinant of a Boolean n n-matrix X
with respect to the eld 2 is detn (X) =
x1 (1)
xn (n) .
n
82
THEOREM 7.1 :
3n3 + n2 4n.
83
(1 i k 2 j k)
(7.1)
by altogether 4k (k 1) gates. Finally we have to eliminate the mth row where um = 1 . Let t1 = u1 and tj = tj1 uj . t1
tk1
can be computed by k 2 gates. Then t1 = = tm1 = 0 while
tm = = tk1 = 1 . Therefore the elements yij of Xk1 can be
computed by
yij = ti zij ti zi+1 j
(7.2)
(7.3)
(9k2 7k 3) = 3n3 + n2 4n
(7.4)
2kn
EXERCISES
1. Prove that the recursion relation R(n) = a R(n b) + cn , R(1) = c
for n = bk , b
1, a c
0 has the solution R(n) = (n) , if
a b , R(n) = (n log n) , if a = b , R(n) = (nlogb a ) , if a b .
2. The Carry-Look-Ahead Adder partitions the input bits into g(n)
groups of nearly the same size. If we know the carry bit for some
group we add the bits of this group using the school method and
we compute directly by (1.8) the carry bit for the next group.
Estimate size and depth of this adder.
84
12. ei 2
85
En ) in 4 as exact as
86
xn y1
yn ) =
1i jn |ij|m
for all m .
27. f(x1
87
(1.1)
88
DEFINITION 1.2 : For a complexity measure CM and a class of functions Fn we denote by CM(Fn ) the complexity of the hardest functions
in Fn , i.e. max{CM(f)|f Fn } .
DEFINITION 1.3 : The Shannon eect is valid for a class of functions
Fn and a complexity measure CM if CM(f) CM(Fn ) o(CM(Fn ))
for almost all f Fn .
We prove that the Shannon eect is valid for several classes of
Boolean functions and complexity measures. With Shannons counting argument we prove that almost all functions are hard, i.e. their
complexity is at least an o(an) for some large an . Then we design
circuits (or formulas) for all functions whose complexity is bounded
by an .
For almost all functions we obtain nearly optimal circuits. These
circuits may be used for f Bn if we have no idea of designing a better
circuit for f . In Ch. 3 we have seen that we can design much more
ecient circuits for many fundamental functions.
89
If b = b(n) = C(Bn ) , |Bn | functions in Bn can be computed by a B2circuit of size b . By Lemma 2.1 we can conclude S(b n) |Bn | . We
use this inequality for an estimation of b . When considering S(b n)
and |Bn | as functions of n we see that S(b n) is exponential while |Bn |
is even double exponential. Since S(b n) as a function of b is also only
exponential, b = b(n) grows exponentially. We now estimate b more
exactly. By Stirlings Formula b! c bb+1 2 eb for some constant
c 0 and thus
log S(b n) log |Bn |
(2.1)
2b log(b + n + 1) + 4b + log b
(b + 1 2) log b + b log e log c 2n
For n suciently large, b n + 1 and therefore
b log b + (6 + log e)b + (1 2) log b log c 2n
(2.2)
If b 2n n1 , we could conclude
2n n1(n log n + 6 + log e) + (1 2)(n log n) log c 2n
(2.3)
(2.4)
90
(2.5)
implying
C(Bn) 2C(Bn1) + 3
C(B2) = 1 and
C(Bn) 2n 3
(2.6)
(2.7)
(2.8)
hence
k
C(Bk) 3 22 + 6 22
k1
(2.9)
91
k
C(Bn) 3(2nk + 22 ) + 6 22
k1
for arbitrary k .
(2.10)
For k = log n 1
C(Bn) 12 2n n1 + o(2n n1 )
(2.11)
These simple ideas already lead to circuits whose size is for almost
all Boolean functions only by a factor of 12 larger than the size of
an optimal circuit. In order to eliminate the factor 12 , we have to
work harder. Lupanov (58) introduced the so-called (k s) - Lupanov
representation of Boolean functions. We represent the values of f by a
table of 2k rows for the dierent values of (x1
xk) and 2nk columns
for the dierent values of (xk+1
xn) . The rows are partitioned to
p = 2k s1 2k s1 + 1 blocks A1
Ap such that A1
Ap1
contain s rows and Ap contains s s rows. We try to nd simpler
functions than f and to reduce f to several of these simpler functions.
Let fi (x) = f(x) if (x1
xk ) Ai and fi (x) = 0 otherwise. Obviously
f = f1 fp .
Let Bi w be the set of columns whose intersection with Ai is equal
to w {0 1}s (for i = p , w {0 1}s ) and let fi w (x) = fi (x) if
(xk+1
xn) Bi w and fi w (x) = 0 otherwise. Obviously f is the
disjunction of all fi w . Now consider the 2k 2nk table of fi w . All
rows outside of Ai consist of zeros only, the rows of Ai have only two
dierent types of columns, columns w and columns of zeros only. We
represent fi w as the conjunction of fi1w and fi2w where fi1w (x) = 1 i for
some j wj = 1 and (x1
xk ) is the j -th row of Ai , and fi2w (x) = 1
i (xk+1
xn ) Bi w . Altogether we obtain Lupanovs (k s)-representation
f(x1
xn ) =
1ip w
fi1w (x1
xn )
(2.12)
92
(2.13)
93
We proceed as in 2.
LEMMA 3.1 : At most F(b n) = (n + 2)b+1 16b 4b functions f Bn
can be computed by B2 -formulas of size b .
Proof : We estimate the number of B2 -formulas of size b . W.l.o.g. the
last gate is the only one without successor and the output is computed
at the last gate. As already discussed in Ch. 2 B2 -formulas are binary
trees if we copy the variables and constants. There exist less than 4b
binary trees with b inner nodes. For each gate starting at the root
of the tree we have for each of the two predecessors at most the two
possibilities of choosing an inner node or a leaf. Each gate is labelled
by one of the 16 functions of B2 . Finally each of the b + 1 leaves is
labelled by one of the n variables or the 2 constants. Altogether we
obtain the claimed bound.
(3.1)
(3.1) is not fullled for b = 2n log1 n(1 (log log n)1) and suciently large n . The same estimation holds for classes Bn Bn where
log |Bn | = 2n (1 log1 n) .
THEOREM 3.1 : For suciently large n at least |Bn |(1 2s(n) ) of
the |Bn | functions in Bn , where s(n) = 2n log1 n , are of formula size
at least 2n log1 n(1 (log log n)1) . In particular, almost all f Bn
are of formula size at least 2n log1 n(1 (log log n)1) .
We already proved in 2 an upper bound of 2n 3 on L(Bn ) . By
94
Lupanovs (k s)-representation we shall obtain for each f Bn a B2 formula with 6 logical levels and (2 + o(1))2n log1 n gates. Since in
formulas all gates have fan-out 1 we have to compute the functions
fi1w and fi2w in another way as in 1. As only a few inputs are mapped
to 1 by fi1w or fi2w , we consider functions f Bn where |f 1(1)| = r is
small.
DEFINITION 3.1 : L(r n) = max{L(f) | f Bn , |f 1(1)| = r} .
The obvious upper bound on L(r n) is rn 1 , consider the DNF.
This upper bound has been improved by Finikov (57) for small r .
LEMMA 3.2 : L(r n) 2n 1 + r2r1 .
Proof : We describe a function f Bn where |f 1(1)| = r by an r nmatrix consisting of the r inputs in f 1(1) . For small r , in particular
r log n , several columns of the matrix are the same. We make the
most of this fact.
We do not increase the formula size of f by interchanging the roles
of xj and xj . With such interchanges we obtain a matrix whose rst
row consists of zeros only. Now the number l of dierent columns in
the matrix is bounded by 2r1 . Let c1
cl be the dierent columns
and A(i) be the set of numbers j such that the j -th column equals ci .
Let a(i) be the minimal element of A(i) .
By denition, f(x) = 1 i x equals one of the rows of the matrix.
This can now be tested eciently. f1 tests whether for each i all
variables xj (j A(i)) have the same value, and afterwards f2 tests
whether x and some row agree at the positions a(i) (1 i l ). f2 is
the disjunction of r monoms of length l , hence L(f2) rl 1 , and
f1(x) =
xp
1il pA(i)
xp
pA(i)
(3.2)
95
(3.3)
= (2 + o(1))nr log1 n
96
L(fi2w )
L(r m)qi(r)
(3.4)
(rm 1)qi(r) +
r log2 m
m log2 m
rlog2 m
r
rqi(r)
(3.5)
Here the lower bound for almost all Boolean functions can be derived easily from the results on formula size by the following fundamental lemma. A partial converse is proved in Ch. 7.
97
Proof : W.l.o.g. we assume that |f 1(1)| 2n1 and use the DNF.
Otherwise we could use the CNF. All minterms can be computed in
parallel in depth log n . As the number of minterms is bounded by
2n1 , we can sum up the minterms in depth n 1 .
By Theorem 4.1 and Theorem 4.2 the Shannon eect is valid for Bn
and the depth of B2 -circuits. The trivial upper bound of Theorem 4.2
has been improved several times. We describe the history of these
improvements.
DEFINITION 4.1 : Let log(1) x = log x and log(k) x = log log(k1) x .
For x 1 , log x = 0 and , for x
1 , log x is the minimal k such
that log(k) x 1 .
log x is an increasing function tending to as x . But log x
is growing rather slowly. So log x 5 for all x 265536 .
Let a(n) be the following number sequence : a(0) = 8 , a(i) = 2a(i1) +
a(i 1) . Now we are able to describe the improved bounds on D(Bn ) .
Spira (71 b):
Muller and Preparata (71):
McColl and Paterson (77):
Gaskov (78):
n + log n .
n + i for n a(i) .
n + 1.
n log log n + 2 + o(1) .
98
mc (z) f(y c) =
c{0 1}m
(4.1)
c{0 1}m
99
t1
f(x) = 0
(5.1)
x 1 + + xn
t+1
f(x) = 1
(5.2)
n 2
n 2
and
(5.3)
(5.4)
n
(2n 2 + n2 2n5 n 2n4)
n 21
(5.5)
if n is even, and
|Mn | 2 2E(n) exp
n
(2(n+3) 2 n2 2n6 n 2n3)
(n 3) 2
n
(2(n+1) 2 + n2 2n4)
+
(n 1) 2
if n is odd.
By these results almost all monotone Boolean functions have only
prime implicants and prime clauses whose length is about n 2 . For
our purposes (see also 2 and 3) a good estimation of log |Mn | is
sucient. Such a result is much easier to prove.
PROPOSITION 5.1 : The number of monotone functions f Mn is
larger than 2E(n) where E(n) = nn2 is larger than c2n n1 2 for some
constant c 0 .
Proof : The estimation of E(n) follows from Stirlings formula. E(n) is
the number of dierent monotone monoms of length n 2 . For each
subset T of this set of monoms we dene fT as the disjunction of all
m T . Since no monom of length n 2 is a shortening of another,
100
101
(5.6)
xn ) =
(5.7)
xi(j) f(x1
i(m+1)
xm i(m + 1)
i(n))
i(n){0 1} i(j)=1
Step 2 : Compute for each block all monotone functions having only
prime implicants of that block, or equivalently, having at most one
prime implicant of each chain of the block.
102
103
(5.8)
(5.9)
1ji
104
(5.10)
CB
H(Cj) = E(m)
1jE(m)
(5.11)
1jE(m)
E(m)1(|Cj| + 1)
E(m) log
1jE(m)
(5.12)
(5.13)
(5.14)
(5.15)
Inserting (5.14) and (5.15) into (5.13) we have proved the following
upper bound on Cm (Mn ) .
THEOREM 5.3 : Each monotone function f Mn can be computed
by a monotone circuit of size O(2n n3 2 log n) .
105
(5.16)
f1 = h1 hn2
where Dm (hi ) i
(5.17)
We have seen that the Shannon eect is valid for Mn and monotone
depth. Henno (79) generalized McColls design to functions in mvalued logic. By a further improvement of Hennos design Wegener (82
b) proved the Shannon eect also for the monotone functions in mvalued logic and (monotone) depth.
106
The proof of the Shannon eect for various classes of functions and
complexity measures is a main subject in Russian language literature.
There also weak Shannon eects are studied intensively. The Shannon
eect is valid for Fn and CM i almost all functions f Fn have nearly
the same complexity k(n) and k(n) = CM(Fn ) is the complexity of the
hardest function in Fn . The weak Shannon eect concentrates on the
rst aspect.
DEFINITION 6.1 : The weak Shannon eect is valid for the class
of functions Fn and the complexity measure CM if CM(f) k(n)
o(k(n)) and CM(f) k(n) + o(k(n)) for almost all f Fn and some
number sequence k(n) .
The validity of the weak Shannon eect for Mn and circuit size
and monotone circuit size has been proved by Ugolnikov (76) and
Nurmeev (81) resp. It is not known whether the Shannon eect is
valid too. The complexity k(n) of almost all functions is not even
known for the (monotone) circuit size. How can the weak Shannon
eect be proved without knowing k(n) ? Often Nigmatullins (73)
variational principle is used. If the weak Shannon eect is not valid
there exist some number sequence k(n) , some constant c
0 , and
some subclasses Gn Hn Fn such that for innitely many n
|Gn | |Hn | = (|Fn |)
(6.1)
CM(f) k(n)
if f Gn
CM(f) (1 + c)k(n)
if f Hn
and
(6.2)
(6.3)
107
Nurmeevs proof is based on Korshunovs description of the structure of almost all f Mn (see 5). However, the proof is technically
involved.
n
k
(7.1)
108
xi gi (x)
(7.2)
1in
By Lemma 7.1 an O(n2 log1 n)-bound for H1n n implies the same
k
bound for H2n and O(nk log1 n)-bounds for Hk1
n n and Hn . Savage (74)
presented the following design of monotone circuits for f H1n n . The
variables are partitioned to b = n log n blocks of at most log n
variables each. All Boolean sums on each block can be computed with
less than 2 log n n -gates similarly to the computation of monoms
in 5. Now each fi H1n can be computed with b 1 -gates from
its subfunctions on the blocks. Since we have n outputs and b blocks,
we can compute f with (2b 1)n-gates. We combine this result with
Lemma 7.1.
THEOREM 7.2 :
Each k-homogeneous function f Hkn can be
computed by a (monotone) circuit of size (2+o(1)) nk log1 n , if k 2 .
We note that we have designed asymptotically optimal monotone
circuits for almost all f H1n n which consist of -gates only. In Ch. 6
we get to know some f H1n n such that optimal monotone circuits
for f have to contain -gates disproving the conjecture that Boolean
sums may be computed optimally using -gates only.
Similar results for formulas are much harder to achieve. Using
the MDNF of f each prime implicant requires its own -gate. We can
109
(7.3)
1mn 2
110
n
2
= n2 n and
m(i)
(7.5)
(7.6)
After the elimination of all Km(i1) m(i1) the graph has at most e(i 1)
edges, each Km(i) m(i) has m(i)2 edges, therefore the algorithm selects
at most e(i 1) m(i)2 copies of Km(i) m(i) . The cost of these subgraphs
altogether is bounded by 2e(i 1) m(i) . According to (7.4) and (7.6)
this can be estimated by 16 n2 (4i1 log n) . Summing up these terms
we obtain the upper bound of 22 n2 log1 n .
111
112
m log c 2r = (c log c)
(8.1)
(8.2)
For each f B2 there are control bits such that the universal gate
realizes f(x1 x2) by its DNF.
The construction of universal graphs is much more dicult.
Valiant (76 b) designed universal circuits of size O(c log c) if c n .
By the results of 4, Ch. 1 , we have to consider only circuits of fanout 2 . We have to simulate the gates G1
Gc of each circuit S
at the distinguished gates G1
Gc of the universal circuit S . The
distinguished gates of S will be universal gates, i.e. that the type of
Gi can be simulated. Since Gj may be a direct successor of Gi in S ,
if j i , we have to be able to transfer the result of Gi to any other
gate Gj where j
i . We design the universal circuit in such a way
that, if Gj is direct successor of Gi in S , we have a path from Gi to Gj
in S such that on all edges of this graph the result of Gi is computed.
Since the successors of Gi are dierent for dierent S , we use universal switches consisting of two true inputs x1 and x2 , one control bit
y , and two output bits computing
z1 = y x1 y x2
and z2 = y x1 y x2
(8.3)
In either case both inputs are saved, but by setting the control bit y
we control in which direction the inputs are transferred to. If y = 1 ,
z1 = x1 and z2 = x2 , and, if y = 0 , z1 = x2 and z2 = x1 .
113
114
Proof : According to our preliminary remarks we only have to construct graphs U(m) of size O(m log m) , fan-out and fan-in 2 for all
nodes, fan-out and fan-in 1 for m distinguished nodes, such that all
directed acyclic graphs of size m and fan-out and fan-in 1 can be simulated. For m = 1 a single node and for m = 2 two connected nodes
are appropriate. For m 4 and m = 2k + 2 , we use a recursive construction whose skeleton is given in Fig. 8.1. The edges are directed
from left to right. For m = m 1 we eliminate pm . The skeleton of
U(m) has to be completed by two copies of U(k) on {q1
qk} and
on {r1
rk} . Since the q- and r-nodes have fan-out and fan-in 1
in the skeleton of U(m) as well as in U(k) (by induction hypothesis),
these nodes have fan-out and fan-in 2 in U(m) . Obviously the distinguished nodes p1
pm have fan-out and fan-in 1 in U(m) . How
can we simulate directed acyclic graphs G of size m and fan-out and
fan-in 1 ? We understand the pairs (p2i1 p2i) of nodes as supernodes.
The fan-out and fan-in of the supernodes is 2 . By Lemma 8.1 we partition the set of edges of G to E1 and E2 . Edges (p2i1 p2i) in G are
simulated directly. The edges leaving a supernode, w.l.o.g. (p1 p2) ,
can be simulated in the following way. If the edge leaving p1 is in E1
and the edge leaving p2 is in E2 (the other case is similar), we shall
use edge disjoint paths from p1 to q1 and from p2 to r1 .
q1
p1
qk1
p2
qk
pm3 pm2
r1
rk1
Fig. 8.1
pm1 pm
rk
115
If the edges leaving p1 and p2 end in the supernodes p2i1 2i and p2j1 2j
resp. we shall take a path from q1 to qi1 and from r1 to rj1 resp. in
the appropriate U(k) . All these paths can be chosen edge disjoint by
induction hypothesis. Finally the paths from qi1 and ri1 to p2i1
and p2i can be chosen edge disjoint. Thus the simulation is always
successful.
Let C(m) be the size of U(m) . Then, by construction
C(m) 2 C( m 2 1) + 5 m
C(1) = 1
C(2) = 2
(8.4)
116
(8.5)
117
may be estimated by
O(dh1 c2 2h )
For h = log c the size is O(c3 d log1 c) .
(8.6)
EXERCISES
1. The Shannon eect is valid for Bn and C{ } .
2. Estimate C(Bn m) for xed m and discuss the result.
3. Prove Theorem 5.2.
4. Let h(n) = |(n)| . Apply Shannons counting argument to
C(n) (Bn) . For which h(n) the bounds are
a) non exponential b) polynomial ?
5. If f(x) = xi(1) xi(k) for dierent i(j) , then C(f) = Cm(f) =
L(f) = Lm (f) = k 1 .
6. The weak Shannon eect but not the Shannon eect is valid for
H1n and C .
7. Dene a class of Boolean functions such that Shannons counting
argument leads to much too small bounds.
118
119
120
121
where c = C(f) . Each gate has fan-in 2 implying that the number of
edges equals 2 c . Hence 2c n + c 1 and c n 1 .
122
The rst bound of size 2n O(1) has been proved by Kloss and
Malyshev (65). We discuss some other applications of the elimination
method.
DEFINITION 2.1 : A Boolean function f Bn belongs to the class
Qn2 3 if for all dierent i j {1
n} we obtain at least three dierent
subfunctions by replacing xi and xj by constants and if we obtain a
subfunction in Qn1
2 3 (if n 4) by replacing xi by an appropriate
constant ci .
Qn2 3 is dened in such a way that a lower bound can be proved
easily (Schnorr (74)).
THEOREM 2.1 : C(f) 2n 3 if f Qn2 3 .
123
23 .
The following lower bound for the storage access function SAn has
been proved by Paul (77). We repeat that SAn (a x) = x|a| .
THEOREM 2.2 : C(SAn ) 2 n 2 .
Proof : Replacing some address variable ai by a constant has the eect
that the function becomes independent from half of the x-variables.
Therefore we replace only x-variables, but this does not lead to storage
access functions. In order to apply induction, we investigate a larger
class of functions.
124
a 0 x0
xn1) = x|a|
if |a| S
(2.1)
125
The proof of Theorem 2.2 makes clear that type- gates are more
dicult to deal with than type- gates. By replacing one input of a
type- gate by a constant it is impossible to replace its output by a
constant. Therefore it is not astonishing that lower bounds for the
basis U2 = B2 { } are easier to prove. Schnorr (74) proved
that parity requires in U2 -circuits three times as many gates as in
B2-circuits.
126
xn y1
yn ) = 1 i (x1
xn) = (y1
yn )
(3.1)
(3.2)
(3.3)
The lower bound on C(fn=) follows from Theorem 1.1. The basis of
the induction for the lower bound on the U2-complexity is contained
in Theorem 3.1.
Now it is sucient to prove the existence of some pair (xi yi) of
variables and some constant c such that we may eliminate 4 gates if we
replace xi and yi by c . Similarly to the proof of Theorem 3.1 we nd
some variable z , whose replacement by a suitable constant c eliminates
3 gates. Afterwards the function is not independent from the partner
z of z . Replacing z also by c eliminates at least one further gate.
Redkin (73) proved that the { }-complexity of fn= is 5n 1 .
Furthermore parity has complexity 4(n 1) in { }-circuits and
complexity 7(n 1) in { }- or { }-circuits. Soprunenko (65)
investigated the basis {NAND} . The complexity of x1 xn is
2(n 1) whereas the complexity of x1 xn is 3(n 1) .
127
COROLLARY 4.2 :
and m 3 .
Proof : Exercise.
C(Cnk m) 2 5 n 0 5 m 4 if 0 k
128
(4.1)
(4.2)
by the elimination method. (4.1) can be derived from (4.2) by induction. For the proof of (4.2) it is sucient to prove that we either may
eliminate 3 gates by replacing some xi by a constant (the subfunction
of f obviously is in Sk1
n1 ) , or we may eliminate 5 gates by replacing
some xi and xj by dierent constants (in this case the subfunction of
f is in Sk1
n2 ) .
We investigate an optimal circuit for f Skn . Furthermore we
choose among all optimal circuits one where the number of edges leaving the variables is as small as possible. At the moment we may forget
this condition, it will be useful in the last case of the proof. We do
not replace more than 2 variables by constants. Since k 1 , n 5
and we cannot obtain constant subfunctions. Therefore gates being
replaced by constants have at least one successor which also can be
eliminated.
129
xj
xi
+
B1
E1
+
C1
E2
B2
C2
Eq
Bp
Cq
Fig. 4.1
130
xi
B1
xj
E
+
Fig. 4.2
131
xj
xi
A
E
C
xj
xi
A
E
+
xj
xi
A
G
D
D
Case 2.1
Case 2.2.a
Fig. 4.3
D
Case 2.2.b
132
(4.3)
= xj (xi xm 1 a b) xi xm a xm b xi a b c
If we replace xi by 0 and xm by 1 a b , resG and therefore the whole
circuit becomes independent from xj . But the subfunction f of f we
have to compute is not constant. Since f is symmetric, it depends
essentially on xj . Contradiction.
Case 3.2 : E is a gate. We rebuild the circuit as shown in Fig. 4.4.
The number of gates is not increased, but the number of edges leaving
xj
xi
A
Case 3
E
+
xj
xi
A
xm
+
Case 3.1
Fig. 4.4
xi
A
xj
E
+
Case 3.2
133
(4.4)
Similarly
resG = (xi resE d0) (xj resE d1) d2
(4.5)
5.5 A 3n - bound
(5.1)
q (x|a| x|b| )
For p = 0 we obtain the function considered by Paul (77). The
object of this section is the proof of the following theorem.
134
135
136
137
138
139
function f Bn on the working tape and can compute the lexicographical successor of f . We start with the lexicographically rst function,
the constant 0 . For each function f we produce one after another each
circuit of size 2nn1 1 and compare its output with f . If we nd
a circuit for f we produce the next Boolean function. Otherwise the
considered function f is equal to fn , and by table-look-up we compute
the output fn(x) . Therefore Turing machines with large resources are
powerful enough to construct hard Boolean functions. Nevertheless
explicitly dened Boolean functions are dened via Turing machines.
DEFINITION 6.1 : Let s : . The sequence fn Bn of Boolean
functions is s-explicitly dened if the language L = fn1 (1) can be den
140
C(gn) d 2n
(6.1)
n
dened by h1
n (1) = L {0 1} is bounded by O(t(n) log t(n)) .
Since the reduction machine M is p(n) time bounded for some polynomial p , also the length of its output is bounded by p(n) if the input
w has length n . We extend the outputs, if necessary, by blanks B
such that the output (for each input of length n) has length p(n) .
These outputs can be encoded by (a1 b1
ap(n) bp(n) ) {0 1}2 p(n)
where (ai bi ) is an encoding of the i -th output letter. By the result cited above all ai and bi can be computed by a circuit of size
O(p(n) log p(n)) .
141
(6.2)
y1 yp(n) = 1
The so constructed circuit for fn contains one copy of an optimal circuit
for gi (1 i p(n)) and beyond that q(n) gates for a polynomial q .
Therefore
2n n1 C(fn) q(n) +
C(gi)
(6.3)
1ip(n)
We dene i(n) {1
functions among g1
(6.4)
C(gi(n)) d 2(i(n))
for some d ,
0.
(6.5)
142
EXERCISES
1. Apply the simple linear bound (Theorem 1.1) to the functions
considered in Ch. 3.
2. Generalize the simple linear bound to -circuits where Br .
3. If f1
fm) m 1 + min{C(f1)
C(fm)}
143
xi(r)
1rm
xj(r) xk(r)
1rl
144
19. Apply Theorem 4.1 to Enk (see Def. 4.2, Ch. 3).
20. Complete the proof of Theorem 5.1.
21. Let fn (a b c r x) = xr|a| (x|b| x|c| ) (compare Def. 5.1). Modify
the proof of Theorem 5.1 and prove that C(fn) 3n 3 .
22. Dene a short 0-1-encoding of circuits. Describe how a Turing
machine with short working tape can simulate encoded circuits.
145
6. MONOTONE CIRCUITS
6.1 Introduction
146
147
At the end of this introduction we discuss some properties of monotone circuits. Why is it much more dicult to investigate { }circuits than to investigate monotone circuits ? If f Bn is given by
PI(f) it is a hard problem to compute PI(f) . If f g Mn are given
by PI(f) and PI(g) it is easy to compute PI(f g) and PI(f g) . By
denition
fg =
t
tPI(f)
t =
t PI(g)
(1.1)
tPI(f)PI(g)
(1.2)
t PI(f) PI(g) : t
t } (1.3)
fg=
(1.4)
t PI(g)
tPI(f)
tt
tPI(f) t PI(g)
and
PI(f g) = {t t | t PI(f) t PI(g)
u PI(f) u PI(g) : t t
(1.5)
uu }
148
(2.1)
= 2 (k 1)n k2
This result is interesting for small k . For large k we apply the duality principle for monotone functions. The dual function fd of f is
f(x1
xn ) . By the rules of de Morgan we obtain a monotone circuit for fd by replacing in a monotone circuit for f -gates by -gates
149
and vice versa. Obviously Cm (f) = Cm(fd ) . The dual function of Tnk
is Tnk (x1
xn ) . (Tnk )d computes 1 i at most k 1 of the negated
variables are 1 i at least n k + 1 of the variables are 1 . Hence
(Tnk)d = Tnnk+1 . We summarize our results.
PROPOSITION 2.1 : i) Cm (Tnk) = Cm (Tnnk+1) .
ii) Cm (Sn ) n(n 1) .
iii) Cm (Tnk ) (2k 1)n k2 .
The reader is asked to look for improvements of these simple upper
bounds. We present a sorting network (Batcher (68)) whose size is
O(n log2 n) and whose depth is O(log2 n) . This sorting network is
based on the sorting by merging principle. W.l.o.g. n = 2k .
ALGORITHM 2.1.a :
Input : Boolean variables x1
Output : Sn (x1
xn) = (Tnn(x1
xn .
xn )
Tn1(x1
xn)) .
xn
and for
150
lists. But only two rank numbers are possible for each element.
ALGORITHM 2.1.b :
Input : a1
Output : z1
am and b1
151
hence
M(m) = m log m + 1
DM(m) = DM(m 2) + 1 and
(2.2)
(2.3)
DM(1) = 1
hence
(2.4)
(2.5)
DM(m) = log m + 1
Let S(n) and DS(n) be the number of comparisons and the depth of
Batchers sorting algorithm for n = 2k .
S(n) = 2 S(n 2) + M(n 2) and S(1) = 0
hence
(2.6)
(2.7)
DS(1) = 0
hence
(2.8)
(2.9)
(2.10)
(2.11)
(2.12)
Dm (Mn ) log n
(2.13)
152
xj+1
xj where j = r 2i (0 r
2ki) and j = j + 2i are sorted.
In 4 we prove that Batchers merging algorithm is asymptotically
optimal. The upper bound for sorting has been improved by Ajtai,
Komlos and Szemeredi (83).
THEOREM 2.2 : One can design a sorting network of size O(n log n)
and depth O(log n) .
We do not present this AKS sorting network as it is rather complicated. Theorem 2.2 is a powerful theoretical result. But the AKS
sorting network beats Batchers sorting network only for very large n ,
in particular, only for n larger than in real life applications (see e.g.
Paterson (83)). The AKS sorting network sorts the input sequence but
no subsequence. It is an open problem whether Batchers algorithm
can be improved signicantly for small n .
For the majority function we do not know of any monotone circuit
of size o(n log n) . The monotone complexity of the majority function is
still unknown. For constant k we improve Proposition 2.1. We design
a monotone circuit of size kn + o(n) . This result has been announced
by Adleman and has been proved by Dunne (84). The conjecture that
kn o(n) monotone gates are necessary is also open.
THEOREM 2.3 : Cm (Tnk) kn + o(n) for constant k .
Proof : Tnk is the disjunction of all monotone monoms of length k .
If B1
Br form a partition of X = {x1
xn} the function
Trk(T1(B1)
T1(Br)) has only prime implicants of length k . The
number of prime implicants is large if all Bi are of almost the same
size. Certainly, we do not obtain all prime implicants of length k .
Therefore we use several partitions of X .
For the sake of simplicity we concentrate on those n where n = pk
for some natural number p . Let p(k) = pk and r = p(k 1) . We
153
Tk (X) =
p(k1)
Tk
1qk
Tp1(Bqp(k1) )
(2.14)
p(k1)
Cm (Tk ) k n + k Cm (Tk
k n 1 + (k p) + (k p)2 +
(2.15)
+ kk1(2 k p) = k n + o(n)
A number r {1
pk} or r {1
pk1} is represented by
a vector r = (r1
rk) {0
p 1}k or r = (r1
rk1)
{0
p 1}k1 . For 1 r pk1 the set Bqr includes the p variables
rq1 j rq
rk1) for some 0 j p 1 . It is
xi where i = (r1
q
obvious that the sets Br (1 r pk1) build a partition of X . We
claim that we nd for dierent j(1)
j(k) {1
pk} some q such
that xj(1)
xj(k) are in dierent sets of the q-th partition.
If xi and xj are in the same set Bqr , i and j agree at all but the q-th
position. If i = j and q = q , it is impossible that xi xj Bqr Bqr . This
proves the claim for k = 2 . For k 2 either q = k is appropriate or two
indices, w.l.o.g. j(k1) and j(k) , agree on all but the last position. We
obtain j (l ) by cancelling the last position of j(l ) . Then j (k1) = j (k)
and among j (1)
j (k) are at most k1 dierent vectors. We obtain
q
B r (1 q k 1) by cancelling the last position of all elements in
Bqr . By induction hypothesis we nd some q {1
k 1} such
that for l = m either j (l ) = j (m) or xj (l ) and xj (m) are not in the
same set B qr . This q is appropriate. If j (l ) = j (m) , j(l ) and j(m)
dier at the last position, hence xj(l ) and xj(m) are for l = m not in the
same set Bqr .
154
k1
2
(3.1)
2n 4 + log n .
155
) 3 5 n O(1) .
156
r 2.
157
xi
G1
xj
G2
Fig. 3.1
g
G3
G4
xj
G2
xk
G5
G3
xl
Fig. 3.2
G4
158
xn ) = Tn1
n2(x1
xn1)) (3.2)
xn = 0) and Cm (Tn1
n2) = Cm (T2 ) by the duality principle.
159
log i log(n!)
(4.1)
1in
The lower bound for the merging function has been proved by Lamagna (79).
160
exercise).
d(i j) e(j) log e(j) 2
log e(j)
+ e(j) =: t(j)
(4.2)
1ik
(4.3)
1ik
Cm(Mn )
(4.4)
t(j)
1jn
A replacement rule for monotone circuits is a theorem of the following type: If some monotone circuit for f computes at some gate
G a function g with certain properties and if we replace gate G by a
circuit for the monotone function h (depending perhaps on g), then
the new circuit also computes f . If h is a constant or a variable,
the given circuit is not optimal. In particular, we obtain results on
161
the structure of optimal monotone circuits. Later we apply replacement rules also in situations where h is a more complicated function.
Nechiporuk (71) and Paterson (75) used already replacement rules.
Mehlhorn and Galil (76) presented the replacement rules in the generalized form we discuss here.
It is easy to verify the correctness of the replacement rules, but it
is dicult and more important to get a feeling why such replacement
rules work. Let g be computed in a monotone circuit for f . If t
PI(g) but t t PI(f) for all monoms t (including the empty monom),
t is of no use for the computation of f . At -gates t can only be
lengthened. At -gates either t is saved or t is eliminated by the law
of simplication. Because of the conditions on t we have to eliminate t
and all its lengthenings. Hence it is reasonable to conjecture that g can
be replaced by h where PI(h) = PI(g) {t} . If all prime implicants
of f have a length of at most k and if all prime implicants of g have
a length of at least k + 1 , we can apply the same replacement rule
several times and can replace g by the constant 0 .
THEOREM 5.1 : Let f g Mn and t PI(g) where t t PI(f)
for all monoms t (including the empty monom). Let h be dened by
PI(h) = PI(g) {t} . If g is computed in some monotone circuit for f
and if we replace g by h the new circuit also computes f .
Proof : Let S be the given circuit for f and let S be the new circuit
computing f . By denition h g . Hence by monotonicity f f . If
f = f we choose some input a where f (a) = 0 and f(a) = 1 . Since
we have changed S only at one gate, h(a) = 0 and g(a) = 1 . Since
g = h t , t(a) = 1 . Let t be a prime implicant of f where t(a) = 1 .
We prove that t is a submonom of t in contradiction to the denition
of t .
Let xi be a variable in t . ai = 1 since t(a) = 1 . Let bi = 0 and
bj = aj if j = i . Obviously b a and t(b) = 0 . Hence f (b) = 0 ,
h(b) = 0 and g(b) = 0 . For input b the circuits S and S compute the
162
same output since they compute the same value at that gate where
they dier. Hence f(b) = f (b) = 0 . In particular t (b) = 0 . Since b
and a dier only at position i , xi is a variable in t and t is a submonom
of t .
(5.1)
(5.2)
163
si k1 1 h1 C{}(f)
(si 1)
(6.1)
1in
Proof : The upper bound is obvious. For the lower bound we consider
an optimal {}-circuit for f . The only functions computed in {}circuits are Boolean sums, since constants may be eliminated. At least
si 1 gates are necessary for the computation of fi and at least si k1
1 of the functions computed at these gates are Boolean sums of more
than k summands. We only count these gates. By Denition 6.1 such
a gate is useful for at most h outputs. Hence the lower bound follows.
164
One might conjecture that -gates are powerless for Boolean sums.
This has been disproved by Tarjan (78).
THEOREM 6.1 : Let f M11 14 be dened by
f1 = p y , f2 = q z ,
f 5 = x1 y , f 6 = x1 x 2 y ,
f 8 = x1 z , f 9 = x1 x 2 z ,
f11 = p u x1 x2 x3 y ,
f13 = r w x1 x2 x3 y ,
Then Cm (f) 17
f3 = r y ,
f4 = s z ,
f 7 = x1 x 2 x 3 y ,
f10 = x1 x2 x3 z ,
f12 = q u x1 x2 x3 z ,
f14 = s w x1 x2 x3 z .
18 = C{}(f) .
g = f7 f10 = x1 x2 x3 yz
(6.2)
Then
f11 = f1 (g u) f12 = f2 (g u)
(6.3)
f13 = f3 (g w) f14 = f4 (g w)
One -gate and 6 -gates are sucient for the computation of
f11
f14 if f1
f10 and the variables are given. Hence Cm (f) 17 .
Obviously C{} (f) 18 . For the lower bound it is sucient to
prove that 8 -gates are necessary for the computation of f11
f14 if
f1
f10 and the variables are given and if -gates are not available.
This proof is left as an exercise.
17 Cm (f)
(6.4)
165
(h h )1
1in
si k1 1 Cm (f) C{}(f)
(si 1) (6.5)
1in
(6.6)
166
or fj = t hj
for j {1
l}
(6.7)
(6.8)
167
1
j!
n22
(j+1)
z(n j)
(j 1)1 j n21 j +
t1
n
2
(6.9)
q
0100
0
0 0
0 0
0010
0
Mb d = 0 0
(6.10)
q b d mod p
0
0
0
0
1
1 0
0000
0
00
1000
0
The corresponding Boolean sums are (1 1)-disjoint. Otherwise we
could nd some ik = ak + bk p and jl = cl + dl p (1 k , l 2) such
that i1 = i2 j1 = j2 and M(ik jl ) = 1 .
168
By denition
cl ak + bk dl mod p for 1 k , l 2 and
(6.11)
(6.12)
xn1 y0
yn1) =
xi yj
(0 k 2n 2)
(7.1)
i+j=k
xi 2ki
0i n
and y =
yi 2ki
0i n
169
1 2
ri
(7.2)
1in
COROLLARY 7.1 : The monotone complexity of the Boolean convolution is at least n3 2 while its circuit complexity is O(n log2 n log log n) .
Proof of Theorem 7.1 : If we replace x1 by 0 we obtain a subfunction
of f which is a semi-disjoint bilinear form with unchanged parameters
r2
rn . Therefore it is sucient to prove that we can eliminate at
1 2
least r1 gates if x1 = 0 . Let s1 be the number of outputs depending
essentially on x1 and consisting of only one prime implicant. If x1 = 0
we can eliminate those s1 -gates where such outputs are computed.
In the following we eliminate at least (r1 s1 )1 2 -gates, altogether
1 2
at least r1 gates.
We consider only outputs fk that depend essentially on x1 and have
more than one prime implicant. Let G0
Gm be a path from x1 to
fk . This path includes at least one -gate Gl where xi yj for some i = 1
and some j is an implicant of the computed function gl . Otherwise
each prime implicant of gl would include either x1 or two x-variables or
170
171
zij =
xik ykj
(1 i I , 1 j J)
(8.1)
1kK
172
For the lower bound on the number of -gates we apply the second
replacement rule of Theorem 5.2.
DEFINITION 8.1 : The variables xik (1 i I) and ykj (1 j J)
are called variables of type k .
173
Several new methods for the proof of lower bounds on the complexity of monotone circuits have been developed during the investigation
of the following generalized Boolean matrix product (Wegener (79 a)
and (82 a)). Let Y be the Boolean matrix product of matrix X1 and
the transposed of matrix X2 . yij = 1 i the i -th row of X1 and the
j -th row of X2 have a common 1. This consideration of the Boolean
matrix product can be generalized to a direct matrix product of m
M N-matrices X1
Xm . For each choice of one row of each matrix the corresponding output computes 1 i the chosen rows have a
common 1 .
m
DEFINITION 9.1 : The generalized Boolean matrix product fMN
is a
monotone function on m M N variables with Mm outputs (m M 2).
The variables form m M N-matrices. xihl is the element of the i -th
matrix at position (h l ) . xihl is a variable of type l .
For 1 h1
yh1
(h1
hm =
hm M let
x1h1l x2h2l
1l N
hm l ) = x1h1 l
xm
hm l
xm
hm l is a prime implicant of type l .
(9.1)
174
m
THEOREM 9.1 : Cm (fMN
) N Mm (2 + (M 1)1) 3 N Mm .
2im
(9.2)
Afterwards each output can be computed with N 1 -gates.
Proof : Let t1 =
kA
xki l and t2 =
k
xkj l . By denition of A , t1
kA
175
m
We interpret this lemma. A monom is useful for some fMN
if it
m
is a (not necessarily proper) shortening of a prime implicant of fMN
.
If g includes several useful monoms of type l , we can replace g by
g t where t is the common part of all useful monoms of type l , i.e.
t is the monom consisting of all variables which are members of all
useful monoms of type l . The replacement of g by g t should not
cause any costs. Therefore we do not count -gates and assume that
all monoms of less than m variables are given as inputs. The length
of a useful monom is bounded by m and the common part of dierent
useful monoms includes less than m variables.
176
m
m
m
Cm (fMN
) Cm (fMN
) Cm(fMN
)
(1 2) N Mm if
vG (h1
1h1
hm M 1l N
hm l ) 1
(9.3)
At each gate we can distribute at most the value 1 among the prime
implicants. This ensures that for an optimal -circuit S
177
m
v(G) Cm (S) = Cm (fMN
)
v(S) :=
(9.4)
G -gate in S
m
is a lower bound on Cm(fMN
) . Finally we prove the necessity of giving
value 1 2 to each prime implicant, i.e. we prove for all (h1
hm l )
that
v(h1
hm l ) :=
hm l )
vG (h1
G -gate in S
1 2
(9.5)
hm l )
v(h1
1h1
hm M 1l N
(9.6)
(1 2) N Mm
and
(9.7)
m
).
vG (t) = 0 for all other t PI(fMN
Obviously
v (G) :=
vG (h1
1h1
hm M 1l N
hm l ) 1 2
(9.8)
178
1
1
+
if t PI(g) , t PI(g ) PI(g ) , these prime implicants
2q 2q
are created at G ,
1
if t PI(g) PI(g ) , t PI(g ) ,
2q
1
if t PI(g) PI(g ) , t PI(g ) , these prime implicants are
2q
preserved at G ,
0 if t PI(g) or t PI(g) PI(g ) PI(g ) .
179
Let s1
sD be the inputs of S(t) leading directly into an -gate.
1 2
(9.9)
for bi = vG(i)
(t) . W.l.o.g. b1 bD . We choose some wi PI(si )
such that some proper lengthening wi of wi is prime implicant of
resG(i) , vG(i)
(wi)
0 , and the type of wi diers from the types of
w1
wi1
. We can always choose w1 = t . If the choice of wi
b1 + + bD i bi
1 2
(9.10)
wD s1
sD
(9.11)
We claim that
s1
sD yt
(9.12)
w D yt
(9.13)
180
L M , x1
1 (1)
n
x1
n (1) {0 1} L .
N L is
181
Mt Nt L :
M f 1 (1)
(10.1)
(Mi Ni ) and
1it
f 1(1) M
(Mi Ni )}
1it
M = Mt
Nt = Mt Nt (Mt Nt)
(Mi Ni ) = f 1(1)
1it
(10.2)
(Mi Ni)
1it
182
Nt
(Mi Ni)
(Mi Ni) M
1it
M = Mt
(Mi Ni)
1it
Nt
(10.4)
Mt Nt (ft1(1) gt1(1))
f 1 (1)
(10.3)
1it1
(Mi Ni )
1it1
(Mi Ni ) and
1it
(Mi Ni )
(10.5)
1it1
= Mt
Nt
(Mi Ni) = M
1it
(Mi Ni)
1it
(10.6)
(Mi Ni)
(10.7)
We try to better understand computations in L . Exactly the functions f where f 1 (1) L have CL -complexity 0 . Again we have to
choose an appropriate class of functions which are given as inputs. If
f 1(1) L , then f 1 (1) cannot be computed in an L-circuit. (10.1)
describes suciently good approximations M of f 1(1) . We do not
demand that the approximating sets M , Mi and Ni can be computed
in an L-circuit of size t .
How should we choose a lattice L where it can be proved that CL (f)
is large ? If L is too dense, we can choose M in such a way that
f 1(1) M and M f 1(1) are small sets. Then these sets can be
183
184
W1
Wr imply W (W1
Wr W) if |W| l , |Wi | l for
1 i r and Wi Wj W for all i = j .
Wr
W for some W1
ii) A implies W (A
A.
W) if W1
iii) A is closed if (A
W) implies (W A) .
Wr
185
B = AB
B = (A B)
LEMMA 11.2 : (L
and
(11.1)
(11.2)
(11.3)
) is a legitimate lattice.
Before we estimate the L-complexity of clique functions, we investigate the structure of closed systems A . If B A , also all B B
where |B | l are elements of A . Obviously B
B B . This observation allows us to describe closed systems by their minimal sets,
namely sets B A where no proper subset of B is included in A . A
consists of its minimal sets, and all sets of at most l elements including a minimal set. Later we shall establish relations between prime
186
(11.4)
Otherwise we choose
FC has property P(r 1 k |C|) .
W W1
Wr1 FC , U
W such that Wi Wj U for i = j .
Let W = W C F , U = U C
W , Wi = Wi C F for
1 i r 1 and Wr = D F . Since C D , Wi Wj U for i = j
in contradiction to the assumption that F has property P(r k) . By
induction hypothesis |FC | (r 2)k|C| . Since D is xed, the condition W D = C is fullled for only one set C . If W D = C = W D
but W = W , also W C = W C . Finally we make use of the fact
that |D| k , since D F .
187
|F| =
|FC |
CD
0ik
0i|D|
|D|
ki
i (r 2)
(11.5)
k
ki
= (r 1)k
i (r 2)
g(g 1) (g l + 1)
1
gl
(11.6)
188
=
1ir
(11.7)
(1 Pr(Ei | E))
1ir
0j
g p(i) j
g
q(i)
0j
gj
g
=
l g
(g l + 1)
gl
(11.8)
LEMMA 11.5 :
of V . Then
Pr(G(h) C
C )n
(g l + 1)
gl
(11.9)
(g l + 1)
gl
(11.10)
Let Wi be the chosen set for the construction of Ci out of Ci1 . G(h)
contains a clique on D i D is colored. G(h) is not in Ci1 i all sets
in Ci1 are not colored. G(h) is in Ci i some set in Ci is colored.
By construction Wi has to be colored. Furthermore by construction
Ci1 Wi , i.e. B1
Br Wi for sets Bj Ci1 . Altogether the
event G(h) Ci Ci1 implies that B1
Br are not colored
but Wi is colored. The probability of this event has been estimated in
Lemma 11.4.
189
n
m(r 1)
and
(11.11)
(l +1) 2
(m1 2 +1) 2
4m3 2 log n
3
(11.12)
Proof : The rst inequality of (11.11) follows from Theorem 10.1. The
last inequalities of (11.11) and (11.12) follow from easy calculations.
The lower bound on CL(cl n m ) is still to be proved.
Let t = CL (cl n m) and let M M1 N1
Mt Nt be sets fullling
the conditions of (10.1) , the denition of CL . By denition we can
choose closed systems A A1 B1
At Bt in V(l ) where M = A ,
Mi = Ai and Ni = Bi .
Case 1 : M is not the set of all graphs.
We consider those mn graphs which contain exactly the edges of
an m-clique, i.e. the graphs corresponding to prime implicants. Each
of these graphs is by (10.1) contained in M or some (Mi Ni ) . We
prove that M can include at most half of these graphs, and that each
-set can include at most 4 (m(r 1) n) (l +1) 2 mn of these graphs.
In order to cover all m-cliques, it is necessary that
4
m(r 1)
n
(l +1) 2
n
1 n
t+
m
2 m
n
m
(11.13)
190
nk
mk
(11.14)
By elementary calculations
1 2 . Hence
(r 1)k
2kl
nk
mk
n
m
nk
mk
(r 1)k
2kl
n
m
n
(m n)k
m
(1 2)k (1 2)
2kl
(11.15)
n
m
Ni )
(11.16)
= A i Bi A i Bi
If the m-clique on Z belongs to (Mi Ni ) we can nd some minimal
set U Z in Ai and some minimal set W Z in Bi , but no subset of
Z belongs to Ai Bi . Since Ai and Bi are closed, U W Z will be
in Ai Bi if |U W| l . Hence |U W| l , and one of the sets U or
W includes at least (l + 1) 2 elements. If the m-clique on Z belongs
to (Mi Ni) , a minimal set of Ai or of Bi with at least (l + 1) 2
elements is included in Z . In the same way as before we can estimate
the number of m-cliques in (Mi Ni) by
191
2(r 1)
(l +1) 2 kl
nk
mk
n
2
m
m(r 1)
n
(l +1) 2
0i
1
2
(11.17)
n
=4
m
m(r 1)
n
(l +1) 2
Ni ) (Mi Ni )
(11.18)
= (Ai Bi ) Ai Bi = Ci Ci
Let h be a random (m 1)-coloring of V , then G(h) is a complete
(m 1)-partite graph. By Lemma 11.5 , the denitions of l and r and
an elementary calculation
Pr G(h) Ci Ci
(11.19)
nl 1 (m 1) (m l ) (m 1)l
m 2 m
= n
and
Pr 1 i t : G(h) Ci Ci
t n
(11.20)
Since all complete (m1)-partite graphs are in the union of all -sets,
n
m(r 1)
(l +1) 2
(11.21)
192
By similar methods Alon and Boppana (85) proved also the following bounds.
DEFINITION 12.1 : cl n p q is the class of all monotone functions
f MN (where N = n2 ) such that f(x) = 0 if G(x) (see Def. 11.1)
contains no p-clique and f(x) = 1 if G(x) contains a q-clique.
THEOREM 12.1 : Let f cl n p q , 4 p q and p q n (8 log n)
for p = p1 2 . Then
1
Cm(f)
8
4p q log n
(p +1) 2
1 (p +1)
2
8
(12.1)
(12.2)
2) m 3
(12.3)
193
zA B yA B
(12.4)
AB
yA
yA yB
(12.5)
We know now that C(f) = o(Cm(f)) for sorting, Boolean convolution, Boolean matrix product, and certain clique functions. The lower
bounds known on Cm (f) C(f) are polynomial. Can this quotient be
superpolynomial or even exponential ? To answer this question we
consider perfect matchings.
DEFINITION 12.2 : PMn is dened on n2 variables xij (1 i j n) .
194
PMn (x) =
x1 (1) xn (n)
(12.6)
195
196
(13.1)
f(a) = f (a) , since we do not have changed the circuit for these inputs.
Altogether f = f .
(13.2)
All f Mn have pseudo complements, but these pseudo complements may be hard to compute. Slice functions have pseudo complements which are easy to compute. Moreover, slice functions set up a
basis of all monotone functions (Wegener (85 a) and (86 b)).
DEFINITION 13.3 : The k -th slice f k of f Bn is dened by
f k = (f Enk ) Tnk+1 = (f Tnk ) Tnk+1
THEOREM 13.2 : i) C(f) C(f 0
f n ) + O(n) .
(13.3)
197
f n ) C(f) + O(n) .
ii) C(f 0
iii) Cm (f 0
f=
(13.4)
0kn
f n)
O(C(f k)) + Cm (0
n) where
0kn
Cm (0
n) = Cm (Tn1
k (Xi ) | 1 i n 0 k n) .
198
Tpi1(x1
0pk
xi ) = Tpi1(x1
xi1) Tni
kp (xi+1
i1
xi1) Tp1
(x1
xn) and
xi1) xi
(13.5)
(13.6)
199
200
and zij =
i
ym
(13.7)
m=j
All yji can be computed with n gates, afterwards all zij for xed i can
be computed with at most 3n(i) 3 gates. Altogether we can compute
all yji and zij with at most 5n k 3 gates. yji and zij are almost pseudo
inputs and pseudo complements for xij with respect to functions in Fkn .
THEOREM 13.4 : All yji and zij can be computed with O(n) gates. If
we replace in a standard circuit for f Fkn xij by yji and xij by zij , then
f is replaced by some function f where f = f Tnk+1 . Hence
Cm(f) = O(C(f)) + O(n) + Cm (Tnk+1)
(13.8)
201
(13.9)
202
Dunne (84) applied the concept of slice functions and proved for
the most complex (n k)-homogeneous functions f Hnk
(see Ch. 4,
n
7) that C(f) = (Cm(f)) . More precisely, Cm (f) = O(C(f))+O(nk1)
for f Hnk
and constant k .
n
The monotone complexity of the set of all pseudo complements has
been determined by Wegener (86 b).
THEOREM 13.6 : Cm (0
n) = (n2) .
(13.11)
(13.12)
203
204
m
2
N
N
N
N
cl M
n m = (cl n m EM ) TM+1 = (cl n m TM ) TM+1
(14.1)
DEFINITION 14.2 : f = f
n 2
n 2
2
, cl ln n
= TN
l +1 for N =
n 2
2
(14.2)
n
2
This theorem due to Dunne (84) implies that the central slice of
cl n n 2 has polynomial (monotone) circuits i cl n n 2 has polynomial
circuits. Furthermore cl n n 2 is an NP-complete predicate, since the
reduction in the following proof can be computed in polynomial time.
Proof of Theorem 14.2 : It is sucient to prove that cl ln n 2 is a subfunction of cl 5n 5n 2 . We denote the vertices by v1
vn w 1
w4n
and replace all variables corresponding to edges which are adjacent
205
4n
2n
+ 4n2
2n2 2n = 8n2 3n
2
2
(14.3)
2
2
edges. Let r = 5n
2 l 2n
2
2 +2n . Then 0 r 8n 3n . We
decide that exactly r of the 8n2 3n variable edges exist. Altogether
the graph contains now 5n
2 l edges.
2
2
Results similar to those in Theorem 14.1 and 14.2 can also be
proved for other NP-complete predicates (see Exercises). In order to
obtain more results on the complexity of the slices of some function,
we compare the complexity of the k-slice with the complexity of the
(k + 1)-slice of some function f. Dunne (84) proved that f k+1 is not
much harder than f k , whereas Wegener (86 b) proved that f k+1 may
be much easier than f k .
206
(14.4)
1in
n1
THEOREM 14.4 :
Let c(k n) = n1
. There are
k1 log k1
functions f Mn with prime implicants of length k only, such that
Cm(f k ) = (c(k n)) and Cm(f l ) = O(n log n) for l k .
Proof : Let Fk n be the set of all functions f such that all prime
implicants have length k and each monom of length k not including
x1 is a prime implicant of f . Then f Fk n is dened by a subset
of all monoms of length k including x1 . Hence log |Fk n | = n1
.
k1
By Shannons counting argument Cm (f k ) = (c(k n)) for almost all
f Fk n .
207
We are not able to prove non linear lower bounds on the circuit
complexity of explicitly dened Boolean functions. For monotone circuits we know several methods for the proof of lower bounds, and for
slice functions f lower bounds on Cm (f) imply lower bounds on C(f) .
Hence we should apply our lower bound methods to slice functions.
The reader should convince himself that our bounds (at least in their
pure form) do not work for slice functions. In this section we discuss some particularities of monotone circuits for slice functions f and
present some problems whose solution implies lower bounds on Cm (f)
and therefore also on C(f) (Wegener (85 a) and (86 b)).
Let PIk (g) be the set of prime implicants of g whose length is k .
Let Mkn be the set of f Mn where PIl (f) = for l k . Mkn includes
all k-slices.
LEMMA 15.1 : Let S be a monotone circuit for f Mkn . If we replace
the inputs xi by yi = xi Tnk , then the new circuit S also computes f .
The proof of this and the following lemmas is left to the reader.
For the computation of the pseudo inputs n + Cm (Tnk) gates suce.
All functions computed in S are in Mkn . In the following sense slice
functions are the easiest functions in Mkn .
208
(15.1)
(15.2)
Set circuits form the principal item of monotone circuits and circuits for slice functions. So we obtain a set theoretical or combinatorial
representation of circuits.
For the classes of functions Fkn and Gkn (see Def. 13.4 and 13.5) we
can use the pseudo inputs yji and in set circuits the sets Yji = PI(yji ) ,
the prime implicants in Yji all have exactly one variable of each class
209
Xi . This holds for all gates of a set circuit with inputs Yji . This leads
to both a geometrical and combinatorial representation of set circuits.
The set of monoms with exactly one variable of each class Xi
(1 i k) can be represented as the set theoretical product
Q = {1
n(i)} where (r(1)
r(k)) Q corresponds to the
1in
monom x1r(1)
210
4. Gij =
i+j=l
8. Kl =
i+j=l
211
n = 16 , m = 4 , k = 12 , h(k) = 3 ,
d1(k) = 8 , d2(k) = 4 ,
T12 = (G3 K8) (G4 K4)
Fig. 15.1
yn) . For f1
gi (x y)
fn Mn we
(15.3)
1in
gn) = Cm (f1
fn ) + n .
gn ) + n 1 .
Proof : The upper bounds follow from the denition and the lower
bound can be proved by the elimination method.
For many functions, in particular those functions we have investigated in 4 9 , we suppose that equality holds in Lemma 15.4 ii.
Let us consider k-slices F1
Fn Mn . Then Fi = Fi Tnk+1 where
PI(Fi ) = PIk (Fi ) . Since yi Fi is in general no slice, we dene the
(k + 1)-slices G G1
Gn by
212
Gi (x y) = Gi (x y) T2n
k+2 (x y) where
Gi (x y) = yi Fi (x)
(15.4)
and G(x y) =
Gi (x y)
(15.5)
1in
Fn )+Cm(T2n
k+2)+2n .
Gn ) + n 1 .
C(G1
Gn ) C(G) + C (T2n
k+2) + 2n for
(15.6)
2n
yj Fj (x) T2n
k+2 (x y) yi Tk+2(x y)
=
1jn
= (Fi (x) yi )
2n
Fj (x) yj yi (T2n
k+2(x y) yi ) Tk+2 (x y)
j=i
213
{n} .
Operations : , (binary).
Outputs : H1
Problem : SC(H1
Hn where Hi = {j | xj PI(fi )} .
Hn ) = ? ( =
(|Hi | 1) ?) .
1in
214
EXERCISES :
1. Let f Mn be in MDNF. If we replace by and by + ,
we obtain the polynomial p(f) . Then Cm(f) C{+ }(p(f)) , but
equality does not hold in general.
2. If a sorting network sorts all 0-1-sequences, it also sorts all sequences of real numbers.
3. Cm (Tn2) = log n .
4. Cm (f) = 2 n for f(x1
x3n) =
215
13. Let f be the Boolean sum of Theorem 6.1. Then Cm (f) = 17 and
C{}(f) = 18 .
14. Which is the largest lower bound that can be proved using (6.9)
and Theorem 6.2 ?
16. For a semi-disjoint bilinear form f let G(f) be the bipartite graph
including the edge (r s) i xr ys PI(fk) for some k . Let V(r s)
be the connected component including the edge (r s) . f is called
a disjoint bilinear form i V(r s) and PI(fk) have at most one edge
(or prime implicant) in common.
a) The Boolean matrix product is a disjoint bilinear form.
b) The Boolean convolution is not a disjoint bilinear form.
| PI(fk )| .
1km
216
25. Design
v(h1
large).
m
with N Mm -gates where
for fMN
i+1
for all prime implicants (i xed, m
l) =
2i
-circuits
hm
1
.
2
b) For almost half of the gates v(G) = 0 , for the other gates
v(G) = 1 .
a) For almost all gates v(G) =
217
PI(g) is
3
for fM2
).
218
We investigate the relations between the complexity measures circuit size C , formula size L and depth D . A more intensive study of
formulas (in Ch. 8) is motivated by the result that D(f) = (log L(f)) .
For practical purposes circuits of small size and depth are preferred. It
is an open problem, whether functions of small circuit size and small
depth always have circuits of small size and depth. Trade-os can be
proved only for formula size and depth.
(1.1)
For a basis
k() =
D (sel) + 1
log 3 1
(1.2)
The variable x decides which of the variables y or z we select. If sel is computable over , k() is small. In particular,
k(B2) = 3 (log 3 1) 5 13 . Obviously sel is not monotone. Let
sel (x y z) = y xz . Then sel = sel for all inputs where y z .
219
(1.3)
(1.5)
(1.6)
220
(1.7)
(1.8)
(1.9)
(1.10)
L (f2 i) l 2 l 0 2l 2 3 (2l 1) 3
(1.11)
Hence
D (f2) D (sel) + k() log((2l 1) 3 + 1)
(1.12)
(1.13)
= k() log(l + 1)
(1.14)
221
(2.1)
222
(2.2)
(2.3)
Hence, by denition,
D(A(d)) d
(2.4)
(2.5)
(2.6)
223
(2.7)
We claim that
2w + m z + 2 for w = max{x y}
(2.8)
hence y x + 2
(2.9)
(2.10)
r 2
D(y)} + 1 + m
(2.11)
(2.12)
D(w 1) 2(w 1)
(2.13)
224
2A( r 2 ) + r 2 2
(2.16)
H(r) + 1 + log(H(r) + 1)
(2.17)
We claim that A(r) H(r) for some appropriate k and all r . Parameter k is chosen such that A(r) H(r) for r R . If r R by (2.15) ,
the induction hypothesis and (2.17)
A(r) + 1 + log(A(r) + 1)
2 H( r 2 ) + r 2 2
2 A( r 2 ) + r 2 2
(2.18)
H(r) + 1 + log(H(r) + 1)
(2.19)
(2.20)
(2.21)
(2.22)
The largest dierences known between L(f) and C(f) are proved in
Ch. 8. For a storage access function f for indirect addresses C(f) =
(n) but L(f) = (n2 log1 n) and for the parity function g and =
{ } C (g) = (n) but L (f) = (n2 ) .
225
A circuit is ecient if size and depth are small. For the existence of
ecient circuits for f it is not sucient that C(f) and D(f) are small.
It might be possible that all circuits of small depth have large size
and vice versa. In Ch. 3 we have designed circuits of small depth and
size, the only exception is division. We do not know whether there
is a division circuit of size O(n log2 n log log n) and depth O(log n) .
For the joint minimization of depth and circuit size we dene a new
complexity measure PCD (P = product).
DEFINITION 3.1 : For f Bn and a basis
PCD (f) = min{C(S) D(S) | S is an -circuit for f} and
(3.1)
(3.2)
226
computations. For VLSI chips (see also Ch. 12, 2) one tries to minimize the area A of the chip and simultaneously the cycle length T .
It has turned out that AT2 is a suitable joint complexity measure.
By information ow arguments one can prove for many problems,
among them the multiplication of binary numbers, that AT2 = (n2) .
Since for multiplication A = (n) and T = (log n) chips where
AT2 = O(n2 ) may exist only for (log n) = T = O(n1 2) . Mehlhorn
and Preparata (83) designed for this range of T VLSI chips optimal
with respect to AT2 . The user himself can decide whether A or T
is more important to him. We are far away from similar results for
circuit size and depth.
xn y1
yn ) =
xi yi yn
(4.1)
1in
227
(4.2)
2+1
yn ) fn 2(x y )
(4.3)
L(F1 ) = 2
hence
(4.4)
D(F1 ) = 1
(4.5)
L(Fn ) = (1 2) n log n + 2 n
D(Fn ) = max{D(Fn 2 ) log(n 2)} + 2
hence
D(Fn ) = 2 log n + 1
The following trade-os have been proved by Commentz-Walter (79) for the monotone basis and by Commentz-Walter and Sattler (80) for the basis { } .
THEOREM 4.2 : PLDm (fn )
1
n log2 n .
128
228
1
n log n log log n(log log log log n)1
8
Proof : Obvious.
yn1 x2
xn) = fn1(y1
yi xi+1 xn =
1in1
yn1 x2
xn )
(4.6)
(yi xi+1 xn )
1in1
xn y1
yn1 1)
LEMMA 4.3 :
d1+s1
d1
Proof : Elementary.
d1+s
d1
d+s
d
if s 1 .
229
d+s
d
| ds p} .
Then
Proof :
By Stirlings formula we can approximate n! by
(2)1 2 nn+1 2 en , more precisely, the quotient of n! and its approximation is in the interval [1 e1 (11n)] . Hence
d+s
d
log
1
1
d+s
1
log e log(2) + log
11(d + s)
2
2
ds
(4.7)
= (d + s)
= (d + s) H
d
d
s
s
log
log
d+s
d+s d+s
d+s
s
d
d+s d+s
where
H(x 1 x) = x log x (1 x) log(1 x)
(4.8)
d+s
d
2 d log
d+s
d
= 2 d log 1 +
(4.9)
s
d
(4.10)
1 2
log(1 + ) | 1}
(4.11)
230
0 s
n log2 t(d s)
n log2 n
128
128
It is easier to work with t(d s) than with PLDm . The main reason
is that depth and formula size are both bounded independently. s is a
bound for the average number of leaves of the formula labelled by xi or
yi . It would be more convenient to have bounds on the number of xi and yj -leaves (for xed i and j) and not only on the average number.
Let t (d s) be the maximal n such that there is a monotone formula
Fn for fn such that D(Fn ) d and for i j {1
n} the number of
xi - and yj -leaves is bounded by s . Obviously t (d s) t(d s) . We
prove an upper bound on t(d s) depending on t (d s) , and afterwards
we estimate the more manageable measure t (d s) .
LEMMA 4.6 : t(d s) 3 t (d 6s) .
231
(4.13)
LEMMA 4.7 : t (d s)
d+s
d
1.
This lemma is the main step of the proof. Again we rst show that
this lemma implies the theorem.
Proof of Lemma 4.5 : By Lemma 4.7 t (d s) is bounded by m(ds)
for the function m of Lemma 4.4. Hence
log t(d s) log t (d 6s) + log 3 2 (6 ds)1 2 + log 3
8 (ds)1
(4.14)
For the proof of Lemma 4.7 we investigate the last gate of monotone
formulas for fn . If this gate is an -gate we apply Lemma 4.2 and
investigate the corresponding dual formula whose last gate is an gate. If fn = g1 g2 , we nd by the following lemma large subfunctions
of fn in g1 and g2 .
LEMMA 4.8 : If fn = g1 g2 , there is a partition of {1
n} to I
and J such that we may replace the variables xi and yi for i I (i J)
by constants in order to obtain the subfunction f|I| (f|J|). Furthermore
g1 or g2 depends essentially on all y-variables.
232
233
d1+s
d1+s1
1+
1+1
d1
d1
d+s
1
d
EXERCISES
1. Generalize Spiras theorem to complete bases Br .
2. Compute C (sel) and D (sel) for dierent bases B2 , in particular = {NAND} and = { } .
3. Let F be a formula of size l and let F be the equivalent formula
constructed in the proof of Spiras theorem. Estimate the size of
F.
4. Which functions are candidates for the relation D(f)
(log C(f)) ?
5. Let n = 2m and let f be the Boolean matrix product of n nmatrices. What is the minimal size of a monotone circuit for f
whose depth is bounded by m + 1 ?
234
235
8. FORMULA SIZE
8.1 Threshold - 2
n 2
n 2
n 2
Tn2 (x) = T1 (x ) T1 (x ) T2 (x ) T2 (x )
(1.1)
236
(1.2)
237
suces. Hence uj1 and uj2 have no common variable and all prime
implicants have length 2 . Similarly g2 = t2 w1 wq .
We rearrange the -gates in the formulas for g1 and g2 , such that
t1 t2 u = u1 up , and w = w1 wq are computed. Then
g1 = t1 u , g2 = t2 w and
g = g1 g 2 = t1 t 2 t 1 w t 2 u u w
(1.3)
(1.4)
of the same size. Let f be the function computed by the new formula.
Tn2 f , since g g . Let us assume, that Tn2 (a) = 0 but f (a) = 1
for some input a . Then g(a) = 0 and g (a) = 1 . This is only possible
if u(a) = 1 or w(a) = 1 . But u and w have only prime implicants of
length 2 , hence Tn2 (a) = 1 in contradiction to the assumption. The
new formula is an optimal monotone formula for Tn2 . We continue in
the same way until we obtain a single-level formula.
238
(p p(m)) = n p
Lm (Tn2) =
1mn
log |Mm |
(1.5)
1mn
|Mm |1
= n p n log
1mn
ai
1in
ai
1 n
(1.6)
1in
Lm (Tn2) n p n log (1 n)
|Mm |
(1.7)
1mn
n.
is a subfunction
Proof : For k n 2 we use the fact that Tnk+2
2
of Tnk . For k
n 2 we apply the duality principle, which implies
Lm (Tnk) = Lm (Tnn+1k) .
239
n 2
(2.1)
0pk
These formulas for Tnk have size O(n(log n)k1) (Korobkov (56)
and Exercises).
Khasin (69) constructed formulas of size
k1
O(n(log n)
(log log n)k2) , Kleiman and Pippenger (78) improved
log n
x (i)
(2.2)
1jk iA(j)
240
of length k each. Hence the size of the formula for Tnk is O(n log n) .
We present the constructive solution due to Friedman (84). We
obtain a formula of size c n log n for a quite large c . It is possible
to reduce c by more complicated considerations. We again consider
functions f as described in (2.2). If A(1)
A(k) is an arbitrary
partition of {1
n} , f has formula size n and f Tnk . xi(1)
xi(k)
PI(f) i the variables xi(1)
xi(k) are in dierent sets of the partition.
Hence we are looking for sets Am j (1 m k , 1 j r) with the
following properties
Ak j are disjoint subsets of {1
A1 j
i(k) {1
n} we nd some j such that
for dierent i(1)
each Am j contains exactly one of the elements i(1)
i(k) .
n} for each j ,
Then
Tnk = F1 Fr
where Fj =
xi
(2.3)
1mk iAm j
The rst property ensures that all prime implicants have length k .
The second property ensures that each monom of length k is a prime
implicant of some Fj . A class of sets Am j with these properties is called
an (n k)-scheme of size r . The size of the corresponding formula is
rn.
We explain the new ideas for k = 2 . W.l.o.g. n = 2r . Let posj (l )
be the j -th bit of the binary representation of l {1
2r} where
1 j r . The sets
Am j = {l | posj (l ) = m} for 0 m 1 and 1 j r
(2.4)
241
for 1 m 3 , 1 j r , 1 t(1)
t(2)
(2.5)
t(3) b .
(j t(1)
t(k))
for 1 m k , 1 j r , 1 t(1)
(2.6)
t(k) b
242
where
r logb 2 + R = r (logb 2 + 1 (1 l )) = r (1 (1 2l ))
= r (1 (1 c)) = r r
(2.7)
(2.8)
Here we used the fact that logb 2 = 1 (2l ) . The greedy algorithm
works as follows. Choose s1 {1
b}r arbitrary. Label all forbidden vectors. If s1
si1 are chosen, choose si {1
b}r as an
243
244
(3.1)
2n1
(3.2)
This implies
Prob Fi Tnm
a{0 1}n
(3.3)
2n 2n1 = 1 2
Hence
Lm (Tnm) 22i
(3.4)
Let
fi = max Prob(Fi (a) = 1) | Tnm (a) = 0
and
(3.5)
4
3
2
LEMMA 3.1 : fi = fi1
4fi1
+ 4fi1
245
fi = fi1 fi1 {0 1 (3 + 5) 2}
(3.6)
for = (3 5) 2 and
(3.7)
hi = hi1 hi1 {(1 + 5) 2 0 1 1}
The only x points in the interval (0 1) are fi1 = and hi1 = 1 .
Considering the representation of fi and hi in Lemma 3.1 we note that
fi fi1 i fi1 and that hi hi1 i hi1 1 . If f0 and
h0 1 , then fi and hi are decreasing sequences converging to 0 .
To us it is important that fl hl
2n1 for some l = O(log n) (see
(3.2) (3.4)).
It turned out that p = 2 (2m 1) is a good choice. Let a be an
input with m 1 ones. Then
f0 = Prob(F0 (a) = 1) = (m 1)p = (n 1)
(3.8)
= (n1)
Let b be an input with m ones. Then
h0 = Prob(F0(a) = 0) = 1 Prob(F0(b) = 1) = 1 mp
(3.9)
= 1 (n 1) = 1 (n1)
We shall see that f0 and 1 h0 are large enough such that fi
and hi decrease fast enough.
The function fi fi1 is continuous (in fi1) . Hence fi fi1 is
bounded below by a positive constant if fi1 [ ] for some xed
0 . If fi , fj for some j = i + O(1) . But for which i
it holds that fi if f0 = (n1) ? For which l fl 2n1 if
246
fj
2
By Lemma 3.1 fi 4 fi1
and hi 4 h2i1 for fi1 hi1 (0 1] . Let
= 1 16 and fj hj . For l j and k = 2l j
4 fl21
fl
and also hl
fl hl
43 fl42
4k1 fjk
(1 4)k = 22k
(3.10)
If fi
and hi
l = i + log n + O(1) .
(3.11)
2n1 for
1 , then fl hl
fi1 = fi = 4 + O( 2 ) and
(3.12)
hi1 = 1 hi = 1 4 + O( 2)
(3.13)
Hence
0 0
: fi =
fi
hi1 = 1 hi
and
1
(3.14)
0 . Hence by (3.14) fi
We know that f0 c n1 for some c
i
1
i1
1
c n if c n
. If we choose i = (log n) (log ) + c
for some appropriate c , then fi and (by similar arguments)
hi 1 . Altogether Prob(Fl = Tnm) 1 2 for all 4 , some
appropriate c depending on and
l = (log n)(1 + log1 ) + c
(3.15)
4 = 2(3
5)
(3.16)
247
The best lower bounds on the formula size of the majority function
(over the monotone basis and over { }) are of size (n2) . Hence
the investigation of the (monotone) formula size of threshold functions
is not yet complete.
248
(4.2)
3 2k1 .
249
(4.3)
(4.4)
2k+1 .
0 such
250
that the following statement holds for all r 3 and all f Bn whose
formula size with respect to is bounded above by n(log log n r) .
There is a set of n r variables such that the subfunction f Br of
f where we have replaced these n r variables by zeros is symmetric,
and its value vector v(f ) = (v0
vr) has the following form: v1 =
v3 = v5 = and v2 = v4 = v6 = . Hence f is uniquely dened
by v0 v1 and v2 .
The proof is based on the fundamental principle of the Ramsey
theory. Simple objects, here formulas of small size, are locally simple,
i.e. have very simple symmetric functions on not too few variables as
subfunctions. We leave out the proof here. A direct application of the
Ramsey theory to lower bounds on the formula size has been worked
out by Pudlak (83). Since the Ramsey numbers are increasing very
fast, such lower bounds have to be of small size.
Here we explain only one simple application of this method, more
applications are posed as exercises.
THEOREM 5.2 : L(Tn2 ) = (n log log n) .
Proof : Let r = 3 . If we replace n3 variables by zeros, Tn2 is replaced
by T32 . For the value vector of T32 v1 = v3 .
251
Also the method due to Fischer, Meyer and Paterson (82) is based
on the fact that simple functions have very simple subfunctions. This
method yields larger lower bounds than the Hodes and Specker method
but for a smaller class of functions. The class of very simple functions
is the class of ane functions x1 xm c for c {0 1} . The size
of an ane function is the number of variables on which it depends
essentially. A subfunction f of f is called a central restriction of f if
n1 n0 {0,1} for the number of variables replaced by ones (n1 ) and
zeros (n0) .
THEOREM 6.1 : Let a(f) be the maximal size of an ane subfunction of f when considering only central restrictions. Then for all
Boolean functions f Bn (n )
L(f) n (log n log a(f)) for some constant
0.
(6.1)
We also omit the tedious proof of this theorem. The following result
states a rather general criterion for an application of this method.
Afterwards we present a function for which the bound of Theorem 6.1
is tight.
THEOREM 6.2 : If f(a) = c for all inputs with exactly k ones and
f(a) = c for all inputs with exactly k + 2 ones, then
L(f) n log min{k n k}
0.
(6.2)
252
(6.3)
n 2
n 2
n 2
Dn1 (x) = D1 (x ) D1 (x ) D0 (x ) D0 (x )
n 2
(6.4)
(6.5)
253
The lower bound due to Nechiporuk (66) is based on the observation that there cannot be a small formula for a function with many
dierent subfunctions. There have to be dierent subformulas for different subfunctions.
DEFINITION 7.1 : Let S X be a set of variables. All subfunctions
f of f , dened on S , are called S-subfunctions.
THEOREM 7.1 : Let f Bn depend essentially on all its variables,
let S1
Sk X be disjoint sets of variables, and let si be the number
of Si -subfunctions of f . Then
L(f) N(S1
Sk ) := (1 4)
log si
(7.1)
1ik
(7.2)
254
|Pi | 2 (|Wi | + 1)
(7.3)
The dierent replacements of the variables xj Si lead to si different subformulas. We measure the local inuence of dierent replacements. Let p be a path in Pi and let us x a replacement of all
variables xj Si . If h is computed at the rst gate of p , then the function computed on the last edge of p is 0 , 1 , h or h , since no variable
xk Si has any inuence on this path. Since Ti can be partitioned
into the paths p Pi , the number of Si -subfunctions is bounded by
4|Pi| . Hence
si 4|Pi|
(7.4)
Sk ) is
(7.5)
(7.6)
1iq
Furthermore
log si
q ik
2t(i)
q ik
(7.7)
255
(7.8)
By Theorem 5.1, Ch. 3 , the circuit complexity of SAn , the storage access function for direct addressing, is bounded by 2n + o(n) .
Hence each x|a| log p+j (1 j log p) can be computed by 2 p log1 p +
o(p log1 p) gates. The whole vector d can be computed with 2p+o(p)
gates. Afterwards ISAn can be computed with 2p + o(p) gates. Altogether 4p + o(p) = 2n + o(n) gates are sucient for the computation
of ISAn .
256
Nechiporuks method has been applied to many functions. We refer to Harper and Savage (72) for the marriage problem and to Paterson (73) for the recognition of context free languages. We investigate
the determinant (Kloss (66)) and the clique functions (Sch
urfeld (84)).
The determinant detn BN where N = n2 is dened by
det(x11
n
xnn ) =
x1 (1)
xn (n)
(7.9)
gc(y1
y1 c12 c13
1 y2 c23
0 1 y3
yn) = det
n
0 0 0
0 0 0
c1 n2 c1 n1
c2 n2 c2 n1
c3 n2 c3 n1
1
0
yn1
1
c1n
c2n
c3n
cn1 n
yn
(7.10)
Since there are (n2 n) 2 matrix elements cij (i j) above the main
diagonal, it is sucient to prove that gc and gc are dierent functions
for dierent c and c .
For n = 2 gc(y1 y2) = y1 y2 c12 and the assertion is obvious. The
case n = 3 is left as an exercise.
257
For n
3 we apply matrix operations which do not change the
determinant. We multiply the second row by y1 and add the result to
the rst row. The new rst row equals
(0 y1 y2 c12 y1 c23 c13
y1 c2n c1n )
(7.11)
The rst column has a one in the second row and zeros in all other
rows. Hence we can erase the rst column and the second row of the
matrix. For y1 = 1 we obtain an (n 1) (n 1)-matrix of type
(7.10). By induction hypothesis we conclude that gc and gc dier if
cij = cij for some i 3 or c1k c2k = c1k c2k for some k 3 .
By similar arguments for the last two columns (instead of the rst
two rows) we conclude that gc and gc dier if cij = cij for some
j n 2 or ck n1 ckn = ck n1 ckn for some k n 2 . Altogether we have to consider only the situation where cij = cij for all
(i j) {(1 n 1) (1 n) (2 n 1) (2 n)} and c1k c2k = c1k c2k for
k {n 1 n} . Let y1 = 0 . Then we can erase the rst column and
the second row of the matrix. We obtain (n 1) (n 1)-matrices
M and M which agree at all positions except perhaps the last two
positions of the rst row. Since c = c and c1k c2k = c1k c2k for
k {n 1 n} , c1 n1 = c1 n1 or c1n = c1n or both. By an expansion
according to the rst row we compute the determinants of M and M .
The summands for the rst n3 positions are equal for both matrices.
The (n 2)-th summand is yn if c1 n1 = 1 (c1 n1 = 1 resp.) and 0
else, and the last summand is 1 if c1n = 1 (c1n = 1 resp.) and 0 else.
Hence gc (y) gc (y) is equal to yn or 1 or yn 1 according to the three
cases above. This ensures that gc = gc if c = c .
258
(n i) 2
(n i) 2
(7.12)
1in1
= (1 16)
259
FC(f) = FC(f)
for f Bn and
(8.1)
260
and
(8.2)
(8.3)
and
(8.4)
1
1
K(f) + K(g) h2f a1
+ h2g a1
g b
f b
(8.5)
(8.6)
261
(8.7)
Hence
LU (fn ) 4 LU (fn 2)
LU (f1) = 1
and LU (fn ) n2
(8.8)
262
Let l PC(f) be dened in a similar way for prime clauses. Then K(f)
l PI(f) l PC(f) .
Proof : Let A f 1(1) and B f 1(0) . For a A we nd some prime
implicant t of f such that t(a) = 1 and the length of t is bounded by
l PI(f). If a f 1(0) is a neighbor of a , t(a ) = 0 . Hence a and a
dier in a position i where xi or xi is a literal of t .
This implies |H(A B)| l PI(f)|A| . Similarly |H(A B)| l PC(f)|B| .
2n 2i1 = n 2n
|A| =
1 2i . (8.9)
where n =
1in
1in
n = (1 2)
2in
2i (1 4)
(8.10)
2in
Furthermore n (1 n ) 1 3 and n1 n .
Let M be a given n n-matrix and M a neighbor of M . W.l.o.g.
M and M dier exactly at position (1 1) . We compute detn M and
detn M by an expansion of the rst row. Then detn M = detn M i
the (n 1) (n 1)-matrix M consisting of the last n 1 rows
and columns of M (or M ) is regular. We have n2 possibilities for the
263
2
(8.11)
Altogether
2
K(det)
n
2
(1 4) n1
n4 22n
2
n 2n (1 n ) 2n
1 n1 n1 4
1 4
n
n
4 n 1 n
12
(8.12)
EXERCISES
1. L(f g) = L(f) + L(g) if f and g depend essentially on disjoint sets
of variables.
x 6 ) = x1 x 4 x 1 x 6 x 2 x 4 x 2 x 6 x 3 x 6 x 4 x 5 x 4 x 6
264
d) Lm (T42) = 4 .
4. Prove that the Korobkov formulas for Tnk (see (2.1)) have size
O(n(log n)k1) .
5. The Fibonacci numbers are dened by a0 = a1 = 1 and
an = an1 + an2 . It is well-known that
n+1
n+1
( 5)
5 for = (1 2)(1 + 5) .
an =
Let = {NAND } where (x y) = x y and D (f) d .
Then f is a disjunction of monoms whose length is bounded by ad .
6. We use the notation of Exercise 5. If f Sn is not constant,
v0) . Esti-
265
14. The Nechiporuk method yields only linear bounds for symmetric
functions.
15. (Nechiporuk (66)) Let m and n be powers of 2 , m = O(log n) ,
m and n large enough that there are dierent yij {0 1}m ,
1 i l = n m , 1 j m , having at least two ones each. Let
xij be variables, and let gi j k (x) be the disjunction of all xkl such
that yij has a 1 at position l . Let
xij
f(x) =
1il 1jm
gi j k (x)
1kl k=i
sponding to the Boolean matrix product (see 15, Ch. 6). Then
L(f) = (n3) .
18. We generalize the denition of f in Exercise 17. Let k 3 and
let X1
Xk1 Z be (k 1)-dimensional n n-matrices of
Boolean variables. For N = k nk1 let f MN be the disjunction
of all
zi(1)
1
i(k1) xr(1) r(k2)i(1)
xk1
r(1)
r(k2)i(k1)
(k1)
).
266
20. Let f Bn . Let f(a) = c for all inputs a with exactly k ones and
f(a) = c for all inputs a with exactly k + 1 ones. Estimate K(f) .
21. 2 log n DU (x1 xn ) 2 log n .
267
9.1 Introduction
268
269
(1.1)
(1.2)
(1.3)
270
The subclass of NP-complete languages contains the most dicult problems in NP in the following sense. Either P = NP or no
NP-complete language is in P (see Cook (71), Karp (72), Specker
and Strassen (76), and Garey and Johnson (79)). The majority of
the experts believes that NP = P . Then all NP-complete languages
(one knows of more than 1000 ones) have no polynomial algorithm.
We mention only two NP-complete languages, the set of all n-vertex
graphs (n arbitrary) with an n 2-clique and the class of all sets of
Boolean clauses (disjunctions of literals) which can be satised simultaneously (SAT = satisability problem).
In order to describe the complexity of a language L relative to
another language A , one considers Turing machines with oracle A .
These machines have an extra tape called oracle tape. If the machine
reaches the oracle state, it can decide in one step whether the word y
written on the oracle tape is an element of A . If one can design an
ecient Turing machine with oracle A for L the following holds. An
ecient algorithm for A implies an ecient algorithm for L , because
we can use the algorithm for A as a subroutine replacing the oracle
queries of the Turing machines with oracle A .
DEFINITION 1.3 : Let A be a language. P(A) or NP(A) is the class
of languages which can be decided by a polynomial deterministic or
non deterministic resp. Turing machine with oracle A . For a class C
of languages P(C) is the union of all P(A) where A C , NP(C) is
dened similarly.
DEFINITION 1.4 : The following hierarchy of languages is called
the Stockmeyer hierarchy (Stockmeyer (76)). 0 = 0 = 0 = P .
n = NP(n1) , in particular 1 = NP . n consists of all L whose
complement is contained in n and n = P(n1) .
Obviously n1 n . It is an open problem whether this hierarchy is proper. If n = n+1 , also n = n+k for all k 0 . In
order to prove NP = P , it is sucient to prove that n = P for some
271
n 1 . Kannan (82) proved by diagonalization the existence of languages Lk 3 3 such that |Lk {0 1}n| is polynomially bounded
and Lk has no circuits of size O(nk ) . At the end of this survey we
state a characterization of the classes n and n (Stockmeyer (76)).
THEOREM 1.1 : L n (L n ) i the predicate x L can be
expressed by a quantied formula
(Q1 x1) (Qn xn ) T(x x1
xn )
(1.4)
272
(2.1)
and
(2.2)
(2.3)
273
In O(n) steps the 0 -th step can be simulated. For the simulation
of the t-th step M has to know the state q of M . The left mark
is shifted by one position to the left, then the head of M turns to
the right until it nds the marked register (a #) . M evaluates in
its central unit (q a) = (q a d) , it bears in mind q instead of q ,
replaces the contents of the register by (a B) , if d = R , and by
(a #) , if d = L or d = N . If d = R , the next register to the right
with contents (b B) is marked by (b #) in the next step. One goes
right to the right mark (# #) which is shifted one position to the
right. The head turns back to the left. If d = L , (a #) is replaced
by (a B) , and the register to the left containing (a B) is marked by
(a #) . The simulation stops when the left mark (# #) is reached.
M is oblivious, and the t-th computation step is simulated in O(t + n)
steps. Altogether t (n) = O(t(n)2) for the running time of M . A more
ecient simulation is due to Fischer and Pippenger ((73) and (79))
(see also Schnorr (76 a)).
THEOREM 2.2 : A deterministic Turing machine M with time complexity t(n) can be simulated by an oblivious Turing machine M with
time complexity O(t(n) log t(n)) .
Proof : Again we use a step-by-step simulation. We shift the information of the tape in such a way that each step is simulated in
register 0 . A move of the head to the right (left) is simulated by a
shift of the information to the left (right). This idea again leads to
an O((t(n)2) algorithm. To improve the running time we divide the
information into blocks such that a few small blocks have to be shifted
often, but large blocks have to be shifted rarely.
We like to shift a block of length l = 2m in time O(l ) l positions
to the right or left. This can be done by an oblivious Turing machine
with 3 tapes. One extra tape is used for adding always 1 until the sum
equals l which is stored on a second track of the tape. The second
extra tape is used to copy the information.
274
275
0 recursively by
(2.4)
276
277
(3.1)
and not only on s(n) . Before we formulate the main result of this section (Borodin (77)), we prove a simple connection between the time
complexity and the space complexity of Turing machines. If a Turing machine runs too long on a short working tape, it reaches some
conguration for the second time. This computation cycle is repeated
innitely often, and the machine never stops. Let the registers of
278
as(n) j)
(3.2)
(3.3)
If t(n) k(n) , the Turing machine runs for some input in a cycle and
does not stop at all on this input. Hence by (3.3)
log t(n) log k(n) = O(l (n))
(3.4)
(3.5)
Proof : We assume that the Turing machine does not stop in q , but
that it remains in q and does not change the contents of the registers
of the working tape. Then the computation on x can be described by
the sequence of congurations k0 (x)
kt(n) (x) and x L i register 1 of the working tape contains a 1 in kt(n) (x) .
For each conguration k(x) the direct successor k (x) is unique.
k (x) does not depend on the whole input vector but only on that bit
xi which is read by the Turing machine in conguration k(x) . Let
A = A(x) be the k(n) k(n)-matrix where ak k = 1 (k , k congurations) i k is the direct successor of k on input x . Since ak k
depends only on one bit of x , A can be computed in depth 1 . Let
Ai = (aik k ) be the i -th power of A with respect to Boolean matrix
multiplications. Since
aik k =
k
aki1k ak
(3.6)
279
aik k = 1 i on input x and starting in conguration k we reach conguration k after t steps. We compute by log t(n) Boolean matrix
multiplications AT for T = 2 log t(n) . The depth of each matrix multiplication is log k(n) + 1 . Finally
fn (x) =
kKa
aT
k0
(3.7)
Each Boolean function fn Bn can be computed by a Turing machine in n steps without working tape. The machine bears in mind the
whole input x and accepts i x fn1(1) . But the number of states of
the machine is of size 2n and grows with n . We like to design a Turing machine which decides L , the union of all fn1 (1) where fn Bn .
But L can be non recursive even if C(fn) is bounded by a polynomial.
A simulation of circuits by Turing machines is possible if we provide
the Turing machine with some extra information depending on the
length of the input but not on the input itself (Pippenger (77 b),(79),
Cook (80)).
DEFINITION 4.1 : A non uniform Turing machine M is a Turing
machine provided with an extra read-only tape (oracle tape) containing for inputs x of length n an oracle an . The computation time t(x)
is dened in the usual way, the space s(x) is the sum of the number of dierent registers on the working tape scanned on input x and
log |an | .
280
281
the result of each gate is used only once. We number the gates of a
formula of depth dn in postorder. The postorder of a binary tree T
with left subtree Tl and right subtree Tr is dened by
postorder(T) = postorder(Tl ) postorder(Tr) root(T)
(4.1)
n} . If
A gate is encoded by its type and numbers il ir {0
il = 0 , the left direct predecessor is another gate and, if il = j
{1
n} , the left direct predecessor is the variable xj . In the same
way ir is an encoding of the right predecessor. Each gate is encoded
by O(log n) bits. Since formulas of depth d contain at most 2d 1
gates, log |an | = O(dn ) .
The Turing machine simulates the formula given in the oracle stepby-step. If we consider the denition of the postorder, we conclude
that gate G has to be simulated immediately after we have simulated
the left and the right subtree of the tree rooted at G . If we erase all
results that we have already used, only the results of the roots of the
two subtrees are not erased. Hence the inputs of G can be found in
the following way. The value of variables is looked up on the input
tape, the result of the j {0 1 2} inputs which are other gates are
the last j bits on the working tape. These j bits are read and erased,
and the result of G is added at the right end of the working tape.
Since each gate can be simulated in time O(n) the claim on the
computation time of the Turing machine follows. It is easy to prove
by induction on the depth of the formula that we never store more
than dn results on that working tape where we store the results of
the gates. We use O(log n) registers on a further working tape for
counting. Hence the space complexity s(n) of the Turing machine is
bounded by O(dn ) .
282
283
(5.2)
284
(5.3)
(5.4)
for some B NP and h Poly . Since NP = 1 , there is by Theorem 1.1 some L P and some polynomial p such that
L = {x | y {0 1}p(|x|) : x h(|x|) y L }
(5.5)
285
(5.6)
(6.1)
286
1 2+
(6.2)
1 2 for x L and
(6.3)
for x L
Prob(M(x) = 0) = 1
iv) ZPP (probabilistic polynomial with zero error) is the class of languages L {0 1} such that there is some ppTm M where
Prob(M(x) = 0) = 0 and
Prob(M(x) = 1)
and
1 2 for x L
(6.4)
Prob(M(x) = 1) = 0 and
Prob(M(x) = 0)
1 2 for x L
and
R NP
(6.5)
The error probability of BPP algorithms can be decreased by independent repetitions of the algorithm.
LEMMA 6.1 : Let M be a ppTm for L BPP fullling (6.2). Let
Mt (t odd) be that probabilistic Turing machine which simulates M
for t times independently and which accepts (rejects) x if more than
t 2 simulations accept (reject) x and which otherwise does not decide
about x . Then
Prob(Mt (x) = L (x))
1 2m
if
(6.6)
287
2
(m 1)
log 1 42
(6.7)
(1 2 + )i(1 2 )i = (1 4 2)i
(6.8)
t i
p (1 p)ti
i
t
(1 4 2)t
i
(6.9)
Hence
Prob(Mt (x) = L (x))
1 (1 4 2 )t
2
0i t 2
t
i
(6.10)
288
(6.12)
289
Again, the experts do not expect that. But beware of experts. The
experts also believed that non uniform algebraic decision trees cannot
solve NP-complete problems in polynomial time. However Meyer auf
der Heide (84) proved that the knapsack problem can be solved by
algebraic decision trees in polynomial time.
DEFINITION 7.1 : A language L is called polynomially self-reducible
if it can be decided by a polynomially time bounded Turing machine
with oracle L which asks its oracle for inputs of length n only for words
of length m n .
LEMMA 7.1 : SAT is polynomially self-reducible.
Proof : Let the input be a set of clauses where at least one clause
includes x1 or x1 . Then we replace at rst x1 by 1 and ask the oracle
whether the new set of clauses can be satised. Afterwards we repeat
this procedure for x1 = 0 . We accept i one of the oracle questions is
answered positively.
(7.1)
290
(7.2)
This lemma serves as technical tool for the proof of the following
theorem (Balcazar, Book and Schoning (84)). The complexity classes
k and k(A) are dened in 1.
THEOREM 7.1 : Let A k Poly be polynomially self-reducible.
Then 2(A) k+2 .
Before we prove this theorem, we use it to prove the announced
result due to Karp and Lipton (80).
THEOREM 7.2 : If SAT has polynomial circuits, then 3 = 2 and
the Stockmeyer hierarchy collapses at the third stage.
Proof : If SAT has polynomial circuits, then SAT P Poly =
0 Poly (Theorem 5.1). Hence 2(SAT) 2 3 (Theorem 7.1
and Lemma 7.1). Since SAT is NP-complete, 3 = NP(NP(NP)) =
2(SAT) . Hence 2 = 3 .
(7.3)
291
(7.4)
Here q is a polynomial where q(|x|) is a bound for |y| and |z| . Since
M is working in polynomial time r(|x|) , also the length of each oracle
word is bounded by r(|x|) . Let p be the polynomial p r . We want
to prove that
L = {x | ( w)p : (( u)r R(u w) ( y)q ( z)q S(w x y z))} (7.5)
where R(u w) holds i u Bw u L(M Bw ) and
S(w x y z) holds i x y L(M Bw ) .
R S P(k) since Bw k . (7.5) implies
L = {x | ( w)p ( y)q ( u)r (z)q : R(u w) S(w x y z)}
(7.6)
292
293
THEOREM 8.2 :
depth dn .
294
i)
Sn UE -uniform
Sn UB -uniform .
Sn UD-uniform
Sn UBC -uniform
NC = UX - SIZE,DEPTH(nO(1) logO(1) n) .
EXERCISES
1. An oblivious t(n) time bounded Turing machine with k tapes can
be simulated by circuits of size O(t(n)) .
2. A t(n) time bounded Turing machine with k tapes can be simulated by an O(t(n)2) time bounded Turing machine with one tape.
3. Specify an oblivious Turing machine for sim(0) .
295
296
10.1 Hierarchies
How large are the gaps in the complexity hierarchies for Boolean
functions with respect to circuit size, formula size and depth ? A gap
is a non-empty interval of integers none of which is the complexity
of any Boolean function. Other hierarchies are investigated in Ch. 11
and Ch. 14.
Let Bn denote the set of all Boolean functions depending essentially
on n variables, n is xed for the rest of this section. For any complexity
measure M , let M(r) be the set of all f Bn where M(f) r . The
gap problem is to nd for each r the smallest increment r = m(r) such
that M(r) is a proper subset of M(r + r ) .
We are interested in c (j) , l (j) and d (j) for binary bases .
Tiekenheinrich (83) generalized the depth results to arbitrary bases.
Obviously c (j) is only interesting for those j where C(j + 1) =
and C(j) = Bn . For any complexity measure M , let M(Bn ) be the
complexity of the hardest function in Bn with respect to M . It has
been conjectured that
d (j) = c (j) = l (j) = 1 for all and all interesting j .
(1.1)
It is easy to prove that d (j) 2 and c (j) l (j) j + 1 (McColl (78 a)). The best results are summarized in the following theorem
(Wegener (81) and Paterson and Wegener (86)).
THEOREM 1.1 :
d (j) = 1 for all B2 and log n 1 j
Let {B2 U2 m} . Then
D (Bn ) .
(1.2)
297
c (j) = 1
if n 2 j C (Bn1)
c (j) n if C(Bn1)
C (Bn ) and
l (j) n if n 2 j
L (Bn )
(1.3)
(1.4)
(1.5)
if n 2 j C (Bn1) and
C(Bn )
(1.6)
(1.7)
Proof of Theorem 1.1 : The idea of the proof is to take some function
f in Bn of maximal complexity and construct a chain of functions
from the constant function 0 up to f such that the circuit size cannot
increase by much at any step in the chain. We then conclude that
there can be no large gaps.
Let f Bn be a function of maximal circuit size with respect to .
Let f 1(1) = {a1
ar} . For the case = m we shall assume that
298
the as are ordered in some way, so that for s t , at does not contain
more ones than as . Let f0 0 , f1
fr = f where
fk1(1) = {a1
ak} for 0 k r
(1.8)
mk(x) = x1
xe(n)
n
(1.9)
i | e(i)=1
xi
(1.10)
(1.11)
C(fk ) Ck (fk1) + n
(1.12)
It is possible that fk Bn . If fk depends essentially only on m variables, we assume w.l.o.g. that these variables are x1
xm . In
fk = fk1 mk or fk = fk1 tk we replace xm+1
xn by 0 . Therefore
(1.12) is improved to
C(fk ) C (fk1) + m
(1.13)
(1.14)
299
max{D (fk1
) log n + 1} + 1 and
)+n
C (fk) C (fk) C (fk1) + n C (fk1
(1.15)
(1.16)
(1.17)
V(hi1) V(hi )
(1.18)
(1.19)
= C(gi1) + 1
For any n 1 j C(Bn1) + 1 we nd some i(j) such that
C(gi(j) ) = j . This proves (1.3).
We construct h0
hm . Let f Bn1 be of maximal circuit size.
Let fr = f and r = |f 1(1)| . We construct fk1 as before by removing
a (minimal) element of fk1(1) , but now regarding fk as a member
of Bs(k) where s(k) = |V(fk)| . The eect of this is to ensure that
300
fr
s(r +1)
e(i)
xi ) for 0 i s(k)
(1.20)
fr1 0
fr1 s(r) fr 0
= xn , fr 0 = fr xn , fr Bn1 and
fk1 z x1
e(i)
xi
= fk1 i+1
(1.21)
In an optimal circuit for fk1 s(k) we replace xn by 1 . Then we compute fk (see (1.20)). Since fk 0 = fk xn , also C(fk 0) C (fk1 s(k)) .
We prove (1.18). Let s = s(k) . V(fk1 i) is a subset of
{x1
xs xn} . We regard fk1 i as a member of Bs+1 . fk1 i depends
essentially on xn . Let ak be that vector which has been removed
1
from fk1(1) for the construction of fk1
(1) . Then ak {0 1}s and
fk1 i(ak 0) = 0 but fk1 i(ak 1) = 1 . If xn = 0 , fk1 i = fk1 and
fk1 i depends essentially on all variables in V(fk1) = {x1
xs }
where s = s(k 1) . Moreover fk1 i depends essentially on x1
xi .
If fk1 i was not depending essentially on xj (j i) we could use
the procedure for the proof of (1.16) and would obtain a circuit
for fk not depending on xj . This would be a contradiction to the
fact that fk depends essentially on x1
xs . For s = max{s i} ,
V(fk1 i) = {x1
xs xn} . Therefore V(fk1 i) V(fk1 i+1) and
V(fk1 s) = {x1
xs xn} = V(fk 0) .
301
The counterexamples show that Theorem 1.1 is at least almost optimal. The general proof of (1.2) (Wegener (81)) is based on the assumption that the constants 0 and 1 are inputs of the circuit. Strangely
enough, this assumption is necessary at least for small n . Let be
the complete basis {1 } (Ring-Sum-Expansion) and let 1 be not
an input of the circuit. B1 = {x1 x1} , D (x1) = 0 but D (x1 ) = 2 .
Therefore d (0) = 2 .
(2.1)
302
(2.2)
Cm
Yes No
L
No
303
Proof : i) Because of the fan-out restriction of formulas we need disjoint formulas for f and g .
ii) The upper bound is obvious since . Let a f 1 (0). By
denition (f g)(a y) = g(y) . Therefore each formula for (f g)
has at least L (g) + 1 leaves labelled by y-variables. The existence of
L (f) + 1 x-leaves is proved in the same way. Altogether each formula
for (f g) has at least L (f) + L (g) + 2 leaves. This implies the
lower bound.
304
x(m) h(x(m+1)
x(n) ))
(2.3)
305
(2.4)
(2.5)
306
COROLLARY 2.1 :
o(n log1 n) .
10.3 Reductions
Reducibility is a key concept in the complexity theory. Polynomialtime reducibility is central to the concept of NP-completeness (see
Garey and Johnson (79)). Reducibility can be used to show that the
complexities of dierent problems are related. This can be possible
even though we do not know the complexity of some problem. Reducibility permits one to establish lower and upper bounds on the
complexity of problem A relative to problem B or vice versa. If A is
reducible to B , this means that A is not much harder than B . Lower
bounds on the complexity of A translate to similar lower bounds on the
complexity of B . An ecient algorithm for B translates to a similarly
ecient algorithm for A . This requires that the necessary resources
for the reducibility function are negligible compared to the complexity
of A and B .
The monograph of Garey and Johnson (79) is an excellent guide to
reducibility concepts based on Turing machine complexity. Because of
the ecient simulations of Turing machines by circuits (Ch. 9, 23)
all these results can be translated to results on circuit complexity.
We discuss three reducibility concepts that were dened with view
on the complexity of Boolean functions. The three concepts are NC1 reducibility, projection reducibility and constant depth reducibility.
NC1 -reducibility is dened via circuits with oracles (Cook (83),
Wilson (83)). We remember that NCk is the class of all sequences fn Bn having UBC-uniform circuits Sn of polynomial size and
O(logk n) depth (see Ch. 9, 8).
307
308
We often have used (see Ch. 3) implicitly the notion of NC1 reducibility, although we have not discussed the uniformity of the
circuits. In order to practise the use of this reducibility concept,
we present some NC1 -reducibility results on arithmetic functions
(Alt (84), Beame, Cook and Hoover (84)). Let MUL (more precisely
MULn ) be the multiplication of two n-bit integers, SQU the squaring
of an n-bit integer, POW the powering of an n-bit number, i.e. the
computation of x x2
xn and DIV the computation of the n most
signicant bits of x1 .
Proof : MUL =1 SQU , since both problems are in NC1 . For this
claim we use the circuits designed in Ch. 3. Nevertheless we state
explicit reductions. SQU 1 MUL , since SQU(x) = MUL(x x) .
MUL 1 SQU , since
x y = (1 2) ((x + y)2 x2 y2)
(3.1)
and there are UBC -uniform circuits of logarithmic depth for addition,
subtraction, and division by 2 .
SQU 1 POW , since SQU is a subproblem of POW .
SQU 1 DIV by transitivity and POW 1 DIV . An explicit
reduction is given by
x2 =
1
x
1
1 x
x+1
(3.2)
(3.3)
309
y := 22 n
=
1
2 n3
=
2
2
1 22 n x
0i
2i 2 n xi
(3.4)
2
22 n (ni) xi
0i
xn has at most n2 signicant bits. After computing enough (but polynomially many) bits of y we can read o x x2
xn in the binary
representation of y .
n(yp(n) ))
(3.5)
yp(n) }
310
311
312
not equal to PARn . Nevertheless PAR is not much harder than MAJ
by the following reduction :
Tnk(x) (Tnk+1(x))
PARn (x) =
(3.6)
k odd
313
c and size p(n) containing oracle gates for gj or gj with j p(n) - the
size and the depth of the oracle gates is 1 - and on each path is at
most one oracle gate.
Notation : f cd g .
THEOREM 3.4 : i) cd is reexive and transitive.
ii) f proj g f cd g .
iii) f cd g , g SIZE - DEPTH(S(n) D(n)) , S and D monotone
f SIZE - DEPTH(p(n) S(p(n)) c D(p(n))) for some polynomial p
and constant c . In particular g SIZE - DEPTH(poly const)
f SIZE - DEPTH(poly const) .
The easy proof of this theorem is left as an exercise. By Theorem 3.4 ii the results of Theorem 3.3 hold also for constant depth reducibility. For simple problems like PAR and MAJ nothing can be
proved with projection reducibility, but a lot is known about constant
depth reducibility. Some of the following results have been proved by
Furst, Saxe and Sipser (84) but most of them are due to Chandra,
Stockmeyer and Vishkin (84).
DEFINITION 3.7 :
PAR : Input: x1
xn . Output x1 xn .
x1
xn .
Output 1
(x).
314
BCOUNT : Input: x1
x 1 + + xn .
UCOUNT : Input: x1
of x1 + + xn .
THEOREM 3.5 : PAR =cd ZMc cd MUL =cd SOR =cd MADD =cd
THR =cd MAJ =cd BCOUNT =cd UCOUNT cd UST CON .
By Theorem 3.4 ii we can combine the results of Theorem 3.3
and Theorem 3.5. The lower bound for parity (which we prove in
Ch. 11) translates to lower bounds for all problems in Theorem 3.3
and Theorem 3.5. Again reducibility is a powerful tool.
We do not prove that MUL
UCOUNT cd UST CON , only
the weaker claim PAR ZMc cd UST CON . This claim is sucient
for the translation of the lower bound on parity to the other problems.
Proof of Theorem 3.5 :
PAR cd ZMc : We use 2c1 copies of each xi and an oracle gate for
ZMc on these n2c1 inputs.
ZMc cd PAR : Obviously PAR = ZMc for c = 1 . Because of
transitivity it is sucient to prove ZMc cd ZM(c-1) if c 2 . Let
x1
xn be the inputs and let us compute yij = xi xj for 1 i j n .
It is sucient to prove that
ZMc(x1
xn ) = ZM(c-1)(x1
ZM(c-1)(y12
xn )
(3.7)
yn1 n)
Let s be the sum of all xi and t the sum of all yij . Then
yij = (1 2)
t =
1i jn
= (1 2)
xi xj
(3.8)
i=j
xi
1in
x2i
1in
= (1 2) (s2 s)
315
(3.9)
Tn1 (x)) .
We
compute
Then we compute PAR(x)
For the proof that the second group of problems contains equivalent problems with respect to constant depth reducibility it is (by
transitivity) sucient to prove that MAJ cd MUL cd MADD cd
BCOUNT cd SOR cd UCOUNT cd THR cd MAJ .
MAJ cd MUL : If we are able to compute the binary representation
cn of x1 + + xn , we are done. MAJ(x) = 1 i x1 + + xn n 2 .
This comparison can be performed in depth 2 and polynomial size by
the disjunctive normal form, since the length of cn is k = log(n + 1) .
For the computation of cn we use a padding trick already used in Ch. 3,
2. Let a be the binary number of length n k with xi at position k(i1)
and zeros elsewhere. Then a is the sum of all xi 2k(i1) . Let b be the
binary number of length n k with ones at the positions k(i 1) for
1 i n and zeros elsewhere. Then b is the sum of all 2k(i1) . We
compute (at an oracle gate) c , the product of a and b . Then c is the
sum of all ci 2k(i1) with k-bit numbers ci contained in c . It is easy to
see that cn = x1 + + xn .
MUL cd MADD : Obvious by the school method for multiplication.
MADD cd BCOUNT : Let ai = (ai m1
ai 0) for 1 i m be
the m numbers we have to sum up. We use in parallel oracle gates to
compute the l = log(m + 1) -bit numbers bj = a1 j + + am j . Then
316
see)
The
am
= 1
317
i ai
aj or ai = aj and i j . Then we compute in parallel by
oracle gates dj , the sum of all cij . dj is the unary representation of
the position of aj in the sorted list of a1
am . By similar methods
as in the last reduction BCOUNT cd SOR we compute the sorted
list of a1
am .
UCOUNT cd THR : We compute in parallel by oracle gates yi =
THR(x1
xn i) . Then UCOUNT(x1
xn) = (yn
y1 ) .
xm and k =
THR cd MAJ : The input consists of x1
(km
k1) , a number in unary representation. We compute by an
oracle gate z = MAJ(x1
xm k1
km 1) .
z = 1 i x1 + + xm + k1 + + km + 1 m + 1 . There are l ones
in k if k represents l . Then k1 + + km = m l . Hence z = 1 i
x1 + + xm l i THR(x k) = 1 .
In order to relate the complexity of PAR and ZMc to all problems considered in Theorem 3.3 we prove PAR cd UST CON . We
compute the adjacency matrix A of an undirected graph G on the
vertices v0
vn+1 . Let x1
xn be the inputs of PAR and let
x0 = xn+1 = 1 . Let aii = 0 and let
aij = xi xi+1 xj1 xj
for i
(3.10)
G contains exactly one path from v0 to vn+1 , this path passes through
all vi with xi = 1 . The length of this path is even i x1 xn =
1 . We square in polynomial size and constant depth the Boolean
matrix A . The result is B , the adjacency matrix of the graph G
where vi and vj are connected by an edge i they are connected in G
by a path of length 2 . i.e. v0 and vn+1 are connected by a path i
x1 xn = 1 . Therefore one oracle gate for UST CON is sucient
for the computation of x1 xn .
318
EXERCISES
1. Let Nn be the class of all f Mn whose prime implicants all have
length n 2 . Then there is for each log n j D(Nn ) (or
Dm (Nn)) a function gj Nn where D(gj ) = j (or Dm (gj ) = j) .
2. Prove (1.2) for the bases { } and {NAND} .
3. Prove d (j) 2 for all binary bases and all interesting j .
4. Prove (1.5).
5. Prove (1.3) and (1.4) for further bases.
6. c (j) = 1 for n = 2 , 0 j
{NAND} .
C (B2) and = { } or =
319
320
11.1 Introduction
321
Ch. 10). In 3 we prove that polynomial circuits for the parity function have depth ((log n) (log log n)) . Applying the reducibility results of Ch. 10, 3 , we conclude that many fundamental functions
with polynomial circuit size are not in SIZE - DEPTH(poly const) .
Therefore we should use circuits where the number of logical levels is
increasing with the input length.
In 2 we prove for some fundamental functions that they are in
SIZE - DEPTH(poly const) and design almost optimal circuits for the
parity function. The announced lower bound for the parity function
is proved in 3. In 4 we describe which symmetric functions are
contained in SIZE - DEPTH(poly const) . In 5 we discuss hierarchy
problems.
We nish this introduction with two concluding remarks. Boundeddepth circuits also represent an elegant model for PLAs (programmable logic arrays). Furthermore results on bounded-depth circuits are related to results on parallel computers (see Ch. 13).
322
DEFINITION 2.1 :
ADD : Addition of two m-bit numbers, n = 2m .
COM : The comparison problem for two m-bit numbers x =
x0) and y = (ym1
y0) . Output 1 |x|
|y| .
(xm1
n = 2m.
MAX : The computation of the maximum of m m-bit numbers, n =
m2 .
MER : The merging problem for two sorted lists of m m-bit numbers,
n = 2 m2 .
U B : Input: an n-bit unary number k . Output: the binary
representation of k .
THEOREM 2.1 : ADD , COM , MAX , MER and U B are in
SIZE - DEPTH(poly const) .
Proof : ADD : We implement the carry look-ahead method (see Ch. 3,
1). We compute uj = xj yj and vj = xj yj (0 j m1) in depth 2 .
The carry bit cj is the disjunction of all ui vi+1
vj (0 i j) (see
Ch. 3, (1.8)). The sum bits sj are computed by s0 = v0 , sn = cn1 and
sj = vj cj1 (1 j n 1) . The size of the circuit is O(n2) (the
number of wires is O(n3)) and the depth is 4 if vj and sj are computed
by 2 -circuits for a b .
COM : |x| |y| if there is an i such that yi = 0 xi = 1 and yj = xj for
all j i . This can be computed by
xi y i
0im1
(xj yj ) (xj yj )
(2.1)
i jm1
323
(EQk(dj ) bji )
(2.2)
1jm
324
1 (k1)
).
ii) PAR has k(n) - and k(n) -circuits of size O(n2 log1 n) if k(n) =
(log n) log log n + 1 .
Proof : i) has already been proved for n = rk1 . The general case is
left as an exercise.
ii) In Step i we combine the outputs of Step i1 to the least number of
blocks whose size is bounded by log n + 1 . The number of blocks in
Step i is bounded by max{1 n logi n } . k(n) 1 steps are sucient
in order to obtain one block, since (log n)k(n)1 n . Altogether we
require less than 2 n log n 2 - and 2 -parity circuits working on at
most log n + 1 inputs.
325
326
The following main lemma tells us that if we apply a random restriction, we can with high probability convert 2-circuits to equivalent 2-circuits of small size. We need the notion of the 1-fan-in of a
circuit. This is the maximal fan-in of all gates on the rst level. The
proof of the rst technical lemma is left to the reader.
327
2p
4p
t
t
= 1+
+1
(1 + p)
(1 + p)
(3.1)
5 p t , where
(3.2)
Pr(l PI(g ) s | f 1)
(3.3)
328
(3.4)
Pr(l PI Y (g ) s | f 1 g1 1)
YT Y=
Pr( (Y) | f 1 g1 1)
YT Y=
Pr(l PI Y (g ) s | f 1 g1 1 (Y) )
since Pr(l PI Y (g ) s | (Y) ) = 0 . We claim that
Pr ( (Y) | f 1 g1 (Y) 1)
2p
1+p
|Y|
and
(3.5)
YT
2p
1+p
s
0i|T|
|T|
i
|Y|
(2|Y| 1) s|Y|
2p
1+p
(2i 1) i
(3.7)
329
s
0it
t
i
4p
(1 + p)
4p
1+
(1 + p)
2p
(1 + p)
2p
1+
(1 + p)
= s
2p
1+p
|Y|
(3.8)
(3.9)
(3.10)
330
:Y{0 1} 0
(3.11)
Pr (l PI(g ) s |Y| | f 1)
by
. Since g1 is
gj be dened in a similar way. Then g = g1 gm
331
(3.12)
The Main Lemma has many applications. The most appropriate function is the parity function, as all prime implicants and prime
clauses of the parity function have length n , and all subfunctions are
parity functions or negated parity functions.
332
333
section whether they belong to SIZE - DEPTH(poly const) (Brustmann and Wegener (86)). Our results are based on the lower bound
techniques due to H
astad (86). Weaker results based on the lower
bound technique due to Furst, Saxe and Sipser (84) have been obtained by Fagin, Klawe, Pippenger and Stockmeyer (85) and Denenberg, Gurevich and Shelah (83). It is fundamental to know a lower
bound for the majority function.
THEOREM 4.1 : For some constant n0 and all n nk0 k - and k 1
circuits for the majority function have more than 2c(k) n
c(k) = (1 10)k (k1) 1 10 .
(k1)
gates ,
n 2
) n Ck(MAJn )
(4.1)
for n nk+1
0
(4.2)
334
longest constant substring of v(f) . For f Bn let l min (f) be the length
of a shortest prime implicant or prime clause.
LEMMA 4.1 : l min (f) = n + 1 vmax(f) for f Sn .
Proof : A prime implicant t of length k with l variables and k l
negated variables implies vl = = vnk+l = 1 and therefore the
existence of a constant substring of v(f) of length n + 1 k . Furthermore, we obtain a maximal constant substring. If vl 1 = 1 or
vnk+l +1 = 1 , we could shorten t by a variable or negated variable
resp. If vl = = vnk+l = 1 is a maximal constant substring of
v(f) , then the monom x1 xl xl +1 xk is a prime implicant of f of
length k . Dual arguments hold for prime clauses and substrings of
v(f) consisting of zeros.
(4.3)
(n + 1) 2 , then for w = vmax(fn ) and
(4.4)
(4.5)
335
(4.6)
336
337
We have proved that the parity function has large complexity with
respect to depth k circuits. This lower bound implies many others by
the reducibility results of Ch. 10, 3. What happens if unbounded
fan-in parity gates are added to the set of admissible gates ? Then
ZMc is easy to compute, since ZMc cd PAR . Razborov (86) proved
that the complexity of the majority function in depth k { }circuits of unbounded fan-in is exponential. This holds also for all
functions f with MAJ cd f . It is an open problem to decide which
functions are dicult in depth k circuits consisting of threshold gates
of unbounded fan-in.
Razborovs new and striking result belongs to the class of hierarchy
results, since we increase the power of the basis in order to be able to
solve more problems with polynomial circuits. We turn our thoughts
to another type of hierarchy results.
DEFINITION 5.1 : Let k(P) and k (P) be the class of all sequences
f = (fn) of functions fn Bn which can be computed by polynomial
m
k-circuits and k -circuits resp. m
k (P) and k (P) are dened in a
similar way with respect to monotone depth k circuits.
Obviously for all k
k(P) k+1(P) SIZE - DEPTH(poly const)
(5.1)
(5.2)
(5.3)
m
k (P) k (P) {f = (fn ) | fn Mn }
(5.4)
338
complexity classes. This was rst done by Klawe, Paul, Pippenger and
Yannakakis (84) for monotone depth k circuits and then by Yao (85)
in the general case.
DEFINITION 5.2 : Let n = mk . Let us denote the variables by
xi(1) i(k) (1 i(j) m) . Let Q = , if k is odd, and Q = , if k is
even. Then Fk n (x) = 1 i the predicate
i(1) i(2) i(3)
Q i(k) : xi(1)
i(k)
=1
(5.5)
is satised.
In the preceding section we discussed already F2 n = hn .
THEOREM 5.1 : Let Fk = (Fk n ) . Then Fk m
k (P) , but all depth
k 1 circuits for Fk have exponential size.
Again H
astad (86) proved similar results with simpler proofs. We
do not present the proofs which are based on the lower bound method
for the parity function.
Such hierarchy results have further implications. Furst, Saxe and
Sipser (84) have shown tight relations to classical complexity problems.
The results of this chapter imply that the complexity classes k and
k+1 (see Def. 1.4, Ch. 9) as well as the complexity classes k and
PSPACE can be separated by oracles.
EXERCISES
1. If a k -formula is dened on n variables and consists of b gates,
then the number of wires can be bounded by b (n + b) and the
number of leaves can be bounded by b n .
339
340
DEFINITION 1.1 : A synchronous circuit is a circuit with the additional property that all paths from the inputs to some gate G have
the same length.
In Ch. 11 we have considered only synchronous bounded-depth
circuits, since this restriction does not change essentially the model of
bounded-depth circuits. Let Cs and Ds be the complexity measures
for synchronous circuits with binary gates. We remember that PCD(f)
is the product complexity (see Def. 3.1, Ch. 7) , namely the minimal
C(S) D(S) for all circuits S computing f .
THEOREM 1.1 : i) Ds (f) = D(f) for all f Bn .
ii) Cs (f) PCD(f) C(f)2 for all f Bn .
Proof : i) and the second inequality of ii) are obvious. Let S be a
circuit for f where C(S) D(S) = PCD(f) . For each gate G , let d(G)
be the length of the longest path to G . Let G1 and G2 be the direct
predecessors of G , w.l.o.g. d(G1) d(G2) . Then d(G1 ) = d(G) 1 .
We add a path of d(G) d(G2) 1 identity gates to the edge from G2
341
This result implies that the size of each synchronous adder is considerably larger than the size of an optimal asynchronous adder. For
functions f with one output no example in which C(f) = o(Cs(f)) is
known. One might think that the carry function cn , the disjunction
of all ui vi+1 vn (0 i n) , is a candidate for such a gap (see also
Ch. 7, 4). But Cs(cn ) is linear (Wippersteg (82)). Harper (77) and
Harper and Savage (79) generalized the bottleneck argument of Theorem 1.2 to functions with one output. The gates on level l D(f)
contain all necessary information on f and all subfunctions of f . Therefore the number of gates on level l cannot be small if f has many
subfunctions.
DEFINITION 1.2 : For f Bn let N(f A) be the number of subfunctions g of f on the set of variables A X = {x1
xn} . Let
342
1
n
a
N (f a) =
log N(f A)
(1.1)
AX |A|=a
N (f a)
1
0kb
b
k
nb k
2 (1 r)1
ak
(1.2)
Let s(k) =
q(k) :=
(1.3)
ab
= q(0)
nab+1
Since q(0) = r
1
2k . Then
s(k + 1)
bk
ak
= 2
s(k)
k+1 nab+1+k
n
a
nb
ak
s(k)
0kb
n
a
nb
a
rk (1 r)1
(1.4)
0kb
1 and a {1
n} , then
(1.5)
343
Cs(f) (1 ) d N (f a)
(1.6)
N(gi A)
(1.7)
1im
N (gi a)
(1.8)
1im
(1.9)
(1.10)
d} is at least
344
Fig. 2.1
z
345
Proof : The rst two inequalities follow from the denition. For the
last inequality we consider an optimal circuit S for f . Let c = C(f) .
We embed input xi at (i 0) and gate Gj at (0 j) . An edge e from
xi or Gi to Gj is embedded by a continuous function e = (e 1 e 2 )
where e k : [0 1] for k {1,2} , e (0) = (i 0) or e (0) = (0 i)
resp. and e (1) = (0 j) . We dene all embeddings such that e 2 is
decreasing. If the edges leading to G1
Gi1 are embedded, then
we embed the two edges leading to Gi in such a way that all previous
edges are crossed once at most. Since the circuit contains 2 c edges,
the number of crossings is bounded by 22c = c (2 c 1) . We replace
each crossing by the planar circuit of Lemma 2.1. In addition to the
c gates of S we obtain at most 3 c (2 c 1) new gates.
346
x1
xn
Fig. 2.2
. . . . . . .G
G
If we replace this crossing by the planar circuit of Lemma 2.1 , we obtain a cycle starting at w leading via G to the z-input of the crossingcircuit, then leading to the y-output of the crossing-circuit and
back to w .
This problem does not occur for our embedding in the proof of
Theorem 2.1. All edges are embedded top-down. If some edges e and
e lie on the same path, their embeddings do not cross.
Theorem 2.1 implies an upper bound of O(22n n2) on the planar
circuit complexity of all f Bn . Savage (81) improved this simple
bound.
347
The upper bound on Cp (f) has been improved by McColl and Paterson (84) to (61 48) 2n . McColl (85 b) proved that for almost all
f Bn the planar circuit complexity is larger than (1 8) 2n (1 4) n .
This implies that Cp (f) = o(Cp(f)) for almost all f Bn .
With information ow arguments and the Planar Separator Theorem due to Lipton and Tarjan ((79) and (80)), Savage (81) proved
(n2)-bounds on the planar circuit complexity of several n-output
functions. Larger lower bounds imply (due to Theorem 2.1) nonlinear
bounds on the circuit complexity of the same functions.
The investigation of planar circuits is motivated by the realization
of circuits on chips. Today, chips consist of h levels, where h
1 is
a small constant. We introduce the fundamental VLSI-circuit model.
For some constant 0 , gates occupy an area of 2 , and the minimum distance between two wires is . A VLSI-circuit of h levels
on a rectangular chip of length l and width w consists of a threedimensional array of cells of area 2 . Each cell can contain a gate, a
wire or a wire branching. Wires cross each other at dierent levels
of the circuit. A crossing of wires occupies a constant amount of area.
The area occupied by a wire depends on the embedding of the circuit.
VLSI-circuits are synchronized sequential machines. The output of
gate G at the point of time t may be the input of gate G at the point
of time t + 1 , i.e. the circuits are in general not cycle-free. For each
input there is a denite input port, a cell at the border of the chip,
348
349
some y = (y1
ym
) where
f(x1
xn y1
ym
) = (x(1)
x(n) )
(2.1)
xn1 ,
350
y1
ym where m = log n . Let y be the binary number represented
by (y1
ym) . Output: z0
zn1 where zi = x(i+y)modn .
MUL : Input: n-bit numbers x y M . Output: the n-bit number
z xy mod M .
CYCCON : Cyclic convolution.
Input: 2 n k-bit numbers
x0 y0
xn1 yn1 . Output: n k-bit numbers z0
zn1 where
zl
xi yj mod M for M = 2k 1
(2.2)
i+j l modn
xn and
MVP : matrix-vector-product. Input: n k-bit numbers x1
an n n-matrix Y = (yij) where yij {0 1} . Output: n k-bit numbers
z1
zn where
zl
yl i xi mod M for M = 2k 1
(2.3)
1in
xi 2i+s
0in1
xi 2i+s
0ins1
n 1} .
(2.4)
xi 2i+sn mod M
nsin1
351
(2.5)
(2.6)
(2.7)
352
xi has to cross the cut if the input port for xi is in another part of
the circuit than the output port for the 1 (i)-th output. In this case
k(i ) := 1 , otherwise k(i ) := 0 .
Let Gij be the set of all G where (i) = j . It follows from
elementary group theory (see e.g. Huppert (67)) that all Gij are of
the same size. Hence 1(j) = i for |G| n1 permutations G . It
follows that
k(i ) = |G| n1 Lout
(2.8)
(2.9)
(2.10)
G1in
(2.11)
353
y1
ym which take the values 0 and 1 independently with probability 1 2 . According to this probability distribution the output S(x) of
S on input x is a random variable.
DEFINITION 3.2 : Let A B {0 1}n and 0 q
p 1 . The
probabilistic circuit S separates A and B with respect to (p q) if
x A : Pr(S(x)) = 1) p and
(3.1)
x B : Pr(S(x)) = 1) q
(3.2)
Notation: (S A B p q) .
S is an -computation of f , if (S f 1(1) f 1(0) (1 2) + 1 2) is satised.
The following considerations can be generalized to circuits with
binary gates or formulas. But we concentrate our view upon boundeddepth circuits with gates of unbounded fan-in (see Ch. 11). We shall
prove Theorem 2.2, Ch. 11 , by designing probabilistic circuits for
threshold functions.
At rst, we present simple transformations and show how probabilistic circuits of very small error probability lead to deterministic circuits. Afterwards we investigate how we can reduce the error
probability. On the one hand, we improve logr n-computations to
logr+1 n-computations, if r 2 , and on the other hand, we improve
log1 n-computations to computations of very small error probability.
All results are due to Ajtai and Ben-Or (84).
LEMMA 3.1 : Let (S A B p q) be satised for A B {0 1}n and
0 q p 1.
i)
If p p
q q , (S A B p q ) is satised.
354
satised.
2n and p
1 2n , there is a deterministic circuit Sd
iv) If q
such that C(Sd) C(S) , D(Sd ) D(S) and (Sd A B 1 0) are
satised.
Proof : i) Obvious by denition.
ii) We negate the output of S and apply the deMorgan rules bottomup.
iii) We use l copies of S with independent random inputs and combine
the outputs by an -gate. Sl (x) = 1 i all copies of S compute 1 .
The assertion follows since the copies of S have independent random
inputs.
iv) Since q p , A and B are disjoint. Let f be dened by f 1(1) =
A . For x A B , the error probability, i.e. the probability that
S(x) = f(x) , is smaller than 2n . Let I(x) be the random variable
where I(x) = 1 i S(x) = f(x) and I(x) = 0 otherwise. Then E(I(x)) =
Pr(S(x) = f(x)) . Hence
E
I(x) =
xAB
E(I(x))
(3.3)
xAB
E(I)
1 . Therefore there is a vector (y1
ym
) {0 1}m which
leads to zero error. Sd is constructed by replacing the random inputs
y1
ym by the constants y1
ym
.
355
and
(3.5)
1 + (2 log n) (logr n) = 1 + 2 logr+1 n
(3.6)
(1 n2)l (1 n2 )(n
1) ln 2
3 ln 2 . Hence
e ln 2 = 1 2
(3.8)
= (1 2)n
(n2 3)
(3.9)
356
(1 2)n
(n2 3)
= (1 2) 8n
(3.10)
= (1 2) (1 + O(n2))
For the second factor 2 n2 (n2 3) (ln 2)
exp{x} =
0i
1 3 for large n . If x 0
(x)i
x x3 x5
1x 1
2 4!
6!
i!
(3.11)
1 1 2 logr+1 n
(3.12)
(1 2) (1 + O(n2)) (1 1 2 logr+1 n)
(3.13)
(1 2) (1 logr+1 n) for large n .
L.
L.
L.
L.
L.
L.
L.
L.
L.
3.1
3.1
3.1
3.1
3.1
3.1
3.1
3.1
3.1
i&
ii
i&
ii
i&
ii
i&
ii
iv
iii l = 2 log n
(3.14)
iii l = 2 n2 ln n
iii l = n3
iii l = n
357
(3.16)
358
(3.17)
fn (x) = 0 if x1 + + xn
(3.18)
n 2
359
(3.19)
We are content with this denition and the remark that many
RNCk algorithms are known.
EXERCISES
1. Prove the upper bound in Theorem 1.2.
2. Each synchronous circuit for the addition of an n-bit number and
a 1-bit number has size (n log n) .
3. Estimate Cs(f) for all f Bn by synchronizing Lupanovs circuit
(Ch. 4, 2).
4. Each synchronous circuit for the computation of the pseudo complements of a k-slice (Ch. 6, 13) contains at least log(n 1)
log( nk + 2) gates.
5. The complete graph Km is planar i m 4 .
6. The complete bipartite graph K3 3 is not planar.
7. (McColl (81)). The following functions are not computable in
monotone, planar circuits.
a) f1 (x y) = (y x) .
360
b) f2(x y) = (xy x) .
c) f3(x y) = (xy x y) .
8. (McColl (85 a)). T53 (and also Tnk for 2
computable in a monotone, planar circuit.
n 1) is not
361
13.1 Introduction
362
have random access to a shared memory. Processor j obtains information from processor i by reading an information written by processor i .
These parallel random access machines represent no realistic model,
but on the one hand it is convenient to design algorithms in this model,
and, on the other hand, ecient algorithms are known for the simulation of these algorithms on those realistic parallel computers discussed
above (see e.g. Mehlhorn and Vishkin (84) or Alt, Hagerup, Mehlhorn
and Preparata (86)). Hence we investigate only parallel random access machines, nowadays the standard model of parallel computers at
least for the purposes of the complexity theory.
363
364
365
(2.1)
1id
2 d + 2 s (d s) = 4 d
ii) There is for each gate of the circuit a cell in the shared memory and
for each edge of the circuit a processor. In time step 0 the contents
of the cells representing -gates are replaced by ones. In time step i
(1 i d) all gates on the i -th level are simulated. The processors
for the edges leading to these gates read the inputs of the edges. If the
level is an -level a processor writes 0 into the corresponding cell for
366
the gate i the processor has read 0 . -gates are simulated similarly.
367
368
369
(3.1)
We simulate Wn step-by-step. The diculty is to nd the information in the memories, since by numbers of length L(n) one may
address 2L(n) dierent cells. These are 2L(n) possible addresses. For
370
each denite input each processor may change the contents of at most
t(n) cells. Therefore the circuits use an internal representation of the
memories of Wn . Everything written at time step k gets the internal address k . If this information is deleted at a later time step,
the information is marked as invalid. The index l refers to the local
memory and index c to the common memory. For all 1 p p(n) ,
1 k t(n) , k t t(n) we shall dene al (p k) , vl (p k) and
wl (p k t) . al (p k) is a number of length L(n) and indicates the address of that cell of the local memory into which the p-th processor
has written at time step k . vl (p,k) is also a number of length L(n) and
equals the number which the p-th processor has written at time step k .
The bit wl (p,k,t) indicates whether at time step t the information the
p-th processor has written into the local memory at time step k is still
valid (wl (p k t) = 1) or has been deleted (wl (p k t) = 0) . In the
same way we dene ac (p k) , vc (p k) and wc (p k t) . At the beginning all local cells contain zeros, only the rst n cells of the common
memory contain the input. Hence we dene ac(i 0) = i , vc (i 0) = xi
and wc (i 0 0) = 1 for 1 i n . Here we assume, like we do in the
whole proof, that numbers are padded with zeros if they have not the
necessary length. All other parameters are equal to 0 for t = 0 .
Let l (p) be the number of lines in the program of the p-th processor. Let i(p l ) , j(p l ) , c(p l ) and r(p l ) be the parameters in the l -th
line of the p-th program. Here and in the following we assume that
non existing parameters are replaced by zeros and that the empty disjunction is zero. Let ic(p l t) = 1 i during the (t +1)-st computation
step processor p is in line l of its program. Obviously ic(p l 0) = 1 i
l = 1.
Let us assume that t computation steps of Wn have been simulated
correctly. This is satised at the beginning for t = 0 . We describe
the simulation of the (t + 1)-st computation step. Let EQ(a b) = 1 i
a = b , let
a (b1
bm ) = (a b1
a bm ) and
(3.2)
371
(a1
am ) (b1
bm ) = (a1 b1
am bm )
(3.3)
EQ(i(p l ) al (p k)) wl (p k t) vl (p k)
(3.4)
1kt
The equality test ensures that we are looking for information at the
correct address only, and the validity bit wl ensures that we consider only valid information. If we consider a computation step or
an if-test, we compute J(p l t) in the same way. R(p l t) equals
c(p l ) (reading of constants), or I(p l t)J(p l t) (computation step),
= J(p l t) or
or I(p l t) (indirect writing), or 1 or 0 if I(p l t)
I(p l t) = J(p l t) resp. (if-test). For steps of indirect reading
R(p l t) equals the contents of the cell I(P l t) of the local or common
memory. Hence R(p l t) can be computed by (3.4) if we replace i(p l )
by I(p l t) . For the common memory we replace al (p k) , vl (p k) , and
wl (p k t) by ac (p k) , vc(p k) and wc (p k t) resp. and compute the
disjunction over all p , since each processor may have written into the
common memory. In every case all R(p l t) are computed in polynomial size and constant depth. Only for indirect writing, A(p l t)
is not a constant. Then A(p l t) is computed in the same way as
R(p l t) . Finally
R(p t) =
(3.5)
ic(p l t) A(p l t)
(3.6)
1l l (p)
A(p t) =
1l l (p)
372
test and the result of this test leads us to line l . Hence all ic(p l t+1)
are computed in polynomial size and constant depth.
Let (p t) = 1 i the p-th processor writes into its local memory
during the (t + 1)-st computation step. Let (p t) = 1 i the p-th
processor tries to write into the common memory during the (t + 1)-st
computation step. (p t) as well as (p t) is the disjunction of some
ic(p l t) . Now the local memories are updated. Let al (p t + 1) =
A(p t) , vl (p t + 1) = R(p t) and wl (p t + 1 t + 1) = (p t) . For
1 k t , let wl (p k t + 1) = 0 i wl (p k t) = 0 or (p t) = 1
and al (p k) = A(p t) . An information is not valid i it was not valid
before or the p-th processor writes into that cell of its local memory
where this information has been stored.
For the updating of the common memory we have to decide write
conicts. Let (p t) = 1 i the p-th processor actually writes some
information into the common memory at the (t + 1)-st computation
step. Then
(p t) = (p t) (
(3.7)
1q p
Combining this simulation and the lower bounds of Ch. 11 we obtain lower bounds on the complexity of RES WRAMs.
373
(4.1)
(4.2)
374
2
THEOREM 4.1 : Let a = (1 + 5) 2 2 618 . ORn can be
computed by an EREW PRAM with n realistic processors and communication width n in time loga n .
Proof : It is essential that a processor may transfer information if it
does not write. We consider the situation of two memory cells M and
M containing the Boolean variables x and y resp. and a processor P
knowing the Boolean variable z . P reads the contents of M , computes
r = y z and writes 1 into M i r = 1 . Then M contains x y z ,
the disjunction of 3 variables. If r = 1 , this value is written into M .
If r = 0 , x y z = x . M contains this information, since P does not
write anything.
375
This idea can be generalized and parallelized. W.l.o.g. the input tape is not read-only, and we have no further memory cells. Let
OR(i j) be the disjunction of xi
xi+j1 . Let Pt (i) be the knowledge of the i -th processor after t computation steps, and let Mt (i) be
the contents of the i -th memory cell after t computation steps. Then
P0(i) = OR(i G0) for G0 = 0 and M0 (i) = OR(i H0) for H0 = 1 . Let
Pt1(i) = OR(i Gt1) and Mt1 (i) = OR(i Ht1) .
During the t-th computation step the i -th processor reads the contents of the (i + Gt1)-th cell (if i + Gt1 n) and computes
Pt (i) = Pt1(i) Mt1 (i + Gt1 )
(4.3)
(4.4)
F1 = 1
5 for = ( 5 + 1) 2
Ft = t ( 5)t
(4.6)
376
L0 = 1
Kt+1 = Kt + Lt
Lt+1 = 3 Kt + 4 Lt
(4.7)
t
Lt = ((3 + 21)bt + ( 21 3)b ) (2 21) bt
Hence
(4.8)
(4.9)
(4.10)
377
n = |L(M1 T a)| LT bT
and T logb n
(4.11)
(4.12)
(4.13)
(4.14)
(4.15)
(4.16)
for the set Y(M t + 1 a) of indices i such that some processor P writes
into M at t + 1 on a(i) but not on a. It is sucient to prove that
|Y(M t + 1 a)| 3 Kt+1
(4.17)
378
of PRAM programs.
We conclude that P
2 K(P t + 1 a(1)) .
or 1
K(P t + 1 a(2)) or
(4.18)
(4.19)
We combine (4.18) with (4.19) and obtain the following estimation for
r = |Y(M t + 1 a)| .
r(r Kt+1 ) 2 r Kt+1
and r 3 Kt+1
(4.20)
379
(4.21)
THEOREM 4.3 : T(f) logb c(f) for the PRAM time complexity of
Because of Proposition 4.1 the conjecture is not weaker than Theorem 4.3. The conjecture is a more natural assertion, since l max is
a more natural complexity measure than c . It is open, whether the
conjecture is really stronger than Theorem 4.3. What is the largest
dierence between c(f) and l max ?
Does there exist a sequence fn Bn such that c(fn ) = o(l max(fn )) or
even log c(fn ) = o(log l max(fn )) ? Only in the second case the conjecture is stronger than Theorem 4.3.
380
In 7 we estimate the critical and the sensitive complexity of almost all functions and of the easiest functions. It will turn out that
the bound of Theorem 4.3 is often tight.
13.5 The complexity of PRAMs and WRAMs with small communication width
381
COROLLARY 5.1 :
Each f Bn can be computed in time
1 2
O (n m) + log m by an EREW PRAM with O (nm)1 2 powerful
processors and communication width m .
Proof : We use the approach of the proof of Theorem 5.1 and collect during each time step all available information as in the proof of
Theorem 2.1 i.
xn ) = x1 (x2 xn )
(5.1)
Then l min (gn) = 1 , but by Theorem 5.2 and the fact that PARn1
is a subfunction of gn the time complexity of gn is not smaller than
((n 1) m)1 2 .
Proof of Theorem 5.2 : We add m processors with numbers larger
than those of the given processors. The i -th additional processor always reads the contents of the i -th memory cell (not on the read-only
382
input tape) and tries to write this information again into the same
cell. Hence for each memory cell there is always a processor which
writes into it.
Let k = l min (f). A processor which knows less than k inputs, does
not know the output. The processors gather their information from
reading inputs on the input tape or information in common memory
cells. During t computation steps a processor may read directly at
most t inputs. For ecient computations the amount of information
owing through the common memory cells needs to be large. We
estimate this information ow. We construct (deterministic) restrictions such that for the so-constructed subfunctions the contents of all
memory cells does not depend on the input.
At the beginning we consider all inputs, namely the cube E0 =
{0 1}n . We construct cubes E1 1
E1 m
ET 1
ET m (T =
T(fn)) such that each cube is a subcube of the one before. Let us
1 or
construct Et l and E be the previous cube, namely Et l 1 if l
Et1 m if l = 1 . For a E let p(a) be the number of the processor that
writes into the l -th memory cell Ml at t on a . We choose at l E
such that p(at l ) p(a) for a E . Let i1
ir be the indices of
those inputs which the p(at l )-th processor has read during the rst
t computation steps directly on the input tape. Obviously r t .
Let Et l be the set of all a E which agree with at l at the positions
i1
ir . Et l is a subcube of E whose dimension is by r smaller than
the dimension of E . Since r t , the dimension of ET m is at least
nm
t = n m T(T + 1) 2
(5.2)
1tT
CLAIM : fn is constant on ET m .
383
(5.3)
Proof of the Claim : The diction, that a processor writes the same in
several situations, should also include the case that a processor never
writes. We prove that the computation paths for the inputs a ET m
are essentially the same. The initial conguration does not depend
on the input. Then we choose some input a and a processor p that
writes on a into the rst common cell at t = 1 such that no processor
p p writes on some a E0 into M1 at t = 1 . We restrict the input
set to those inputs which agree with a at that position which has been
read by p . Let a E1 1 . No processor p
p writes into M1 on a
at t = 1 (by construction). Processor p cannot distinguish between a
and a . Hence p writes on both inputs the same into M1 and switches
to the same state.
Let us consider Et l and the previous cube E . We assume that the
t1 and of Mi (1 i l 1)
contents of all Mi at the time steps 0
at time step t do not depend on a E . Then we choose some input
a E and a processor p writing on a into Ml at time step t such that
no processor p p writes on some a E into Ml at t . We restrict the
input set to those inputs which agree with a at those positions which
have been read by p on the input tape. Let a Et l . No processor
p p writes into Ml on a at t (by construction). Processor p does
the same on a and a , since it has read the same information on the
input tape and in the common memory. Hence p writes the same on
a and on a into Ml and switches to the same state.
The contents of the output cell M1 is for all a ET m the same.
Hence f is constant on ET m .
384
385
t(t + 1) 2 m T3
(5.4)
1tT
(5.5)
386
if j R(h 1)
bj = a(h)j
if j R (h 1)
bj = ej
if j (R(h 1) R (h 1)) .
(5.6)
On input a(h 1) , the i(h 1)-st processor reads on the input tape
during the rst t computation steps only variables which either have
been xed for the construction of Et l h1 or have indices in R(h 1) .
On input b , the i(h 1)-st processor reads the same information, as
all xed variables agree with e . Since the i(h 1)-st processor writes
on a(h 1) into Ml at t , it writes also on b into Ml at t . In the same
way we conclude that the i(h)-th processor writes on b into Ml at t .
The assumption, that R (h 1) and R(h 1) are disjoint, leads to a
write conict which cannot be solved by a PRAM.
Case 2 : i(h 1) = i(h) . The inputs a(h 1) and a(h) agree on all
variables which have been xed during the construction of Et l h2 .
Let t be the rst time step where the i(h)-th processor reads on the
input tape on a(h 1) a variable which has not been xed. During the
computation steps 1
t 1 the i(h)-th processor cannot distinguish
between a(h 1) and a(h) . Hence it reads on both inputs at t in the
387
same input cell. The index of this cell is contained in R(h 1) and in
R (h 1) .
Beame (published by Vishkin and Wigderson (85)) considered another complexity measure.
DEFINITION 5.1 : Let m(f) = min{|f 1 (0)| |f 1(1)|} and let M(f) =
n log m(f) .
THEOREM 5.4 : If a PRAM computes fn Bn in time T(fn ) with
communication width m , then T(fn) (M(fn) m)1 2 .
We omit the proof of this theorem. Obviously M(PARn ) = 1 and
we obtain a trivial bound. But M(ORn ) = n and the lower bound
(n m)1 3 of Theorem 5.3 is improved to (n m)1 2 . This bound is
optimal for ORn if m = O(n log2 n) (see Theorem 5.1). For a comparison of the lower bounds of Theorem 5.3 and 5.4 we remark that
M(fn ) l max(fn) for all fn Bn and M(fn ) = O(1) for almost all fn Bn
(see Exercises).
388
functions f where l min (f) is suciently large. The lower bound can
be extended via the simulation of 2 and the reducibility results of
Ch. 10, 3 , to many fundamental functions and graph functions. The
optimality of the lower bound follows from the simulation in 2 and
the upper bound of Ch. 11, 2.
Beames proof works with probabilistic methods. One of the crucial ideas is the description of a computation by processor and cell
partitions.
DEFINITION 6.1 : For each WRAM , its i -th processor Pi , its
j -th common memory cell Mj and any time step t {0
T} we
dene the processor partition P(i t) and the cell partition M(j t) .
Two inputs x and y are equivalent with respect to P(i t) i they
lead to the same state of Pi at time step t . Two inputs x and y are
equivalent with respect to M(j t) i they lead to the same contents of
Mj at time step t .
DEFINITION 6.2 : Let A = (A1
Am ) be a partition of {0 1}n .
Let fi (x) = 1 i x Ai . Let (as in Ch. 11, 3) l PI (fi) denote the
maximal length of a prime implicant of fi . The degree d(A) of A is
the maximum of all l PI (fi) .
Since ANDn , the conjunction of n variables, can be computed by
a WRAM in one step, the degree of a cell partition may increase violently during one step. But after having applied a random restriction
(see Def. 3.1, Ch. 11) , with large probability the degree is not too
large. The projection g of a parity function again is a parity function.
Hence the degree of the output cell at the end of the computation, i.e.
at time step T , is as large as the input size. We choose restrictions
such that on the one hand the number of variables does not decrease
too fast and, on the other hand, the degree of the partition does only
increase slowly. Then the computation time T cannot be too small for
the parity function.
389
390
4p
(1 + p) (21 r 1)
4p
+1
(1 + p)
6pr
= 2 , then
(6.1)
(6 p r)s
(6.2)
391
1 (1 24) n1 T
2
4
1 T
1
p(n) 2(1 96) n
and
4
1 T
1
c(n) 2(1 12) (n T!)
4
p(n) + c(n)
(6.3)
(6.4)
(6.5)
log n
log n
log n
=
O
O(1) + log log n
log log n
(log log n)2
(6.6)
log n
log n
O
2 log log n
(log log n)2
(6.7)
392
s d(M(1 T)(T)) = DT
1
n (96 sd)(T1)
48
(6.8)
(6.9)
(6.10)
Each processor knows at most one variable. Hence there are at most
two memory cells into which a denite processor may write at time
step 1 . For at most 2 p(n) memory cells Mj the degree of [F (j 1)]
may be larger than s . The probability that the degree of all [F (j 1)]
is less than s is at least 1 22s1 . Since [F (j 1)] is a renement
of M(j 1) , the function f describing an equivalence class of M(j 1)
is the disjunction of some functions gi describing equivalence classes
of [F (j 1)] . If l PI (gi) s , also l PI(f) s . Hence the degree of
M(j 1) is bounded by the degree of [F (j 1)] . The probability that
D1 n 48 , the expected number of remaining variables, is at least
1 3 . Hence we can choose a restriction (1) for which all conditions
hold simultaneously.
For the induction step we assume that the claim holds for some
t 1 . The state of the i -th processor at t+1 depends only on the state
of this processor at t (the partition P(i t)) and on the contents of that
cell which the processor reads at t . For all inputs of an equivalence
class of P(i t) this is the same cell. Hence each equivalence class
of P(i t + 1) is the intersection of some equivalence class of P(i t)
393
(6.11)
We look for a restriction (t+1) which keeps the degrees of the processor and cell partitions small and keeps the number of variables large.
Let Rq be a random restriction for some q chosen later. By (6.11)
and the Main Lemma for r = 2 s
Pr d(P(i t + 1)(t) ) s
(12 q s)s
(6.12)
(6.13)
(12 q s)s
(6.14)
We hope that the degree of all processor and all cell partitions is simultaneously bounded by s for some restriction . We have to consider
p(n) processors and innitely many memory cells. But for those memory cells which no processor writes into at t + 1 it is for sure by the
induction hypothesis that the degree of M(j t + 1)(t) is bounded by
s . By (6.11) the equivalence classes of P(i t + 1)(t) are described by
functions whose prime implicants have a length of at most 2 s . Such
a prime implicant is satised for a fraction of 22s of all inputs. Hence
P(i t + 1)(t) partitions the input set to at most 22 s subsets. This
394
implies that the i -th processor may write only into one of 22 s dierent
memory cells at t + 1 . Altogether for only 22 s p(n) memory cells Mj
it is not for sure that the degree of M(j t + 1)(t) is bounded by s .
Let q = 1 (96 s) . The probability that the degree of all processor
and cell partitions (with respect to (t) and ) is not bounded by s is
(since s = 4 log p(n)) at most
(22 s + 1) p(n) (12 q s)s = (22 s + 1) p(n) 23 s
(6.15)
= (1 + 22 s) p(n) 2s
1
1
= (1 + 22 s)
= (1 + 22 s) p(n)
4 p(n) 4
The probability that Dt+1 is less than its expected value Dt (96 s)
(1 48) n (96 s)t is bounded by 2 3 . Since
1
2
1 + 22 s +
4
3
(6.16)
we can choose a restriction (t+1) such that all assertions are satised
for time step t + 1 .
log n
log n
3 log log n
log log n2
(6.17)
395
log n
log n
5 log log n
log log n2
(6.18)
1
n (96 s)(T1)
48
(6.19)
(6.20)
(6.21)
(6.22)
THEOREM 6.3 : The lower bounds of Theorem 6.1 hold for almost
all fn Bn .
Furthermore (6.20) holds for all fn Bn . If l min (fn ) is not too
small, we obtain powerful lower bounds on the WRAM complexity of
these functions.
396
THEOREM 7.1 : i) l min (f) l max(f) and c(f) l max(f) for all f Bn .
ii) l min (ORn ) = 1 but c(ORn ) = l max (ORn ) = n for the symmetric
and monotone function ORn Bn .
iii) There are functions fn Bn where c(fn) = n 2 +2 but l max (fn) =
n 1.
iv) For all n = 6 m , there are functions fn Bn where c(fn) = (1 2) n
but l min (fn ) = (5 6) n .
v) c(f) l max(f) 2l max(f)n for all f Bn , in particular c(fn) = n i
l max(fn ) = n for fn Bn .
Proof : i) The rst part is obvious and the second part is Proposition 4.1.
ii) is obvious.
vn )
iii) Let fn Sn be dened by its value vector v(fn) = (v0
where vi = 1 i i { n 2 n 2 + 1} . The assertion holds for
397
these functions. The proof is left to the reader (who should apply
Theorem 7.3).
iv) Let f B4 be dened by the following Karnaugh diagram.
f 00 01 11 10
00 0
1
1
1
01 0
0
0
1
11 1
1
0
1
10 0
1
0
0
By case inspection, c(f) = 2 and l min (f) = 3 . If n = 4m , let
fn Bn be equal to the
-sum of m copies of f on disjoint sets of
variables. Then c(fn ) = (1 2) n and l min (fn) = (3 4) n . Paterson
(pers. comm.) dened some f B6 where c(f) = 3 and l min (f) = 5 .
This leads considering the above arguments to the claim of Part iv of
the theorem.
v) We use a pigeon-hole argument. Let k = l max(f) . Then we nd an
(n k)-dimensional subcube S where f is constant such that f is not
constant on any subcube S which properly contains S . By denition
c(f a) c(f) |S| = c(f) 2nk
(7.1)
aS
(7.2)
aS
398
We remember that vmax (fn) and vmin (fn) denote for fn Sn the
length of a longest and shortest maximal constant substring of v(fn)
resp. (see Ch. 11, 4 and Exercises).
(7.3)
399
By Theorem 7.3 , l min (f) , c(f) and l max (f) can be computed for
f Sn from v(f) in linear time O(n) . For further fundamental functions it is possible to compute the complexity with respect to these
complexity measures. But in general this computation is NP-hard.
Therefore it is helpful to know the complexity of an easiest function
in some class. This yields a lower bound for all other functions in this
class.
THEOREM 7.4 : If n 2 ,
1
1
1
log n log log n + c(Bn ) l max(Bn ) l max(Mn )
(7.4)
2
2
2
1
1
= c(Mn ) log n + log log n + O(1)
2
4
Proof : It follows from Theorem 7.1 and Theorem 7.2 that c(Bn )
l max(Bn ) l max(Mn ) = c(Mn ) .
For the upper bound we dene a monotone storage access function
MSAn on n + k variables x = (x1
xk) and y = (y1
yn) where
k
n = k 2 . Let A be the class of all subsets of {1
k} of size k 2
and let : A {1
n} be one-to-one. Then
400
MSAn (x y) = Tkk
2 +1 (x)
xi y(A)
(7.5)
AA iA
Only address vectors with exactly k 2 ones are valid for MSAn . We
claim that
c(MSAn ) = k 2 + 1
(7.6)
401
4 n c 22c
(7.7)
402
1
1
1
log n log log n +
2
2
2
This result indicates again how excellent the lower bound of Theorem 4.3 is.
THEOREM 7.6 : c(Sn ) = l max(Sn ) = (n + 1) 2 .
Proof : Obviously c(Sn ) l max (Sn) . For fn = Tn(n+1) 2 , l max(fn ) =
(n + 1) 2 , since Tnk has only prime implicants of length k and prime
clauses of length n + 1 k . Hence l max (Sn ) (n + 1) 2 . If f Sn
is not constant, vi = vi+1 for some i . By Theorem 7.3 iii
c(f) max{i + 1 n i} (n + 1) 2
and c(Sn ) (n + 1) 2 .
(7.8)
403
f(x1 2
x(n1) (n) )
(7.9)
THEOREM 7.7 : i) n 4
xji
1in 1j i
xij
(7.10)
i jn
each prime clause has length n 1 . The i -th clause computes 0 i the
i -th vertex is isolated. The prime implicants correspond to minimum
graphs without isolated vertices. These are spanning forests where
each tree contains at least 2 vertices. The number of edges in spanning
forests, and therefore also the length of prime implicants, is bounded
by n 1 . Hence l max(MGN ) n 1 .
Since MGN MN , l max(MGN ) = c(MGN ) by Theorem 7.2. It is
sucient to prove that l max(MGN ) n 1 . All prime implicants
404
(7.11)
405
(n l m)(l + 1) n 1
(7.12)
(7.13)
If l 1 , this is equivalent to
l 2 2 l n + l + n2 3 n + 3 (n 4) l
(7.14)
(7.15)
l n (1 2) (3 n 27 4)1
(7.16)
(7.17)
n2
r
2
edges. It
(7.18)
Until now we have not counted the missing edges within the vertex
set vnl
vn . Since m 1 , at least l + 1 = n r + 1 edges are
406
missing between the two vertex sets. Let z be the number of vertices
vi (i n l ) which are not connected to at least two vertices vj where
j
n l . Then we may count at least n r + 1 + z missing edges
between the two vertex sets. Furthermore there are n r + 1 z
vertices vi (i n l ) for which only one edge to the other vertex set is
missing. If z r 2 , we have already counted enough missing edges.
Otherwise we partition the n r + 1 z vertices vi (i n l ) with
one missing edge to r 1 equivalence classes. vi and vj are equivalent
if they have the same missing neighbor vk (1 k n l 1 = r 1) .
If vi and vj are in the k -th equivalence class the edge between vi and
vj is missing. Otherwise vi , vj and all vl (1 l r 1 , l = k) build
an r-clique and G is a satisfying graph.
Let N(k) be the size of the k -th equivalence class. Altogether we
have proved the existence of
nr+1+z+
1kr1
N(k)
2
(7.19)
(7.20)
1kr1
(7.21)
(7.22)
407
It follows from Theorem 7.7 and Theorem 4.3 that the PRAM time
complexity of all graph properties is (log N) and from Theorem 7.7
and Theorem 5.3 that the time complexity of all graph properties
with respect to PRAMs of communication width m is (n m)1 3 =
N1 6 m1 3 . The last lower bound can be improved to (N m)1 3
for most of the fundamental graph properties f GN by proving that
l max(f) = (N) for these graph properties. Nearly all fundamental
graph properties are monotone. Sch
urfeld and Wegener (86) present
conditions which can be tested eciently and which lead for most of
the fundamental graph properties to the assertion that these graph
properties are (N)-critical.
We have seen that some functions have small critical and small
maximal sensitive complexity, that several functions have small minimal sensitive complexity and that many functions have a large complexity with respect to the complexity measures. We present tight
bounds for the complexity of almost all functions in Bn Mn or Sn .
These results have already been applied in Ch. 11, 4 and Ch. 13, 6 ,
and they generalize all lower bounds of Ch. 11 and Ch. 13.
THEOREM 7.8 : i) The fraction of f Bn where c(f) = n 1 or
c(f) = n tends to e1 or 1 e1 resp., hence c(f) n 1 for almost
all f Bn .
ii) c(f) = l max(f) for almost all f Bn .
iii) Let (n) be any function tending to as n . Then
n log(n + log2 n log n + (n))
l min (f)
(7.23)
408
a {0 1}n
Pr(c(f a) = n) = 2n
Pr(c(f a) = n 1) = n 2n
and
(7.24)
Let us assume for a moment that c(f a) and c(f b) are independent,
which is only satised if d(a b) 3 for the Hamming distance d .
Then
n
Pr(c(f) = n) = 1 (1 2n)2 1 e1
and
(7.25)
0)
(7.27)
(7.28)
X2a +
Xa Xb =
a=b
Xa +
a
Xa Xb
(7.29)
a=b
if d(a b) 2
(7.30)
(7.31)
Altogether
E(X2) E(X) + E2(X) + O n4 2n and
1
Pr(X = 0) + 1 + O n2 2n 1
n
tends to 0 as n .
(7.32)
(7.33)
409
as n .
(7.34)
(7.35)
and
(7.36)
=1
12
(7.37)
2nc(n)
2c(n)
2c(n) 2
2nc(n)2
c(n)
410
d(n)
(7.39)
f(a) = 0 and
(7.40)
a1 + + an n 2 + 2
f(a) = 1
(7.41)
(7.42)
for almost all f Mn . For the proof of the exact bounds of the
theorem we refer to Bublitz, Sch
urfeld, Voigt and Wegener (86). Here
we are satised with the weaker result (7.42).
411
(7.43)
(7.44)
EXERCISES
1. (Cook, Dwork and Reischuk (86)). A PRAM is called almost
oblivious if the number of the memory cell into which the i -th
processor writes at time step t is allowed to depend on the input
length but not on the input itself. A PRAM is called oblivious if
also the decision whether the i -th processor writes at time step t
may depend on the input length and not on the input itself. The
time complexity of oblivious PRAMs computing f is not smaller
than log c(f) + 1 . Hint: Kt+1 = Lt+1 , Lt+1 = Kt + Lt .
2. The time complexity of almost oblivious PRAMs computing f is
not smaller than t if c(f) F2t+1 , the (2t+1)-st Fibonacci number.
Hint: Kt+1 = Kt + Lt , Lt+1 = Kt+1 + Lt .
412
413
11. How long are the prime implicants and prime clauses of fnt Bn
where fnt (a) = 1 i ai = = ai+t1 (indices modn) for some i ?
12. Each non constant graph property depends on all N variables.
13. The graph properties
a) G contains a k-clique ,
b) G contains a Hamiltonian circuit ,
c) G contains an Eulerian circuit ,
d) G contains a perfect matching
are (N)-critical.
14. The graph property G contains a vertex whose degree is at least
dn is (N)-critical, if 0 d 1 .
15. The graph property G contains no isolated vertex has minimal
sensitive complexity n 2 .
16. l min (f) = min{l (f 0) l (f 1)} for f Mn and the constant inputs 0
and 1 .
17. Determine the number of f Mn where l min (f) n 2 + 1 .
18. Let M be the complexity measure of Def. 5.1. Then M(f) l max(f)
for all f Bn and M(f) = O(1) for almost all f Bn .
19. The number of f Sn where c(f) n or l max(f)
the (n 1)st Fibonacci number Fn1 .
n is 2 Fn1 for
414
415
and
(1.1)
(1.2)
416
417
418
DEFINITION 2.1 :
elusive.
419
This result indicates that many functions are elusive. This fact is
also underlined by the following asymptotic results.
=
0k2n1
2n1
k
2n1
k
2n1
k
2n1
2n1 k
(2.1)
=
2n
2n1
420
(2.2)
421
(2.3)
422
(3.1)
1ik
if t 3
(3.2)
Let r(i) = |Si | and let t(i) be the number of nodes in an optimal
BP for f labelled with some xj Si . Each Si -subfunction of f can be
computed by a BP of size t(i) . We only have to replace the other
variables by the appropriate constants. Hence
si N(r(i) t(i)) r(i)t(i)(t(i))2t(i)
if t(i) 3
(3.3)
if t(i) 3 and
(3.4)
(3.5)
Since, by denition, BP(f) is the sum of all t(i) , we have proved the
theorem.
From the arguments used in Ch. 8, 7 , we obtain the following results. By Theorem 3.1 one cannot prove larger lower bounds
than bounds of size n2 log2 n . BP(ISAn ) = n2 log2 n but
C(ISAn ) = O(n) for the storage access function for indirect addressing, BP(detn ) = n3 log1 n for the determinant and BP(cl n m ) =
(n m)3 log1 n for clique functions.
Pudlak (84 b) translated the Hodes and Specker method (Ch. 8,
5) to BPs and proved (n(log log n) (log log log n)) lower bounds on
423
424
Proof : The following claim holds for all BP1s but in general not for
BPs. If p is a path in a BP1 , then there is an input a(p) for which p
is part of the computation path. Let fp be the subfunction of f where
we have replaced the variables read on p by the proper constants.
Now we consider an optimal BP1 G for f Sn . Let p and p be
paths from the source to the computation node v . We claim, that
l (p ) = l (p ) , where l (p) is the length of p , that we read the same
input variables on p and p (perhaps with dierent results) and that
fp = fp .
If v is followed for some b {0 1} by b-sinks only, v can be replaced
by a b-sink. Hence we assume that fp and fp are not constant. fp is
a symmetric function on n l (p ) variables. By Theorem 2.2 iii the
longest path p starting in v has length n l (p ) . The same holds for
p . Hence l (p ) = l (p ) . On p and p we read all variables that have
not been read on p . In particular, we read on p and p the same
set of variables. The BP1 with source v computes fp and fp . Hence
fp = fp .
Now we relabel the computation nodes such that the nodes on level
l (the BP is synchronous by the claim above) are labelled by xl +1 . We
claim that the new BP1 G computes f . Let p be a path in G from
the source to a b-sink. If we have chosen m0 times the left successor
and m1 times the right successor on p , then f(a) = b for all inputs a
with at least m0 zeros and at least m1 ones. If the same path is used
in G , the input contains at least m0 zeros and at least m1 ones. The
output b is computed correctly.
By this theorem on the structure of optimal BP1s for symmetric functions f , we can design optimal BP1s eciently. The level 0
consists of the source labelled by x1 . At each node v on level l we
still have to compute a symmetric function fv on n l variables. The
node v gets a second label i indicating that (vi
vi+nl ) is the value
vector of fv . This additional label is i = 0 for the source. For a node v
on level l labelled by xl +1 and i we need two successors v0 and v1 on
425
THEOREM 4.2 :
There is an O(n BP1(f))-time and O(n)-space
algorithm for the computation of an optimal BP1 for f Sn given by
its value vector.
Together with this algorithm we obtain the following characterization of the BP1-complexity of symmetric functions.
THEOREM 4.3 : For f Sn let rl (f) (0 l n 1) be the number
of dierent non constant subvectors (vi
vi+nl ) (0 i l ) of the
value vector v(f) .
i)
BP1(f) =
rl (f)
0l n1
426
ii)
BP1(f)
0l n1
min l + 1 2nl +1 2
= n2 2 n log n + O(n)
Proof : i) follows from Theorem 4.1 and the algorithm from Theorem
4.2. For ii) we state that rl (f) l +1 (we consider only l +1 subvectors)
and that rl (f) 2nl +1 2 (we consider non constant vectors of length
n l + 1) .
) = n2 4 + (n) .
ii) BP1(Enn
) = n2 4 + (n) .
iii) BPk(Enn
) = O(n(k+1) k) .
iv) BP(Enn
427
m
j
(4.1)
Proof : (4.1) follows easily from the rst part of the theorem. For each
0-1-sequence a1
am with at most k(0) zeros and k(1) ones there is
a computation node in G . The lower bound in (4.1) is equal to the
number of these sequences.
We turn to the proof of the structural results. Before we have not
tested all variables of a prime implicant or a prime clause with the right
result we do not reach a sink. cl n k is monotone. Consequently, the
variables of a prime implicant have to be ones. All prime implicants of
cl n k have length k2
k(1) , they correspond to a minimal graph with
a k-clique. According to the results of Ch. 13 prime clauses correspond
to maximal graphs without any k-clique. The largest maximal graphs
are by Turans theorem (see e.g. Bollobas (78)) complete (k 1)partite graphs where the size of all parts is n (k 1) or n (k 1) .
We have to estimate the number of missing edges. This number is
underestimated if we assume that all parts have size n (k 1) and
each vertex is not connected to n (k 1) 1 other vertices. Hence
l (0) (1 2)n(n (k 1) 1) for the length l (0) of the shortest prime
clause. Since l (0) k(0) , neither v(p) nor v(q) is a sink.
Let us assume that w = v(p) = v(q) . Since p and q are dierent
paths, there is a rst node w where the paths separate. W.l.o.g. w is
labelled by x1 2 . Let Gp and Gq be the partially specied graphs specied by the computation paths p and q resp. Edges tested positively
are called existing, whereas edges tested negatively are called forbidden, all other edges are called variable. W.l.o.g. the edge (1,2) exists
in Gp and is forbidden in Gq . Let G be the part of G whose source
428
(4.2)
(4.3)
3o(n)
6o(n)
21
) for constant k .
k(0)+k(1)
k(1)
is
429
ii) By denition of m(n) , k(0) and k(1) are of size n 3 o(n) . Each
BP1 for cl n m(n) contains at the top a complete binary tree of depth
n 3 o(n) .
iii) cl nm(n) n 2m(n) is a subfunction of cl n n 2 . Let m(n) = n 2
(n 3)1 2 . Then k(0) and k(1) are of size n 6 o(n) .
and m(n) =
Proof : After m(n) positive tests we still have not found a k-clique
of special type. After m(n) negative tests at most (1 2) n2 3 vertices
lie on a forbidden edge. Hence there is a consecutive sequence of
k(n) vertices which may still build a k(n)-clique of special type. The
depth of all sinks is larger than m(n) .
430
3o(n)
431
For the proof of the claim we assume that v(H) = v(H ) although
|H H | n 2 3 . Let p (H) and p (H ) be those parts of p(H) and
p(H ) resp. that lead from the source to v(H) . On p(H) and p(H )
we have tested the same set of variables. Otherwise some path, where
not all variables are tested, would lead from the source to a 1-sink
although all prime implicants have length N .
We investigate the computation path leading from the source via
p(H) to v(H) and then via p(H ) to a 1-sink. The computation path
is the path p(H ) for some set H of size n 2 . |H H | n 2 2 by
denition of v(H) . In particular, H contains some vertex i H . We
prove the claim by proving that H = H . Each positive test increases
the number of vertices lying on existing edges at most by 2 . Hence
there is some j H such that no edge (j ) has been tested positively
on p (H ) . For all k H {j} , the edge (j k) is tested positively on
the second part of p(H ) and therefore on p(H ) . All these vertices
k H and j have to belong to H because of the denition of cl on .
cl n 3 .
We have dened the width of BPs in Denition 1.3. By Theorem 1.4 all f Bn can be computed by a width-2 BP. Therefore
w-k-BP(f) , the minimum size of a width-k BP for f , is well-dened for
k 2 . We have already proved in Theorem 1.4 that depth-bounded
432
circuits can be simulated eciently by width-bounded branching programs. But the converse is false. The parity function has linear
width-2 complexity but exponential size in circuits of constant depth
(see Ch. 11, 3). Later we present a complete characterization of BPs
of bounded width and polynomial size.
Before that we report the history of lower bounds. Borodin, Dolev,
Fich and Paul (83) proved that width-2 BPs for the majority function
have size n2 log1 n . By probabilistic methods Yao (83) improved
this result. Width-2 BPs for the majority function cannot have polynomial size. Shearer (pers. comm.) proved an exponential lower
bound on the width-2 BP complexity of the counting function Cn1 3
computing 1 i x1 + + xn 1 mod 3 .
For k 3 no large lower bounds on the width-k BP complexity
of explicitly dened Boolean functions are known. Using the Ramsey theory (see e.g. Graham, Rothschild and Spencer (80)) Chandra,
Fortune and Lipton (83) proved that there is no constant k such that
the majority function can be computed in width-k BPs of linear size.
Ajtai et al. (86) could prove non trivial lower bounds for BPs whose
width is not larger than (log n)O(1) (poly log). Almost all symmetric
functions, and in particular the following explicitly dened function
x1 + +xn is a quadratic residue mod p for a xed prime p between
n1 4 and n1 3 , cannot be computed by poly log - width BPs of size
o(n(log n) log log n) .
All these results are motivated by the conjecture that the majority function cannot be computed by BPs of constant width and
polynomial size. But this conjecture has been proved false by Barrington (86).
THEOREM 5.1 : Let fn Bn . There is for some constant k a
sequence of width-k BPs Gn computing fn with polynomial size i
there is a sequence Sn of circuits with binary gates computing fn with
polynomial size and depth O(log n) .
433
(x) =
1i k
gi j i j (x) gi
j i j
(x)
(5.1)
434
cycle by the string (a1 a2 a3 a4 a5) where (ai) = ai+1 for i 4 and
(a5) = a1 .
DEFINITION 5.2 : A permutation branching program (PBP) of
width k and length (or depth) l is given by a sequence of instructions (j(i) gi hi ) for 0 i
l , where 1 j(i) n and gi hi k .
A PBP has on each level 0 i l k nodes v1 i
vk i . On level i
we realize i(x) = gi if xj(i) = 0 and i (x) = hi if xj(i) = 1 . The PBP
computes (x) = l 1(x) 0(x) k on input x . The PBP computes fn Bn via if (x) = id for x fn1(0) and (x) = = id for
x fn1 (1).
LEMMA 5.1 : A PBP for f of width k and length l can be simulated
by a BP of width k and length k l .
Proof : The PBP has k sources on level 0 . We obtain k BPs of
width k and length l (one for each source). The nodes vm i (1 m k)
are labelled by xj(i) , and the wires from level i to level i + 1 correspond
to gi and hi . The nodes on the last level are replaced by sinks. In the
r -th BP the (r)-th node on the last level is replaced by a 1-sink, all
other sinks are 0-sinks. This BP computes 1 i (x)(r) = (r) . Hence
f(x) = 1 i all BPs compute 1 . Similarly to the proof of Theorem 1.4
we combine the k BPs to a BP for f . We do not have to increase the
width since all sinks lie on the last level.
Hence it is sucient to design PBPs of width 5 . We restrict ourselves to 5-cycles which have properties that serve our purposes.
LEMMA 5.2 : If the PBP G computes f via the 5-cycle and is
another 5-cycle, then there is a PBP G of the same length computing
f via .
Proof : The existence of 5 where = 1 follows from
elementary group theory (or by case inspection). In G we simply
435
LEMMA 5.3 : If the PBP G computes f via the 5-cycle , then there
is a PBP G of the same length computing f via a 5-cycle.
Proof : Obviously 1 is a 5-cycle. We only change the last instruction : gl 1 is replaced by 1 gl 1 and hl 1 is replaced by 1 hl 1 .
Then (x) = 1 (x) . Hence (x) = 1 for x f 1 (0) and
(x) = id for x f 1 (1) . G computes f via 1 .
436
This result explains also why one was not able to prove non polynomial lower bounds on the width-k BP complexity of explicitly dened
functions if k 5 .
14.6 Hierarchies
THEOREM 6.1 : The majority function has non polynomial complexity with respect to circuits of constant depth and BPs of width 2 ,
but it has polynomial complexity with respect to monotone circuits,
monotone formulas, BP1s and BPs of width 5 .
THEOREM 6.2 : The clique function cl n k(n) for cliques of special
type and size k(n) = n1 3 has exponential complexity with respect
to BP1s, but it has polynomial complexity with respect to width-2
437
THEOREM 6.3 :
cl n 3 has exponential complexity with respect
to BP1s , but it has polynomial complexity with respect to BPs of
constant width and circuits.
Proof : We only have to prove the upper bound. In constant depth
we decide with O(n3) binary gates for each 3-clique whether it exists.
Then we compute the parity of these results in logarithmic depth and
size O(n3) . Finally we apply Theorem 5.1.
In Ch. 11, 5 , we have seen that the classes k (P) build a proper
hierarchy.
DEFINITION 6.1 : Let w-k-BP(P) be the class of sequences f =
(fn) of functions fn Bn which can be computed by width-k BPs of
polynomial size.
Obviously w-k-BP(P) w-(k + 1)-BP(P) for k 2 . The results
of Barrington (86) (see 5) imply that this hierarchy collapses at the
fth level. But, from the results on the majority function, we know
that w-2-BP(P) w-5-BP(P) .
DEFINITION 6.2 : Let BPk(P) be the class of sequences f = (fn ) of
functions fn Bn which can be computed by read-k-times-only BPs
of polynomial size.
438
BP2(P) .
439
The size of each Tl and Hl j is O(n2 ) . Hence the total size is O(n4 ) .
The BP is a read-twice-only BP. Each path leads for some l at most
through the BP1s T1
Tl , Hl l +1
in Ti , if i l , in Tj , if j l , in Hl i , if l
Hence each edge is tested at most twice.
i , and in Hl j , if l
j.
EXERCISES
1. The number of 1-leaves in a decision tree for f Bn is not smaller
than the number of gates on level 1 in a 2-circuit for f .
2. Let DT(f) be the decision tree complexity of f . Then D(f)
c log(DT(f) + 1) for some constant c .
3. Estimate the time complexity of the Turing machine we used for
the proof of (1.1).
4. Let f Bn be not elusive. Let f Bn be a function diering from
f on exactly one input. Then f is elusive.
440
5. BP(PARn ) = 2n 1 .
6. Determine BP1(Enk) and BP1(Tnk ) exactly.
7. The upper bound in Theorem 4.3 ii is optimal. Hint: Consider
de Bruijn sequences (see e.g. Knuth (81)) as value vectors.
8. Prove Theorem 4.4. Hint: Chinese Remainder Theorem.
9. Design an ecient algorithm for the construction of optimal decision trees for symmetric functions.
10. Prove a counterpart of Theorem 4.3 for decision trees.
11. Determine DT(PARn ) .
12. BP1(cl on ) = 2O(n) .
13. There is a BP1 for cl n k where two paths, on which at most k(1)+1
variables are tested positively and at most O(k(0)) variables are
tested negatively, lead to the same node.
14. (Dunne (85), just as 15.). Assume that for f Bn , all sets
V {x1
xn} of size bounded by m , and all xi V there
is a restriction : {x1
xn } (V {xi }) {0 1} such that
f (V xi) {xi xi } . Then BP1(f) 2m1 1 .
15. The BP1-complexity of the Hamiltonian circuit function and the
BP1-complexity of the logical permanent are exponential.
16. Prove an upper bound (as small as possible) on the constant-width
BP-complexity of the majority function.
441
17. (k cl on ) BPk(P) .
442
References
Adleman(78): Two theorems on random polynomial time. 19.FOCS,75-83.
Adleman;Booth;Preparata;Ruzzo(78): Improved time and space bounds for Boolean
matrix multiplication. Acta Informatica 11, 61-70.
Ahlswede;Wegener(86): Search problems. Wiley (in press).
Aho;Hopcroft;Ullman(74): The design and analysis of computer algorithms.
Addison-Wesley.
Ajtai(83): 11 -formulae on nite structures. Ann.Pure and Appl.Logic 24, 1-48.
Ajtai;Babai;Hajnal;Komlos;Pudlak;Rodl;Szemeredi;Turan(86): Two lower bounds
for branching problems. 18.STOC, 30-38.
Ajtai;Ben-Or(84): A theorem on probabilistic constant depth computations.
16.STOC, 471-474.
Ajtai;Komlos;Szemeredi(83): An 0(n log n) sorting network. 15. STOC, 1-9.
Alekseev(73):
On the number of k-valued monotonic functions.
Sov.
Math.Dokl.14,87-91.
Alt(84): Comparison of arithmetic functions with respect to Boolean circuit depth.
16.STOC, 466-470.
Alt;Hagerup;Mehlhorn;Preparata(86): Simulation of idealized parallel computers on
more realistic ones. 12.MFCS, LNCS 233, 199-208.
Alon;Boppana(85): The monotone circuit complexity of Boolean functions.
Preprint.
Anderson;Earle;Goldschmidt;Powers(67): The IBM system/360 model 91: oatingpoint execution unit. IBM J.Res.Dev.11, 34-53.
Andreev(85): On a method for obtaining lower bounds for the complexity of individual monotone functions. Sov.Math.Dokl. 31, 530-534.
Arlazarov;Dinic;Kronrod;Faradzev(70): On economical construction of the transitive closure of a directed graph. Sov.Math.Dokl.11, 1209-1210.
Ashenhurst(57): The decomposition of switching functions. Symp. Theory of
Switching, 74-116.
Ayoub(63): An introduction to the analytical theory of numbers. Amer.Math.Soc.
Balcazar;Book;Schoning(84): Sparse oracles, lowness and highness. 11.MFCS,
LNCS 176, 185-193.
Barrington(86):Bounded-width polynomial-size branching programs recognize exactly those languages in NC1 . 18.STOC, 1-5.
Barth(80): Monotone Bilinearformen. TR Univ. Saarbr
ucken.
Bassalygo(82): Asymptotically optimal switching circuits. Probl.Inf.Transm.17,
206-211.
Batcher(68): Sorting networks and their applications. AFIPS 32, 307-314.
Beame(86a): Limits on the power of concurrent-write parallel machines. 18.STOC,
169-176.
Beame(86b): Lower bounds in parallel machine computation. Ph.D.Thesis, Univ.
Toronto.
443
444
445
functions.
Hedtst
uck(85): Uber
die Argumentkomplexitat Boolescher Funktionen. Diss. Univ.
Stuttgart.
Henno(79): The depth of monotone functions in multivalued logic. IPL 8, 176-177.
446
447
448
449
McColl;Paterson(77): The depth of all Boolean functions. SIAM J.on Comp.6, 373380.
McColl;Paterson(84): The planar realization of Boolean functions. TR Univ. Warwick.
McKenzie;Cook(84): The parallel complexity of some Abelian permutation group.
TR Univ. Toronto.
Mead;Conway(80): Introduction to VLSI systems. Addison Wesley.
Mead;Rem(79): Cost and performance of VLSI computing structures. IEEE J.Solid
State Circuits 14, 455-462.
Mehlhorn(77): Eziente Algorithmen. Teubner.
Mehlhorn(79): Some remarks on Boolean sums. Acta Inform.12, 371-375.
Mehlhorn;Galil(76): Monotone switching circuits and Boolean matrix product.
Computing 16, 99-111.
Mehlhorn;Preparata(83): Area-time optimal VLSI integer multiplier with minimum
computation time. IC 58, 137-156.
Mehlhorn;Vishkin(84): Randomized and deterministic simulations of PRAMs by
parallel machines with restricted granularity of parallel memories. Acta
Inform.21, 339-374.
Mendelson(82): Boolesche Algebra und logische Schaltungen. McGraw-Hill.
Meyer;Stockmeyer(72): The equivalence problem for regular expressions with squaring requires exponential time. 13.SWAT, 125-129.
Meyer auf der Heide(84): A polynomial linear search algorithm for the n-dimensional
knapsack problem. JACM 31, 668-676.
Meyer auf der Heide(86): Ecient simulations among several models of parallel
computers. SIAM J.on Comp.15, 106-119.
Mileto;Putzolu(64): Average values of quantities appearing in Boolean function minimization. IEEE Trans.El.Comp.13, 87-92.
Mileto;Putzolu(65): Statistical complexity of algorithms for Boolean function minimization. JACM 12, 364-375.
Miller,R.E.(79): Switching theory. Robert E.Krieger Publ.Comp.
Miller,W.(75): Computer search for numerical instability. JACM 22, 512-521.
Muller;Preparata(75): Bounds to complexities of networks for sorting and switching.
JACM 22, 195-201.
Muller;Preparata(76): Restructing of arithmetic expressions for parallel evaluation.
JACM 23, 534-543.
Muroga(79): Logic design and switching theory. John Wiley.
Nechiporuk(66): A Boolean function. Sov.Math.Dokl.7, 999-1000.
Nechiporuk(71): On a Boolean matrix. Syst.Th.Res.21, 236-239.
Nigmatullin(67a): Certain metric relations in the unit cube. Discr.Anal.9, 47-58.
Nigmatullin(67b): A variational principle in an algebra of logic. Discr.Anal.10,
69-89.
450
451
452
453
454
455
Wei(83): An n3 2 lower bound on the monotone complexity of Boolean convolution. IC 59, 184-188.
Weyh(72): Elemente der Schaltungsalgebra. Oldenbourg.
Wilson(83): Relativized circuit complexity. 24.FOCS, 329-334.
Wippersteg(82): Einige Ergebnisse f
ur synchrone Schaltkreise. Dipl.arb. Univ.
Bielefeld.
Yablonskii(57): On the impossibility of eliminating trial of all functions from P2 in
solving some problems on circuit theory. Dokl.Akad.Nauk.USSR 124, 44-47.
Yao(83): Lower bounds by probabilistic arguments. 24.FOCS, 420-428.
Yao(85): Separating the polynomial-time hierarchy by oracles. 26.FOCS, 1-10.
Yap(83): Some consequences of non-uniform conditions of uniform classes. TCS 26,
287-300.
Zak(84): An exponential lower bound for one-time-only branching programs.
11.MFCS, LNCS 176, 562-566.
FCT
FOCS
IC
ICALP
IPL
JACM
JCSS
LNCS
MFCS
STACS
STOC
SWAT
TCS
TCS-GI
TR
456
Index
addition 7,39,124,313,322,341,348
ane function 251
Ajtai,Komlos and Szemeredi sorting
network 152
almost all 87
arithmetic circuit 64
Ashenhurst decomposition 304
basis 7
Batcher sorting network 149
bilinear function 169
bipartite matching 310
Boolean convolution 58,168,209,350
Boolean matrix product 78,107,170,350
Boolean sum 36,107,163,213
BPP 286
branching program 414
canonical slice 203
carry function 39,226,341
carry-look-ahead-adder 83
carry save adder 51
cell partition 388
central slice 204
Chinese Remainder Theorem 61
circuit 7
circuit size 9
circuit value problem 310
clause 36
clique function 107,184,189,192,203,
204,257,270,384,421,422,427,436
clique-only function 430,438
communication width 363
comparison function 143,322
complete basis 9
conditional sum adder 40
conguration 278
conjunctive normal form 5
connectivity 85,309
constant depth reducibility 312
counting function 74,123,127,252,314
CO WRAM 363
CRCW PRAM 363
CREW PRAM 362
critical complexity 379
data rate 348
de Morgan laws 4
decision tree 419
decoding circuit 90
depth 9
determinant 81,256,262,343,422
direct product 301
Discrete Fourier Transform 63
disjoint bilinear form 215
disjunctive normal form 5
division 67,308
dual function 148
elimination method 121
elusive function 418
equality test 126
EREW PRAM 362
essential dependence 19,120
Eulerian cycle 309
evasive function 418
exactly - k - function 74,426
explicitly dened function 119,139
exponential growth 17
EXP TAPE HARD 139
f an-in 11
fan-out 10
Fast Fourier Transform 62
Fischer, Meyer and Paterson method
251
formula 12
formula size 12
gate 7
generating circuit 283
graded set of Boolean functions 389
graph property 402
Hamiltonian circuit 217,426
hierarchy 296,337,394,436
(h k)-disjoint Boolean sum 163
Hodes and Specker method 249
homogeneous function 107
Horner scheme 227
implicant 24
Karnaugh diagram 30
Krapchenko method 258
(k s) - Lupanov representation 91
Las Vegas algorithm 286
logical level 23,320
457
prime implicant 24
probabilistic computation model 285,
352
processor partition 388
programmable circuit 110
projection reducibility 146,309
pseudo complement 195
quadratic function 107
Quine and McCluskey algorithm 25
radix - 4 -representation 54
random restriction 326
read-once-only branching program
423
replacement rule 160
ring sum expansion 6
root of identity 62
satisability problem 270,289
Schonhage and Strassen algorithm 56
selection function 20,218
self reducibility 289
semi-disjoint bilinear form 169
sensitive complexity 374
set circuit 208
set cover problem 33
Shannon eect 88
Shannons counting argument 87
k -circuit 320
single level 236
size 9
SIZE - DEPTH(poly const) 312
slice function 195
sorting function 148,158,313
sorting network 74,148
space complexity 269
Stockmeyer hierarchy 270
storage access function 76,123,374,420
storage access function for indirect
addressing 255,422
Strassen algorithm 79
strong connectivity 310
subtraction 50
symmetric function 74
synchronous circuit 340
table of prime implicants 27
threshold function 74,107,127,148,154,
196,235,239,243,250,313,323,357,422
458