Lecture Logical Effort
Lecture Logical Effort
Dinesh Sharma
EE Department
IIT Bombay, Mumbai
CL
dVout dVout
Idp = CL So dt = CL for rise time
dt Idp
VDD
When the input is high,
V2
CL dVout
Z
τL =
K ′ W /L V1 f (Vout , Vgs )
In digital design, we keep the channel length at its minimum value, so
L is a constant. Let us initially ignore the parasitic capacitances. We
can see that
W τL C
= Constant so τL ∝ L
CL W
This tells us that the delay associated with a gate charging a load
capacitor scales directly with CL and inversely with W , the width of the
charging/discharging transistor. This linear dependence permits us to
design logic stages easily.
Tapered Buffer
Tapered Buffer
Tapered Buffer
Tapered buffer
s1 s2 si sn-1 sn
CL
Tapered Buffer
Tapered Buffer
Tapered Buffer
Only two terms in the sum contain si . Since all scale factors si are
independent, the derivative of all the rest of the terms is 0. Therefore,
1 si+1 si si+1
− =0 Which gives: =
si−1 si2 si−1 si
Tapered Buffer
Tapered Buffer
The first stage of the tapered buffer has a size which can be
driven by any CMOS gate. Its input capacitance corresponding to
this size is Cin .
Each subsequent stage has a drive capability which is ρ times the
drive capability of the previous stage.
Since the drive capability is being stepped up by ρ in n stages, we
should have
CL 1/n
n CL
ρ = so ρ =
Cin Cin
We define the ratio H ≡ CL /Cin .
Tapered Buffer
CL ln H
ρn = = H, so n =
Cin ln ρ
ln H ρ
dtotal = nρτ = ρ τ = τ ln H
ln ρ ln ρ
In order to minimize dtotal , we set its derivative with respect to ρ to 0.
This gives
1 ρ 1 1 1
τ ln H − 2
= 0 which leads to =
ln ρ (ln ρ) ρ ln ρ (ln ρ)2
Thus we obtain the result that the optimum stage ratio for a tapered
buffer is e.
ln H C
n= = ln H = ln L
ln ρ Cin
We have found that the optimum stage ratio for a tapered buffer is
e, and the number of inverters in the chain is ln(Cout /Cin ).
These results were computed for a situation where the only logic
gates used were inverters, and loading due to driver transistors
themselves in the logic gate was ignored.
Now we would like to see how to optimize the delay for the general
case, where any logic gate can be used in multi-stage logic and
the effect of self loading is not ignored.
We would also like to take the realistic case, where the logic path
is not a linear chain and outputs have a fanout 6= 1.
Effects of Self-Loading
The input capacitance of the next stage is not the only load on a
logic gate.
In addition to the input capacitance of the next stage, a logic gate
has to drive the capacitance associated with its own output
transistors.
This loading comes from the drain capacitance of the output
transistors as well as their drain to gate capacitance.
This additional capacitance is proportional to width of the driver
transistors W . Thus,
CL = Cext + Cp W
Where Cp is the parasitic capacitance per unit width of driver
transistors.
Thus, if we make a logic gate larger, its parasitic load also increases
proportionally.
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 17 / 111
Generalizing the Tapered Buffer Effects of Self-Loading
Effects of Self-Loading
CL = Cext + WCp
Effects of Self-Loading
Cext
τL = const. × + τp
Cin
Effect of Branching
Effect of Branching
other gates
In general, a logic chain will contain
points where multiple gates are
Path under driven by a stage.
Stage i Stage i+1 consideration
The effect of this branching (or
fanout) must be taken into account
other gates
while computing delay.
If a stage drives multiple gates, its actual load is the sum of the input
capacitances of all branches that it drives.
Thus the delay of this gate is higher by the factor Ctotal /Conpath
compared to the delay if it was driving only one path.
Effect of Branching
Logical Effort
d = f + p = gh + p
Cout
h=
Cin
γ γ γ γ In1 nγ
In1 In2 Inn
Out In2 nγ
Inn n
Inn nγ
In2 n Out
VDD
γ γ γ γ
Each input of the n input NAND gate loads
In1 In2 Inn
its driver with (n + γ) units of capacitance.
Out
Inn n The minimum inverter loads its driver by
(1 + γ) units.
So the logical effort of an n input NAND
In2 n
gate is (n + γ)/(1 + γ).
In1 n
This reduces to 4/3 for a 2 input NAND
n input NAND
with γ = 2, as expected.
VDD
Each input of the n input NOR gate loads
In1 nγ
its driver with (1 + nγ) units of
In2 nγ capacitance.
The minimum inverter loads its driver by
(1 + γ)units.
Inn nγ
Out
So the logical effort of an n input NOR
In1 In2 Inn
gate is (1 + nγ)/(1 + γ).
1 1 1 1
This reduces to 5/3 for a 2 input NOR with
n input NOR γ = 2, as expected.
VDD
Sel Sel
2γ 2γ Each input of the multiplexer is loaded
In1 In2 with a capacitance ∝ (2 + 2 γ).
2γ 2γ
The minimum inverter loads its driver by
Out (1 + γ)units.
2 2
So the logical effort for the multiplexer is
Sel Sel
2 2 (2 + 2 γ)/(1 + γ) = 2.
It is interesting to see that the logical effort will remain 2 for every data
input even when we parallel n tri-stateable inverters to form an n input
mux.
Parasitic Delay
Given the series/parallel connections of any logic gate and the rules
associated with these connections, we can compute the logical effort
of the gate.
Parasitic Delay
VDD
VDD
Effect of Branching
Branching Effort
The branching effort is defined as
Cin2 Cin3 C C
··· L = L
Y
H= hi =
Cin1 Cin2 Cinn Cin1
i
The branching effort b provides the correction for the actual loading
seen by a logic stage.
Path Effort
We can now define the path effort, F as the product of all logical efforts
and branch corrected electrical efforts.
N
Y N
Y N
Y N
Y N
Y
F = gi bi hi = gi bi hi
i=1 i=1 i=1 i=1 i=1
So,
N
Y N
Y N
Y
F = GBH where G = gi , B = bi , and H = hi
i=1 i=1 i=1
Path Effort
The equation that defines the path effort looks quite similar to the
definition of the stage effort.
Notice, however, that unlike the stage effort f , the path effort F
does not define the delay of the path.
The total delay is the sum of individual delays and not their
product. X X X
D= di = gi bi hi + pi
Still, F is a useful quantity for optimisation of path delays as we
shall see later.
Path Delay
We need to adjust the size of each stage (and hence Ci ) such that
D is minimum.
Therefore we should set the partial derivative of the expression for
D with respect to each of Ci to 0.
All pi are size independent, and therefore give 0 on differentiating
with respect to Ci .
Only two terms in the first sum involve Ci . These are
gi−1 bi−1 Ci /Ci−1 + gi bi Ci+1 /Ci.
∂D g b gbC
= 0 = i−1 i−1 − i i 2i+1
∂Ci Ci−1 Ci
Ci C
This leads to gi−1 bi−1 = gi bi i+1
Ci−1 Ci
So gi−1 bi−1 hi−1 = gi bi hi for all i
The path delay is minimized when each stage in the path has the
same stage effort, f = gbh.
Since the Path Effort F is the product of all stage efforts and the
stage effort has to be equal for all stages for minimum delay, we
must have f̂ = F 1/N . (A hat over a symbol indicates an expression
that achieves minimum delay.)
For this optimum effort, we obtain
D̂ = NF 1/N + P = N(GBH)1/N + P
f̂ (GBH)1/N
h= =
gb gb
Cini+1 Cini+1 (GBH)1/N
Since hi ≡ , =
Cini Cini gb
This gives
gi bi
Cini = Cini+1
(GBH)1/N
We can use this recursive relation for computing the scale, and hence
transistor sizes for all stages, starting with the last one.
The path logical effort, G, is the product of the logical efforts of the
logic gates along the path. In the following example, we assume γ = 2
and pinv = 0.6.
G = 10/3 × 1 = 3.33 for configuration a,
G = 6/3 × 5/3 = 3.33 for case b, and
G = 4/3 × 5/3 × 4/3 × 1 = 2.96 for configuration c.
Since there is no branching, B = 1.
We estimate the parasitic delay of n input NANDs and NORs to be
n pinv .
We can write the total delay for the three configurations as:
It is clear from these equations that case b will always be better than a.
Let us take the example of 8-input AND circuit to see how all
geometries can be calculated using logical effort.
We take the input capacitance of a minimum inverter as the unit of
capacitance.
The unit of time is τ , the delay of a reference inverter driving
another identical reference inverter excluding its parasitic delay.
The unit of transistor width will be the width of the n transistor in
the reference inverter.
We shall take γ = 2 and pinv = 0.6 in this example.
4 5 4 64
G= × × × 1 = 80/27 = 2.963 B = 1, H= = 16
3 3 3 4
We begin from the load end in this example. The last stage is an
inverter, with g = 1, b = 1.
Cout 64
h = 2.624 = =
Cin Cin
64
g = 4/3 g = 5/3 g = 4/3 g=1 So Cin = = 24.39
p = 2 pinv p = 2 pinv p = 2 pinv p = pinv 2.624
So the final stage inverter is scaled up by 24.39 compared to the
reference inverter. Taking the n channel transistor width in the
reference inverter as the unit,
n-channel transistor width = 24.39
p-channel transistor width = γ × 24.39 = 48.78.
5
f̂ = 2.624 = gbh = ×1×h. So h = 1.5744
3
Cout 12.3936
h = 1.5744 = =
Cin Cin
12.3936
g = 4/3 g = 5/3 g = 4/3 g=1
= 7.872
Therefore Cin =
p = 2 pinv p = 2 pinv p = 2 pinv p = pinv
1.5744
Scale factor for this stage is 7.872/g = (7.872 × 3)/5 = 4.7232.
The reference 2 input NOR gate has n channel transistor width = 1,
and p channel transistor width = 4.
n-channel transistor width = 4.72,
p channel transistor width = 4 × 4.7232 = 18.89
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 71 / 111
Design of multi-stage logic Example: An 8-input AND network
4
f̂ = 2.624 = gbh = ×1×h. So h = 1.9680
3
Cout 7.872
h = 1.968 = =
Cin Cin
7.872
g = 4/3 g = 5/3 g = 4/3 g=1
=4
Therefore Cin =
p = 2 pinv p = 2 pinv p = 2 pinv p = pinv
1.968
This agrees with our specification that the input capacitance of the first
stage should be equivalent to 4 inverters.
The scale factor will be 4/g = 4/(4/3) = 3. In the reference NAND, all
transistors have a width = 2. so in the first stage, all transistors will
have a width = 3 × 2 = 6.
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 72 / 111
Design of multi-stage logic Example: An 8-input AND network
Stage I II III IV
Logic type 2in NAND 2in NOR 2in NAND Inverter
g 4/3 5/3 4/3 1
Cin 4 7.87 12.39 24.39
Scale Factor 3 4.7232 9.2952 24.39
n width 6 4.72 18.59 24.39
p width 6 18.89 18.59 48.78
Parasitic Delay 1.2 1.2 1.2 0.6
Additional inverters in the logic chain permit sharing the effort over
a larger number of stages, which can reduce the total delay.
We first find the optimum value of N.
If N > n1 , we shall have the opportunity of reducing the delay by
adding inverters.
n1
X
D̂ = NF 1/N + pi + (N − n1 )pinv
i=1
The first term in the equation above is the effort delay of N stages.
The sum of parasitic delays of n1 logic gates gives us the second
term.
Finally, the parasitic delay of n2 = N − n1 inverters gives the third
term.
We define the optimum stage effort ρ ≡ F 1/N . Then
n1
X
D̂ = N(ρ + pinv ) + pi − n1 pinv
i=1
n1
X
D̂ = N(ρ + pinv ) + pi − n1 pinv
i=1
1 n
ln F X
D̂ = (ρ + pinv ) + pi − n1 pinv
ln ρ
i=1
∂ D̂ ln F 1 ln F
=0=− 2
· · (ρ + pinv ) + (1)
∂ρ (ln ρ) ρ ln ρ
Therefore,
ln F ln F 1
= 2
· · (ρ + pinv )
ln ρ (ln ρ) ρ
and so,
1
1= (ρ + pinv )
ρ ln ρ
1
1= (ρ + pinv )
ρ ln ρ
This gives
ρ + pinv = ρ ln ρ
Which can be written as
pinv + ρ(1 − ln ρ) = 0
pinv + ρ(1 − ln ρ) = 0
This equation cannot be solved in closed form and either iterative
solutions or graphical solutions have to be used to determine ρ from
pinv .
In the special case when pinv = 0, we have
pinv + ρ(1 − ln ρ) = 0
For non-zero values of pinv , this equation can be solved iteratively
using Newton Raphson technique.
We define
f (ρ) = ρ(1 − ln ρ) + pinv = 0
′ 1
Then f ρ = (1 − ln ρ) + ρ − = − ln ρ
ρ
Let us illustrate the iterative method by taking pinv = 1.
We know that for pinv = 0, the value of ρ is e. A guess value to start
iterations can be ρ = 3.
ρ ρnext = (ρ + pinv )/ ln ρ
3.0000 3.6410
3.6410 3.5914
3.5914 3.5911
3.5911 3.5911
7
We can also solve the above 6.5 ρ ln ρ
equation graphically by plotting 6
Stage delay
The table below gives the values of ρ and the corresponding stage
delay for several values of parasitic delay p.
p ρ ln ρ d =ρ+p
0 2.718 (e) 1.000 2.718
0.2 2.912 1.069 3.11
0.4 3.093 1.129 3.49
0.6 3.266 1.184 3.87
0.8 3.432 1.233 4.23
1.0 3.591 1.278 4.59
1.5 3.967 1.378 5.47
2.0 4.319 1.463 6.32
Given the value of f , we can start from the last stage and work
backwards as earlier to calculate all transistor geometries.
For the last stage the output capacitance is known (=CL ). The
input capacitance can be calculated from
gN
CinN = C
f L
This gives the scale factor for this stage from which, geometries of
transistors in the last stage can be computed.
For each preceding stage, we use the recursive relation
gi bi
Cini = Cini+1
f
From Cini , we can calculate the scale factor, and hence the
geometry of all transistors for this stage.
We would like to know how much the path delay changes if the number
of stages deviates from the optimum.
What is a fork?
We often need a signal and its complement simultaneously. When
the signal changes its value, the complement should also change
at the same time.
Otherwise, there will be a short interval during which both the
signal and its complement are TRUE or FALSE simultaneously.
This can lead to malfunction.
The trivial solution of using an inverter to generate the
complement will not meet this requirement – since in this case,
the complement will switch an inverter delay later.
We can meet the requirement of a signal and its complement
changing almost simultaneously by using a “fork” – which is a
circuit with a common input feeding two branches with different
number of inversions.
The delay of the two branches must be equalized as closely as
possible.
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 95 / 111
Fork Design
Typically, the two branches of a fork will have n and (n+1) inverters.
4r A fork is named with the number of
256 inverters in its branches.
4
For example the circuit on the left is
a 3-4 fork.
4 (1-r) 512
Specifications for the fork will include
the total capacitive load placed by
both branches at the input node.
The terminal loads at the two outputs need not be the same!
This is because typically, one arm of the fork drives nMOS transistors
while the other drives pMOS transistors – and their sizes may not be
the same.
Designing a Fork
It is easy to design each individual arm of the fork for minimum
delay using logical effort techniques.
However, how do we ensure that the optimum delays of the two
arms are equal?
To equalize the delays, we use the branching delay to balance the
difference of delays in the two arms.
4r
256
4 Take the 3-4 fork shown on the left as an
example. The specification demands that
4 (1-r)
the two branches together should place
512
a load of 4 on the upstream driver.
Designing a Fork
Notice that for a fork with n and n+1 inverters, the difference of delay is
a smaller fraction of the total delay if n is large.
The choice of n is a trade off between the robustness of delay
matching and power dissipation as well as complexity.
4r By chosing an appropriate value of r
4
256 between 0 and 1, we can adjust the
optimized delays of the two
branches, such that these are equal.
4 (1-r) 512
Design of a fork essentially requires
the evaluation of such a suitable
value for the parameter r.
Design Example
Design Example
We want to evaluate the value of r such that the optimum delay in the
two branches is equal.
4r
4
256 All g and b values are 1 in this example.
Since the input capacitance is dependent
4 (1-r) on r, the value of H is not known and
512
depends on r.
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 100 / 111
Fork Design Design Example
4r
256 For the upper branch, the input
4
capacitance is 4r , while the output
capacitance is 256.
4 (1-r) 512 Thus H1 = 256/4r = 64/r .
All g and b values are 1.
1/3
64
F1 = 64/r , and correspondingly, fˆ1 = = 4r −1/3
r
The delay through the upper arm of the fork is
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 101 / 111
Fork Design Design Example
4r
For the lower branch, the input
256
4 capacitance is 4(1 − r ), while the
output capacitance is 512.
4 (1-r) 512
Thus
H2 = 512/4(1 − r ) = 128/(1 − r ).
All g and b values are 1,
1/4
128 128
F2 = , and correspondingly, fˆ2 = = 3.3636(1−r )−1/4
1−r 1−r
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 102 / 111
Fork Design Design Example
Equalizing Delays
13.4543 12
f ′ (r ) = − (1−r )−5/4 (−1)+ r −4/3 = 3.3636(1−r )−5/4 +4r −4/3
4 3
We can now solve this non-linear equation using Newton Raphson
iterations.
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 103 / 111
Fork Design Design Example
Iterative Solution
Taking the initial guess for r as 0.5, successive values for r can be
tabulated as:
r f(r) f’(r) next r
0.5 2.88095 18.0794 0.34065
0.34065 -0.251373 22.4743 0.351835
0.351835 -0.00333406 21.8878 0.351987
0.351987 -5.78369e-07 21.8803 0.351987
0.351987 -1.42109e-14 21.8803 0.351987
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 104 / 111
Fork Design Design Example
4 4 4
fˆ1 = 1/3 = 1/3
= = 5.665232
r 0.351987 0.70606
All stages are inverters with g = 1, b = 1.
Since f̂ = gbh = 5.665232, h = 5.665232 for all stages.
The first inverter should have an input capacitance of 4r = 1.408
The next inverter should have an input capacitance of
1.408 × h = 1.408 × 5.665232 = 7.976.
Input capacitance for the final inverter will be
7.976 × h = 7.976 × 5.665232 = 45.188.
The final inverter can drive a load of 45.188 × 5.665232 = 256 as
required.
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 105 / 111
Fork Design Design Example
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 106 / 111
Fork Design Design Example
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 107 / 111
Fork Design Design Example
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 108 / 111
Fork Design Design Example
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 109 / 111
Fork Design Design Example
To see the robustness of the design, Let us assume that the actual
load capacitors in both the branches are higher by 10%.
Without changing inverter sizes, what are the delays with the changed
values and how much is the difference in delays of the two branches?
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 110 / 111
Fork Design Design Example
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 111 / 111