0% found this document useful (0 votes)
223 views111 pages

Lecture Logical Effort

This document discusses designing multi-stage logic circuits. It begins by analyzing the delay of a single-stage inverter circuit. It then introduces the concept of a tapered buffer, which uses multiple progressively larger inverters to drive a large capacitive load. The optimal design for a tapered buffer is described, where the size of each successive inverter stage is a constant factor larger than the previous stage. This minimum-delay tapered buffer design has all stage ratios equal to the same value ρ.

Uploaded by

Yagami Light
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
223 views111 pages

Lecture Logical Effort

This document discusses designing multi-stage logic circuits. It begins by analyzing the delay of a single-stage inverter circuit. It then introduces the concept of a tapered buffer, which uses multiple progressively larger inverters to drive a large capacitive load. The optimal design for a tapered buffer is described, where the size of each successive inverter stage is a constant factor larger than the previous stage. This minimum-delay tapered buffer design has all stage ratios equal to the same value ρ.

Uploaded by

Yagami Light
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

Designing Multi-Stage Logic

Dinesh Sharma

EE Department
IIT Bombay, Mumbai

September 29, 2021

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 1 / 111


Single stage delay

Single Stage Delay


In a CMOS gate, for any digital input, either the pull down or the pull up
net work is OFF.
VDD
ViL

When the input is low, VoH

CL
dVout dVout
Idp = CL So dt = CL for rise time
dt Idp
VDD
When the input is high,

dVout dVout VoL


Idn = −CL So dt = −CL for fall time
dt Idn ViH

Where the transistor currents are of the form


W
Id = K ′ f (Vds , Vgs ) Where K ′ = µCox
Dinesh Sharma (IIT B) L Logical Effort September 29, 2021 2 / 111
Single stage delay

Single Stage Delay

Integrating from the start voltage V1 to end voltage V2 , we get


expressions of the type
V2
C dVout
Z
τL = ′ L
K W /L V1 f (Vout , Vgs )

where τL is the time taken to charge/discharge the load capacitor


CL from V1 to V2 and K ′ is the conductance factor given by µCox .
The right hand side of this equation is a definite integral. It will
evaluate to some constant depending on the voltages defining the
‘High’ and ‘Low’ logic levels, the supply voltage, turn on voltages,
drain saturation voltage etc.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 3 / 111


Single stage delay

Single Stage Delay

V2
CL dVout
Z
τL =
K ′ W /L V1 f (Vout , Vgs )
In digital design, we keep the channel length at its minimum value, so
L is a constant. Let us initially ignore the parasitic capacitances. We
can see that
W τL C
= Constant so τL ∝ L
CL W
This tells us that the delay associated with a gate charging a load
capacitor scales directly with CL and inversely with W , the width of the
charging/discharging transistor. This linear dependence permits us to
design logic stages easily.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 4 / 111


Tapered Buffer

Tapered Buffer

As an example of multi-stage logic, let us take the case when a large


capacitor is to be driven by a CMOS circuit.
A minimum sized inverter will take too long to charge this
capacitor.
Therefore, we would like to scale up the inverter (multiply all
transistor widths by a scale factor) in order to drive this large
capacitor.
However, the input capacitance of this scaled up inverter may be
too large for a minimum sized inverter to drive!
Therefore, we need a medium sized inverter to drive the large final
inverter.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 5 / 111


Tapered Buffer

Tapered Buffer

To drive a large capacitive load, we need a wide transistor, and so


we scale up the inverter driving this load.
To drive this scaled up inverter, we need another inverter of
medium size.
We keep adding inverters, till the first inverter in the chain is small
enough to be driven by standard CMOS logic.
This kind of buffering is referred to as a tapered buffer.
Tapered buffer
s1 s2 si sn-1 sn
CL

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 6 / 111


Tapered Buffer

Tapered Buffer

Tapered buffer
s1 s2 si sn-1 sn
CL

How do we decide the number of inverters to include in this chain?


What should be the scale factors for each successive stage to
minimize the total delay?

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 7 / 111


Tapered Buffer

Tapered Buffer

Let the i’th inverter in the chain


Tapered buffer
s1 s2 si sn-1 sn
be scaled up by a factor si
CL relative to a minimum sized
inverter.
Let the delay of a minimum
sized inverter driving another
minimum sized inverter be τ .
The i’th inverter provides charging current which is si times the
minimum sized inverter.
However, the load it sees is si+1 times the input capacitance of a
minimum inverter.
si+1
Therefore the delay associated with the i’th stage is si τ .

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 8 / 111


Tapered Buffer

Tapered Buffer

The delay associated with the i’th


s
Tapered buffer stage is i+1
si τ .
s1 s2 si sn-1 sn So the total delay of the inverter
CL
chain is given by
n n
X si+1 X si+1
dtotal = τ =τ
si si
1 1
In order to minimize the total delay, we should put the partial derivative
with respect to each of the si equal to zero. Therefore,
 
d s2 si si+1
τ + ··· + + + ··· = 0
dsi s1 si−1 si

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 9 / 111


Tapered Buffer

Tapered Buffer

Total delay is minimum when


 
d s2 s s
τ + · · · + i + i+1 + · · · = 0
dsi s1 si−1 si

Only two terms in the sum contain si . Since all scale factors si are
independent, the derivative of all the rest of the terms is 0. Therefore,

1 si+1 si si+1
− =0 Which gives: =
si−1 si2 si−1 si

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 10 / 111


Tapered Buffer

Tapered Buffer

For minimum delay through the tapered buffer, we must have


si si+1
=
si−1 si

The stage ratio, which is the factor by which an inverter is larger


than the previous one, should be the same for all stages.
Let this constant stage ratio for the tapered buffer be ρ.
si+1
The delay contributed by the i’th stage is si τ = ρτ .
The total delay of n stages is then nρτ .

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 11 / 111


Tapered Buffer

Tapered Buffer

The first stage of the tapered buffer has a size which can be
driven by any CMOS gate. Its input capacitance corresponding to
this size is Cin .
Each subsequent stage has a drive capability which is ρ times the
drive capability of the previous stage.
Since the drive capability is being stepped up by ρ in n stages, we
should have
CL 1/n
 
n CL
ρ = so ρ =
Cin Cin
We define the ratio H ≡ CL /Cin .

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 12 / 111


Tapered Buffer

Tapered Buffer

CL ln H
ρn = = H, so n =
Cin ln ρ
ln H ρ
dtotal = nρτ = ρ τ = τ ln H
ln ρ ln ρ
In order to minimize dtotal , we set its derivative with respect to ρ to 0.
This gives
 
1 ρ 1 1 1
τ ln H − 2
= 0 which leads to =
ln ρ (ln ρ) ρ ln ρ (ln ρ)2

Thus ln ρ = 1, and therefore ρ = e

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 13 / 111


Tapered Buffer

Thus we obtain the result that the optimum stage ratio for a tapered
buffer is e.

The optimum number of stages in the buffer is given by

ln H C
n= = ln H = ln L
ln ρ Cin

Obviously, the number of inverters to put in the chain will be given by


the nearest integer to this value.

CL /Cin is a given design parameter. We take the integer n closest to


ln(CL /Cin ) and make a tapered inverter with n stages, with a stage
ratio of e ≈ 2.718.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 14 / 111


Generalizing the Tapered Buffer

Generalizing the Tapered Buffer

We have found that the optimum stage ratio for a tapered buffer is
e, and the number of inverters in the chain is ln(Cout /Cin ).
These results were computed for a situation where the only logic
gates used were inverters, and loading due to driver transistors
themselves in the logic gate was ignored.
Now we would like to see how to optimize the delay for the general
case, where any logic gate can be used in multi-stage logic and
the effect of self loading is not ignored.
We would also like to take the realistic case, where the logic path
is not a linear chain and outputs have a fanout 6= 1.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 15 / 111


Generalizing the Tapered Buffer

Delay in the General Case

To obtain delay for a general chain of logic gates, we need to:


Take the effects of self-loading into account.
Consider the delay introduced by logic gates other than inverters.
Include the effect of branching or fanout, which will occur in a
general logic chain.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 16 / 111


Generalizing the Tapered Buffer Effects of Self-Loading

Effects of Self-Loading
The input capacitance of the next stage is not the only load on a
logic gate.
In addition to the input capacitance of the next stage, a logic gate
has to drive the capacitance associated with its own output
transistors.
This loading comes from the drain capacitance of the output
transistors as well as their drain to gate capacitance.
This additional capacitance is proportional to width of the driver
transistors W . Thus,
CL = Cext + Cp W
Where Cp is the parasitic capacitance per unit width of driver
transistors.
Thus, if we make a logic gate larger, its parasitic load also increases
proportionally.
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 17 / 111
Generalizing the Tapered Buffer Effects of Self-Loading

Effects of Self-Loading

The self loading parasitic capacitance is proportional to W . Thus,

CL = Cext + WCp

Where Cp is the parasitic capacitance per unit width of driver


transistors.
If we make a logic gate larger, its parasitic load also increases
proportionally.
Therefore the delay of the stage is

CL Cext + WCp Cext


τL ∝ = Which gives τL ∝ + Cp
W W W
Thus the parasitic delay associated with self-loading is
independent of W.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 18 / 111


Generalizing the Tapered Buffer Effects of Self-Loading

Effects of Self-Loading

Our earlier expression for gate delay was τL ∝ Cnext /W where


Cnext was the capacitance presented the next logic gate input.
For taking the self-loading into account, we just have to add a
constant to this delay, so that τL ∝ Cext /W + Cp Where Cp is
independent of W.
we can write the above expression as

Cext
τL = const. × + τp
Cin

Where τp is the parasitic delay associated with a logic gate driving


its own output capacitance, and is independent of the size of the
driving logic gate or the load.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 19 / 111


Generalizing the Tapered Buffer Using Logic Gates Other Than Inverters

Multi-stage Logic with Other Gates


Transistor sizes in a gate are adjusted based on several
considerations.
1 Difference in mobility of pMOS and nMOS transistors. A pMOS
transistor has to be wider than an nMOS transistor to provide the
same drive current because the hole mobility is lower than
electrons mobility. For example, we may scale the width of pMOS
transistors to be double that of nMOS transistors connected in the
same configuration. This ratio is represented by γ.
2 If there are n series connected transistors, their widths should be
scaled up by n. in order to make the output drive of the logic gate
equivalent to an unscaled inverter.
3 After accounting for mobility differences and series connections,
we may scale up all transistors by some factor in order to drive
larger loads. (This was Si in our earlier discussion).

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 20 / 111


Generalizing the Tapered Buffer Using Logic Gates Other Than Inverters

Load presented by 2 input NAND and NOR gates to their drivers.


VDD VDD VDD
In1 In2 In2
2/1 2/1 2/1 4/1
In Out Out In1
4/1
Out
2/1
1/1
1/1 1/1
2/1

Inverter NAND NOR

Here we have assumed γ = 2.

Transistor sizes are adjusted to account for mobility differences


and series connection.
Further, we may scale up all transistors by some factor in order to
drive larger loads.
This has an an impact on the capacitive loading placed on the
previous stage.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 21 / 111


Generalizing the Tapered Buffer Using Logic Gates Other Than Inverters

Load presented by 2 input NAND and NOR gates to their drivers.


VDD VDD VDD
In1 In2 In2
2/1 2/1 2/1 4/1
In Out Out In1
4/1
Out
2/1
1/1
1/1 1/1
2/1

Inverter NAND NOR

While an inverter places a load of 3 units on the previous gate, a 2


input NAND gate with the same output drive loads the previous
stage with a capacitance of 4 units.
A 2 input NOR gate loads the previous stage with a capacitance of
5 units.
We must account for a loading factor for different gate types when
optimizing multi-stage logic.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 22 / 111


Generalizing the Tapered Buffer Using Logic Gates Other Than Inverters

Delay of Other Logic Gates

A minimum sized NAND gate presents a load capacitance which


is 4/3 times that presented by a minimum sized inverter. Similarly,
a minimum sized NOR gate loads the previous stage with a
capacitance which is 5/3 times that of a minimum sized inverter.
These gates then have an output drive which is the same as a
minimum inverter.
It is more convenient to find the delay of a logic gate that has the
same input capacitance as an inverter. (This way, the logic type
will affect its own delay and not that of the previous gate).
Therefore the delay of a NAND gate which presents the same
input capacitance as an inverter will be 4/3 times the inverter delay
and the NOR gate will have a delay 5/3 times that of an inverter.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 23 / 111


Generalizing the Tapered Buffer Effect of Branching

Effect of Branching

Effect of Branching

other gates
In general, a logic chain will contain
points where multiple gates are
Path under driven by a stage.
Stage i Stage i+1 consideration
The effect of this branching (or
fanout) must be taken into account
other gates
while computing delay.
If a stage drives multiple gates, its actual load is the sum of the input
capacitances of all branches that it drives.

Thus the delay of this gate is higher by the factor Ctotal /Conpath
compared to the delay if it was driving only one path.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 24 / 111


Generalizing the Tapered Buffer Effect of Branching

Effect of Branching

The stage driving multiple paths incurs higher delay compared to


the case when it drives only a single path.
We can compute the gate delay as if it was driving only the next
gate in the path, and then scale it up by the factor Ctotal /Conpath to
obtain the delay with fanout.
If there is no branching, this factor is just 1, because
Ctotal = Conpath .
Except for this correction, we can model the logic chain as if it is
just a linear path.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 25 / 111


Method of Logical Effort

Optimization of Multi-Stage logic

Now that we have found ways of take care of


1 Inclusion of Parasitic Delay
2 Use of logic gates other than inverters
3 Taking branching into account
We can optimize the cumulative delay of a general logic chain.
This treatement was first suggested by Ivan Sutherland, Bob Sproull
and David Harris. They expanded on this technique in a book that they
authored:
Logical Effort: Designing Fast CMOS Circuits
Ivan E. Sutherland, Bob F. Sproull, David L. Harris
Morgan Kaufmann Publishers, Inc.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 26 / 111


Method of Logical Effort

Generalized Logic Chain

We can generalize the optimization made for a tapered buffer to a


generic logic path by incorporating the corrections discussed above.
We can account for the parasitic delay by adding a size
independent delay. The parasitic delay depends only on the logic
type and not on the size of the gate.
We can account for different types of logic by scaling up the stage
delay of an inverter by the ratio of the input capacitances of the
gate and the inverter.
This correction factor is called the logical effort. This is also size
independent and depends only on the type of gate being used.
We can account for branching by scaling up the charge/discharge
time by the ratio of the total capacitance the gate drives with the
input capacitance of the next stage.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 27 / 111


Method of Logical Effort

Logical Effort

For delay calculation, we can treat a CMOS logic gate as an


inverter, with a delay which is scaled up depending on its input
capacitance relative to an inverter with the same output drive.
This correction factor is called the logical effort of this gate.
Logical effort is independent of sizing of the gate. A NAND gate
providing twice the output drive of a minimum inverter will have an
input capacitance which is 4/3 times that of an inverter sized up by
a factor of 2.
Thus, the logical effort depends on the logic function provided by a
gate and on nothing else. (It will depend on γ, of course - but that
is a technological constant).

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 28 / 111


Method of Logical Effort

Single gate delay with Logical Effort

We can now handle the delay of a logic gate in a general case.


Self Loading The effect of self loading can be incorporated in delay
calculation by using the expression d = f + p where f is
the effort delay, which depends on transistor currents and
load capacitance, while p is the parasitic delay, which is
size independent.
Logic type We further express f as a product of two quantities, g and
h. Here h is the electrical effort of this stage. This is given
by the ratio of output capacitance to input capacitance. g
is the logical effort which accounts for the extra loading
caused by a gate as compared to an inverter.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 29 / 111


Method of Logical Effort

Single gate delay with Logical Effort

The effort delay is the product of Logical Effort and electrical


effort: f = gh
We can express the delay introduced by a logic gate as

d = f + p = gh + p

all items in this equation are dimensionless.


Delay is measured in units of τ , which is the delay of a minimum
inverter driving another minimum inverter not including the
parasitic delay.
This way of expressing delays separates the effects of different
components which cause delay, so these can be handled
independently.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 30 / 111


Method of Logical Effort

Single gate delay with Logical Effort

Using the method of logical effort, we separate the effects of different


components which cause delay, so these can be handled
independently.
Dependence on technology is encapsulated in τ , which is the
delay of a minimum inverter driving another minimum inverter
excluding delay due to self loading.
All delays are expressed as a multiple of this quantity.
Thus delays in this formulation are unitless numbers and must be
multiplied with τ to get the absolute value of delay.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 31 / 111


Method of Logical Effort

Dependence on Gate Sizing

Dependence on sizing is encapsulated in h. This is the only


component of delay which is size dependent.
h is also a unit less quantity, because it is defined as the ratio of
output capacitance to input capacitance.

Cout
h=
Cin

These capacitances can be measured in units of the input


capacitance of an inverter.
Cin is a measure of the factor by which a gate has been scaled up
in order to make it faster.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 32 / 111


Method of Logical Effort

Dependence on Logic Type

Dependence on logic type is expressed through g. This accounts


for the series/parallel configuration of transistors in a particular
gate.
If the p type transistors are twice the size of n type transistors in
minimum inverters, the logical effort of NAND gates is 4/3, while
that of NOR gates is 5/3.
Logical effort of other gates can be calculated easily.
g depends on logic type and nothing else.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 33 / 111


Method of Logical Effort

Effect of Self Loading

The effect of parasitic delays due to the load presented by the


driver transistors themselves is expressed through p.
p is size independent.
It is also a dimensionless quantity as this delay is also expressed
in units of τ .

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 34 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Logical Effort for Common CMOS Gates

The logical effort of an inverter is, by definition, 1.


The logical effort of other logic functions depends on their circuit
topology.
N input NAND and NOR gates are shown in the figure below.
VDD VDD

γ γ γ γ In1 nγ
In1 In2 Inn
Out In2 nγ
Inn n

Inn nγ
In2 n Out

In1 In1 In2 Inn


n 1 1 1 1

n input NAND n input NOR

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 35 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Logical Effort for n input NAND

The n input NAND gate has n nMOS


VDD
transistors in series and n pMOS
In1 γ
In2
γ γ
Inn
γ
transistors in parallel.
Inn n
Out Each nMOS should be n times wider than
the nMOS in the minimum inverter.
Each pMOS will have the same width as
In2 n that of the minimum inverter – which is γ
In1 n
times the minimum size (to account for
lower hole mobility).
n input NAND
So total capacitive load on any input is
n + γ units.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 36 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Logical Effort for n input NAND

VDD

γ γ γ γ
Each input of the n input NAND gate loads
In1 In2 Inn
its driver with (n + γ) units of capacitance.
Out
Inn n The minimum inverter loads its driver by
(1 + γ) units.
So the logical effort of an n input NAND
In2 n
gate is (n + γ)/(1 + γ).
In1 n
This reduces to 4/3 for a 2 input NAND
n input NAND
with γ = 2, as expected.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 37 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Logical Effort for n input NOR

VDD The n input NOR gate has n pMOS


transistors in series and n nMOS
In1 nγ
transistors in parallel.
In2 nγ Each pMOS should be n times wider than
the pMOS in the minimum inverter, which
itself is γ times the minimum size (to
Inn nγ account for lower hole mobility).
Out
In1 In2 Inn
Each nMOS will have the same width as
1 1 1 1
that of the minimum inverter.
n input NOR So total capacitive load on any input is
1 + nγ units.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 38 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Logical Effort for n input NOR

VDD
Each input of the n input NOR gate loads
In1 nγ
its driver with (1 + nγ) units of
In2 nγ capacitance.
The minimum inverter loads its driver by
(1 + γ)units.
Inn nγ
Out
So the logical effort of an n input NOR
In1 In2 Inn
gate is (1 + nγ)/(1 + γ).
1 1 1 1
This reduces to 5/3 for a 2 input NOR with
n input NOR γ = 2, as expected.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 39 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Logical Effort for a multiplexer

VDD
Sel Sel
2γ 2γ Each input of the multiplexer is loaded
In1 In2 with a capacitance ∝ (2 + 2 γ).
2γ 2γ
The minimum inverter loads its driver by
Out (1 + γ)units.
2 2
So the logical effort for the multiplexer is
Sel Sel
2 2 (2 + 2 γ)/(1 + γ) = 2.

It is interesting to see that the logical effort will remain 2 for every data
input even when we parallel n tri-stateable inverters to form an n input
mux.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 40 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Parasitic Delay

Given the series/parallel connections of any logic gate and the rules
associated with these connections, we can compute the logical effort
of the gate.

How do we estimate the parasitic delay?

The parasitic delay of an inverter can be computed through


simulation of measurement of delay of an inverter loaded by
varying number of inverters. The slope of the delay versus load
gives τ , while the intercept will give pinv .
We can estimate the parasitic delay of a logic gate relative to that
of an inverter by looking at the ratio of the diffusion area directly
connected to the output node in the gate, as compared to that in
an inverter.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 41 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Parasitic Delay

We can approximate the parasitic delay by considering only the


self capacitance which is directly connected to the output.
In case of an inverter, the drain of the n channel transistor (of
width 1) and the drain of the p channel transistor (of width γ) are
directly connected to the output node.
Thus the parasitic capacitance for an inverter is ∝ (1 + γ).
For any other gate, we find the width of transistors which are
directly connected to the output node, and find its ratio with
(1 + γ).
This permits us to express the parasitic delay of any gate as a
multiple of the parasitic delay of the inverter.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 42 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Parasitic Delay of n input NAND

VDD

γ γ γ γ In case of an n input NAND gate, only one


In1 In2 Inn
of the series connected n channel
Out
Inn n transistors (with width = n) is connected to
the output node.
All p channel transistors (each with width
In2 n
γ) are connected to the output node.
In1 n The total parasitic capacitance at the
n input NAND
output node is ∝ (n + n γ)

Hence the parasitic delay of an n input NAND gate is


(n + n γ)/(1 + γ) = n times the parasitic delay of an inverter.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 43 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Parasitic Delay of n input NOR

VDD

In1 nγ In case of an n input NOR gate, all


In2
n-channel transistors (with width = 1) are

connected to the output node.
Only one of the series connected p
Inn nγ
channel transistors (with width n γ) is
Out connected to the output node.
In1 In2 Inn
1 1 1 1 The total parasitic capacitance at the
output node is ∝ (n + n γ)
n input NOR

Hence the parasitic delay of an n input NOR gate is also


(n + n γ)/(1 + γ) = n times the parasitic delay of an inverter.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 44 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Parasitic Delay of n input Multiplexer

There are n tri-stateable inverters connected in parallel in an n input


multiplexer.
VDD Each tri-stateable inverter connects an
Sel0

Sel1

Seln

n-channel transistor of width 2 and a
In0 In1 Inn p-channel transistor of width 2γ to the
2γ 2γ 2γ
Out
output.
Total parasitic capacitance at the output
2 2 2
node is then 2n(1 + γ).
Sel0 Sel1 Seln
2 2 2
Parasitic delay of the n input mux is
therefore 2n times the parasitic delay of an
inverter.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 45 / 111


Method of Logical Effort Logical Effort for Common CMOS Gates

Logical effort and Parasitic Delay of common gates

Gate Logical Effort Parasitic Delay


Inverter 1 pinv
2 input NAND (2 + γ)/(1 + γ) 2 pinv
n input NAND (n + γ)/(1 + γ) n pinv
2 input NOR (1 + 2γ)/(1 + γ) 2 pinv
n input NOR (1 + nγ)/(1 + γ) n pinv
2 way mux 2 4 pinv
n way mux 2 2n pinv

The logical effort for an n input mux is independent of n. However, a


more complex multiplexer will have a large parasitic delay.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 46 / 111


Design of multi-stage logic

Design of Multi-stage Logic


In multi-stage logic, we consider the logical effort gi and electrical
effort hi of each stage.
We define the path logical effort as the product of logical effort of
all stages on a given path.
N
Y
G≡ gi
i=1
.
Similarly, we define the path electrical effort as the product of
electrical effort of all stages on a given path.
N
Y Couti
H≡ hi , where hi = for stage i.
Cini
i=1

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 47 / 111


Design of multi-stage logic

Design of Multi-stage Logic

If there is no branching, the load on a particular stage is just the


input capacitance of the next stage, that is: Couti = Cini+1 . When
we multiply all electrical efforts along a path, all except the first
and last capacitances cancel.
Thus we get
CL
H=
Cin1
Where CL is the final load capacitance and Cin1 is the input
capacitance of the first stage.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 48 / 111


Design of multi-stage logic

Effect of Branching

If, however, there is branching at some stage i, Couti 6= Cini+1 .


Effect of Branching Now Couti includes not only Cini+1 but
other gates also the input capacitance of other logic
gates which are not on the path under
Path under consideration.
Stage i Stage i+1 consideration

Ctotal ≡ Conpath + Coffpath


other gates
Where Conpath = Cini+1
We need to introduce a correction factor for this. We define a
branching effort as
Conpath + Coffpath
b≡
Conpath

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 49 / 111


Design of multi-stage logic Branching Effort

Branching Effort
The branching effort is defined as

Conpath + Coffpath Conpath


b≡ while h≡
Conpath Cin

The Conpath in electrical effort h considers only the on path loading.


Because of this, H retains its cancellation property:

Cin2 Cin3 C C
··· L = L
Y
H= hi =
Cin1 Cin2 Cinn Cin1
i

The branching effort b provides the correction for the actual loading
seen by a logic stage.

Ctotali+1 Conpathi+1 Ctotali+1


bi hi = =
Conpathi+1 Cini Cini

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 50 / 111


Design of multi-stage logic Branching Effort

Path Branching Effort

If there is no branching at a stage, the corresponding bi value is 1


since Coffpath = 0. So, multiplication by bi does not change any
thing.
If, however, there is branching at the i’th stage, multiplying hi by bi
corrects the output capacitance, because

Ctotali+1 Conpathi+1 Ctotali+1


bi hi = =
Conpathi+1 Cini Cini

We define the branching effort of the whole path, denoted by B, as


the product of the branching effort at each stage along the path.
N
Y
B= bi
i=1

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 51 / 111


Design of multi-stage logic Path Effort

Path Effort
We can now define the path effort, F as the product of all logical efforts
and branch corrected electrical efforts.
N
Y N
Y N
Y N
Y N
Y
F = gi bi hi = gi bi hi
i=1 i=1 i=1 i=1 i=1

So,
N
Y N
Y N
Y
F = GBH where G = gi , B = bi , and H = hi
i=1 i=1 i=1

The advantage of using branching correction separately from electrical


effort is that H retains its cancellation property for capacitance of
intermediate stages and is defined only by the final load and the input
capacitance. (H = CL /Cin1 ).
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 52 / 111
Design of multi-stage logic Path Effort

Path Effort

The equation that defines the path effort looks quite similar to the
definition of the stage effort.
Notice, however, that unlike the stage effort f , the path effort F
does not define the delay of the path.
The total delay is the sum of individual delays and not their
product. X X X
D= di = gi bi hi + pi
Still, F is a useful quantity for optimisation of path delays as we
shall see later.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 53 / 111


Design of multi-stage logic Path Delay

Path Delay

The path delay, D, is the sum of the delays of each of the N


stages of logic in the path.
As in the expression for delay in a single stage we separate the
contribution to path delay from effort and the delay due to parasitic
capacitances. X
D= di = DF + P
where DF is the path effort delay, while P is the path parasitic
delay.
P
The path effort delay isPsimply DF = gi bi hi and the path
parasitic delay is P = pi .

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 54 / 111


Design of multi-stage logic Path Delay

Minimizing Path Delay

The total path delay is given by


X X X X Ci+1 X
D= di = gi bi hi + pi = gi bi + pi
Ci

We need to adjust the size of each stage (and hence Ci ) such that
D is minimum.
Therefore we should set the partial derivative of the expression for
D with respect to each of Ci to 0.
All pi are size independent, and therefore give 0 on differentiating
with respect to Ci .
Only two terms in the first sum involve Ci . These are
gi−1 bi−1 Ci /Ci−1 + gi bi Ci+1 /Ci.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 55 / 111


Design of multi-stage logic Path Delay

Minimizing Path Delay

In the expression for path delay, only two terms depend on Ci .


These are gi−1 bi−1 Ci /Ci−1 + gi bi Ci+1 /Ci.
All other terms will give 0 on differentiation with respect to Ci .

∂D g b gbC
= 0 = i−1 i−1 − i i 2i+1
∂Ci Ci−1 Ci

Ci C
This leads to gi−1 bi−1 = gi bi i+1
Ci−1 Ci
So gi−1 bi−1 hi−1 = gi bi hi for all i

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 56 / 111


Design of multi-stage logic Path Delay

Minimizing Path Delay

The path delay is minimized when each stage in the path has the
same stage effort, f = gbh.
Since the Path Effort F is the product of all stage efforts and the
stage effort has to be equal for all stages for minimum delay, we
must have f̂ = F 1/N . (A hat over a symbol indicates an expression
that achieves minimum delay.)
For this optimum effort, we obtain

D̂ = NF 1/N + P = N(GBH)1/N + P

D̂ is the minimum delay possible for this path.


We achieve this minimum delay by appropriate sizing of
transistors in each stage of the logic path, so that all stages have
the same effort.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 57 / 111


Design of multi-stage logic Path Delay

Minimizing Path Delay

Minimization of path delay requires that each logic stage be


designed so that the stage effort f ≡ gi bi hi is the same for all
stages.
This gives the optimum value of stage effort as
f̂ = F 1/N = (GBH)1/N .
Since F = GBH is known, we can compute f̂ . (While all hi are not
known, H is known to be CL /Cin1 because of the cancellation
property).
Since f̂ can be calculated, we can compute h for all stages.
We start with the last stage, where the output capacitance is
known (=CL ). Knowing h, we can compute Cin
– and hence the scale factor for this stage.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 58 / 111


Design of multi-stage logic Path Delay

Sizing for Minimum Path Delay

f̂ (GBH)1/N
h= =
gb gb
Cini+1 Cini+1 (GBH)1/N
Since hi ≡ , =
Cini Cini gb
This gives
gi bi
Cini = Cini+1
(GBH)1/N
We can use this recursive relation for computing the scale, and hence
transistor sizes for all stages, starting with the last one.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 59 / 111


Design of multi-stage logic Path Delay

Sizing for Minimum Path Delay

We have the recursive relation


gi bi
Cini = Cini+1
(GBH)1/N

Knowing CL , we can compute the input capacitance of the last


stage.
From the input capacitance of the last stage, we can compute the
input capacitance of stage preceding it using the recursive
relation.
in this way, Cin for every stage can be determined.
From Cin , we can calculate the scale factor, and hence the
geometry of transistors for each stage.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 60 / 111


Design of multi-stage logic Example: An 8-input AND network

Example: An 8-input AND network

When a large number of inputs must be combined, there are several


options for the structure of the circuit.
The figure below shows three configurations for computing the AND
function of eight inputs. Which one is best?

g = 10/3 g=1 g=2 g = 5/3 g = 4/3 g = 5/3 g = 4/3 g=1


p = 8 pinv p = pinv p = 4 pinv p = 2 pinv p = 2 pinv p = 2 pinv p = 2 pinv p = pinv

(a) (b) (c)

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 61 / 111


Design of multi-stage logic Example: An 8-input AND network

Example: An 8-input AND network

The path logical effort, G, is the product of the logical efforts of the
logic gates along the path. In the following example, we assume γ = 2
and pinv = 0.6.
G = 10/3 × 1 = 3.33 for configuration a,
G = 6/3 × 5/3 = 3.33 for case b, and
G = 4/3 × 5/3 × 4/3 × 1 = 2.96 for configuration c.
Since there is no branching, B = 1.
We estimate the parasitic delay of n input NANDs and NORs to be
n pinv .

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 62 / 111


Design of multi-stage logic Example: An 8-input AND network

Example: An 8-input AND network

We can write the total delay for the three configurations as:

Configuration a: D = 2(3.33H)1/2 + 5.4


Configuration b: D = 2(3.33H)1/2 + 3.6
Configuration c: D = 4(2.96H)1/4 + 4.2

It is clear from these equations that case b will always be better than a.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 63 / 111


Design of multi-stage logic Example: An 8-input AND network

Example: An 8-input AND network

Configuration a: D = 2(3.33H)1/2 + 5.4


Configuration b: D = 2(3.33H)1/2 + 3.6
Configuration c: D = 4(2.96H)1/4 + 4.2

Choice between cases b and c depends on the electrical effort, H.


When H = 1, configuration b has a delay of 7.25, while that for
configuration c is 9.45. So in this case, b will be best.
For H = 12, the delays for configurations b and c are 16.25 and
13.97 respectively, so configuration c will be best.
The equations show that for high electrical effort, case c yields least
delay because the H 1/4 factor dominates. (In the current example,
configuration c is best for H > 5.68).
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 64 / 111
Design of multi-stage logic Example: An 8-input AND network

Example: An 8-input AND network

Let us take the example of 8-input AND circuit to see how all
geometries can be calculated using logical effort.
We take the input capacitance of a minimum inverter as the unit of
capacitance.
The unit of time is τ , the delay of a reference inverter driving
another identical reference inverter excluding its parasitic delay.
The unit of transistor width will be the width of the n transistor in
the reference inverter.
We shall take γ = 2 and pinv = 0.6 in this example.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 65 / 111


Design of multi-stage logic Example: An 8-input AND network

Example: 8-input AND geometry computation

We are given that the load to be driven is equivalent to 64


reference inverters. (Cout = 64).
The input source is designed to drive up to 4 reference inverters.
(Cin = 4).
H = 64/4 = 16. As discussed before configuration c, which is a 4
stage design, is best for this value of H

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 66 / 111


Design of multi-stage logic Example: An 8-input AND network

Example: 8-input AND geometry computation

We can take the logic path from any of the


inputs to the final output.
On this path, we shall encounter:
1 a 2 input NAND (g = 4/3, b = 1),
2 a 2 input NOR (g = 5/3, b = 1),
3 a 2 input NAND (g = 4/3, b = 1), and
g = 4/3 g = 5/3 g = 4/3 g=1
p = 2 pinv p = 2 pinv p = 2 pinv p = pinv 4 an inverter (g = 1, b = 1)

4 5 4 64
G= × × × 1 = 80/27 = 2.963 B = 1, H= = 16
3 3 3 4

F = GBH = 47.4074, So f̂ = 47.40741/4 = 2.624

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 67 / 111


Design of multi-stage logic Example: An 8-input AND network

Relating Cin to Transistor Geometry

Unit of capacitance is the input capacitance of the reference


inverter.
Unit of width is the width of the n channel transistor in the
reference inverter.
So 1 + γ units of width correspond to 1 unit of capacitance.
The input capacitance of a reference logic gate, obtained from the
reference inverter by series parallel rules, is g.
Thus, (Cin = g) ⇒ scale factor = 1 over the reference logic gate.
For any given Cin , scale factor = Cin /g.
If we know Cin , we can find the scale factor by dividing it by g, and then
multiply all transistor widths in the reference logic gate by this scale
factor to get individual transistor geometries.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 68 / 111


Design of multi-stage logic Example: An 8-input AND network

Last stage: inverter

We begin from the load end in this example. The last stage is an
inverter, with g = 1, b = 1.

f̂ = 2.624 = gbh = 1 × 1 × h. So h = 2.624

Cout 64
h = 2.624 = =
Cin Cin
64
g = 4/3 g = 5/3 g = 4/3 g=1 So Cin = = 24.39
p = 2 pinv p = 2 pinv p = 2 pinv p = pinv 2.624
So the final stage inverter is scaled up by 24.39 compared to the
reference inverter. Taking the n channel transistor width in the
reference inverter as the unit,
n-channel transistor width = 24.39
p-channel transistor width = γ × 24.39 = 48.78.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 69 / 111


Design of multi-stage logic Example: An 8-input AND network

3rd stage: 2 input NAND


The stage driving the inverter is a 2 input NAND.
So g = 4/3, b = 1, Cout = 24.39.
4
f̂ = 2.624 = gbh = ×1×h
3
2.624 × 3
So h = = 1.968
4
Cout 24.39
h = 1.968 = =
Cin Cin
g = 4/3 g = 5/3 g = 4/3 g=1
24.39
p = 2 pinv p = 2 pinv p = 2 pinv p = pinv
= 12.3936
Therefore Cin =
1.968
Scale factor for this stage is 12.3936/g = (12.3936 × 3)/4 = 9.2952.
The reference 2 input NAND gate has n channel transistor width = 2,
and p channel transistor width = 2.
Therefore the transistor geometries in this stage will be
2 × 9.2952 = 18.59 for both n and p channel transistors.
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 70 / 111
Design of multi-stage logic Example: An 8-input AND network

2nd stage: 2 input NOR


The stage driving the 2 input NAND is a 2 input NOR.
So g = 5/3, b = 1, Cout = 12.3936.

5
f̂ = 2.624 = gbh = ×1×h. So h = 1.5744
3
Cout 12.3936
h = 1.5744 = =
Cin Cin
12.3936
g = 4/3 g = 5/3 g = 4/3 g=1
= 7.872
Therefore Cin =
p = 2 pinv p = 2 pinv p = 2 pinv p = pinv
1.5744
Scale factor for this stage is 7.872/g = (7.872 × 3)/5 = 4.7232.
The reference 2 input NOR gate has n channel transistor width = 1,
and p channel transistor width = 4.
n-channel transistor width = 4.72,
p channel transistor width = 4 × 4.7232 = 18.89
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 71 / 111
Design of multi-stage logic Example: An 8-input AND network

First stage: 2 input NAND


Finally, we come to the first stage which is a 2 input NAND.
So g = 4/3, b = 1, Cout = 7.872.

4
f̂ = 2.624 = gbh = ×1×h. So h = 1.9680
3
Cout 7.872
h = 1.968 = =
Cin Cin
7.872
g = 4/3 g = 5/3 g = 4/3 g=1
=4
Therefore Cin =
p = 2 pinv p = 2 pinv p = 2 pinv p = pinv
1.968
This agrees with our specification that the input capacitance of the first
stage should be equivalent to 4 inverters.
The scale factor will be 4/g = 4/(4/3) = 3. In the reference NAND, all
transistors have a width = 2. so in the first stage, all transistors will
have a width = 3 × 2 = 6.
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 72 / 111
Design of multi-stage logic Example: An 8-input AND network

Design of 8 input AND: Summary of results

The following table gives the geometries of transistors in all stages:

Stage I II III IV
Logic type 2in NAND 2in NOR 2in NAND Inverter
g 4/3 5/3 4/3 1
Cin 4 7.87 12.39 24.39
Scale Factor 3 4.7232 9.2952 24.39
n width 6 4.72 18.59 24.39
p width 6 18.89 18.59 48.78
Parasitic Delay 1.2 1.2 1.2 0.6

The total delay of the 4 stage implementation is:


X
4f̂ + Pi = 4 × 2.624 + 4.2 = 10.496 + 4.2 = 14.7

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 73 / 111


Optimizing the path length

Optimizing the path length

The delay optimization procedure discussed earlier assumes that


the number of stages N is known.
However, this may not be the optimum path length and we can
sometimes get a faster circuit by buffering intermediate outputs in
this logic path. (For example, the delay of the tapered buffer had
an optimum number of inverters).
How do we determine the optimum number of stages in the
general case?

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 74 / 111


Optimizing the path length

Optimizing the path length

We assume that there is a logic path containing n1 stages and we


are free to add n2 inverters to this path, if that results in a lower
overall delay.
We now consider the optimization of this logic path containing
N = n1 + n2 stages.
We shall assume that there is no requirement for n2 to be even.
(This implies that an inverted output is equally acceptable or else,
the logic path and inputs can be suitably altered to produce the
desired output).
The optimization problem is to find the scale factors for each of the
N = n1 + n2 stages, such that the delay is minimum.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 75 / 111


Optimizing the path length

Optimizing the path length

The path effort F = GBH for the n1 stages of logic is known.


This is because the logical effort of each of the logic gates is
known to us, so that G may be evaluated.
Branching, if any, in the logic chain is also known, so B can be
evaluated.
finally, H depends only on the final load capacitance and input
capacitance, which is the starting specification for the
optimization.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 76 / 111


Optimizing the path length

Optimizing the path length

Addition of n2 inverters does not change the value of G, since


g = 1 for each of the n2 inverters.
Similarly, the inverters do not introduce any additional branching,
so B remains the same.
Finally, H is defined by the final load and the input capacitance of
the first logic element, which is not changed by the inserted
inverters.
Therefore F = GBH for the N = n1 + n2 stages (including n2
inverters) is the same as the F for n1 logic stages.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 77 / 111


Optimizing the path length

Optimizing the path length

Additional inverters in the logic chain permit sharing the effort over
a larger number of stages, which can reduce the total delay.
We first find the optimum value of N.
If N > n1 , we shall have the opportunity of reducing the delay by
adding inverters.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 78 / 111


Optimizing the path length

Optimizing the path length

The total delay in the N stages is the sum of delays of n1 logic


stages and n2 inverters.
For optimum delay, the stage effort should be equal for all the N
stages.
Therefore, the stage effort of logic stages as well as the inverters
is the same and is = F 1/N .
So the total delay is:
n1
X
1/N
D̂ = NF + pi + (N − n1 )pinv
i=1

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 79 / 111


Optimizing the path length

Optimizing the path length

n1
X
D̂ = NF 1/N + pi + (N − n1 )pinv
i=1

The first term in the equation above is the effort delay of N stages.
The sum of parasitic delays of n1 logic gates gives us the second
term.
Finally, the parasitic delay of n2 = N − n1 inverters gives the third
term.
We define the optimum stage effort ρ ≡ F 1/N . Then
n1
X
D̂ = N(ρ + pinv ) + pi − n1 pinv
i=1

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 80 / 111


Optimizing the path length

Optimizing the path length

n1
X
D̂ = N(ρ + pinv ) + pi − n1 pinv
i=1

Since ρ ≡ F 1/N , N = ln F / ln ρ. Therefore,


1 n
ln F X
D̂ = (ρ + pinv ) + pi − n1 pinv
ln ρ
i=1

Notice that F , n1 , pi and pinv are given constants.


We can optimize the total delay with respect to ρ by setting the
derivative of D̂ with respect to ρ to zero.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 81 / 111


Optimizing the path length

Optimizing the path length

1 n
ln F X
D̂ = (ρ + pinv ) + pi − n1 pinv
ln ρ
i=1

Differentiating by parts, we get

∂ D̂ ln F 1 ln F
=0=− 2
· · (ρ + pinv ) + (1)
∂ρ (ln ρ) ρ ln ρ

Therefore,
ln F ln F 1
= 2
· · (ρ + pinv )
ln ρ (ln ρ) ρ
and so,
1
1= (ρ + pinv )
ρ ln ρ

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 82 / 111


Optimizing the path length

Optimizing the path length

1
1= (ρ + pinv )
ρ ln ρ
This gives
ρ + pinv = ρ ln ρ
Which can be written as

pinv + ρ(1 − ln ρ) = 0

This condition is independent of F and the value of ρ is uniquely


defined by pinv .

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 83 / 111


Optimizing the path length

Solving for Stage Ratio ρ

pinv + ρ(1 − ln ρ) = 0
This equation cannot be solved in closed form and either iterative
solutions or graphical solutions have to be used to determine ρ from
pinv .
In the special case when pinv = 0, we have

ρ(1 − ln ρ) = 0 so ln ρ = 1 which gives ρ = e

This corresponds to the case of tapered inverter that we had solved


before.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 84 / 111


Optimizing the path length

Solving for Stage Ratio ρ

pinv + ρ(1 − ln ρ) = 0
For non-zero values of pinv , this equation can be solved iteratively
using Newton Raphson technique.
We define
f (ρ) = ρ(1 − ln ρ) + pinv = 0
 
′ 1
Then f ρ = (1 − ln ρ) + ρ − = − ln ρ
ρ
Let us illustrate the iterative method by taking pinv = 1.
We know that for pinv = 0, the value of ρ is e. A guess value to start
iterations can be ρ = 3.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 85 / 111


Optimizing the path length

Solving for Stage Ratio ρ

f (ρ) = ρ(1 − ln ρ) + pinv = 0 and f ′ ρ = − ln ρ


We illustrate iterative solution of this equation taking pinv = 1 and the
initial guess for ρ = 3.
Each successive guess for ρ can be calculated as

f (ρ) ρ(1 − ln ρ) + pinv


ρnext = ρ − =ρ+
f (ρ)
′ ln ρ
ρ + pinv
So ρnext =
ln ρ

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 86 / 111


Optimizing the path length

Solving for Stage Ratio ρ

ρ ρnext = (ρ + pinv )/ ln ρ
3.0000 3.6410
3.6410 3.5914
3.5914 3.5911
3.5911 3.5911

The value of ρ converges to four decimal digits within four iterations.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 87 / 111


Optimizing the path length

Solving for Stage Ratio ρ

ρ(1 − ln ρ) + pinv = 0 So ρ ln ρ = ρ + pinv

7
We can also solve the above 6.5 ρ ln ρ
equation graphically by plotting 6

ρ + pinv and ρ ln ρ as functions of ρ. 5.55


The value of ρ at which these two 4.5

intersect is the solution of the 4


p=2
3.5 p=1.5
equation. 3 p=0.6
p=0.8
p=0.4 p=1.0 ρ+p
The figure on the right shows the 2.5
p=0
p=0.2
2
solution for different values of pinv . 2.5 3
ρ
3.5 4 4.5

For a given CMOS process, pinv is fixed. Therefore, the value of ρ


needs to be computed only once for a given process.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 88 / 111


Optimizing the path length

Stage delay

The table below gives the values of ρ and the corresponding stage
delay for several values of parasitic delay p.

p ρ ln ρ d =ρ+p
0 2.718 (e) 1.000 2.718
0.2 2.912 1.069 3.11
0.4 3.093 1.129 3.49
0.6 3.266 1.184 3.87
0.8 3.432 1.233 4.23
1.0 3.591 1.278 4.59
1.5 3.967 1.378 5.47
2.0 4.319 1.463 6.32

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 89 / 111


Optimizing the path length

Finding the optimum number of stages

Once ρ is known, we can evaluate the optimum number of logic


stages for a given path effort as N = ln F / ln ρ.
This value will be fractional in general and needs to be zapped to
the nearest integer.
If N > n1 (where n1 is the number of logic stages required for
implementing the desired logic function), we can insert N − n1
inverter stages to optimize the overall delay.
After this the stage effort can be re-calculated as f = F ( 1/N).
Once f and N are known, we can proceed as before to compute
the sizes of each stage.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 90 / 111


Optimizing the path length

Finding the optimum number of stages

If we have the freedom of inserting a number of inverters in the logic


path to optimize delay, we follow the following procedure:
From pinv , find the ideal stage effort ρ by solving
pinv + ρ(1 − ln ρ) = 0. This can be done through iterative solutions
or graphically as described earlier.
Once ρ is known, find the number of stages as ln F / ln ρ. The
nearest integer to the value so found is the optimum number of
stages N.
If N > n1 , insert (N − n1 ) inverters anywhere in the logic path. If
N ≤ n1 , take N = n1 .
The stage effort is now adjusted to f = F 1/N .

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 91 / 111


Optimizing the path length

Given the value of f , we can start from the last stage and work
backwards as earlier to calculate all transistor geometries.
For the last stage the output capacitance is known (=CL ). The
input capacitance can be calculated from
gN
CinN = C
f L
This gives the scale factor for this stage from which, geometries of
transistors in the last stage can be computed.
For each preceding stage, we use the recursive relation

gi bi
Cini = Cini+1
f
From Cini , we can calculate the scale factor, and hence the
geometry of all transistors for this stage.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 92 / 111


Optimizing the path length

Sensitivity of Delay to the Number of Stages

We would like to know how much the path delay changes if the number
of stages deviates from the optimum.

N/N̂ D/D̂ N/N̂ D/D̂


0.25 7.42 1.4 1.06
0.5 1.46 2.0 1.24
0.7 1.09 3.0 1.62
1/0 1/00 4.0 2.01
This table assumes pinv = 0.6.

We can see that delay is quite insensitive to the number of stages,


provided the deviation from optimum is not too large. Using 40%
higher N increases the delay by just 6%.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 93 / 111


Optimizing the path length

Sensitivity of Delay to the Number of Stages

As we have seen, using 40% higher N increases the delay by just


6%.
In fact, doubling the number of stages from optimum increases the
delay only 24%.
Using half as many stages as the optimum increases the delay by
46%.
Delay penalty is smaller for higher than necessary N compared to
lower N for the same deviation from the optimum.
A stage or two more or less in a design with many stages will
make little difference, provided proper transistor sizes are used.
Only when very few stages are required does a change of one or
two stages make a large difference.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 94 / 111


Fork Design

What is a fork?
We often need a signal and its complement simultaneously. When
the signal changes its value, the complement should also change
at the same time.
Otherwise, there will be a short interval during which both the
signal and its complement are TRUE or FALSE simultaneously.
This can lead to malfunction.
The trivial solution of using an inverter to generate the
complement will not meet this requirement – since in this case,
the complement will switch an inverter delay later.
We can meet the requirement of a signal and its complement
changing almost simultaneously by using a “fork” – which is a
circuit with a common input feeding two branches with different
number of inversions.
The delay of the two branches must be equalized as closely as
possible.
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 95 / 111
Fork Design

How is a fork designed?

Typically, the two branches of a fork will have n and (n+1) inverters.
4r A fork is named with the number of
256 inverters in its branches.
4
For example the circuit on the left is
a 3-4 fork.
4 (1-r) 512
Specifications for the fork will include
the total capacitive load placed by
both branches at the input node.
The terminal loads at the two outputs need not be the same!
This is because typically, one arm of the fork drives nMOS transistors
while the other drives pMOS transistors – and their sizes may not be
the same.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 96 / 111


Fork Design

Designing a Fork
It is easy to design each individual arm of the fork for minimum
delay using logical effort techniques.
However, how do we ensure that the optimum delays of the two
arms are equal?
To equalize the delays, we use the branching delay to balance the
difference of delays in the two arms.
4r
256
4 Take the 3-4 fork shown on the left as an
example. The specification demands that
4 (1-r)
the two branches together should place
512
a load of 4 on the upstream driver.

If we divide the input capacitances of the first inverters in the two


branches in the ratio 4r and 4(1-r), the total input capacitance will be 4
for all choices of r.
Dinesh Sharma (IIT B) Logical Effort September 29, 2021 97 / 111
Fork Design

Designing a Fork

Notice that for a fork with n and n+1 inverters, the difference of delay is
a smaller fraction of the total delay if n is large.
The choice of n is a trade off between the robustness of delay
matching and power dissipation as well as complexity.
4r By chosing an appropriate value of r
4
256 between 0 and 1, we can adjust the
optimized delays of the two
branches, such that these are equal.
4 (1-r) 512
Design of a fork essentially requires
the evaluation of such a suitable
value for the parameter r.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 98 / 111


Fork Design Design Example

Design Example

Let us go through the design of the 3-4 fork shown earlier.


We want to design a 3-4 fork with the total input capacitance (to
be driven by the up-stream driver) equal to 4 times the unit
inverter input capacitance.
The final load on the branch with 3 inverters is equivalent to 256
minimum inverters, while that on the 4 inverter branch is
equivalent to 512 minimum inverters.
Assume Pinv = 2.0, γ = 2.2.
The input capacitance is divided in the ratio r:(1-r) for the 3 and 4
inverter branches of the fork respectively.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 99 / 111


Fork Design Design Example

Design Example

We want to evaluate the value of r such that the optimum delay in the
two branches is equal.
4r

4
256 All g and b values are 1 in this example.
Since the input capacitance is dependent
4 (1-r) on r, the value of H is not known and
512
depends on r.

We shall use the Newton Raphson technique to evaluate r, starting


with a guess value of r=0.5.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 100 / 111
Fork Design Design Example

The Upper Branch

4r
256 For the upper branch, the input
4
capacitance is 4r , while the output
capacitance is 256.
4 (1-r) 512 Thus H1 = 256/4r = 64/r .
All g and b values are 1.
 1/3
64
F1 = 64/r , and correspondingly, fˆ1 = = 4r −1/3
r
The delay through the upper arm of the fork is

D1 = 3fˆ1 + 3Pinv = 12r −1/3 + 3Pinv

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 101 / 111
Fork Design Design Example

The lower Branch

4r
For the lower branch, the input
256
4 capacitance is 4(1 − r ), while the
output capacitance is 512.
4 (1-r) 512
Thus
H2 = 512/4(1 − r ) = 128/(1 − r ).
All g and b values are 1,
 1/4
128 128
F2 = , and correspondingly, fˆ2 = = 3.3636(1−r )−1/4
1−r 1−r

The delay through the lower arm of the fork is

D2 = 4fˆ2 + 4Pinv = 13.4543(1 − r )−1/4 + 4Pinv

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 102 / 111
Fork Design Design Example

Equalizing Delays

Condition for the two delays to be equal is: D2 − D1 = 0

13.4543(1 − r )−1/4 − 12r −1/3 + Pinv = 0


Defining f (r ) ≡ 13.4543(1 − r )−1/4 − 12r −1/3 + Pinv
We seek the value of r which will make f (r ) = 0.
The derivative of f (r ) may be written as

13.4543 12
f ′ (r ) = − (1−r )−5/4 (−1)+ r −4/3 = 3.3636(1−r )−5/4 +4r −4/3
4 3
We can now solve this non-linear equation using Newton Raphson
iterations.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 103 / 111
Fork Design Design Example

Iterative Solution

Taking the initial guess for r as 0.5, successive values for r can be
tabulated as:
r f(r) f’(r) next r
0.5 2.88095 18.0794 0.34065
0.34065 -0.251373 22.4743 0.351835
0.351835 -0.00333406 21.8878 0.351987
0.351987 -5.78369e-07 21.8803 0.351987
0.351987 -1.42109e-14 21.8803 0.351987

Thus, r = 0.352 will equalize delays.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 104 / 111
Fork Design Design Example

Sizing Inverters in the Upper Branch

For the upper branch,

4 4 4
fˆ1 = 1/3 = 1/3
= = 5.665232
r 0.351987 0.70606
All stages are inverters with g = 1, b = 1.
Since f̂ = gbh = 5.665232, h = 5.665232 for all stages.
The first inverter should have an input capacitance of 4r = 1.408
The next inverter should have an input capacitance of
1.408 × h = 1.408 × 5.665232 = 7.976.
Input capacitance for the final inverter will be
7.976 × h = 7.976 × 5.665232 = 45.188.
The final inverter can drive a load of 45.188 × 5.665232 = 256 as
required.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 105 / 111
Fork Design Design Example

Sizing Inverters in the Upper Branch

Alternatively, we could have started with the output. As before:


f̂ = gbh = 5.665232, g = 1, b = 1, so h = 5.665232 for all stages.
Final Cout = 256.
For the last inverter, Cin = 256/5.665232 = 45.188. This
becomes the output capacitance of the second inverter.
for the second inverter,
Cin = 45.188/h = 45.188/5.665232 = 7.976. This becomes the
output capacitance of the first inverter.
Finally, since the output capacitance of the first inverter is 7.976,
its input capacitance is 7.976/5.665232 = 1.408.
This agrees with the value 4r as required.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 106 / 111
Fork Design Design Example

Sizing Inverters in the Lower Branch

For the lower branch, fˆ2 = 3.3636/(1 − r )1/4 = 3.749.


Again all stages are inverters with g = 1, b = 1. Therefore, for all
stages, h = 3.749.
Input capacitance of the first inverter is 4 × (1 − r ) = 2.592.
Input capacitance of the following three inverters should be
2.592 × 3.749 = 9.717,
9.717 × 3.749 = 36.43 and 36.43 × 3.749 = 136.57.
The final inverter can drive a capacitance of 136.57 × 3.749 = 512
as expected.

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 107 / 111
Fork Design Design Example

Transistor geometries for the upper branch

Input capacitance of 1 corresponds to n channel width of 1 and p


channel width of γ.
Therefore an inverter stage with input capacitance of Cin will have
n channel transistor width of Cin and p channel transistor width of
γ × Cin .

First Inverter Second Inverter Third Inverter


Cin = 1.408 Cin = 7.976 Cin = 45.188
n width p width n width p width n width p width
1.408 3.10 7.976 17.548 45.188 99.413

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 108 / 111
Fork Design Design Example

Transistor geometries for the lower branch

Given the input capacitance values for all inverters,


the n channel widths are equal to the capacitance,
while the p channel widths are γ(= 2.2) times this value.

Gate Cin nMOS width pMOS width


First Inverter 2.592 2.59 5.70
Second Inverter 9.717 9.72 21.38
Third Inverter 36.43 36.43 80.15
Fourth Inverter 136.57 136.57 300.46

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 109 / 111
Fork Design Design Example

Delays of the two branches

Delay for the upper branch = 3fˆ1 + 3Pinv = 3 × 5.665 + 6 = 22.995.


Delay for the lower branch = 4fˆ2 + 4Pinv = 4 × 3.749 + 8 = 22.995.

To see the robustness of the design, Let us assume that the actual
load capacitors in both the branches are higher by 10%.
Without changing inverter sizes, what are the delays with the changed
values and how much is the difference in delays of the two branches?

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 110 / 111
Fork Design Design Example

Robustness of Delay matching


Let the actual load capacitor values in both the branches be higher by
10%. Without changing inverter sizes, let us evaluate the difference in
delays of the two branches.
Since inverter sizes remain the same, all inverters except the final
one see the same load.
Therefore change in the final load capacitor will change the delay
of the last stage only.
The remaining delays will remain the same.

D1 = 2fˆ1 + 1.1 × fˆ1 + 3Pinv = 3.1 × 5.665 + 6 = 17.562 + 6 = 23.562


D2 = 3fˆ2 + 1.1 × fˆ2 + 4Pinv = 4.1 × 3.749 + 8 = 17.562 + 6 = 23.371

The difference in delays is therefore 23.562 − 23.371 = 0.192. (The


upper branch is slower by this amount).

Dinesh Sharma (IIT B) Logical Effort September 29, 2021 111 / 111

You might also like