Lecture Notes Digital CMOS IC
Lecture Notes Digital CMOS IC
MOSFET Overview
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Permissions to Use Conditions & Acknowledgment
• Permission is granted to copy and distribute this slide
set for educational purposes only, provided that the
complete bibliographic citation and following credit line
is included: "Copyright 2002 J. Rabaey et al."
Permission is granted to alter and distribute this
material provided that the following credit line is
included: "Adapted from (complete bibliographic
citation). Copyright 2002 J. Rabaey et al."
This material may not be copied or distributed for
commercial purposes without express written
permission of the copyright holders.
• Slides 13-17 Adapted from CSE477 VLSI Digital Circuits
Lecture Slides by Vijay Narayanan and Mary Jane Irwin,
Penn State University
• Labs
Tuesdays 6 PM – 9 PM 2157/2161 Kemper
Wednesdays 5 PM – 8 PM 2157/2161 Kemper
• Letter
• A: 100 - 90%
• B: 90 - 80%
• C: 80 - 70%
• D: 70 - 60%
• F: below 60%
100 mo
x/ 18
s 8
h ic
ap
Gr
10
/ 18m o
x
CPU 2
1
1H96 2H96 1H97 2H97 1H98 2H98 1H99 2H99 1H00 2H00
P6
Pentium ®
Power (Watts)
10
486
8086 286
386
8085
1 8080
8008
4004
0.1
1971 1974 1978 1985 1992 2000
Year
1000 Nozzle
Nuclear …chips might become hot…
100 Reactor
100
200
90
Temperature (C)
150 80
70
100
60
50
50
0 40
Battery
(40+ lbs)
40% 1.7KW
Standby Power
30% 400W
20%
88W
12W
10%
0%
Source: Borkar, De Intel®
2000
Amirtharajah/Parkhurst, 2002
EEC 118 2004
Spring 2011 2006 2008 20
Emerging Microsensor Applications
Industrial Plants and Power Line Monitoring Operating Room of the Future
(courtesy ABB) (courtesy John Guttag)
One
centimeter
N+ N+ P+ P+
source drain source drain
P-substrate N-substrate
bulk (substrate) bulk (substrate)
NMOS PMOS
Amirtharajah/Parkhurst, EEC 118 Spring 2011 30
MOS Transistor Symbols
NMOS D PMOS D
G B G B
S S
D D
G B G B
S S
D D
G B G B
S S
Amirtharajah/Parkhurst, EEC 118 Spring 2011 31
Note on MOS Transistor Symbols
• All symbols appear in literature
– Symbols with arrows are conventional in analog papers
– PMOS with a bubble on the gate is conventional in digital
circuits papers
• Sometimes bulk terminal is ignored – implicitly
connected to supply:
NMOS PMOS
W
tox
L
xd
Amirtharajah/Parkhurst, EEC 118 Spring 2011 33
MOS Transistor Regions of Operation
• Three main regions of operation
• Cutoff: VGS < VT
No inversion layer formed, drain and source are
isolated by depleted channel. IDS ≈ 0
• Linear (Triode, Ohmic): VGS > VT, VDS < VGS-VT
Inversion layer connects drain and source.
Current is almost linear with VDS (like a resistor)
• Saturation: VGS > VT, VDS ≥ VGS-VT
Channel is “pinched-off”. Current saturates
(becomes independent of VDS, to first order).
• Inverter
– Logic symbol
– CMOS inverter circuit
– CMOS inverter layout (top view of lithographic
masks)
Amirtharajah/Parkhurst, EEC 118 Spring 2011 43
Inverter Fabrication: NWELL and Oxides
• N-wells created
• Thick field oxide grown surrounding active
regions
• Thin gate oxide grown over active regions
• Polysilicon deposited
– Chemical vapor deposition (Places the Poly)
– Dry plasma etch (Removes unwanted Poly)
Uncorrected Corrected
μCox W
Saturation: ID = (VGS − VT ) (1 + λVDS )
2
2 L
• MOS Structure
• MOSFET Scaling
• MOSFET Capacitances
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Outline
• Finish Lecture 1 Slides
• Switch Example
• MOSFET Structure
• MOSFET Regimes of Operation
• Scaling
• Parasitic Capacitances
N+ N+ P+ P+
source drain source drain
P-substrate N-substrate
bulk (substrate) bulk (substrate)
NMOS PMOS
Amirtharajah/Parkhurst, EEC 118 Spring 2011 4
MOS Transistor Symbols
NMOS D PMOS D
G B G B
S S
D D
G B G B
S S
D D
G B G B
S S
Amirtharajah/Parkhurst, EEC 118 Spring 2011 5
Note on MOS Transistor Symbols
• All symbols appear in literature
– Symbols with arrows are conventional in analog papers
– PMOS with a bubble on the gate is conventional in digital
circuits papers
• Sometimes bulk terminal is ignored – implicitly
connected to supply:
NMOS PMOS
W
tox
L
xd
Amirtharajah/Parkhurst, EEC 118 Spring 2011 7
NMOS Transistor I-V Characteristics I
Vg > VT0
Vs = 0 Vd = 0
depletion
source drain region
P-substrate
inversion VB = 0
layer
Amirtharajah/Parkhurst, EEC 118 Spring 2011 11
Threshold Voltage Components
• Four physical components of the threshold voltage
1. Work function difference between gate and channel
(depends on metal or polysilicon gate): ΦGC
2. Gate voltage to invert surface potential: -2ΦF
3. Gate voltage to offset depletion region charge:
QB/Cox
4. Gate voltage to offset fixed charges in the gate oxide
and oxide-channel interface: Qox/Cox
ε ox
Cox = : gate oxide capacitance per unit area
tox
Amirtharajah/Parkhurst, EEC 118 Spring 2011 12
Threshold Voltage Summary
• If VSB = 0 (no substrate bias):
QB 0 Qox
VT 0 = Φ GC − 2φ F − − (K&L 3.20)
Cox Cox
VT = VT 0 + γ ( − 2φ F + VSB − 2φ F ) (3.19)
NMOS PMOS
Substrate Fermi
potential
φF < 0 φF > 0
Depletion charge
density
QB < 0 QB > 0
Substrate bias
coefficient
γ>0 γ<0
If Vx > 0,
A VSB (A) > 0,
Vx VT(A) > VTO
B
VT0
μCox W
Saturation: ID = (VGS − VT ) (1 + λVDS )
2
2 L
depletion
source drain region
substrate
I D = μ nCox
W
L
[
(VGS − VT )VDS − 12 VDS2 ]
W
Device transconductance: k n = μ nCox
L
Process transconductance: k = μ nCox
'
n
ID = k
W
L
'
n [
(VGS − VT )VDS − 2 VDS
1 2
]
Amirtharajah/Parkhurst, EEC 118 Spring 2011 21
Saturation Region
• When VDS = VGS - VT:
– No longer voltage drop of VT from gate to substrate at drain
– Channel is “pinched off”
• If VDS is further increased, no increase in current IDS
– As VDS increased, pinch-off point moves closer to source
– Channel between that point and drain is depleted
– High electric field in depleted region accelerates electrons
towards drain Vg > VT0
Vs=0 Vd > VGS-VT0
depletion
source drain region
pinch-off point VB = 0
Amirtharajah/Parkhurst, EEC 118 Spring 2011 22
Saturation I/V Equation
• As drain voltage increases, channel remains
pinched off
– Channel voltage remains constant
– Current saturates (no increase with increasing VDS)
• To get saturation current, use linear equation with
VDS = VGS - VT
VGS3
Drain current IDS
Linear VGS2
VGS1
Saturation
L = L − ΔL
'
VDS = VGS-VT
VGS3 with channel-
length
Drain current IDS
xd
• Overlap capacitances
– Gate electrode overlaps source and drain regions
– CGB = CoxWLeff
Cg,total
(no overlap,
xd = 0)
A 2qε N d N a
For a P-N junction: Cj =
2 V0 − V N d + N a
qε Si N d N a
If V=0, cap/area = C j0 =
2V0 N d + N a
AC j 0
General form: Cj = m
⎛ V ⎞
⎜⎜1 − ⎟⎟
⎝ V0 ⎠
K eq =
− 2 V0
(V2 − V1 )
(
V0 − V2 − V0 − V1 ) (abrupt junction only)
• Inverter Characteristics
• CMOS Inverters
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Outline
• Review: Inverter Transfer Characteristics
• Lecture 3: Noise Margins, Rise & Fall Times,
Inverter Delay
• CMOS Inverters: Rabaey 1.3.2, 5 (Kang &
Leblebici, 5.1-5.3 and 6.1-6.2)
PMOS NMOS
Vin=4V
Drain current IDS
Vin=3V Vdd
Vout
Vin=2V
Vin=1V
P linear P cutoff
N cutoff N linear
P linear
N sat P sat
N sat
P sat
N linear
• Increase W of PMOS
VDD kp=kn kp increases
VTC moves to right
• Increase W of NMOS
Vout kp=5kn
kn increases
VTC moves to left
2
[
(VGS ,n − VT 0,n ) = 2(VGS , p − VT 0, p )VDS , p − VDS , p 2
kp
]
kn
2
2
2
[
(Vin − VT 0,n ) = 2(Vin − VDD − VT 0, p )(Vout − VDD ) − (Vout − VDD )2
kp
]
• Differentiate and set dVout/dVin to –1
⎡ dVout ⎤
k n (Vin − VT 0,n ) = k p ⎢(Vin − VDD − VT 0, p ) + (Vout − VDD ) − (Vout − VDD )
dVout
⎥
⎣ dV in dV in ⎦
kn
2
[
2(Vin − VT 0,n )Vout − Vout =
2 kp
2
] (
Vin − VDD − VT 0, p
2
)
• Differentiate and set dVout/dVin to –1
⎡ dVout ⎤
k n ⎢(Vin − VT 0,n ) ⎥ = k p (Vin − VDD − VT 0, p )
dVout
+ Vout − Vout
⎣ dVin dVin ⎦
( )
kn 2Vout −VIH +VT 0, p = k p VIH −VDD −VT 0, p ( )
VDD + VT 0, p + k R (2Vout + VT 0,n ) kn
VIH = kR =
1 + kR kp
• Solve simultaneously with KCL to find VIH
Amirtharajah, EEC 116 Fall 2011 17
CMOS Inverter: VM Calculation
• KCL (NMOS & PMOS saturated):
2 2
• Solve for VM = Vin = Vout
VT 0,n +
1
(VDD + VT 0, p )
kR kn
VM = kR =
1 kp
1+
kR
Amirtharajah, EEC 116 Fall 2011 18
CMOS Inverter: Achieving Ideal VM
VT 0,n +
1
kR
(VDD + VT 0, p )
kn
VTH = kR =
1 kp
1+
kR
2
⎛ VDD 2 + VT 0, p ⎞
• Ideally, VM = VDD/2 k R ,ideal = ⎜⎜ ⎟
⎟
⎝ V DD 2 + VT 0 , n ⎠
VIL = (3VDD + 2 VT 0 )
1
8
VIH = (5VDD − 2 VT 0 )
1
8
VIL + VIH = VDD
xd
• Overlap capacitances
– Gate electrode overlaps source and drain regions
– CGB = CoxWLeff
Cg,total
(no overlap,
xd = 0)
A 2qε N d N a
For a P-N junction: Cj =
2 V0 − V N d + N a
qε Si N d N a
If V=0, cap/area = C j0 =
2V0 N d + N a
AC j 0
General form: Cj = m
⎛ V ⎞
⎜⎜1 − ⎟⎟
⎝ V0 ⎠
K eq =
− 2 V0
(V2 − V1 )
(
V0 − V2 − V0 − V1 ) (abrupt junction only)
• CMOS Inverters
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Announcements
• Lab 2 this week, report due next week
• Lab 1 reports due this week at lab section
• HW 2 due this Friday at 4 PM in box, Kemper
2131
PMOS NMOS
Vin=4V
Drain current IDS
Vin=3V Vdd
Vout
Vin=2V
Vin=1V
P linear P cutoff
N cutoff N linear
P linear
N sat P sat
N sat
P sat
N linear
• Increase W of PMOS
VDD kp=kn kp increases
VTC moves to right
• Increase W of NMOS
Vout kp=5kn
kn increases
VTC moves to left
2
[
(VGS ,n − VT 0,n ) = 2(VGS , p − VT 0, p )VDS , p − VDS , p 2
kp
]
kn
2
2
2
[
(Vin − VT 0,n ) = 2(Vin − VDD − VT 0, p )(Vout − VDD ) − (Vout − VDD )2
kp
]
• Differentiate and set dVout/dVin to –1
⎡ dVout ⎤
k n (Vin − VT 0,n ) = k p ⎢(Vin − VDD − VT 0, p ) + (Vout − VDD ) − (Vout − VDD )
dVout
⎥
⎣ dV in dV in ⎦
kn
2
[
2(Vin − VT 0,n )Vout − Vout =
2 kp
2
] (
Vin − VDD − VT 0, p
2
)
• Differentiate and set dVout/dVin to –1
⎡ dVout ⎤
k n ⎢(Vin − VT 0,n ) ⎥ = k p (Vin − VDD − VT 0, p )
dVout
+ Vout − Vout
⎣ dVin dVin ⎦
( )
kn 2Vout −VIH +VT 0, p = k p VIH −VDD −VT 0, p ( )
VDD + VT 0, p + k R (2Vout + VT 0,n ) kn
VIH = kR =
1 + kR kp
• Solve simultaneously with KCL to find VIH
Amirtharajah/Parkhurst, EEC 118 Spring 2011 17
CMOS Inverter: VM Calculation
• KCL (NMOS & PMOS saturated):
2 2
• Solve for VM = Vin = Vout
VT 0,n +
1
(VDD + VT 0, p )
kR kn
VM = kR =
1 kp
1+
kR
Amirtharajah/Parkhurst, EEC 118 Spring 2011 18
CMOS Inverter: Achieving Ideal VM
VT 0,n +
1
kR
(VDD + VT 0, p )
kn
VTH = kR =
1 kp
1+
kR
2
⎛ VDD 2 + VT 0, p ⎞
• Ideally, VM = VDD/2 k R ,ideal = ⎜⎜ ⎟
⎟
⎝ V DD 2 + VT 0 , n ⎠
VIL = (3VDD + 2 VT 0 )
1
8
VIH = (5VDD − 2 VT 0 )
1
8
VIL + VIH = VDD
• CMOS Inverters
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Acknowledgments
• Slides due to Rajit Manohar from ECE 547
Advanced VLSI Design at Cornell University
P linear P cutoff
N cutoff N linear
P linear
N sat P sat
N sat
P sat
N linear
Cgd,p Cdb,p
Capacitance on
Vin f node f (output):
Cgd,n Cdb,n Cint
Cg
• Junction cap
Cdb,p and Cdb,n
• Gate capacitance
Cgd,p and Cgd,n
Cgs,n Csb,n • Interconnect cap
Gnd • Receiver gate cap
Amirtharajah/Parkhurst, EEC 118 Spring 2011 7
CMOS Inverter Junction Capacitances
• Junction capacitances Cdb,p and Cdb,n:
– Equation for junction cap:
m
AC j 0 ⎛ εq N a N d 1 ⎞
C j (V ) = , C j0 = ⎜⎜ ⎟⎟
⎝ 2 N a + N d φ0 ⎠
m
⎛ V⎞
⎜⎜ 1 − ⎟⎟
⎝ φ0 ⎠
– Non-linear, depends on voltage across junction
– Use Keq factor to get equivalent capacitance for a
voltage transition
Cdb = AK eq C j + PK eqswC jsw
C gd , p = C gd ,n = CoxWLD
However, also need to consider Miller effect ...
Amirtharajah/Parkhurst, EEC 118 Spring 2011 9
CMOS Inverter Capacitances: Miller Effect
Cgd1
Vout
Vout
Vin Vin 2Cgd1
ID.n Cload
Vin
dV
I =C
dt
dVout
I D ,n = Cload Need to determine ID,n
dt
t0 t1 t2
VOH −VT 0 , n
− 2CL
t1
∫t dt = kn (VOH − VT 0,n )2 ∫ dV
VOH
out
0
2CLVT 0,n
t1 − t0 =
kn (VOH − VT 0,n )2
[
I DS = kn (VOH − VT 0,n )Vout − 12 Vout
2
]
(VOH +VOL ) / 2
dVout
t2 − t1 = −CL ∫
VOH −VT 0 ,n
[
kn (VOH − VT 0,n )Vout − 12 Vout
2
]
CL ⎡ 2(VOH − VT 0,n ) − (VOH + VOL ) / 2 ⎤
t2 − t1 = ln⎢ ⎥
kn (VOH − VT 0,n ) ⎣ (VOH + VOL ) / 2 ⎦
t PLH =
Cload
( 12 VDD − VSS )
I avg
Amirtharajah/Parkhurst, EEC 118 Spring 2011 21
CMOS Inverter Delay: 2nd Approximation
• Another approximate
method:
– Again assume constant Iavg
– Iavg = current I1 at start of I1
V1=Vdd
transition
CloadVDD
t PHL = V2=½Vdd
k n (VDD − VTn )
2
CloadVDD
t PLH =
k p (VDD − VTP )
2 t1 t2
Empirical equations:
2
⎛ tr ⎞
tpHL(ns)
⎝2⎠
2
⎛tf ⎞
t plh (actual ) = t ( step ) + ⎜⎜ ⎟⎟
2
plh
⎝2⎠
trise(ns)
Amirtharajah/Parkhurst, EEC 118 Spring 2011 23
How to Improve Delay?
• Minimize load capacitances
– Small interconnect capacitance
– Small Cg of next stage
• Raise supply voltage
– Increases current faster than increased swing ΔV
• Increase transistor gain factor
– Increase transistor drive current for
charging/discharging output capacitance
• Use low threshold voltage devices
– More subthreshold leakage power dissipation
Amirtharajah/Parkhurst, EEC 118 Spring 2011 24
Inverter Power Consumption
• Static power consumption (ideal) = 0
– Actually DIBL (Drain-Induced Barrier Lowering),
gate leakage, junction leakage are still present
• Dynamic power consumption
T
1
Pavg = ∫ v(t )i(t )dt
T0
1⎡ ⎛ dVout ⎞ ⎛ dVout ⎞ ⎤
T /2 T
Pavg = ⎢ ∫ Vout ⎜ − Cload ⎟dt + ∫ (VDD − Vout )⎜ Cload ⎟dt ⎥
T⎣0 ⎝ dt ⎠ T /2 ⎝ dt ⎠ ⎦
⎡ T /2 T ⎤
1 ⎢⎛ Vout ⎞
⎟ + ⎛⎜VDDVout Cload − CloadVout 2 ⎞⎟ ⎥
2
1
Pavg = ⎜ − Cload
T ⎢⎜⎝ 2 ⎟⎠ ⎝ 2 ⎠ T /2 ⎥
⎣ 0 ⎦
1
Pavg = CloadVDD = CloadVDD f
2 2
T
Amirtharajah/Parkhurst, EEC 118 Spring 2011 25
Next Time: Combinational Logic
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Announcements
• Quiz 1 today!
• Lab 2 reports due this week
• Lab 3 this week
• HW 3 due this Friday at 4 PM in box, Kemper
2131
T
Amirtharajah/Parkhurst, EEC 118 Spring 2011 6
Static CMOS
• Complementary pullup
network (PUN) and pulldown
network (PDN)
• Only one network is on at a A
time B PUN
C
• PUN: PMOS devices
F
– Why? A
• PDN: NMOS devices B PDN
C
– Why?
• PUN and PDN are dual
networks
A F
• If CMOS gate implements
B
logic function F:
series
– PUN implements function F
– PDN implements function G
=F
A
• PUN: F = A+B = A•B
B
• PDN: G = F = A+B A B
1 1 W R
W R
1 W R
W W R R
W R
0 0
0
WN ½ WN
WN
0 0 1 1
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0
F = A•(B+C)
Amirtharajah/Parkhurst, EEC 118 Spring 2011 14
Example: Complex Gate
Design CMOS gate for this logic function:
F = A•(B+C) = A + B•C
B C
B
A
C
F
B WP
A WP • What is worse-case pulldown delay?
C WP
F • Effective inverter for delay calculation:
A WN
B C WN ½ WP
WN
½ WN
1 1 W R
W R
1 W R
W W R R
W R
0 0
0
WN ½ WN
WN
B WP
A WP • What is worse-case pulldown delay?
C WP
F • Effective inverter for delay calculation:
A WN
B WNC WN ½ WP
½ WN
• And-Or-Invert (AOI)
– Sum of products boolean function
– Parallel branches of series connected NMOS
• Or-And-Invert (OAI)
– Product of sums boolean function
– Series connection of sets of parallel NMOS
A n1 B A
vdd F n1
A B B
F
gnd
– Convert to layout using consistent Euler paths
Amirtharajah/Parkhurst, EEC 118 Spring 2011 30
Propagation Delay Analysis - The Switch Model
RON
=
VDD VDD
VDD
Rp Rp Rp
Rp
A B B
A F
Rn Rp
F CL
B A
Rn
CL F
Rn Rn Rn
A CL
A B
A
tp = 0.69 Ron CL
⎛ V1 ⎞ ⎛ 12 VDD ⎞
t p = RC ln⎜⎜ ⎟⎟ = RC ln⎜⎜ ⎟⎟
⎝ V0 ⎠ ⎝ VDD ⎠
t p = RC ln(0.5)
t phl = 0.69 RnC L Standard RC-delay
equations from literature
t plh = 0.69 R p C L
1 1 B 4
A A 2
B
F C 4
2 CL
B D 2
F
2 A 2
D 1
A
B 2C 2
In3 C3
M3
Distributed RC-line
In2 M2 C2
CL CL
In3 M3 In1 M1
In2 M2 C2 C2
In2 M2
In1 M1 C1 C3
In3 M3
(a) (b)
Amirtharajah/Parkhurst, EEC 118 Spring 2011 41
Fast Complex Gates - Design Techniques (3)
• Improved Logic Design
CL CL
VDD VDD
In2
Out
In3
In4
GND
In1 In2 In3 In4
Vout
Cgd Cdb1
In1 1 Note that the value of Cload for calculating
Cgs1 Csb1 propagation delay depends on which capacitances
2
Cgd Cdb2 need to be discharged or charged when the critical
In2 2 signal arrives.
Cgs2 Csb2
3 Example: In1 = In3 = In4 = 1. In2 = 0. In2 switches from low
Cgd Cdb3 to high. Hence, Nodes 3 and 4 are already discharged to
In3 3
ground. In order for Vout to go from high to low… Vout
Cgs3 Csb3
4 node and node 2 must be discharged.
Cgd Cdb4 CL =
In4 4 Cgd5+Cgd7+Cgd8+2Cgd6(Miller)+Cdb5+Cdb6+Cdb7+Cd
Cgs4 Csb4 b8 +Cgd1+ Cdb1+ Cgs1+ Csb1+ 2Cgd2+ Cdb2+ Cw
– Pipelining
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Announcements
• Lab 3 this week at lab section
• HW 3 due this Friday at 4 PM in box, Kemper
2131
• Quizzes will be handed back in lab section
A F
• If CMOS gate implements
B
logic function F:
series
– PUN implements function F
– PDN implements function G
=F
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 8
Introduction
Chip designers face a bewildering array of choices
– What is the best circuit topology for a function?
– How many stages of logic give least delay?
???
– How wide should the transistors be?
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 9
Example
Ben Bitdiddle is the memory designer for the Motoroil 68W86,
an embedded automotive processor. Help Ben design the
A[3:0] A[3:0]
decoder for a register file. 32 bits
Decoder specifications:
4:16 Decoder
16 words
16
Register File
– 16 word register file
– Each word is 32 bits wide
– Each bit presents load of 3 unit-sized transistors
– True and complementary address inputs A[3:0]
– Each input may drive 10 unit-sized transistors
Ben needs to decide:
– How many stages to use?
– How large should each gate be?
– How fast can decoder operate?
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 10
Delay in a Logic Gate
Express delays in process-independent unit d = d abs
Delay has two components: d = f + p τ
τ = 3RC
f: effort delay = gh (a.k.a. stage effort)
≈ 3 ps in 65 nm process
– Again has two components 60 ps in 0.6 μm process
g: logical effort
– Measures relative ability of gate to deliver current
– g ≡ 1 for inverter
h: electrical effort = Cout / Cin
– Ratio of output to input capacitance
– Sometimes called fanout
p: parasitic (intrinsic) delay
– Represents delay of gate driving no load
– Set by internal parasitic capacitance
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 11
Delay Plots
d =f+p 2-input
= gh + p 6
NAND Inverter
g = 4/3
Normalized Delay: d
5 p=2
What about d = (4/3)h + 2
4 g=1
NOR2? p=1
3 d=h+1
2 Effort Delay: f
1
Parasitic Delay: p
0
0 1 2 3 4 5
Electrical Effort:
h = Cout / Cin
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 12
Computing Logical Effort
DEF: Logical effort is the ratio of the input
capacitance of a gate to the input capacitance of an
inverter delivering the same output current.
Measure from delay vs. fanout plots
Or estimate by counting transistor widths
2 2 A 4
Y
2 B 4
A 2
A Y Y
1 B 2 1 1
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 13
Catalog of Gates
Logical effort of common gates
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 14
Catalog of Gates
Parasitic delay of common gates
– In multiples of pinv (≈1)
Gate type Number of inputs
1 2 3 4 n
Inverter 1
NAND 2 3 4 n
NOR 2 3 4 n
Tristate / mux 2 4 6 8 2n
XOR, XNOR 4 6 8
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 15
Example: Ring Oscillator
Estimate the frequency of an N-stage ring oscillator
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 16
Example: FO4 Inverter
Estimate the delay of a fanout-of-4 (FO4) inverter
d
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 17
Multistage Logic Networks
Logical effort generalizes to multistage networks
Path Logical Effort G= gi ∏
Cout-path
Path Electrical Effort H=
Cin-path
Path Effort F = ∏ f i = ∏ gi hi
10
x z
y
20
g1 = 1 g2 = 5/3 g3 = 4/3 g4 = 1
h1 = x/10 h2 = y/x h3 = z/y h4 = 20/z
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 18
Multistage Logic Networks
Logical effort generalizes to multistage networks
Path Logical Effort G= gi ∏
Cout − path
Path Electrical Effort H=
Cin − path
Path Effort F = ∏ f i = ∏ gi hi
Can we write F = GH?
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 19
Paths that Branch
No! Consider paths that branch:
15
G =1 90
5
H = 90 / 5 = 18
GH = 18 15
90
h1 = (15 +15) / 5 = 6
h2 = 90 / 15 = 6
F = g1g2h1h2 = 36 = 2GH
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 20
Branching Effort
Introduce branching effort
– Accounts for branching between stages in path
Con path + Coff path
b=
Con path
B = ∏ bi
Note:
∏h i = BH
Now we compute the path effort
– F = GBH
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 21
Multistage Delays
Path Effort Delay DF = ∑ f i
Path Delay D = ∑ d i = DF + P
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 22
Designing Fast Circuits
D = ∑ d i = DF + P
Delay is smallest when each stage bears same effort
1
fˆ = gi hi = F N
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 23
Gate Sizes
How wide should the gates be for least delay?
fˆ = gh = g CCoutin
gi Couti
⇒ Cini =
fˆ
Working backward, apply capacitance
transformation to find input capacitance of each gate
given load it drives.
Check work by verifying input cap spec is met.
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 24
Example: 3-stage path
Select gate sizes x and y for least delay from A to B
y
x
45
A 8
x
y B
45
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 25
Example: 3-stage path
x
y
x
45
A 8
x
y B
45
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 26
Example: 3-stage path
Work backward for sizes
y = 45 * (5/3) / 5 = 15
x = (15*2) * (5/3) / 5 = 10
y
x
45
45
A P:
84 P:
x 4
N: 4 P:
y 12 B
N: 6 45
N: 3 45
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 27
Best Number of Stages
How many stages should a path use?
– Minimizing number of stages is not always fastest
Example: drive 64-bit datapath with unit inverter
Initial Driver 1 1 1 1
8 4 2.8
D = NF1/N + P 16 8
= N(64)1/N + N
23
Datapath Load 64 64 64 64
N: 1 2 3 4
f: 64 8 4 2.8
D: 65 18 15 15.3
Fastest
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 28
Derivation
Consider adding inverters to end of path
– How many give least delay? N - n1 ExtraInverters
Logic Block:
n1 n1Stages
D = NF + ∑ pi + ( N − n1 ) pinv
1
N Path Effort F
i =1
∂D 1 1 1
= − F N ln F N + F N + pinv = 0
∂N
ρ=F
1
Define best stage effort N
pinv + ρ (1 − ln ρ ) = 0
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 29
Best Stage Effort
pinv + ρ (1 − ln ρ ) = 0 has no closed-form solution
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 30
Sensitivity Analysis
How sensitive is delay to using exactly the best
number of stages? 1.6
1.51
D(N) /D(N)
1.4
1.26
1.2 1.15
1.0
(ρ=6) (ρ =2.4)
0.0
0.5 0.7 1.0 1.4 2.0
N/ N
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 31
Example, Revisited
Ben Bitdiddle is the memory designer for the Motoroil 68W86,
an embedded automotive processor. Help Ben design the
A[3:0] A[3:0]
decoder for a register file. 32 bits
Decoder specifications:
4:16 Decoder
16 words
16
Register File
– 16 word register file
– Each word is 32 bits wide
– Each bit presents load of 3 unit-sized transistors
– True and complementary address inputs A[3:0]
– Each input may drive 10 unit-sized transistors
Ben needs to decide:
– How many stages to use?
– How large should each gate be?
– How fast can decoder operate?
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 32
Number of Stages
Decoder effort is mainly electrical and branching
Electrical Effort: H = (32*3) / 10 = 9.6
Branching Effort: B=8
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 33
Gate Sizes & Delay
Logical Effort: G = 1 * 6/3 * 1 = 2
Path Effort: F = GBH = 154
Stage Effort: fˆ = F 1/ 3 = 5.36
Path Delay: D = 3 fˆ + 1 + 4 + 1 = 22.1
Gate sizes: z = 96*1/5.36 = 18 y = 18*2/5.36 = 6.7
A[3] A[3] A[2] A[2] A[1] A[1] A[0] A[0]
10 10 10 10 10 10 10 10
y z word[0]
y z word[15]
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 34
Comparison
Compare many alternatives with a spreadsheet
D = N(76.8 G)1/N + P
Design N G P D
NOR4 1 3 4 234
NAND4-INV 2 2 5 29.8
NAND2-NOR2 2 20/9 4 30.1
INV-NAND4-INV 3 2 6 22.1
NAND4-INV-INV-INV 4 2 7 21.1
NAND2-NOR2-INV-INV 4 20/9 6 20.5
NAND2-INV-NAND2-INV 4 16/9 6 19.7
INV-NAND2-INV-NAND2-INV 5 16/9 7 20.4
NAND2-INV-NAND2-INV-INV-INV 6 16/9 8 21.6
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 35
Review of Definitions
Term Stage Path
number of stages 1 N
logical effort g G = ∏ gi
H=
Cout-path
electrical effort h= Cout
Cin Cin-path
Con-path + Coff-path
branching effort b= Con-path B = ∏ bi
effort f = gh F = GBH
effort delay f DF = ∑ f i
parasitic delay p P = ∑ pi
delay d= f +p D = ∑ d i = DF + P
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 36
Method of Logical Effort
1) Compute path effort F = GBH
2) Estimate best number of stages N = log 4 F
3) Sketch path with N stages
1
4) Estimate least delay D = NF + P N
gi Couti
6) Find gate sizes Cini =
fˆ
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 37
Limits of Logical Effort
Chicken and egg problem
– Need path to compute G
– But don’t know number of stages without G
Simplistic delay model
– Neglects input rise time effects
Interconnect
– Iteration required in designs with wire
Maximum speed only
– Not minimum area/power for constrained delay
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 38
Summary
Logical effort is useful for thinking of delay in circuits
– Numeric logical effort characterizes gates
– NANDs are faster than NORs in CMOS
– Paths are fastest when effort delays are ~4
– Path delay is weakly sensitive to stages, sizes
– But using fewer stages doesn’t mean faster paths
– Delay of path is about log4F FO4 inverter delays
– Inverters and NAND2 best for driving large caps
Provides language for discussing fast circuits
– But requires practice to master
Amirtharajah/Parkhurst, EEC 118 Spring 2011 CMOS VLSI Design 4th Ed. 39
Next Topic: Sequential Logic
– Pipelining
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Announcements
• Quiz 2 on Monday, April 26
• Midterm on Monday, May 3
– Covers material through Lecture (Monday 4/26)
• HW4 due Friday, 4PM in box, Kemper 2131
• Lab 3, Part 2 report due next week
A F
• If CMOS gate implements
B
logic function F:
series
– PUN implements function F
– PDN implements function G
=F
WN ½ WN
WN
A B A
A B B
F
VDD VDD
VDD
Rp Rp Rp
Rp
A B B
A F
Rn Rp
F CL
B A
Rn
CL F
Rn Rn Rn
A CL
A B
A
tp = 0.69 Ron CL
⎛ V1 ⎞ ⎛ 12 VDD ⎞
t p = RC ln⎜⎜ ⎟⎟ = RC ln⎜⎜ ⎟⎟
⎝ V0 ⎠ ⎝ VDD ⎠
t p = RC ln(0.5)
t phl = 0.69 RnC L Standard RC-delay
t plh = 0.69 R p C L equations from literature
1 1 B 4
A A 2
B
F C 4
2 CL
B D 2
F
2 A 2
D 1
A
B 2C 2
In3 C3
M3
Distributed RC-line
In2 M2 C2
CL CL
In3 M3 In1 M1
In2 M2 C2 C2
In2 M2
In1 M1 C1 C3
In3 M3
(a) (b)
Amirtharajah/Parkhurst, EEC 118 Spring 2010 19
Fast Complex Gates - Design Techniques (3)
• Improved Logic Design
CL CL
VDD VDD
In2
Out
In3
In4
GND
In1 In2 In3 In4
Vout
Cgd Cdb1
In1 1 Note that the value of Cload for calculating
Cgs1 Csb1 propagation delay depends on which capacitances
2
Cgd Cdb2 need to be discharged or charged when the critical
In2 2 signal arrives.
Cgs2 Csb2
3 Example: In1 = In3 = In4 = 1. In2 = 0. In2 switches from low
Cgd Cdb3 to high. Hence, Nodes 3 and 4 are already discharged to
In3 3
ground. In order for Vout to go from high to low… Vout
Cgs3 Csb3
4 node and node 2 must be discharged.
Cgd Cdb4 CL =
In4 4 Cgd5+Cgd7+Cgd8+2Cgd6(Miller)+Cdb5+Cdb6+Cdb7+Cd
Cgs4 Csb4 b8 +Cgd1+ Cdb1+ Cgs1+ Csb1+ 2Cgd2+ Cdb2+ Cw
– Pipelining
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Outline
• Review: Static CMOS Logic
• Finish Static CMOS transient analysis
• Sequential MOS Logic Circuits: Rabaey, 7.1-7.3
(Kang & Leblebici, 8.1-8.5)
In LOGIC Out
tp,comb
Φ
• Two information storage mechanisms
– Positive feedback-based (static) circuits
– Charge storage-based (dynamic) circuits
• Clock signal Φ controls timing of state (memory)
updates
Amirtharajah/Parkhurst, EEC 118 Spring 2010 5
Positive Feedback: Bistability
Vi1 Vo1 = Vi 2 Vo 2
Vi1 = Vo 2
A
Vi 2 = Vo1 C (metastable)
B
Vi1 = Vo 2
Amirtharajah/Parkhurst, EEC 118 Spring 2010 6
Metastability
A A
Vo1=Vi2
Vo1=Vi2
C C
B B
δ Vi1=Vo2 δ Vi1=Vo2
Q Q
R R
• Allows control of the state of the bistable element
• One input state is not allowed
• Gating S and R with the clock prevents the latch
from responding except during one phase of the
clock cycle
S Q
0 0 1 1
0 1 1 0
1 0 0 1
R Q
1 1 Q Q
memory
Amirtharajah/Parkhurst, EEC 118 Spring 2010 10
Other Latches
• Clocked SR latch
– Adds clock input. Latch output can only be
set/reset when clk=1 (or clk=0)
• Other latch types:
– JK latch: Removes “not allowed” state – e.g.,
toggles when inputs are both 1
– T latch: Toggles when T input = 1
– D latch: Output = D input
A F
When en=0, F is
en “floating”, i.e. high
impedance
Amirtharajah/Parkhurst, EEC 118 Spring 2010 12
Positive Dynamic Transmission Gate Latch
Clk
Q
D I0
C0
Clk
• No feedback devices
• Data stored on input capacitance of inverter I0
• Dynamic logic issues apply: leakage, capacitive
coupling, charge sharing
Amirtharajah/Parkhurst, EEC 118 Spring 2010 13
Transmission Gate Positive Static Latch
Clk
Q
Clk
D
Clk
Amirtharajah/Parkhurst, EEC 118 Spring 2010 14
NMOS Pass Gate Positive Static Latch
Clk
Q
Q
Clk
VDD − VTn
D
• Fewer devices, less area, lower clock load
• Threshold drop on internal nodes implies more static
power, less noise margin
Amirtharajah/Parkhurst, EEC 118 Spring 2010 15
Master-Slave Flip-Flop
• By cascading two level-sensitive latches, one
type of edge triggered flip-flop is created
• JK latch can be used for first stage so that no
input combinations are invalid
• SR latch is then used for the second stage
because inputs cannot be invalid
• Rather than using logic gate-based latches, can
cascade latches such as above (e.g.,
transmission gate dynamic or static latches)
C0 C1
Clk Clk
• No feedback devices
• Data stored on input capacitances of inverters I0 and I1
• Dynamic logic issues apply: leakage, capacitive
coupling, charge sharing
Amirtharajah/Parkhurst, EEC 118 Spring 2010 18
Clocked Circuit Timing
• Timing definitions:
– Clock-to-Q or Propagation Delay (tclkQ): delay of
flip-flop from clock edge to output Q
– Setup Time (tsetup): amount of time before clock
edge that data has to be stable. If data arrives
after this time, it will not be latched correctly.
– Hold Time (thold): amount of time after clock edge
that data has to be stable.
• It is possible to trade off setup and hold time with
flip-flop circuit design
– Modify data and clock timing relationship by
delaying one of the two signals
t
tsetup thold
In
DATA
STABLE
t
tpFF
Out
DATA
STABLE
t
In LOGIC Out
tp,comb
Φ
1
t pFF + t p ,comb + t setup <T =
f
• Signals must propagate out of flip-flop, through
combinational logic, and be stable before next
clock edge (clock period = T, clock frequency = f)
Amirtharajah/Parkhurst, EEC 118 Spring 2010 21
Staticized Dynamic Positive Edge-Triggered FF
Clk I1 Clk I3
Q
D I0 I2
C0 C1
Clk Clk
Clk Clk
D
A Clk
Clk Q
Clk
1. Both high simultaneously, race condition from D to Q
2. Node A can be driven simultaneously by D and B
Amirtharajah/Parkhurst, EEC 118 Spring 2010 23
Race Through and Feedback Paths
B
Clk
Clk Clk
D
A Clk
Clk Q
Clk
1. Both high simultaneously, race condition from D to Q
2. Node A can be driven simultaneously by D and B
Amirtharajah/Parkhurst, EEC 118 Spring 2010 24
Nonoverlapping Clocks Methodology
B
PHI 0
PHI 1 PHI 1
D
A PHI 0
PHI 0 Q
PHI1
• Guarantee nonoverlap period long enough
• Note: internal nodes left high Z during nonoverlap
Amirtharajah/Parkhurst, EEC 118 Spring 2010 25
C2MOS Edge Triggered Flip-Flop
Clk Clk
D Q
Clk C0 Clk C1
Gnd Gnd
D Q
C0 C1
D Q
VDD VDD
C0 C1
REG
a a
REG
REG
REG
φ φ .
REG
. log Out log Out
φ φ φ φ
REG
REG
b b
– Array multipliers
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Announcements
• Complete Lab 4 this week
S
S
A F = A⊕ S
S
S
• If S = 0, F = A and when S = 1, F = ~A
Amirtharajah/Parkhurst, EEC 118 Spring 2011 4
Review: Transmission Gate Multiplexer
F = AS + BS
A
S
Amirtharajah/Parkhurst, EEC 118 Spring 2011 5
Dynamic CMOS
• Operation
– Clk low during Pre-charge clk Mp
• Mp is on while Mn is off
• Output charged to Vdd
– Clk high during evaluate NMOS
network
• Mn is on while Mp is off
• Output pulled down clk Mn
according to PDN
function
• PDN design same as static Gnd
CMOS
clk Mp
clk Mp
Out 2
1
NMOS
network Out 1
clk Mn
clk Mn
Gnd
Gnd
• During pre-charge stage, inputs to second gate are all
high: Out 2 could discharge before Out 1 discharges
Amirtharajah/Parkhurst, EEC 118 Spring 2011 8
Cascading Multiple Stages
• During pre-charge stage, inputs to second gate are
all high
– At the beginning of evaluate stage, Out 2 is
discharged.
– Out 1 goes through its evaluation stage concurrently
and goes low
• Hence out 2 was supposed to be high, but already
discharged.
• Dynamic logic driven by the same clock cannot be
cascaded directly
clk Mp
clk Mp
Out 2
NMOS 1
network Out 1
clk Mn
clk Mn
Gnd
Gnd
Amirtharajah/Parkhurst, EEC 118 Spring 2011 13
Domino Logic
• Add an inverter between dynamic gates
– Inverter drives the gate’s fanout – increased
performance
• Sometimes the inverter is replaced with a more
complex static CMOS gate
– Incorporates more logic per stage to improve
speed
• Static CMOS gate improves overall circuit
dynamic noise margins
T
– Effective capacitance Cload is doubled when the gate
evaluates because the gate must later precharge
– Frequency must be multiplied by the probability that an
evaluation will occur
• Power is usually higher for domino logic except when it
replaces prior logic with very high activity factors
clk Mp clk Mp
PMOS
NMOS network
network
clk Mn clk Mn
Gnd Gnd
Zipper Logic
• Like NORA logic…but,
– PMOS precharge and NMOS pre-discharge
weakly on during evaluation stage…
Clk Clk
Out0
In0
In1 PDN Out1
In4 PDN
In5
Clk
– Array multipliers
Jeff Parkhurst
Intel Corporation
Announcements
• Homework 5 this week
• Lab 4 Parts 1 + 2 – keep working!
• Midterm next Monday, May 2 (in class)
1 1 1
A B
= +
Wna Wnb W peff W pa W pb
Wneff = Wnb
Amirtharajah/Parkhurst, EEC 118 Spring 2011 4
Review of Sizing
• Gate delays depend on which inputs switch
– Normally sized for worst-case delay
– Best-case (fastest) delay also important due to
race conditions in a pipelined datapath
• Switching threshold VM normally considers all
inputs switching
• Delay estimation
– Combine switching transistors into equivalent
inverter
C – VT = 0.5V
Wn
• 1st: Find delay of inverter
• 2nd: Find delay of NAND
– Vin = 0
– NMOS in cutoff: no drain
current Vout
Gnd
I Dp = k p (− VDD − VTp )
2
1
2
(neglecting λ)
– Setting Idn = Idp:
Vin=3V
VGS=-VDD
Vin=2V
Vin=1V
Vin=3V VDD
Vout
Vin=2V
Vin=1V
= =
0V
Vout = 0V @ t=0
Vin
VDD
• Equivalent
resistance Req is Req,n
parallel combinaton
Req,p
of Req,n and Req,p
R
• Req is relatively
constant
Req
1 1
Req ,n ≈ Req , p ≈
k n (VDD − Vtn ) (
k p VDD − Vtp )
1
Req ≈
(
k n (VDD − Vtn ) + k p VDD − Vtp )
Amirtharajah/Parkhurst, EEC 118 Spring 2011 22
Equivalent Resistance – Region 1
• NMOS saturation:
Req ,n =
(VDD − Vout )
k n (VDD − Vout − Vtn )
1 2
2
• PMOS saturation:
Req , p =
(VDD − Vout )
k p (− VDD − Vtp )
1 2
2
Req ,n =
(VDD − Vout )
k n (VDD − Vout − Vtn )
1 2
2
• PMOS linear:
2(VDD − Vout )
=
Req , p
(
k p 2(VDD − VTP )(VDD − Vout ) − (VDD − Vout )
2
)
2
=
k p [2(VDD − VTP ) − (VDD − Vout )]
• PMOS linear:
2
Req , p =
k p [2(VDD − VTP ) − (VDD − Vout )]
S
S
A F = A⊕ S
S
S
• If S = 0, F = A and when S = 1, F = ~A
Amirtharajah/Parkhurst, EEC 118 Spring 2011 27
Transmission Gate Multiplexer
F = AS + BS
A
S
Amirtharajah/Parkhurst, EEC 118 Spring 2011 28
Full Transmission Gate Logic
B C
F = A BC
A
B C
• PMOS devices in parallel with NMOS transistors pass
full VDD (only one logic path shown above)
• Requires more devices, but each can be sized smaller
than static CMOS
• Output inverter reduces impact of fanout
Amirtharajah/Parkhurst, EEC 118 Spring 2011 29
Next Topic: Dynamic Circuits
– Improved speed
– Reduced area
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Outline
• Today: Alternative MOS Logic Styles
• Dynamic MOS Logic Circuits: Rabaey 6.3 (Kang &
Leblebici, 9.4-9.6)
S
S
A F = A⊕ S
S
S
• If S = 0, F = A and when S = 1, F = ~A
Amirtharajah/Parkhurst, EEC 118 Spring 2010 4
Review: Transmission Gate Multiplexer
F = AS + BS
A
S
Amirtharajah/Parkhurst, EEC 118 Spring 2010 5
Dynamic CMOS
• Operation
– Clk low during Pre-charge clk Mp
• Mp is on while Mn is off
• Output charged to Vdd
– Clk high during evaluate NMOS
network
• Mn is on while Mp is off
• Output pulled down clk Mn
according to PDN
function
• PDN design same as static Gnd
CMOS
clk Mp
clk Mp
Out 2
1
NMOS
network Out 1
clk Mn
clk Mn
Gnd
Gnd
• During pre-charge stage, inputs to second gate are all
high: Out 2 could discharge before Out 1 discharges
Amirtharajah/Parkhurst, EEC 118 Spring 2010 8
Cascading Multiple Stages
• During pre-charge stage, inputs to second gate are
all high
– At the beginning of evaluate stage, Out 2 is
discharged.
– Out 1 goes through its evaluation stage concurrently
and goes low
• Hence out 2 was supposed to be high, but already
discharged.
• Dynamic logic driven by the same clock cannot be
cascaded directly
clk Mp
clk Mp
Out 2
NMOS 1
network Out 1
clk Mn
clk Mn
Gnd
Gnd
Amirtharajah/Parkhurst, EEC 118 Spring 2010 13
Domino Logic
• Add an inverter between dynamic gates
– Inverter drives the gate’s fanout – increased
performance
• Sometimes the inverter is replaced with a more
complex static CMOS gate
– Incorporates more logic per stage to improve
speed
• Static CMOS gate improves overall circuit
dynamic noise margins
T
– Effective capacitance Cload is doubled when the gate
evaluates because the gate must later precharge
– Frequency must be multiplied by the probability that an
evaluation will occur
• Power is usually higher for domino logic except when it
replaces prior logic with very high activity factors
clk Mp clk Mp
PMOS
NMOS network
network
clk Mn clk Mn
Gnd Gnd
Zipper Logic
• Like NORA logic…but,
– PMOS precharge and NMOS pre-discharge
weakly on during evaluation stage…
Clk Clk
Out0
In0
In1 PDN Out1
In4 PDN
In5
Clk
• DRAM, SRAM
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Announcements
• Finish Lab 5 this week
• Quiz 3 Wednesday
• Homework 7 issued later this week, due next
week
• Lab 6 next week, report due June 2
S0 S1 S2 S3
WL
Bit 3T DRAM
Line
Write Read
M2 M3
Store
M1
Q Q
WL
M3 Q Q M4
Vdd Vdd
Cc Cc
M1 M2
WL
2
n,3
(VDD − VQ − VTN ) =
k
2
2 n,1
(2(V
DD − V )V
TN Q − VQ
2
)
⎛W ⎞
⎜ ⎟
⎝ L ⎠3 2(VDD − 1.5VTN )VTN plug into ID equations:
Guarantee VQ < VTN,
k n ,3
= <
k n ,1 ⎛W ⎞
⎜ ⎟
(VDD − 2VTN )
2 ID3 < ID1 at VQ=VTN
⎝ L ⎠1
Amirtharajah/Parkhurst, EEC 118 Spring 2011 22
SRAM Design: Write “0” (1st Analysis)
BL = 0V BL = Vdd
M5 M6
VBL
M3 Q Q M4
Cc Cc
M1 M2
WL
• Assume VQ = Vdd and VQ = 0V
• Data must be forced into the cell
– VQ must fall below the threshold of the inverter to turn M2 off.
– This allows VQ to go high enough to go above the Vt of M1
• This discharges node Q and stores a 0
• Assume VBL remains at 0V: M3 linear, M5 linear (VQ=VDD/2)
Amirtharajah/Parkhurst, EEC 118 Spring 2011 23
SRAM Design: Write “0” (1st Analysis)
BL = 0V BL = Vdd
M5 M6
VBL
M3 Q Q M4
Cc Cc
M1 M2
WL
Cc Cc
M1 M2
WL
Cc Cc
M1 M2
WL
• Assume VQ = Vdd and VQ = 0V
• Data must be forced into the cell
– VQ must fall below the threshold of the NMOS (turns M2 off).
– This allows VQ to go high enough to go above the Vt of M1
• This discharges node Q and stores a 0
• Assume VBL remains at 0V: M5 sat., M3 linear (VQ = VTN)
Amirtharajah/Parkhurst, EEC 118 Spring 2011 26
SRAM Design: Write “0” (2nd Analysis)
BL = 0V BL = Vdd
M5 M6
VBL
M3 Q Q M4
Cc Cc
M1 M2
WL
Cc Cc
M1 M2
WL
+
-
VN 2
+
-
• Hypothetical noise sources added to inverter inputs
• SNM corresponds to largest noise disturbances which
won’t disrupt cell operation
• SNM can be determined graphically by butterfly plot
Amirtharajah/Parkhurst, EEC 118 Spring 2011 29
Hold Static Noise Margin
Gnd
• Plot two mirrored Vin/Vout curves
• SNM = side of largest inscribed
square
Amirtharajah/Parkhurst, EEC 118 Spring 2011 30
Read Static Noise Margin
Vin
SNM
Vout
Gnd
• Diode-connected access NMOS
simulates precharged bitline
Vin
Vout
SNM
Gnd
• Always-on ground-connected
access NMOS simulates bitline
driven low, use Read Ckt for Write 1
Amirtharajah/Parkhurst, EEC 118 Spring 2011 32
Memory Peripherals
• Memory core (memory cells) largely determined by
technological considerations
– Emphasizes reduced area, sacrifices speed, reliability
– Peripheral circuits can recover some of the lost
performace
• Address Decoders
– Row Decoders: one-hot decoding for word lines
– Column Decoders: 2L-to-1 multiplexers for bit lines
• I/O Buffers and Drivers
• Sense Amplifiers
• Memory Timing and Control
S D
n+ n+
p substrate
– Delay estimation
Rajeevan Amirtharajah
University of California, Davis
Jeff Parkhurst
Intel Corporation
Announcements
• HW7: Optional
– Issued later today
• Lab 6: Memories
– Issued this evening, due last day of class
• Quiz 4 Wednesday
vin
I dyn vout
I sc
CL
VDD − VT
vin (t )
VT
I peak
I sc (t )
t sc t sc
VDS ≈ 0
vin vout
I sc ≈ 0
C L big
• If inputs switch fast and output switches slowly, very little short
circuit current results
– Translates to slower propagation delays which might not be tolerable
Amirtharajah/Parkhurst, EEC 118 Spring 2011 15
Short Circuit Power Dissipation
VDS ≈ VDD
vin vout
I sc ≈ I MAX
C L small
I2, I3, I6
I5 I1
I4
B
Power-Delay
Product (PDP) tries
to balance power
W/L decreasing
and delay tradeoff
Amirtharajah/Parkhurst, EEC 118 Spring 2011 26
Power Delay Product Optimum
• Just like Vt scaling vs. power supply there is diminishing
returns for sizing
– Preceding curve shows delay vs. power
• Obtained by modifying the size of the gate to analyze
delay and power
• By decreasing W/L, delay goes up but power goes down
– After a while, decreasing W/L increases delay
tremendously without lowering power
• By increasing W/L, delay goes down but power goes up
– After a while, increasing W/L costs you tremendously
in power without lowering delay
• Optimal point where slope of curve is -1
Amirtharajah/Parkhurst, EEC 118 Spring 2011 27
Pipeline Approach to Voltage Scaling
• Start with a single design with two registers
– Consider the logic in between allows freq = fmax
• Now break the logic into N separate parts with equal delay
– Separate each part by a register
– Logic will be several times faster (New fmax = N x Old fmax)
• Vdd can be lowered in order slow down logic to fit original
fmax freq
– However, additional capacitance of each register has been
added.
• Power savings could be as much as 80% once all things
are considered
LOGIC LOGIC
D A B
LOGIC LOGIC
A B
f
2
• Parallelize computation up to N times
• Reduce clock frequency by factor N
• Reduce voltage to meet relaxed frequency constraint
Amirtharajah/Parkhurst, EEC 118 Spring 2011 31
Tradeoffs of Parallelization
• Amount of parallelism in application may be limited
• Extra capacitance overhead of multiple datapaths
– N times higher input loading
– N-to-1 selector on output
– Lower clock frequency somewhat offset by higher clock
load
• Consumes more area, devices, more leakage power
especially in deep submicron
• Voltage reduction typically results in dramatic power
gains
– ~3X power reduction
– Delay estimation
Rajeevan Amirtharajah
University of California, Davis
Outline
• Review and Finish: Low Power Design
• Interconnect Effects: Rabaey Ch. 4 and Ch. 9
(Kang & Leblebici, 6.5-6.6)
• Lumped modeling:
t L
W
ρL L ρL ρ
R= = = Rsq Rsq =
A tW W t
Amirtharajah, EEC 118 Spring 2011 7
Parallel-Plate Capacitance
• Width large compared to dielectric thickness, height
small compared to width: E field lines orthogonal to
substrate
W
t L
h dielectric
substrate
εr
C= WL
h
Amirtharajah, EEC 118 Spring 2011 8
Fringing Field Capacitance
• When height comparable to width, must account for
fringing field component as well
L
W
h dielectric
substrate
dielectric t W
h
substrate
εr ⎛
t⎞ 2πε r
c = c pp + c fringe = ⎜W − ⎟ +
h ⎝ 2 ⎠ log(2h t + 1)
• Model is simple and works fairly well (Rabaey, 2nd ed.)
dielectric
substrate
A −
V
Cm
ΔQ = Cm (Vf −Vi )
= Cm(VDD − (−VDD))
+
B +
Cm
V = 2CmVDD
−
C
A B C OR A B C Ceff = 0
A B C OR A B C Ceff = 4Cm
A B C A B C
Ceff = 2Cm
OR
A B C OR A B C
Amirtharajah, EEC 118 Spring 2011 15
Data Dependent Switched Capacitance 2
• When adjacent wires are static, mutual capacitance is
effectively to ground
0B 0 OR 1B 1
1B 0 OR 0B 1
Ceff = 2Cm
0B 1 OR 1B 0
1B 1 OR 0B 0
• Remember: it is the charging of capacitance where we
account for energy from supply, not discharging
R/2 R/2
1 2 N
C/N C/N
cl = εμ
– Assumes uniform or “average” dielectric
Amirtharajah, EEC 118 Spring 2011 21
Summary
• Many important effects to consider in interconnect design
– Resistance, capacitance, inductance can all affect signal
performance
– Long rise/fall time signals, only resistance and capacitance
needs to be considered
• Several models useful for RC interconnect delay analysis
– Simple lumped (1 R, 1 C) model: easy to analyze and/or
simulate, will be pessimistic
– T-model (2 Req = R/2, 1 C): more accurate than lumped
– Distributed model (N Req = R/N, N Ceq = C/N): most accurate,
use Elmore delay approximation for hand analysis
Amirtharajah, EEC 118 Spring 2011 22
Next Topic: Design for Manufacturability
Rajeevan Amirtharajah
University of California, Davis
Outline
• Finish interconnect discussion
• Manufacturability: Rabaey G, H (Kang & Leblebici,
14)
20
15
Count
10
0
0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-
100
Propagation Delay (ps)
• Delay variations with parameters, loading, VDD, and T
Amirtharajah, EEC 118 Spring 2011 7
Yield Estimation and Maximization
• Parametric Yield: ratio of total acceptable circuits to total
manufactured circuits
– Design for manufacturability aims to maximize yield (and $$)
• Yield statistics are usually complicated since circuit
performance is complex function of parameters
• Numerous methods for estimating and maximizing yield
– Response surface models (RSM): compact analytical model
fit to circuit simulations using Design of Experiments
– Direct Monte Carlo circuit simulations or the RSM can be
used to estimate yields
– Designer controlled parameters then adjusted to maximize
yield estimates
Amirtharajah, EEC 118 Spring 2011 8
Worst-Case Design 1
• Given range of variations for process, voltage,
temperature identify worst (best) cases for performance
parameter of interest
– Process corner models from fab define limits of device
performance
– Labeled by NMOS-PMOS pairs, e.g. Typical NMOS-Typical
PMOS (TT)
– Usual additional corners: Fast NMOS-Fast PMOS (FF),
Slow NMOS-Slow PMOS (SS), Fast NMOS-Slow PMOS
(FS), Slow NMOS-Fast PMOS (SF)
– Usual voltage corners: Nominal VDD +/- 10%
– Temperature range: 0 – 100 oC
10 nm FET
180 nm FET