0% found this document useful (0 votes)
13 views

CA L7 Unit3 Slides

The document discusses computer architecture and arithmetic logic units. It provides an overview of integer addition and subtraction, digital logic design, and how to build an ALU from basic logic gates. It describes how a 1-bit ALU can perform operations like AND, OR, and addition and how multiple 1-bit ALUs can be combined to form a larger ALU.

Uploaded by

Incia Saleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

CA L7 Unit3 Slides

The document discusses computer architecture and arithmetic logic units. It provides an overview of integer addition and subtraction, digital logic design, and how to build an ALU from basic logic gates. It describes how a 1-bit ALU can perform operations like AND, OR, and addition and how multiple 1-bit ALUs can be combined to form a larger ALU.

Uploaded by

Incia Saleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

Computer Architecture

(EE 371 / CE 321 / CS 330)

Dr. Farhan Khan


Assistant Professor,
Dhanani School of Science & Engineering,
Habib University

1
Outline: Unit 3

• Arithmetic for Computers: Review

• Digital Logic Design: Review

• Arithmetic Logic Unit (ALU)

• Single-Cycle Processor Implementation

2
Integer Addition

3
Integer Addition: Overflow

Overflow occurs when the result from an operation cannot be represented


with the available hardware, in this case a 64-bit word

• Adding +ve and –ve operands


– Overflow is not possible
• Adding two +ve number
– Overflow has occurred if the sign bit of the result is 1
• Adding two –ve numbers
– Overflow has occurred if the sign bit of the result is 0

4
Integer Subtraction

• Subtraction through Addition


– Add first operand and negation of the second operand
– Example: 7 – 6 = 7 + (-6)

5
Integer Subtraction: Overflow

Overflow occurs when the result from an operation cannot be represented


with the available hardware, in this case a 64-bit word

• Subtracting two +ve or two –ve operands


– Overflow is not possible
• Subtracting +ve from -ve operand
– Overflow has occurred if the sign bit of the result is 0
• Subtracting –ve from +ve operands
– Overflow has occurred if the sign bit of the result is 1

6
Review of Logic Design

Input 1 Output 1
System
Input 2 Output 2

0-1 0-1
Digital Logic
System
0-1 0-1

7
Review of Logic Design

• 2 Types of Logic Systems

0-1 0-1
Combinational
Logic System
0-1 0-1

The output of a combinational logic system depends only on the


current input.

Sequential Logic
0-1 0-1
System

0-1 Memory 0-1

The output of a sequential logic system can depend both on


the inputs and internal memory.

8
Review of Logic Design

• Truth Table
– Because a combinational logic block contains no memory, it can be completely specified
by defining the values of the outputs for each possible set of input values.
– Such a description is normally given as a truth table.

9
Review of Logic Design

• Logic Gates
– Logic blocks/systems are built from Logic Gates that implement basic logic functions

A
B

A.B A + B𝐴

10
Review of Logic Design

• Decoder
– The decoder has an n-bit input and 2n outputs, where only one output is asserted for each
input combination

11
Review of Logic Design

• Multiplexor (Selector)
– Consider the two-input multiplexor.
– This multiplexor has three inputs: two data values and a selector (or control) value.
– The selector value determines which of the inputs becomes the output.

12
Review of Logic Design

• Multiplexor (Selector)
– Consider the two-input multiplexor.
– This multiplexor has three inputs: two data values and a selector (or control) value.
– The selector value determines which of the inputs becomes the output.

– Multiplexors can be created with an arbitrary number of data inputs.

13
Arithmetic Logic Unit (ALU)

• The arithmetic logic unit (ALU) is the brain of the computer, the device that
performs the arithmetic operations like addition and subtraction or logical
operations like AND and OR.

14
Arithmetic Logic Unit (ALU)

• The arithmetic logic unit (ALU) is the brain of the computer, the device that
performs the arithmetic operations like addition and subtraction or logical
operations like AND and OR.

• Let’s focus on 1-bit ALU and progressively build an ALU with more
functionality

15
Arithmetic Logic Unit (ALU)

The 1-bit Logical Unit for AND and OR.

16
1-bit Adder

17
Arithmetic Logic Unit (ALU)

he 1-bit Logical Unit for AND and OR. The 1-bit ALU that performs
AND, OR, and addition.

18
Arithmetic Logic Unit (ALU)

A 64-bit ALU constructed from 64 1-bit ALUs 19


Arithmetic Logic Unit (ALU)

The 1-bit ALU that performs Support for subtraction (through


AND, OR, and addition. addition).

• By selecting b (Binvert = 1) and setting CarryIn to 1 in


the least significant bit of the ALU, we get two’s
complement subtraction of b from a instead of
addition of b to a.
20
Arithmetic Logic Unit (ALU)

The 1-bit ALU that performs Support for NOR operation (through
AND, OR, addition, and subtraction AND).
(through addition).
• DeMorgan’s Theorem .

21
Arithmetic Logic Unit (ALU)

The 1-bit ALU that performs


AND, OR, addition, subtraction Support for slt operation.
• slt rd, rs1, rs2
(through addition) and NOR operation.
rd =1 if rs1<rs2
rd = 0 otherwise
• Consequently, slt will set all but the least significant bit
to 0, with the least significant bit set according to the
comparison. 22
Arithmetic Logic Unit (ALU)

The 1-bit ALU that performs


AND, OR, addition, subtraction
(through addition) and NOR operation. Support for slt operation.
• For the ALU to perform slt, we first need to expand
the three-input multiplexor to add an input for the slt
result. We call that new input Less and use it only for
slt.
23
Arithmetic Logic Unit (ALU)

Support for slt operation.


• We must connect 0 to the Less input for the upper 63 bits
of the ALU, since those bits are always set to 0.

• How to set the least significant bit?


• We want the least significant bit of slt operation to
be a 1 if a < b
• That is, we want the least significant bit of slt
operation to be a 1 if a − b is negative and a 0 if a-b
is positive
• This desired result corresponds exactly to the sign
bit values after performing a-b: 1 means negative
and 0 means positive
• So, we need only connect the sign bit (adder output
of the most significant bit) to the Less input of least
significant bit to get slt.
• This needs a modified 1-bit ALU at the most
significant bit

A 64-bit ALU constructed from 64 1-bit ALUs (Notice the ALU for
24
Most Significant Bit)
Arithmetic Logic Unit (ALU)

 Most significant bit

The 1-bit ALU that performs


AND, OR, addition, subtraction
(through addition) and NOR operation. Support for slt operation.
• We need to connect the sign bit (adder output of
the most significant bit) to the Less input of least
significant bit to get slt.
• This needs a modified 1-bit ALU at the most
significant bit (addition of the Set output)25
Arithmetic Logic Unit (ALU)

Support for slt operation.


• We must connect 0 to the Less input for the upper 63 bits
of the ALU, since those bits are always set to 0.

• How to set the least significant bit?


• We want the least significant bit of slt operation to
be a 1 if a < b
• That is, we want the least significant bit of slt
operation to be a 1 if a − b is negative and a 0 if a-b
is positive
• This desired result corresponds exactly to the sign
bit values after performing a-b: 1 means negative
and 0 means positive
• So, we need only connect the sign bit (adder output
of the most significant bit) to the Less input of least
significant bit to get slt.
• This needs a modified 1-bit ALU at the most
significant bit

A 64-bit ALU constructed from 64 1-bit ALUs (Notice the ALU for
26
Most Significant Bit)
Arithmetic Logic Unit (ALU)

Support for beq operation.


• beq operation branches if two registers are equal.

• The easiest way to test equality with the ALU is to


subtract b from a and then test to see if the result is
0.

• To test if the result is 0, the simplest way is to OR all


the outputs together and then send that signal
through an inverter

27
Arithmetic Logic Unit (ALU)

Simplifying Control

• Notice that every time we want the ALU to subtract,


we set both CarryIn and Binvert to 1.

• For adds or logical operations, we want both control


lines to be 0.

• We can therefore simplify control of the ALU by


combining the CarryIn and Binvert to a single control
line called Bnegate.

28
Arithmetic Logic Unit (ALU)

Simplifying Control

• Notice that every time we want the ALU to subtract,


we set both CarryIn and Binvert to 1.

• For adds or logical operations, we want both control


lines to be 0.

• We can therefore simplify control of the ALU by


combining the CarryIn and Binvert to a single control
line called Bnegate.

29
Arithmetic Logic Unit (ALU)

ALU Control

• We can think of the combination of the 1-bit Ainvert


line, the 1-bit Bnegate line, and the 2-bit Operation
lines as 4-bit control lines for the ALU, telling it to
perform add, subtract, AND, OR, NOR, o set less than.

30
Arithmetic Logic Unit (ALU)

Symbol for a Complete ALU

31
Processor Implementation (Chapter 4)

• Classical performance equation:

• Instruction Count
– Determined by Compiler and Instruction Set Architecture
• CPI and Clock cycle time
– Determined by Processor Implementation

32
Processor Implementation (Chapter 4)

• Classical performance equation:

• Instruction Count
– Determined by Compiler and Instruction Set Architecture
• CPI and Clock cycle time
– Determined by Processor Implementation
– We will examine two RISC-V processor implementations
• A simplified version
• A more realistic pipelined version

33
RISC-V Processor Implementation: Simplified Version

• Supports a simplified subset of RISC-V instructions


– Memory reference: ld, sd
– Arithmetic/logical: add, sub, and, or
– Control transfer: beq

34
RISC-V Processor Implementation: Overview

35
RISC-V Processor Implementation: Instruction Execution

1. Program Counter (PC)  Instruction memory


– fetch instruction

36
RISC-V Processor Implementation: Overview

37
RISC-V Processor Implementation: Instruction Execution

1. Program Counter (PC)  Instruction memory


– fetch instruction

2. Register numbers  register file


– read registers

38
RISC-V Processor Implementation: Overview

39
RISC-V Processor Implementation: Instruction Execution

1. Program Counter (PC)  Instruction memory


– fetch instruction

2. Register numbers  register file


– read registers

3. Depending on instruction class, use ALU to calculate


– Arithmetic result
– Memory address for load/store
– Branch comparison

40
RISC-V Processor Implementation: Overview

41
RISC-V Processor Implementation: Instruction Execution

1. Program Counter (PC)  Instruction memory


– fetch instruction

2. Register numbers  register file


– read registers

3. Depending on instruction class, use ALU to calculate


– Arithmetic result
– Memory address for load/store
– Branch comparison

4. Access data memory for load/store

42
RISC-V Processor Implementation: Overview

43
RISC-V Processor Implementation: Instruction Execution

1. Program Counter (PC)  Instruction memory


– fetch instruction

2. Register numbers  register file


– read registers

3. Depending on instruction class, use ALU to calculate


– Arithmetic result
– Memory address for load/store
– Branch comparison

4. Access data memory for load/store

5. PC  target address or PC + 4

44
RISC-V Processor Implementation: Overview

45
RISC-V Processor Implementation: Multiplexers

46
RISC-V Processor Implementation: Control

47
RISC-V Processor Implementation: Overview

48
RISC-V Processor Implementation: Instruction Execution

1. Program Counter (PC)  Instruction memory


– fetch instruction

49
RISC-V Processor Implementation: Overview

50
RISC-V Processor Implementation: Instruction Execution

1. Program Counter (PC)  Instruction memory


– fetch instruction

2. Register numbers  register file


– read registers

51
RISC-V Processor Implementation: Overview

52
RISC-V Processor Implementation: Instruction Execution

1. Program Counter (PC)  Instruction memory


– fetch instruction

2. Register numbers  register file


– read registers

3. Depending on instruction class, use ALU to calculate


– Arithmetic result
– Memory address for load/store
– Branch comparison

53
RISC-V Processor Implementation: Overview

54
RISC-V Processor Implementation: Instruction Execution

1. Program Counter (PC)  Instruction memory


– fetch instruction

2. Register numbers  register file


– read registers

3. Depending on instruction class, use ALU to calculate


– Arithmetic result
– Memory address for load/store
– Branch comparison

4. Access data memory for load/store

55
RISC-V Processor Implementation: Instruction Execution

1. Program Counter (PC)  Instruction memory


– fetch instruction

2. Register numbers  register file


– read registers

3. Depending on instruction class, use ALU to calculate


– Arithmetic result
– Memory address for load/store
– Branch comparison

4. Access data memory for load/store

5. PC  target address or PC + 4

56
RISC-V Processor Implementation: Overview

57
RISC-V Processor Implementation: Overview

58
RISC-V Processor Implementation: Overview

• Datapath Design

• Control Lines and Control Unit

59
Building a Datapath

• Datapath
– Elements that process data and addresses in the CPU
– Examples: Registers, ALUs, mux’s, memories, …

• We will build a RISC-V datapath incrementally

60
Building a Datapath: Instruction Fetch

61
RISC-V R-format Instructions

funct7 rs2 rs1 funct3 rd opcode


7 bits 5 bits 5 bits 3 bits 5 bits 7 bits

• Instruction fields
– opcode: operation code
– rd: destination register number
– funct3: 3-bit function code (additional opcode)
– rs1: the first source register number
– rs2: the second source register number
– funct7: 7-bit function code (additional opcode)

62
Building a Datapath: R-Format Instructions

• Read two register operands


• Perform arithmetic/logical operation
• Write register result

63
RISC-V I-format Load Instruction

immediate rs1 funct3 rd opcode


12 bits 5 bits 3 bits 5 bits 7 bits

• Instruction fields
– opcode: operation code
– rd: destination register number
– funct3: 3-bit function code (additional opcode)
– rs1: source or base address register number
– Immediate: constant operand, or offset added to base address (2’s
complement, sign extended)

64
RISC-V S-format Store Instructions

imm[11:5] rs2 rs1 funct3 imm[4:0] opcode


7 bits 5 bits 5 bits 3 bits 5 bits 7 bits

• Instruction fields
– opcode: operation code
– funct3: 3-bit function code (additional opcode)
– rs1: base address register number
– rs2: source operand register number
– immediate: offset added to base address (Split so that rs1 and rs2 fields always
in the same place across various instruction formats)

65
Building a Datapath: Load-Store Instructions

• Read register operands

• Calculate address using 12-bit offset


– Use ALU, but sign-extend offset

• Load: Read memory and update register

• Store: Write register value to memory

66
Building a Datapath: Composing the Elements

• R-Type + Load-Store Datapath

67
Conditional Branches

• Branch to a labeled instruction if a condition is true


– Otherwise, continue sequentially

• beq rs1, rs2, L1


– if (rs1 == rs2) branch to instruction labeled L1

• bne rs1, rs2, L1


– if (rs1 != rs2) branch to instruction labeled L1

68
Branch If Equal (beq)

• beq rs1, rs2, L1


– if (rs1 == rs2) branch to instruction labeled L1

– beq rs1, rs2, L1  beq rs1, rs2, offset

– To calculate Branch Target Address, PC acts as the base address while a 12-bit
offset field is available in the instruction.

– The offset field is left-shifted 1-bit before being added to the PC to calculate the
branch target address.

– Why? The offset to the target instruction is supposed to represent the number
of instructions (not the number of bytes) that we need to jump. If we restrict
ourselves to 32-bit instructions (as in this course), then we need to shift by 2.
However, shift by 1 is there to support the extension to RISC-V ISA that contains
16-bit compressed instructions.
69
Branch If Equal (beq)

beq rs1, rs2, offset


• Read register operands

• Compare operands
– Use ALU, subtract and check
Zero output

• Calculate target address

• Sign-extend displacement

• Shift left 1 place (halfword


displacement)

• Add to PC value
70
Building a Datapath: Composing the Elements

• Full Datapath

71
RISC-V Processor Implementation: Overview

• Datapath Design (Continued)

• Control Lines and Control Unit

72
Arithmetic Logic Unit (ALU)

ALU Control

• We can think of the combination of the 1-bit Ainvert


line, the 1-bit Bnegate line, and the 2-bit Operation
lines as 4-bit control lines for the ALU, telling it to
perform add, subtract, AND, OR, NOR, o set less than.

73
Arithmetic Logic Unit (ALU)

Symbol for a Complete ALU

74
Building a Datapath: Composing the Elements

• Full Datapath

75
ALU Usage for Different Instructions

• Load-Store Instructions
– ALU used for addition

• Branch If Equal (beq) Instruction


– ALU used for subtraction

• R-type Instruction
– ALU usage depends on opcode

76
Generating the ALU Control Input

• We can generate the 4-bit ALU


control input using a small
control unit (ALUControl)

• Inputs to ALU Control


– funct7 and funct3 fields of the
instruction and a 2-bit control
field, which we call ALUOp

• Output of ALU Control


– 4-bit ALU control signal

77
Datapath with Control

78
Generating the ALU Control Input

• We can generate the 4-bit ALU


control input using a small
control unit (ALUControl)

• Inputs to ALU Control


– funct7 and funct3 fields of the
instruction and a 2-bit control
field, which we call ALUOp

• Output of ALU Control


– 4-bit ALU control signal

79
Generating the ALU Control Input
Truth Table
• We can generate the 4-bit ALU
control input using a small
control unit (ALUControl)

• Inputs to ALU Control


– funct7 and funct3 fields of the
instruction and a 2-bit control
field, which we call ALUOp

• Output of ALU Control


– 4-bit ALU control signal

80
Control Signals Other than ALU

81
Main Control Unit

82
Datapath with Control: R-Type Instruction

83
Datapath with Control: Load Instruction

84
Datapath with Control: Beq Instruction

85
Implementation of Main Control Unit

• Control signals are derived


from binary encoded
instructions.

• The Main Control Unit can set


all control signals, except
PCSrc, based solely on the
opcode and funct fields of the
instruction.

• To generate PCSrc signal,


Branch signal from the Main
Control Unit is ANDed with
the Zero signal from the ALU.

86
Implementation of Main Control Unit

Truth Table
• Control signals are derived
from binary encoded
instructions.

• The Main Control Unit can set


all control signals, except
PCSrc, based solely on the
opcode field of the
instruction.

• To generate PCSrc signal,


Branch signal from the Main
Control Unit is ANDed with
the Zero signal from the ALU.

87
Datapath with Control

88
RISC-V Instruction Execution Steps

• RISC-V instructions classically take five steps:


1. IF: Fetch instruction from memory.
2. ID: Decode instruction and read registers
3. EX: Execute the operation or calculate an address.
4. MEM: Access an operand in data memory (if necessary).
5. WB: Write the result back into a register (if necessary).

89
Steps in RISC-V Instruction Execution

90
Review of Logic Design

• 2 Types of Logic Systems

0-1 0-1
Combinational
Logic System
0-1 0-1

The output of a combinational logic system depends only on the


current input.

Sequential Logic
0-1 0-1
System

0-1 Memory 0-1

The output of a sequential logic system can depend both on


the inputs and internal memory.

91
Combinational Logic Elements: Example

A A
Y + Y
B
B

A
I0 M
u Y ALU Y
I1 x
B
S F

92
Sequential Logic (State) Elements: Example

• Register: stores data in a circuit


– Uses a clock signal to determine when to update the stored value
– Edge-triggered: update when Clk changes from 0 to 1

D Q Clk
D
Clk
Q

93
Sequential Logic (State) Elements: Example

• Register with Write Control:


– Only updates on clock edge when write control input is 1
– Used when stored value is required later

Clk
D Q
Write
Write
D
Clk
Q

94
Combination of Combinational and Sequential Logic
Elements

• Because only state elements can store a data value, any collection of
combinational logic must have its inputs come from a set of state
elements and its outputs written into a set of state elements

• When two state elements surround a block of combinational logic


– All signals must propagate from state element 1, through the combinational logic, and
to state element 2 in the time of one clock cycle.
– The time necessary for the signals to reach state element 2 defines the length of the
clock cycle

95
Datapath with Control

96
Steps in RISC-V Instruction Execution

97
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

98
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

Clock Cycle: 1 99
Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

Clock Cycle: 2 100


Steps in RISC-V Instruction Execution

ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)

Clock Cycle: 3 101


Processor Implementation

• Classical performance equation:

• Instruction Count
– Determined by Compiler and Instruction Set Architecture
• CPI and Clock cycle time
– Determined by Processor Implementation

102
Performance Issues of Single Cycle Processor Design

• CPI is 1.

• However, clock cycle time is determined by the longest possible path/delay


in the data path.
– Critical Path: Load Instruction
– Load instruction uses five functional units in series: Instruction memory  register file 
ALU  data memory  register file

103
Steps in RISC-V Instruction Execution

200 ps 100 ps 200 ps 200 ps 100 ps

104
RISC-V Performance: Single Cycle

• Assume that the operation times for the major functional units of data
path are :
– 100ps for register read or write
– 200 ps for memory access for instructions or data,
– 200 ps for ALU operation

Single-Cycle Implementation
• Clock cycle time must support the
longest instruction (i.e. ld)
• Clock Cycle Time : 800 ps

105
Steps in RISC-V Instruction Execution

200 ps 100 ps 200 ps 200 ps 100 ps

106
Clock Cycle = 800
RISC-V Performance: Single Cycle

Single-cycle

107
Performance Issues of Single Cycle Processor Design

• CPI is 1.

• However, clock cycle time is determined by the longest possible path/delay


in the data path.
– Critical Path: Load Instruction
– Load instruction uses five functional units in series: Instruction memory  register file 
ALU  data memory  register file

• Although CPI is 1, the overall performance of a signal cycle processor


implementation is poor because the clock cycle is too long.

108
Performance Issues of Single Cycle Processor Design

• CPI is 1.

• However, clock cycle time is determined by the longest possible path/delay


in the data path.
– Critical Path: Load Instruction
– Load instruction uses five functional units in series: Instruction memory  register file 
ALU  data memory  register file

• Although CPI is 1, the overall performance of a single cycle processor


implementation is poor because the clock cycle is too long.

• We can improve performance by Pipelining

109

You might also like