Intro To Prog.-3
Intro To Prog.-3
The Von Neumann Architecture, named after the mathematician John von Neumann, is the foundation
for most modern computers. It describes a system where a Central Processing Unit (CPU), Memory,
and Input/Output (I/O) devices are interconnected, and the same memory is used for both data and
program instructions. Let us break down each component and their roles, including the important
registers that handle data flow within the CPU.
1. Memory (RAM)
2. Central Processing Unit (CPU)
3. Input/Output (I/O)
1.1 Memory (RAM)
The memory is used to store both program instructions and data. It can be thought of as a collection of
cells where each cell has a unique address. Memory is essential for holding the data that the CPU
processes, as well as the instructions telling the CPU what to do.
● Key Features
○ Stores program instructions and data in the same space.
○ Data is accessed using addresses.
○ Works as short-term storage (RAM).
The CPU is the brain of the computer that executes instructions and processes data. It consists of—
Registers are small, fast storage locations inside the CPU that temporarily hold data and instructions.
They are critical for the CPU’s operation, allowing it to perform tasks like arithmetic operations, data
transfer, and instruction execution quickly, as registers are much faster than accessing data from main
memory (RAM).
1. Small Size- Registers typically store only a small amount of data (e.g., 8, 16, 32, or 64 bits).
2. High Speed- Registers operate at the CPU’s clock speed, making them the fastest memory type
available in a computer system.
3. Dedicated Functions- Different types of registers have specific roles in processing data and
instructions.
Types of Registers
1. Fetch- The CPU retrieves an instruction or data from memory and stores it in a register.
2. Execute- The CPU processes data held in the registers (e.g., performing an addition in the
Arithmetic Logic Unit, or ALU).
3. Store- The result of the operation is often stored back in a register for further use or sent to
memory.
The I/O devices are responsible for interacting with the external world. These include devices like
keyboards, monitors, printers, and hard drives. The I/O system allows data to be entered into or extracted
from the computer.
The flow of data and instructions through the system is critical to understanding how the architecture
operates. The Von Neumann Cycle explains the step-by-step process the CPU follows to execute
instructions. Also observe fig. 2 to get it well.
1. Fetch
○ The PC holds the address of the next instruction.
○ The CU sends this address to the MAR, which accesses the memory location.
○ The instruction at that address is retrieved by the MDR and then stored in the IR.
2. Decode
○ The CU decodes the instruction in the IR to determine what operation needs to be
performed.
○ It identifies the type of operation (e.g., addition, data transfer) and the necessary operands
(data) for execution.
3. Execute
○ The instruction is executed by the ALU, performing the required operation (e.g.,
arithmetic, logical operations).
○ Any result from this operation is stored in the Accumulator or written back to memory.
4. Store (optional)
○ If the instruction involves writing data to memory, the MAR will hold the address, and
the MDR will store the data that needs to be written.
5. Update
○ The PC is updated to point to the next instruction in sequence, and the cycle repeats.
The Control Unit is the orchestrator of all operations in the CPU. It directs the flow of data between the
CPU, memory, and I/O devices. The CU does not perform calculations but ensures that the right data is in
the right place at the right time.
● Key Functions
○ Fetching the next instruction from memory (using the PC, MAR, and MDR).
○ Decoding the instruction and directing the ALU and registers accordingly.
○ Controlling the timing of operations to ensure that each step of the cycle happens in the
correct order.
A bus is a communication pathway for transferring data and signals between the components of the
computer. There are two main types of buses:
1. Address Bus
○ Carries the address of the memory location where data needs to be read from or written
to.
○ Unidirectional, meaning it only travels from the CPU to memory or I/O devices.
2. Data Bus
○ Transfers the actual data between memory, CPU, and I/O devices.
○ Bidirectional, meaning data can travel both to and from the CPU.
Data flow- The arrows represent the flow of data between the CPU, memory, and I/O devices via the
buses.
Control Unit- Directs the operations, fetching instructions, and managing the flow of data.
Fig. 2 : Work-flow of the Von Neumann Cycle
1. Fetching an instruction from memory using the PC, MAR, and MDR.
2. Decoding the instruction in the IR.
3. Executing the instruction using the ALU and storing the result in the ACC or back in memory.
4. The PC is updated to fetch the next instruction, repeating the cycle.
The Von Neumann Architecture is fundamental to modern computing systems. Understanding how
memory, the CPU, registers, and buses work together helps in mastering not only computer architecture
but also low-level programming in languages like C. You can see the next page diagram to feel the
framework.
+---------------------+
| Input |
+----------+----------+
|
v
+--------------------------------------------+
| Central Processing Unit (CPU) |
| +-----------------------------------+ |
| | Control Unit (CU) | |
| +-----------------------------------+ |
| | Program Counter (PC) | |
| | Instruction Register (IR) | |
| | Memory Address Register (MAR) | |
| | Memory Data Register (MDR) | |
| | Accumulator (ACC) | |
| +-----------------------------------+ |
| Arithmetic Logic Unit (ALU) |
+--------------------------------------------+
|
v
+-------------------------+
| Memory |
+-------------------------+
|
v
+----------------------+
| Output |
+----------------------+
The Memory Address Register (MAR) is critical in the Von Neumann architecture, as it—
1. Holds the memory address of the data or instruction that the CPU needs to access. This could be
an address to fetch data from, store data to, or fetch an instruction from.
2. Communicates with the memory unit by sending the address of the required data/instruction.
3. Works together with the Memory Data Register (MDR) to complete memory read and write
operations–
○ The MAR specifies where in memory to look or place data.
○ The MDR contains the actual data to be fetched or stored at that memory location.
In terms of the fetch-decode-execute cycle, the MAR plays a key role during the fetch and store stages;
● Fetch– The MAR holds the address of the instruction to be fetched from memory.
● Store– During data storage, the MAR holds the address in memory where data from the
Accumulator (ACC) or another register should be written.
8.2 Summary
● MDR holds data being transferred between memory and the CPU.
● It works alongside the MAR, where MAR holds the address, and MDR holds the data for that
address.
The Instruction Register (IR) is a special register in the CPU that holds the current instruction being
executed. Once an instruction is fetched from memory, it is stored in the IR for decoding and execution by
the control unit.
1. Instruction Storage
○ The IR temporarily holds the instruction that has been fetched from memory. This allows
the control unit to interpret the instruction and issue the necessary control signals for
execution.
2. Instruction Decoding
○ The instruction in the IR is decoded by the control unit to determine what operation the
CPU should perform (e.g., addition, subtraction, memory read/write).
3. Control Signal Generation
○ Based on the instruction in the IR, the control unit generates control signals to direct
other parts of the CPU (such as the Arithmetic Logic Unit or memory) on how to execute
the instruction.
The Instruction Register holds and decodes the current instruction, playing a key role in the
fetch-decode-execute cycle of the CPU. It ensures the CPU knows which operation to perform next.
The Accumulator (ACC) is a special-purpose register in the CPU used to store intermediate results of
arithmetic and logic operations performed by the Arithmetic Logic Unit (ALU).
1. Temporary Storage: It holds the result of an operation temporarily before it is either used in
further calculations or stored in memory.
2. ALU Operations: The ALU typically performs operations like addition, subtraction, and bitwise
operations using the value stored in the accumulator.
3. Efficient Data Processing: Using the accumulator allows the CPU to process data more quickly,
reducing the need to repeatedly store and fetch data from memory.
Example
When the CPU adds two numbers, one number is often placed in the accumulator, and the result of the
addition is stored back in the accumulator.
In summary, the ACC is crucial for optimizing CPU performance by minimizing data transfer between
the CPU and memory during computation.
The Arithmetic Logic Unit (ALU) is a critical component of the Central Processing Unit (CPU). It is
responsible for performing all arithmetic and logical operations in a computer. Here is a breakdown of its
functions—
1. Arithmetic Operations
○ Addition (+), Subtraction (-)
○ Multiplication (*), Division (/)
○ It also handles operations like incrementing and decrementing.
2. Logical Operations
○ AND, OR, NOT, XOR– These are bitwise operations.
○ Comparison operations like equal to (==), greater than (>), less than (<).
3. Shifting Operations
○ Shift Left and Shift Right: Used to move bits in a number left or right, which can also be
used for multiplication / division by powers of 2.
4. Flags
○ The ALU often sets or clears flags (special bits) in the status register based on the result
of an operation.
○ The flag register is a special register in the CPU that contains status flags set by the
ALU based on the outcome of operations.
○ Common flags include:
■ Zero Flag (Z)- Set when the result is zero.
■ Carry Flag (C)- Set when an operation results in a carry out of the most
significant bit.
■ Overflow Flag (O)- Set when an arithmetic overflow occurs.
■ Sign Flag (S)- Set if the result is negative.
11.2 Role of ALU in the CPU
The CPU instruction cycle (also known as the fetch-decode-execute cycle) is the process by which a
CPU retrieves, interprets, and executes instructions from memory. It consists of three main steps, but
sometimes includes a fourth step called store. Here is an overview of each stage—
● The Program Counter (PC) contains the address of the next instruction.
● The CPU sends this address to memory using the MAR (Memory Address Register).
● The instruction at that memory location is retrieved (fetched) and loaded into the Instruction
Register (IR).
● The PC is incremented to point to the next instruction.
● The result of the instruction (if any) is written back to memory or a register.
● This stage is not always considered separate but is part of the overall execution.
13. Diagram of the CPU Instruction Cycle
+-----------------------------------+
| Start of Cycle |
+-----------------------------------+
|
v
+-----------------------+
| 1. Fetch Instruction |
+-----------------------+
|
v
+-----------------------+
| 2. Decode Instruction |
+-----------------------+
|
v
+-----------------------+
|3. Execute Instruction |
+-----------------------+
|
v
+-----------------------+
|4. Store (if necessary)|
+-----------------------+
|
v
+-----------------------+
| PC points to next |
| instruction(PC++) |
+-----------------------+
|
v
+-----------------------------------+
| Next Instruction |
+-----------------------------------+
The Control Unit (CU) is a fundamental part of the Central Processing Unit (CPU) that manages the
execution of instructions. It coordinates the operations of the CPU, directing data flow between the CPU
and other components like memory, the Arithmetic Logic Unit (ALU), and input/output devices. The CU
ensures that the CPU carries out instructions in the correct sequence and timing.
1. Instruction Fetching
○ The Control Unit retrieves (fetches) instructions from the memory, typically using the
Program Counter (PC) to get the address of the next instruction to execute.
2. Instruction Decoding
○ After fetching, the CU decodes the instruction to understand what operation needs to be
performed (e.g., arithmetic, logic, control operations).
3. Generating Control Signals
○ The Control Unit generates a series of control signals that direct various parts of the CPU
and peripheral devices. These signals dictate the actions to be performed at each step of
the instruction cycle
■ Directing the ALU for arithmetic/logic operations.
■ Telling memory whether to read or write data.
■ Managing data transfers between registers and memory.
■ Controlling the data bus and address bus.
4. Coordinating Data Flow
○ The Control Unit ensures data is transferred between memory, registers, and the ALU in
the correct order. For example:
■ It sends signals to fetch operands from registers or memory before passing them
to the ALU for processing.
■ After execution, it ensures the result is stored back into a register or memory.
5. Instruction Sequencing
○ The Control Unit manages the sequence of execution for multiple instructions by
incrementing the Program Counter (PC) and handling jumps, loops, or conditional
instructions.
Timing signals help synchronize the operations in the CPU, ensuring that each action happens in the
correct sequence and at the right moment. This is controlled by the system clock.
The instruction cycle (also known as the fetch-decode-execute cycle) involves the following steps,
coordinated by the control unit:
1. Fetch
○ The Control Unit retrieves the next instruction from memory (pointed to by the Program
Counter).
○ The address of the instruction is loaded into the Memory Address Register (MAR), and
the instruction itself is fetched into the Instruction Register (IR).
2. Decode
○ The Control Unit decodes the instruction to determine what operation is required.
○ The necessary control signals are generated based on the decoded instruction.
3. Execute
○ The CU directs the ALU or other components to execute the instruction. This might
involve reading/writing from memory, performing arithmetic, or branching to a new
address.
○ Any result is placed in the appropriate register or sent to memory.
4. Store
○ If necessary, the result of the operation is stored in memory or a register.
○ The control unit updates the Program Counter (PC) to point to the next instruction and
repeats the cycle.
For a simple addition operation, like A = B + C, the control unit performs the following—-
1. Fetch
○ The instruction ADD A, B, C is fetched from memory.
2. Decode
○ The CU decodes the instruction to understand it involves adding the values in registers B
and C, and storing the result in A.
3. Execute
○ The CU sends control signals to the ALU to add the values in B and C.
○ It also ensures the data is routed to the ALU through the data bus.
4. Store
○ The result of the addition is stored in register A.
○ The CU updates the Program Counter to point to the next instruction in memory.
Overall gist
● The Control Unit (CU) is the "brain" of the CPU that orchestrates the execution of instructions
by controlling data flow, generating control signals, and managing the instruction cycle.
● It is responsible for fetching, decoding, and executing instructions, ensuring that the CPU
components work together in harmony.
● By coordinating the ALU, registers, memory, and I/O devices, the control unit ensures the
computer can execute programs correctly and efficiently.
The Program Counter (PC) is a crucial register in the CPU that keeps track of the next instruction to be
executed in a program. It stores the memory address of the next instruction that the processor should
fetch, decode, and execute.
1. Instruction Sequencing
○ The PC ensures the CPU executes instructions in the correct sequence by pointing to the
location of the next instruction in memory.
2. Automatic Increment
○ After fetching an instruction, the PC is automatically incremented to point to the address
of the following instruction, unless a jump or branch occurs.
3. Handling Jumps/Branches
○ In case of jumps, branches, or function calls, the PC is updated with the new target
address to change the flow of control.
The Program Counter directs the flow of execution by tracking the address of the next instruction,
ensuring the CPU executes programs in the correct order.
1. Interrupt Signal- The signal that triggers the interrupt, which can come from hardware devices
(e.g., keyboard input, mouse movement) or software (e.g., system calls).
2. Interrupt Service Routine (ISR)- A special function that is executed when an interrupt occurs.
The CPU transfers control to the ISR to handle the specific event that triggered the interrupt.
3. Interrupt Vector Table- A table of memory addresses pointing to the ISRs. The CPU uses this
table to find the appropriate ISR based on the interrupt signal.
4. Saving the Context- Before executing the ISR, the CPU saves the current state (context) of the
program, such as register values and the Program Counter (PC), so it can resume normal
execution after handling the interrupt.
1. Hardware Interrupts
○ Triggered by external hardware devices, such as a keyboard, mouse, or network card.
○ Examples: A key press on a keyboard or a signal from a timer.
2. Software Interrupts
○ Triggered by software, often through a system call or instruction. It is used for requesting
system services.
○ Example: Division by zero error or requesting I/O operation.
3. Exceptions (Traps)
○ These are internally generated by the CPU when an error or exceptional condition occurs
(e.g., illegal operation, divide by zero).
CPU interrupts allow the processor to respond immediately to critical tasks, making multitasking and
handling I/O operations more efficient. They ensure the CPU can prioritize high-importance tasks, such as
responding to hardware or system requests, without losing track of ongoing processes.
22. Address and Data Bus: Their Substantial Role in a Computer System
In a computer system, the address bus and data bus are critical components that facilitate
communication between the CPU and other hardware components like memory and I/O devices. They
work together to ensure that data is correctly transferred and processed by the CPU, contributing to the
seamless execution of instructions.
What is a Bus?
A bus is a collection of wires or communication lines that allow data to be transferred between different
parts of a computer. It serves as the "communication highway" for signals and data. There are several
types of buses in a computer system, but the address bus and data bus are two key ones that enable CPU
communication with memory and peripheral devices.
The address bus is responsible for carrying memory addresses from the CPU to other components like
memory (RAM) or I/O devices. The CPU uses the address bus to specify the location where data should
be read from or written to.
● Unidirectional– The address bus typically flows in one direction—from the CPU to memory or
I/O devices. The CPU sends the address to identify the location of data, but the data does not
travel back along the address bus.
● Size- The size of the address bus (in bits) determines how much memory the system can address.
For example-
○ An 8-bit address bus can address 2^8 = 256 memory locations.
○ A 16-bit address bus can address 2^16 = 65,536 memory locations (64KB).
○ A 32-bit address bus can address 2^32 = 4,294,967,296 memory locations (4GB).
● Role in Memory Access–
○ When the CPU wants to access data, it places the memory address of the required data
onto the address bus. The memory unit or I/O device then listens to the bus and responds
to that specific address by reading from or writing to it.
The data bus carries the actual data being transferred between the CPU, memory, and I/O devices. Once
the CPU has identified the location of the data (using the address bus), the data bus is used to transfer the
information between the CPU and the specified location.
● Once the CPU sends the address to the memory or I/O device via the address bus, the data bus is
used to either retrieve data from that address or send data to be written to that address.
● For instance, in a read operation, the CPU places an address on the address bus, and the memory
responds by placing the requested data on the data bus, which is then received by the CPU.
● In a write operation, the CPU sends the data it wants to store onto the data bus, and the memory
or I/O device writes this data to the specified location.
The address bus and data bus work in tandem to transfer data between the CPU and other components.
Here's how they collaborate in a typical memory access cycle—
Step-by-Step Process
● Read/Write Signals- Indicate whether the operation is a read (retrieving data) or a write (storing
data).
● Clock Signals- Synchronize the timing of data transfers.
● Enable Signals- Ensure only the correct device is active at any given time to avoid conflicts on
the buses.
Though address and data buses are more of a hardware concept, certain C programs demonstrate the idea
of memory access and data transfer—-
#include <stdio.h>
int main() {
int data = 10; // Data to store in memory
int *ptr; // Pointer to simulate address bus
ptr = &data; // 'ptr' holds the address of 'data' (like an
address bus)
printf("Address of data: %p\n", ptr); // Address bus action
printf("Value at the address: %d\n", *ptr); // Data bus action
(read data)
In this example–
● ptr = &data simulates the address bus by holding the address of data.
● *ptr = 20 simulates the data bus, writing the value 20 to the memory location.
● Reading and printing the value via *ptr shows how the data bus retrieves the data stored at the
memory address.
22.6 Address Bus and Data Bus in Modern Computers
In modern computing—
● Address bus width- Modern systems typically have 32-bit or 64-bit address buses, allowing
access to large amounts of memory (4GB for 32-bit, 18.4 exabytes for 64-bit).
● Data bus width- With advances in technology, modern CPUs often have data buses that can
handle 64 bits or even more, improving the speed and efficiency of data transfer.
Summary
● The address bus carries memory addresses from the CPU to memory and I/O devices to specify
where data should be read from or written to.
● The data bus transfers the actual data between the CPU, memory, and I/O devices. It is
bidirectional, allowing data to flow both to and from the CPU.
● Together, the address and data buses enable the smooth transfer of data and instructions in a
computer system, forming the backbone of CPU-memory communication.
Understanding the address bus and data bus is essential to grasp how computers handle data storage and
retrieval, making them crucial elements in both hardware and low-level programming concepts.
The system clock generates a series of regular electrical pulses, known as clock cycles, which act as a
timing signal to synchronize the operations of all components connected to the buses. Each component on
the bus (CPU, memory, I/O devices) operates in sync with the clock. The clock defines when each action
occurs, like reading data or transferring addresses.
● Clock Cycles- Each operation (like fetching, decoding, reading/writing data) typically requires
one or more clock cycles.
● Frequency- The speed of the clock is measured in hertz (Hz), typically megahertz (MHz) or
gigahertz (GHz), and it dictates how fast operations can be synchronized. Higher clock speeds
mean faster synchronization.
A timing diagram helps visualize how synchronization happens. For example, during a read operation–
1. The address bus is loaded with a memory address during a specific clock cycle.
2. The memory responds with the data and places it on the data bus in sync with the clock.
3. The data is transferred to the CPU within the clock cycle window.
● Handshaking- Devices may use a method called handshaking, where one device sends a signal to
indicate readiness (e.g., for a data transfer), and another device responds when it is ready to
proceed. This ensures that data is only transferred when both devices are synchronized.
● Bus Arbitration- In complex systems with multiple devices using the same bus, bus arbitration
ensures that only one device can use the bus at a time. A bus controller or arbiter prioritizes which
device gets to control the bus, based on the clock timing.
● Synchronous Buses- These buses operate strictly according to the system clock. All data
transfers happen at specific clock cycles. This ensures predictable and regular data transfer but
limits flexibility.
● Asynchronous Buses- These buses do not rely on a common clock and instead use control
signals for timing. Each device signals when it is ready for a data transfer, providing more
flexibility but often resulting in slower operations.
Summary
● Synchronization is achieved through the system clock, which ensures that all data transfers on
the buses happen at regular, predictable intervals.
● Control signals generated by the CPU manage when devices are allowed to communicate on the
bus.
● Bus arbitration ensures that multiple devices can share the bus without conflicts.
1. Bus Arbitration
○ In systems where multiple devices share the same bus, arbitration ensures that only one
device can use the bus at a time. Bus protocols define how a device requests access to the
bus and how the arbiter decides which device gets access.
○ Types
■ Centralized Arbitration- A central controller (bus arbiter) manages bus access.
■ Distributed Arbitration- Devices negotiate among themselves to determine bus
access.
2. Data Transfer
○ Bus protocols define how data is transferred between the components. This includes the
format of the data, how it is placed on the data bus, and how the receiving device knows
that valid data is available.
○ Data Width- Specifies how many bits of data are transferred in one operation (e.g.,
32-bit, 64-bit).
○ Addressing- The protocol outlines how memory addresses are placed on the address bus
and how the target device identifies that the data is meant for it.
3. Handshaking
○ Handshaking is the process by which two devices communicate to ensure that both are
ready for data transfer. It prevents data loss by ensuring that the sender doesn’t transmit
data until the receiver is ready, and vice versa.
○ Common signals include acknowledge (ACK) and ready.
4. Timing
○ Bus protocols define the timing of signals on the bus. This includes when devices can
place data on the bus, how long signals must be held, and when the next operation can
occur.
○ Synchronous Protocols- Timing is controlled by the system clock, and all operations
occur in sync with clock cycles.
○ Asynchronous Protocols- Devices use control signals to coordinate data transfer
independently of the clock.
5. Error Detection
○ Some bus protocols include error-checking mechanisms to detect data transmission
errors. This can include methods like parity bits or checksums to ensure that data is
transmitted accurately.
6. Bus Transactions
○ A bus transaction refers to the complete process of transferring data between devices,
including:
■ Request Phase- A device requests to initiate communication on the bus.
■ Address Phase- The requesting device places the address of the target device or
memory on the address bus.
■ Data Phase- Data is transmitted between the devices.
■ Completion Phase- The transaction is completed, and the bus is released for the
next transaction.
● USB (Universal Serial Bus) is a widely used serial bus protocol for connecting external devices
(keyboards, mice, storage devices, etc.) to a computer.
○ Data Transfer- USB uses a serial method to send data in packets.
○ Handshaking- It includes a request and acknowledgement phase to ensure that the host
and device are ready for communication.
○ Speed- USB standards define different data transfer rates (e.g., USB 2.0, USB 3.0).
Summary
Bus protocols are essential for managing how data is transmitted over the bus, ensuring synchronization,
efficiency, and error-free communication between the CPU, memory, and I/O devices. Different bus
protocols cater to different requirements, like speed, reliability, and complexity.
Bus arbitration is the process of managing and controlling access to a shared bus in a computer system.
In systems where multiple devices (CPU, memory, I/O devices) share a common bus, bus arbitration
ensures that only one device can use the bus at a time, preventing conflicts.
● In computer systems, multiple devices often need to communicate over the same bus to perform
operations like reading from memory or writing data to an I/O device. Without arbitration,
devices could try to use the bus simultaneously, leading to collisions and incorrect data transfer.
1. Centralized Arbitration
○ A single bus arbiter (controller) is responsible for managing which device can access the
bus. The arbiter receives requests from devices and grants access based on predefined
rules (e.g., priority).
○ Example- In the PCI bus, the arbiter controls which device (e.g., CPU, network card)
can access the bus.
2. Distributed Arbitration
○ In this method, there is no central arbiter. Devices negotiate among themselves to decide
who gets to use the bus. They follow a predefined protocol, often involving priority levels
or a round-robin system.
1. Daisy Chaining
○ Devices are connected in a chain. The highest-priority device is at the start of the chain,
and it gets first access to the bus. If it doesn't need the bus, the request passes down the
chain.
○ Simple but can lead to starvation for lower-priority devices.
2. Polling
○ The arbiter polls each device in sequence to check if it needs to use the bus. The arbiter
then grants the bus to the requesting device.
○ Advantage- Prevents starvation. Disadvantage: Can be slow if many devices are polled
before a request is found.
3. Priority Arbitration
○ Devices are assigned priority levels. When multiple devices request the bus, the arbiter
grants access to the highest-priority device.
○ Advantage- High-priority tasks are processed first.
○ Disadvantage- Lower-priority devices may experience delays.
4. Round-Robin Arbitration
○ The arbiter grants bus access to devices in a circular order. Each device gets its turn,
ensuring fairness among devices.
○ Advantage- Ensures that no device gets unfairly delayed.
○ Disadvantage- May not prioritize urgent tasks.
1. Bus Request- A device sends a request to the arbiter, indicating it needs the bus.
2. Arbitration- The arbiter selects one device based on the arbitration technique being used (e.g.,
priority, round-robin).
3. Bus Grant- The arbiter grants control of the bus to the selected device.
4. Data Transfer- The device uses the bus to transfer data.
5. Bus Release- Once the transfer is complete, the device releases control of the bus.
Summary
● Registers are fast, small storage locations within the CPU used to hold data and instructions
during processing.
● Bus arbitration manages access to a shared bus, ensuring that multiple devices can communicate
without conflict, using methods like centralized or distributed arbitration, and techniques such as
priority, round-robin, polling, or daisy chaining.
Bus timing refers to the synchronization of data transfer over the communication pathways (buses) in a
computer system, ensuring that data is correctly sent and received between various components (like the
CPU, memory, and I/O devices). Proper timing is crucial for maintaining data integrity and system
performance.
1. Bus Cycle
○ A bus cycle is a complete sequence of operations that allows data to be transferred
between devices. It typically consists of several phases, including address setup, data
transfer, and acknowledgment.
○ The duration of a bus cycle is determined by the slowest device involved in the transfer.
2. Timing Signals
○ Timing signals control the operations of the bus. These signals dictate when data can be
placed on the bus, when it can be read, and when operations can commence.
○ Common signals include Clock signals, which provide a periodic pulse that synchronizes
all components, and Control signals, which manage the direction and type of data
transfer.
3. Setup and Hold Time
○ Setup Time- The minimum time before the clock edge that data must be stable on the
bus for the receiving device to latch it correctly.
○ Hold Time- The minimum time after the clock edge that data must remain stable on the
bus to ensure proper reading by the receiving device.
4. Bus Arbitration
○ When multiple devices want to use the bus simultaneously, arbitration determines which
device gets access to the bus.
○ Timing in arbitration ensures that devices do not interfere with each other, providing
orderly access to the bus.
5. Data Transfer Timing
○ Timing impacts the speed of data transfers. Faster bus speeds can lead to shorter cycles
and increased throughput, while slower speeds may create bottlenecks in data processing.
○ Various factors, including bus width (number of bits transferred in parallel), clock
frequency, and signal integrity, affect transfer timing.
● Increased Throughput- More instructions can be executed in parallel, increasing overall CPU
throughput and performance.
● Reduced Stalls- By executing independent instructions immediately, out-of-order execution
reduces the stalls caused by data dependencies.
● Better Resource Utilization- The CPU can utilize its execution units more efficiently by
working on the next available instructions instead of waiting for specific ones.
26.3 Challenges
Summary
Bus timing is crucial for ensuring that data transfers over buses are synchronized and efficient, involving
various aspects like bus cycles, timing signals, and arbitration. Out-of-order execution is a powerful
technique that allows CPUs to execute instructions based on resource availability rather than strict
program order, leading to increased throughput and better resource utilization. Both concepts play
significant roles in the performance of modern computer systems.
26.4 Register Renaming
Register renaming is a technique used in modern CPUs to avoid conflicts that arise from multiple
instructions trying to use the same register, leading to false dependencies (also called name
dependencies). It allows the CPU to execute more instructions in parallel, improving performance by
resolving these conflicts dynamically during execution.
In a CPU, there is a limited number of physical registers available. Multiple instructions may need to use
the same registers, creating a problem when an instruction has to wait for another to finish using a
register, even though the values are unrelated. This can cause stalls and limit instruction-level parallelism.
1. Write-after-Write (WAW) Hazard: Occurs when two instructions write to the same register, but
the second write must wait for the first one to complete.
2. Write-after-Read (WAR) Hazard: Occurs when an instruction writes to a register before an
earlier instruction reads from it.
Instead of having multiple instructions share the same physical registers, register renaming maps logical
registers (used in the program code) to a larger pool of physical registers (used by the hardware). This
removes the artificial dependency between instructions, allowing them to execute independently and in
parallel.
Procedure
1. When an instruction is decoded, the CPU checks if the register specified in the instruction is
already in use by another instruction.
2. If the register is in use, the CPU assigns a new physical register to the instruction, effectively
renaming the register.
3. This allows subsequent instructions to execute without waiting for the original register to become
available.
Example
1. a = b + 2; // Writes to R1
1. a = b + 2; // R1 (renamed to P1)
2. c = d + 4; // R1 (renamed to P2)
Now, both instructions can execute in parallel because they are using different physical registers, even
though they were originally targeting the same logical register.
● Eliminates false dependencies- Instructions that have no true data dependencies can execute
simultaneously without waiting for shared registers.
● Improves parallelism- Multiple instructions can be processed at the same time (out-of-order
execution), enhancing overall CPU throughput.
● Reduces pipeline stalls- By avoiding name conflicts, register renaming helps to minimize stalls
and improve the efficiency of the instruction pipeline.
● Reorder Buffer (ROB): Tracks instructions and assigns physical registers to logical ones.
● Physical Register File: A large pool of registers where renaming maps logical registers to
different physical registers.
Summary
Register renaming resolves false dependencies caused by different instructions trying to use the same
registers. By mapping logical registers to a larger set of physical registers, the CPU can allow multiple
instructions to execute in parallel, reducing pipeline stalls and improving instruction-level parallelism. It
is a key feature in modern superscalar and out-of-order execution architectures.
ALU (Arithmetic Logic Unit) flags are special bits in the CPU's status register (often called the flag
register) that provide information about the result of an arithmetic or logical operation. These flags are
used by the CPU to make decisions, such as branching, based on the outcomes of operations.
● When performing arithmetic operations, flags are used to branch or make decisions in programs–
○ Zero flag is used in loops or conditional statements to check if an operation resulted in
zero.
○ Carry flag is important in multi-byte arithmetic operations where the result of one
operation carries over to the next byte.
ALU flags provide essential information about the outcome of arithmetic and logic operations, helping
the CPU handle tasks like conditional branching or multi-byte operations.
27.4 ALU Overflow Flags
The ALU (Arithmetic Logic Unit) overflow flag is a status bit in a CPU that indicates whether an
arithmetic operation has resulted in an overflow condition. This flag is crucial for ensuring the correctness
of calculations, especially when working with fixed-size data types.
Overflow occurs when the result of an arithmetic operation exceeds the maximum (or minimum) value
that can be represented within the allocated bits for a particular data type. For example, in an 8-bit signed
integer representation:
1. Unsigned Overflow
○ In unsigned arithmetic, overflow happens when a calculation produces a value greater
than the maximum value representable with the given number of bits.
○ Example
■ For an 8-bit unsigned integer, the maximum value is 255. For eg. if you add 200
and 100:
1. For Addition
○ In signed addition
■ Overflow occurs if
■ Adding two positive numbers results in a negative number.
■ Adding two negative numbers results in a positive number.
■ Mathematically
■ (A>0 and B>0 and Result<0) → Overflow.
■ (A<0 and B<0 and Result>0 → Overflow.
2. For Subtraction
○ In signed subtraction
■ Overflow occurs if
■ Subtracting a negative number from a positive number yields a negative
result.
■ Subtracting a positive number from a negative number yields a positive
result.
■ Mathematically
■ (A>0 and B<0 and Result<0 → Overflow.
■ (A<0 and B>0 and Result>0) → Overflow.
● The overflow flag is essential for software that relies on precise arithmetic operations, such as
financial calculations, scientific computations, and systems programming.
● It enables the detection of errors that may arise from overflow conditions, allowing developers to
implement appropriate error handling, such as raising exceptions or adjusting computations.
Summary
The ALU overflow flag indicates whether an arithmetic operation has resulted in an overflow condition,
affecting both unsigned and signed arithmetic. It helps detect errors in calculations where the result
exceeds the representable range of a data type, ensuring the correctness and reliability of computational
results in various applications.
● L1 Cache Miss: If the CPU can’t find the needed data in L1, it checks L2. If L2 also doesn’t have
the data, it checks L3, and then finally main memory.
● Cache Hierarchy: This hierarchy balances speed and capacity. L1 is the smallest and fastest,
handling critical data. L2 is larger and slower but can hold more data. L3 is the largest and shared,
serving as a last cache buffer before main memory.
● Cache Coherency: In multi-core processors, data in the caches must remain consistent across
cores. Cache coherency protocols (like MESI – Modified, Exclusive, Shared, Invalid) ensure that
data accessed by one core is reflected across all cores, preventing conflicts.
Summary
Cache memory is organized into levels (L1, L2, L3) to improve CPU performance by providing fast,
temporary storage for frequently accessed data. Each level balances speed, capacity, and proximity to the
CPU.
Cache coherency protocols ensure that multiple caches (in a multi-core or multi-processor system)
maintain a consistent view of data when the same memory location is stored in different caches. Without
coherency, a processor might read stale or incorrect data due to concurrent updates by another processor.
In multi-core systems, each core may have its own L1 and L2 caches. If two cores modify data stored at
the same memory location, the caches need to communicate to ensure that each core sees the most
up-to-date version of the data.
1. Write-Invalidate Protocol
○ When one processor writes to a memory location, it invalidates (removes) the copy of
that memory location in the caches of other processors. Only the processor performing
the write has a valid copy, forcing others to fetch the updated data from memory when
they need it.
2. Write-Update Protocol
○ When one processor writes to a memory location, it updates the same location in all
other processors’ caches. This ensures that all caches have the same value, but increases
the communication overhead.
However, cache coherency protocols ensure that all processors in a multi-core system have a consistent
view of data stored in caches.
Cache synchronization refers to the methods and mechanisms used to ensure that the data stored in the
CPU cache (which is faster but smaller) is consistent with the main memory (which is slower but larger)
and, in multi-core or multi-processor systems, among the caches of different CPUs. Ensuring cache
coherence is crucial for maintaining data integrity and system performance.
1. Cache Coherency
○ Cache coherency ensures that multiple caches in a system maintain the same view of
shared data. This is particularly important in multi-core processors where different cores
might cache the same memory location.
○ Coherency protocols manage the visibility of data changes made in one cache to other
caches.
2. Cache Invalidation
○ When one cache updates a data value, other caches holding the same value need to be
informed to invalidate their copies. This prevents stale data from being read.
○ For example, if Core A modifies a value in its cache, Core B should invalidate its cached
copy of that value.
3. Cache Update
○ In some protocols, instead of invalidating the data, the modified data can be sent to other
caches to update their copies. This method can be more efficient in some scenarios but
may require more bandwidth.
4. Bus Protocols
○ Cache synchronization often involves bus protocols that manage communication between
caches and memory. For example, when a processor makes a change, it may issue a
signal on the bus to update or invalidate other caches.
5. Memory Consistency Models
○ Different systems have different memory consistency models that dictate how memory
operations (read and write) appear to be executed in relation to one another. These models
define the rules for synchronization among caches.
1. MESI Protocol
○ The MESI protocol (Modified, Exclusive, Shared, Invalid) is a widely used cache
coherency protocol.
■ Modified (M): The cache line is modified and is not present in any other cache.
■ Exclusive (E): The cache line is not modified and is only in the current cache.
■ Shared (S): The cache line is shared among multiple caches.
■ Invalid (I): The cache line is invalid.
○ Transitions between these states ensure that the caches remain consistent with each other.
2. MOESI Protocol
○ An extension of the MESI protocol, the MOESI protocol includes an Owned state, which
allows a cache to have a modified copy while indicating to other caches that it is
responsible for supplying that data.
3. Directory-Based Protocols
○ These protocols maintain a central directory that keeps track of which caches have copies
of each memory block. When a cache wants to access a block, it checks the directory,
ensuring coherence across caches.
● Snooping– Caches listen to the bus to detect changes made by other caches and take appropriate
actions (like invalidating or updating their copies).
● Write-Through vs. Write-Back
○ Write-Through: Every write to the cache is immediately written to main memory,
ensuring consistency but potentially reducing performance.
○ Write-Back: The cache can delay writing back to memory until it needs to evict that
block, which can lead to inconsistencies if not managed properly.
Summary
Cache synchronization is essential for maintaining data consistency across caches in multi-core or
multi-processor systems. Mechanisms like cache coherency protocols (e.g., MESI, MOESI), invalidation
and update strategies, and memory consistency models work together to ensure that caches reflect the
most recent data from main memory and from each other. Effective cache synchronization helps optimize
performance while ensuring data integrity in complex computing environments.
Cache coherence refers to the consistency of data stored in local caches of a shared resource, ensuring that
all caches reflect the same value for shared data. Here are some practical examples to illustrate how cache
coherence works in different scenarios:
Imagine a multi-core system where two cores, Core A and Core B, access a shared variable, X.
● Initial State:
○ Both Core A and Core B load X from main memory. Each core caches X in their
respective caches.
○ Cache States:
■ Core A: X in Shared (S) state
■ Core B: X in Shared (S) state
● Scenario 1: Core A Modifies X:
○ Core A changes the value of X to 5.
○ This action changes the state of X in Core A’s cache to Modified (M), and it sends an
invalidation signal on the bus to Core B, informing it that X has been modified.
○ Cache States:
■ Core A: X in Modified (M) state
■ Core B: X in Invalid (I) state (since it has received the invalidation)
● Scenario 2: Core B Tries to Read X:
○ When Core B attempts to read X, it finds that its cache line is invalid.
○ Core B must fetch the updated value of X from Core A (or directly from memory if Core
A is not available).
○ The state of X in Core A remains Modified (M), while Core B's cache fetches the value
and marks it as Shared (S) after reading the new value.
● Initial State:
○ Both Core A and Core B load X into their caches.
○ Cache States:
■ Core A: X in Shared (S) state
■ Core B: X in Shared (S) state
● Scenario: Core A Modifies X:
○ Core A writes a new value to X, changing it to 10.
○ Core A changes its state to Modified (M) and sends an invalidation signal to Core B.
○ Core B receives the invalidation and changes X to the Invalid (I) state.
○ Core A now has the only copy of the modified X.
● Scenario: Core B Requests X:
○ If Core B wants to access X, it sees that its cache state is Invalid (I).
○ Core B sends a request for X to Core A.
○ Core A responds with the value 10, and Core B marks X as Shared (S) after fetching the
value, indicating it now has a valid copy of the data.
28.7.3 Directory-Based Coherence Example
In a directory-based coherence protocol, a centralized directory keeps track of which caches hold copies
of shared data.
● Initial State:
○ Variable Y is stored in main memory and loaded into caches of Core C and Core D.
○ The directory shows:
■ Main Memory: Y
■ Core C: Holds Y (State: Shared)
■ Core D: Holds Y (State: Shared)
● Scenario: Core C Modifies Y:
○ Core C updates Y to 20.
○ The directory is updated to reflect that Core C has modified the data, changing the state to
Modified for Core C and invalidating the copies in Core D.
○ The directory now indicates:
■ Core C: Holds Y (State: Modified)
■ Core D: Holds Y (State: Invalid)
● Scenario: Core D Requests Y:
○ If Core D tries to read Y, it notices its cache state is Invalid.
○ Core D requests the current value from the directory, which forwards the request to Core
C.
○ Core C sends the updated value 20 back to Core D, which then marks its state as Shared
in the directory.
Summary
These examples demonstrate how cache coherence protocols, such as MESI, MOESI, and directory-based
protocols, work to maintain data consistency across multiple caches in multi-core systems. They handle
scenarios involving modifications, reads, and the invalidation of stale data, ensuring that all cores have a
coherent view of the shared data. Proper management of cache coherence is essential for the performance
and reliability of modern multi-core processors.
Cache consistency refers to the coherence of data stored in multiple cache memory systems within a
computing environment. In systems where multiple processors or cores have their own caches, ensuring
that all caches reflect the most current data state is essential. Here are some examples and scenarios that
illustrate cache consistency.
In a write-through cache, every write operation to the cache is immediately followed by a write to the
main memory. This ensures that the data in the cache and main memory are always consistent.
● Scenario:
○ Core 1 writes the value 42 to memory address A.
○ The value 42 is written both to Core 1's cache and the main memory simultaneously.
○ Any subsequent read of memory address A by any core will return 42, ensuring
consistency.
In a write-back cache, modifications to data are made in the cache only, and the main memory is updated
only when the cache line is evicted.
● Scenario:
○ Core 1 writes the value 42 to memory address A, which is stored in its cache.
○ Core 2 reads memory address A before Core 1 writes back to the main memory.
○ Core 2 may read an outdated value, causing inconsistency. To maintain consistency,
mechanisms like cache coherence protocols are necessary.
Cache coherence protocols are implemented to ensure that all caches reflect the most current data. Two
popular protocols are MESI (Modified, Exclusive, Shared, Invalid) and MOESI (Modified, Owner,
Exclusive, Shared, Invalid).
● Scenario:
○ Assume both Core 1 and Core 2 have caches with memory address A loaded.
○ Core 1 modifies address A to 42 (state changes to Modified).
○ The MESI protocol invalidates Core 2's copy of address A to prevent it from returning
stale data.
○ When Core 2 tries to read address A, it finds it in the Invalid state and retrieves the
updated value from main memory, ensuring consistency.
In directory-based cache coherence schemes, a directory keeps track of the state of each cache line.
● Scenario:
○ Multiple cores (Core 1, Core 2, Core 3) cache the same memory address A.
○ The directory indicates which caches hold copies of address A.
○ When Core 1 writes to address A, it notifies the directory to mark Core 2's and Core 3's
caches as Invalid.
○ Any subsequent reads by Core 2 or Core 3 will require them to fetch the new value from
Core 1’s cache or the main memory, thus maintaining consistency.
In systems that require sequential consistency, operations appear to execute in a specific order,
maintaining the order of writes and reads.
● Scenario:
○ If Core 1 writes to memory address A and Core 2 reads from it, Core 2 will always see
the updated value if the system is sequentially consistent.
○ Even if Core 2 executes its read before Core 1's write completes, the read operation will
be delayed until the write is visible, ensuring a consistent view of memory across all
cores.
Summary
Cache consistency is crucial in multiprocessor systems to ensure that all cores see a coherent view of
shared data. Examples such as write-through and write-back caching, cache coherence protocols (like
MESI and MOESI), directory-based coherence, and sequential consistency highlight the various strategies
and mechanisms employed to maintain cache consistency. Proper implementation of these techniques
prevents stale or incorrect data from being read and ensures reliable and predictable system behavior.
Cache eviction refers to the process of removing data from a cache when it becomes full or when certain
conditions require space to be made for new data. This is an essential mechanism in cache management,
as it ensures that the most relevant data is retained in the cache for efficient access while older or less
frequently used data is discarded.
● Limited Size: Caches have a limited size, meaning they cannot hold all possible data. When the
cache reaches its capacity, new data cannot be added without removing existing data.
● Data Relevance: Some data becomes stale or less relevant over time. Eviction helps keep the
cache populated with the most useful data, improving access times and overall performance.
● Efficiency: Efficient cache eviction strategies help optimize the performance of memory access
patterns in applications, especially in environments with high data access rates.
28.9.2 Cache Eviction Policies
Various policies determine which data should be evicted from the cache when new data needs to be
stored. Some of the most common cache eviction policies include:
Cache eviction is a crucial process in cache memory management that involves removing data from the
cache to make room for new data when the cache is full. Various eviction policies, such as LRU, FIFO,
LFU, Random Replacement, and ARC, determine which data to remove, aiming to optimize cache
performance and reduce latency. Effective cache eviction strategies enhance system efficiency by
ensuring that relevant and frequently accessed data remains available for fast access.
https://medium.com/@apoorva.holkar22/cache-memory-organization-f8770cd89b7
Dear Students! You can get the diagrams of cache memory work-flow via accessing the aforementioned
provided URL.