0% found this document useful (0 votes)

71 views26 pages

Parallel Architecture: Sathish Vadhiyar

The document discusses parallel computing architectures and memory systems. It begins by describing motivations for parallel computing like faster execution times and large data. It then covers Flynn's taxonomy of parallel architectures including SISD, SIMD, MISD, and MIMD models. Shared memory and message passing architectures are also introduced. The document dives deeper into interconnection networks, cache coherence problems in shared memory systems, and cache coherence protocols.

Uploaded by

dhruvbhagtani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views26 pages

Parallel Architecture: Sathish Vadhiyar

Uploaded by

dhruvbhagtani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Parallel Architecture

Sathish Vadhiyar
Department of Computational and Data Sciences
Supercomputer Education and Research Centre
Indian Institute of Science, Bangalore, India

September 13, 2019 SERC Training Workshop

Motivations of Parallel Computing

• Faster execution times

– From days or months to hours or seconds
– E.g., climate modelling, bioinformatics
• Large amount of data dictate parallelism
• Parallelism more natural for certain kinds
of problems, e.g., climate modelling
• Due to computer architecture trends
– CPU speeds have saturated
– Slow memory bandwidths
PARALLEL ARCHITECTURES

September 13, 2019 SERC Training Workshop

4
Classification of Architectures – Flynn’s
classification
In terms of parallelism in
instruction and data stream
• Single Instruction Single
Data (SISD): Serial
Computers
• Single Instruction Multiple
Data (SIMD)
- Vector processors and
processor arrays
- Examples: CM-2, Cray-90,
Cray YMP, Hitachi 3600

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/
5
Classification of Architectures – Flynn’s
classification
• Multiple Instruction Single
Data (MISD): Not popular
• Multiple Instruction
Multiple Data (MIMD)
- Most popular
- IBM SP and most other
supercomputers,
clusters, computational
Grids etc.

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/
6
Classification 2:
Shared Memory vs Message Passing
• Shared memory machine: The n
processors share physical address space
– Communication can be done through this
shared memory
P
M P
M P
M P
M P
M P
M P
M

P P P Interconnect
P P P P

Interconnect
Main Memory

• The alternative is sometimes referred

to as a message passing machine or a
distributed memory machine
7

Shared Memory Machines

The shared memory could itself be

distributed among the processor nodes
– Each processor might have some portion of
the shared physical address space that is
physically close to it and therefore
accessible in less time
– Terms: NUMA vs UMA architecture
• Non-Uniform Memory Access
• Uniform Memory Access
8
Classification of Architectures – Based on
Memory
• Distributed memory

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/

 Multi-cores and Many-cores

INTERCONNECTION NETWORKS

9
10

Interconnects

• Used in both shared memory and

distributed memory architectures
• In shared memory: Used to connect
processors to memory
• In distributed memory: Used to connect
different processors
• Components
– Interface (PCI or PCI-e): for connecting
processor to network link
– Network link connected to a communication
network (network of connections)
11

Communication network

• Consists of switching elements to which

processors are connected through ports
• Switch: network of switching elements
• Switching elements connected with each
other using a pattern of connections
• Pattern defines the network topology

• In shared memory systems, memory units

are also connected to communication
network
12

Network Topologies
• Bus, ring – used in small-
scale shared memory
systems

• Crossbar switch – used in

some small-scale shared
memory machines, small or
medium-scale distributed
memory machines
13

Multistage network – Omega network

• To reduce switching complexity
• Omega network – consisting of logP stages,
each consisting of P/2 switching elements

• Contention
– In crossbar – nonblocking
– In Omega – can occur during multiple
communications to disjoint pairs
14

Mesh, Torus, Hypercubes, Fat-tree

• Commonly used network topologies in

distributed memory architectures
• Hypercubes are networks with dimensions
Mesh, Torus, Hypercubes

2D
Mesh
Hypercube (binary n-cube)

n=2 n=3

Torus

15
16

Fat Tree Networks

• Binary tree
• Processors arranged in leaves
• Other nodes correspond to switches
• Fundamental property:
No. of links from a node to
a children = no. of links
from the node to its parent
• Edges become fatter as we traverse up the
tree
17

Evaluating Interconnection topologies

• Diameter – maximum distance between any two processing nodes

– Full-connected – 1
2
– Star –
p/2
– Ring –
logP
– Hypercube -
• Connectivity – multiplicity of paths between 2 nodes. Miniimum
number of arcs to be removed from network to break it into two
disconnected networks
– Linear-array – 1
2
– Ring –
2
– 2-d mesh –
– 2-d mesh with wraparound – 4
– D-dimension hypercubes – d
18

Evaluating Interconnection topologies

• bisection width – minimum number of

links to be removed from network to
partition2 it into 2 equal halves
– Ring – Root(P)

– P-node1 2-D mesh -

– Tree – 1
P2/4
– Star –
P/2
– Completely connected –
– Hypercubes -
19

Evaluating Interconnection topologies

• channel width – number of bits that can be

simultaneously communicated over a link, i.e.
number of physical wires between 2 nodes
• channel rate – performance of a single physical
wire
• channel bandwidth – channel rate times channel
width
• bisection bandwidth – maximum volume of
communication between two halves of network,
i.e. bisection width times channel bandwidth
SHARED MEMORY AND CACHES

20
Shared Memory Architecture: Caches
P1 P2
ReadX=1
Write X Read X
Cache hit:
Wrong data!!
X:
X:10 X: 0

X: 1
0

21
22

Cache Coherence Problem

• If each processor in a shared memory

multiple processor machine has a data cache
– Potential data consistency problem: the cache
coherence problem
– Shared variable modification, private cache
• Objective: processes shouldn’t read `stale’
data
• Solutions
– Hardware: cache coherence mechanisms
23

Cache Coherence Protocols

• Write update – propagate cache line to other

processors on every write to a processor
• Write invalidate – each processor gets the
updated cache line whenever it reads stale
data
Invalidation Based Cache Coherence
P1 P2
ReadX=1
Write X Read X

X: 1
X:
X:10 X: 0

Invalidate

X: 0 X: 1

24
25

Cache Coherence using invalidate protocols

• 3 states associated with data items

– Shared – a variable shared by 2
caches
– Invalid – another processor (say P0)
has updated the data item
– Dirty – state of the data item in P0
September 13, 2019 SERC Training Workshop

Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
Pdcco 1
No ratings yet
Pdcco 1
8 pages
Lecture 6 - Interconnection Part1 Is An Interconnection
No ratings yet
Lecture 6 - Interconnection Part1 Is An Interconnection
9 pages
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
No ratings yet
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
70 pages
Lecture 5 Network Topologies For Parallel Architectures - Updated
No ratings yet
Lecture 5 Network Topologies For Parallel Architectures - Updated
46 pages
Interconnection Networks
No ratings yet
Interconnection Networks
31 pages
PDC 22 - Heterogeneity Interconnection Topologies
No ratings yet
PDC 22 - Heterogeneity Interconnection Topologies
18 pages
RG1 Intro ParallelArch HPCAI Jan2020
No ratings yet
RG1 Intro ParallelArch HPCAI Jan2020
47 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
cs668 Lec1 ParallelArch
No ratings yet
cs668 Lec1 ParallelArch
18 pages
Unit 1
No ratings yet
Unit 1
25 pages
CS621 Final Term
No ratings yet
CS621 Final Term
111 pages
Introduction To Parallel Processing Architecture
No ratings yet
Introduction To Parallel Processing Architecture
31 pages
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages
Slides Chapter 2 - Parallel Programming Platforms
No ratings yet
Slides Chapter 2 - Parallel Programming Platforms
33 pages
Unit 4
No ratings yet
Unit 4
9 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
2 - Parallel Computer Architecture - 1
No ratings yet
2 - Parallel Computer Architecture - 1
26 pages
Chapter 2 - Parallel Programming Platforms
No ratings yet
Chapter 2 - Parallel Programming Platforms
33 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
57 pages
V Models of Parallel Computers V. Models of Parallel Computers - After PRAM and Early Models
No ratings yet
V Models of Parallel Computers V. Models of Parallel Computers - After PRAM and Early Models
35 pages
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
No ratings yet
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
34 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
40 pages
Parallel and Distributed Computing Research Paper
No ratings yet
Parallel and Distributed Computing Research Paper
8 pages
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
No ratings yet
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
21 pages
Introduction
No ratings yet
Introduction
46 pages
Linear and Ring Networks
No ratings yet
Linear and Ring Networks
11 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
Lecture 4 Flynn's Classical Taxonomy
No ratings yet
Lecture 4 Flynn's Classical Taxonomy
43 pages
Lecture 3.2.3 (Various Interconnection Networks)
No ratings yet
Lecture 3.2.3 (Various Interconnection Networks)
22 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
Scalable Parallel Computing
No ratings yet
Scalable Parallel Computing
11 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
BDS Session 2
No ratings yet
BDS Session 2
59 pages
3RD Unit Half 2
No ratings yet
3RD Unit Half 2
8 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Introduction About ACA Syllabus
No ratings yet
Introduction About ACA Syllabus
18 pages
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
100% (1)
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
17 pages
Aca
No ratings yet
Aca
13 pages
Multiprocessor
No ratings yet
Multiprocessor
22 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Parallel Algorithms: Peter Harrison and William Knottenbelt
No ratings yet
Parallel Algorithms: Peter Harrison and William Knottenbelt
65 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Parallel Computing Architecture Guide
No ratings yet
Parallel Computing Architecture Guide
72 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Lect3-Parallel System
No ratings yet
Lect3-Parallel System
31 pages
PDC - Lecture - No. 3
No ratings yet
PDC - Lecture - No. 3
34 pages
Architecture
No ratings yet
Architecture
67 pages
Biharmonic Friction With A Smagorinsky-Like Viscosity For Use in Large-Scale Eddy-Permitting Ocean Models
No ratings yet
Biharmonic Friction With A Smagorinsky-Like Viscosity For Use in Large-Scale Eddy-Permitting Ocean Models
12 pages
Jclid190846 PDF
No ratings yet
Jclid190846 PDF
22 pages
SERC IntroMPI 2019-09-14 v0
No ratings yet
SERC IntroMPI 2019-09-14 v0
43 pages
Climate Tipping Points - Too Risky To Bet Against: Comment
No ratings yet
Climate Tipping Points - Too Risky To Bet Against: Comment
5 pages
Ocean Gyres Driven by Surface Buoyancy Forcing: Research Letter
No ratings yet
Ocean Gyres Driven by Surface Buoyancy Forcing: Research Letter
10 pages
Mesoscale Eddy Dynamics High Resolution Model
No ratings yet
Mesoscale Eddy Dynamics High Resolution Model
16 pages
Programming Environments On Sahasrat (Cray-Xc40 System)
No ratings yet
Programming Environments On Sahasrat (Cray-Xc40 System)
38 pages
Ship Motion Assignment
No ratings yet
Ship Motion Assignment
1 page
Forces On Ship: OE 1010 Introduction To Ocean Engineering
No ratings yet
Forces On Ship: OE 1010 Introduction To Ocean Engineering
62 pages
Parallel Programming Models Guide
No ratings yet
Parallel Programming Models Guide
32 pages
Assignment 5: Initial Checks On Hold and Tank Capacity, Resistance and Propulsion, Trim, Stability and Freeboard
No ratings yet
Assignment 5: Initial Checks On Hold and Tank Capacity, Resistance and Propulsion, Trim, Stability and Freeboard
1 page
Jiang, Henn, Sharma - 2002 - Wash Waves Generated by Ships Moving On Fairways of Varying Topography
No ratings yet
Jiang, Henn, Sharma - 2002 - Wash Waves Generated by Ships Moving On Fairways of Varying Topography
15 pages
Ocean Acoustics: Sound Radiation & Diffraction
No ratings yet
Ocean Acoustics: Sound Radiation & Diffraction
3 pages
Assignment 8
No ratings yet
Assignment 8
1 page
Shipping Transit Times Chart
No ratings yet
Shipping Transit Times Chart
2 pages
IND AS 115: Revenue Recognition Guide
No ratings yet
IND AS 115: Revenue Recognition Guide
21 pages
Chapter 4 Overview of Preventive Maintenance
No ratings yet
Chapter 4 Overview of Preventive Maintenance
14 pages
Computer Science Glossary Roman English
No ratings yet
Computer Science Glossary Roman English
3 pages
Steps To Use Smart Pigeon Hole PDF
No ratings yet
Steps To Use Smart Pigeon Hole PDF
2 pages
Vigenere Cipher: By: Mohsin Tahir Waqas Akram Numan-Ul-Haq Ali Asghar Rao Arslan
No ratings yet
Vigenere Cipher: By: Mohsin Tahir Waqas Akram Numan-Ul-Haq Ali Asghar Rao Arslan
15 pages
AX7203 User Manual
No ratings yet
AX7203 User Manual
57 pages
Direct Memory Access Overview
No ratings yet
Direct Memory Access Overview
21 pages
Ect303 Digital Signal Processing, December 2022
No ratings yet
Ect303 Digital Signal Processing, December 2022
3 pages
Machine Learning For Cyber: Unit 1: Introduction
No ratings yet
Machine Learning For Cyber: Unit 1: Introduction
23 pages
Advanced VLSI Design Course
No ratings yet
Advanced VLSI Design Course
13 pages
AUTOSAR Memory Stack
No ratings yet
AUTOSAR Memory Stack
31 pages
Notes On The Troubleshooting and Repair of Small Switchmode Power Supplies
100% (1)
Notes On The Troubleshooting and Repair of Small Switchmode Power Supplies
65 pages
Data Collection
No ratings yet
Data Collection
7 pages
GhostNet Version 1.4
No ratings yet
GhostNet Version 1.4
32 pages
3-Transmission Line Operation
No ratings yet
3-Transmission Line Operation
56 pages
FS - PP-01-Batch Number Print
100% (1)
FS - PP-01-Batch Number Print
11 pages
Internet Banking Manual - Final
No ratings yet
Internet Banking Manual - Final
11 pages
Real-ESRGAN: Synthetic Data Super-Resolution
No ratings yet
Real-ESRGAN: Synthetic Data Super-Resolution
10 pages
Spectroil Q100
67% (3)
Spectroil Q100
100 pages
Application of Computers in Hospital and Clinical Pharmacy
11% (9)
Application of Computers in Hospital and Clinical Pharmacy
13 pages
fОвщвдвлв
No ratings yet
fОвщвдвлв
77 pages
CS3342 Software Design Course
No ratings yet
CS3342 Software Design Course
15 pages
Java Notes Module 4 3rd Year
No ratings yet
Java Notes Module 4 3rd Year
24 pages
GAT Application Procedure
No ratings yet
GAT Application Procedure
2 pages
Quick Start Guide: CR-HD PRO Diagnostic Tool
No ratings yet
Quick Start Guide: CR-HD PRO Diagnostic Tool
2 pages
Approach 2 - Middleware - SAP ECC or S4HANA BTP
No ratings yet
Approach 2 - Middleware - SAP ECC or S4HANA BTP
20 pages
Installing Ubuntu Server
100% (1)
Installing Ubuntu Server
13 pages
Assignment Problems: Paul Dawkins
No ratings yet
Assignment Problems: Paul Dawkins
176 pages
ICT Safety and Security Guide
No ratings yet
ICT Safety and Security Guide
7 pages
Chapter 3 Part 1
No ratings yet
Chapter 3 Part 1
10 pages

Parallel Architecture: Sathish Vadhiyar

Uploaded by

Parallel Architecture: Sathish Vadhiyar

Uploaded by

Parallel Architecture

September 13, 2019 SERC Training Workshop

Motivations of Parallel Computing

• Faster execution times

September 13, 2019 SERC Training Workshop

• The alternative is sometimes referred

Shared Memory Machines

The shared memory could itself be

 Multi-cores and Many-cores

• Used in both shared memory and

• Consists of switching elements to which

• In shared memory systems, memory units

• Crossbar switch – used in

Multistage network – Omega network

Mesh, Torus, Hypercubes, Fat-tree

• Commonly used network topologies in

Fat Tree Networks

Evaluating Interconnection topologies

• Diameter – maximum distance between any two processing nodes

Evaluating Interconnection topologies

• bisection width – minimum number of

– P-node1 2-D mesh -

Evaluating Interconnection topologies

• channel width – number of bits that can be

Cache Coherence Problem

• If each processor in a shared memory

Cache Coherence Protocols

• Write update – propagate cache line to other

Cache Coherence using invalidate protocols

• 3 states associated with data items

You might also like