KCS 713 Unit 1 Lecture 5
KCS 713 Unit 1 Lecture 5
Shailesh Saxena
Department of CS
KCS 713/Unit-1
Contents
Parallel Computing
SISD (Single Instruction Single Data Stream)
SIMD (Single Instruction Multiple Data Stream)
MISD (Multiple Instruction Single Data Stream)
MIMD (Multiple Instruction Multiple Data Stream)
Important Questions
References
KCS 713/Unit-1
Parallel Computing
Traditionally, software has been written for serial computation
Accomplished by breaking the problem into independent parts so that each processing element
can execute its part of the algorithm simultaneously with the others
KCS 713/Unit-1
KCS 713/Unit-1
KCS 713/Unit-1
The computational problem should be:
Solved in less time with multiple compute resources than with a single compute
resource
KCS 713/Unit-1
Uses for Parallel Computing
Science and Engineering
Historically, parallel computing has been used to model difficult problems in many areas of science
and engineering
KCS 713/Unit-1
Uses for Parallel Computing
Industrial and Commercial
Today, commercial applications provide an equal or greater driving force in the development of
faster computers
Data mining, Oil exploration, Web search engines, Medical imaging and diagnosis,
Pharmaceutical design, Financial and economic modeling, Advanced graphics and virtual reality,
Collaborative work environments
KCS 713/Unit-1
Why Use Parallel Computing?
In theory, throwing more resources at a task will shorten its time to completion, with potential
cost savings
KCS 713/Unit-1
Provide concurrency
A single compute resource can only do one thing at a time. Multiple computing resources can
be doing many things simultaneously
For example, the Access Grid (www.accessgrid.org) provides a global collaboration
network where people from around the world can meet and conduct work "virtually”
KCS 713/Unit-1
Limits to serial computing
Transmission speeds
• The speed of a serial computer is directly dependent upon how fast data can move through
hardware.
• Absolute limits are the speed of light and the transmission limit of copper wire
• Increasing speeds necessitate increasing proximity of processing elements
Limits to miniaturization
• Processor technology is allowing an increasing number of transistors to be placed
on a chip
• However, even with molecular or atomic-level components, a limit will be reached on how
small components can be
KCS 713/Unit-1
Limits to serial computing
Economic limitations
• It is increasingly expensive to make a single processor faster
• Using a larger number of moderately fast commodity processors to achieve the
same or better performance is less expensive
Current computer architectures are increasingly relying upon hardware level
parallelism to improve performance
• Multiple execution units
• Pipelined instructions
• Multi-core
KCS 713/Unit-1
Von Neumann Architecture
Named after the Hungarian mathematician John von Neumann who first authored the
general requirements for an electronic computer in his 1945 papers
Since then, virtually all computers have followed this basic design
KCS 713/Unit-1
Widely used classifications is called Flynn's Taxonomy
Based upon the number of concurrent instruction and data streams
available in the architecture
KCS 713/Unit-1
Single Instruction, Single Data
• A sequential computer which exploits no parallelism in either the
instruction or data streams
• Single Instruction: Only one instruction stream is being acted on by the
CPU during any one clock cycle
• Single Data: Only one data stream is being used as input during any
• one clock cycle
KCS 713/Unit-1
Single Instruction, Multiple Data
A computer which exploits multiple data streams against a
single
instruction stream to perform operations
• Single Instruction: All processing units execute the same instruction
at any given clock cycle
• Multiple Data: Each processing unit can operate on a different
data element
KCS 713/Unit-1
KCS 713/Unit-1
Multiple Instruction, Single
Data
Multiple Instruction: Each processing unit operates on the
data independently via separate instruction streams
Single Data: A single data stream is fed into multiple
processing units
Some conceivable uses might be:
• Multiple cryptography algorithms
attempting to crack a single coded
message
KCS 713/Unit-1
KCS 713/Unit-1
Multiple Instruction, Multiple Data
Multiple autonomous processors simultaneously
executing different instructions on different data
• Multiple Instruction: Every processor may be executing a
different instruction stream
• Multiple Data: Every processor may be working with a different
data
stream
KCS 713/Unit-1
KCS 713/Unit-1
Parallel computing
Using parallel computer to solve single problems faster
Parallel computer
Multiple-processor or multi-core system supporting parallel programming
Parallel programming
Programming in a language that supports concurrency explicitly
KCS 713/Unit-1
Task
A logically discrete section of computational work.
A task is typically a program or program-like set of instructions that is executed by a
processor
A parallel program consists of multiple tasks running on multiple processors
Shared Memory
Hardware point of view, describes a computer architecture where all processors have
direct access to common physical memory
In a programming sense, it describes a model where all parallel tasks have the same
"picture" of memory and can directly address and access the same logical memory
locations regardless of where the physical memory actually exists
KCS 713/Unit-1
Symmetric Multi-Processor (SMP)
Hardware architecture where multiple processors share a single address space and
access to all resources; shared memory computing
Distributed Memory
In hardware, refers to network based memory access for physical memory that is not
common
As a programming model, tasks can only logically "see" local machine memory and must
use communications to access memory on other machines where other tasks are
executing
Communications
Parallel tasks typically need to exchange data. There are several ways this can be
accomplished, such as through a shared memory bus or over a network, however the actual
event of data exchange is commonly referred to as communications regardless of the
method employed
KCS 713/Unit-1
Synchronization
Coordination of parallel tasks in real time, very often associated with
communications
Often implemented by establishing a synchronization point within an application where a
task may not proceed further until another task(s) reaches the same or logically equivalent
point
Synchronization usually involves waiting by at least one task, and can therefore cause a
parallel application's wall clock execution time to increase
KCS 713/Unit-1
Parallel Computer Memory Architectures
Shared Memory
All processors access all memory as global address space
Multiple processors can operate independently but share the same memory resources
Changes in a memory location effected by one processor are visible to all other processors
KCS 713/Unit-1
Shared memory machines are classified as UMA and NUMA, based upon memory access times
Uniform Memory Access (UMA)
• Identical processors
Cache coherent means if one processor updates a location in shared memory, all the other
processors know about the update. Cache coherency is accomplished at the hardware level
KCS 713/Unit-1
Non-Uniform Memory Access (NUMA)
• Often made by physically linking two or more SMPs
• One SMP can directly access memory of another SMP
• Not all processors have equal access time to all memories
• Memory access across link is slower
• If cache coherency is maintained, then may also be called CC- NUMA - Cache Coherent NUMA
KCS 713/Unit-1
Advantages
Global address space provides a user-friendly programming
perspective to memory
Data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs
Disadvantages
Primary disadvantage is the lack of scalability between memory and CPUs
• Adding more CPUs can geometrically increases traffic on the shared memory-CPU
path, and for cache coherent systems, geometrically increase traffic associated with
cache/memory management
Programmer responsibility for synchronization constructs that ensure "correct" access of
global memory
KCS 713/Unit-1
Important Questions
KCS 713/Unit-1
References
Dan C Marinescu: “ Cloud Computing Theory and Practice.” Elsevier(MK) 2013.
RajkumarBuyya, James Broberg, Andrzej Goscinski: “Cloud Computing Principles
and Paradigms”, Willey 2014.
Kai Hwang, Geoffrey C Fox and Jack J Dongarra, “Distributed and cloud computing”, Elsevier(MK)
2012.
John W Ritting house, James F Ransome: “Cloud Computing Implementation, Management and
Security”, CRC Press 2013.
KCS 713/Unit-1
KCS 713/Unit-1