0% found this document useful (0 votes)
9 views

UNIT I NOTES DC

The document provides an overview of distributed computing, defining it as a system of independent components that communicate to achieve common goals. It discusses key elements such as resource sharing, scalability, and fault tolerance, as well as the differences between message-passing and shared memory systems. Additionally, it highlights various examples of distributed systems, their motivations, and communication primitives used in distributed environments.

Uploaded by

varshnikk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

UNIT I NOTES DC

The document provides an overview of distributed computing, defining it as a system of independent components that communicate to achieve common goals. It discusses key elements such as resource sharing, scalability, and fault tolerance, as well as the differences between message-passing and shared memory systems. Additionally, it highlights various examples of distributed systems, their motivations, and communication primitives used in distributed environments.

Uploaded by

varshnikk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

CS3551 DISTRIBUTED COMPUTING

UNIT I INTRODUCTION

Introduction: Definition-Relation to Computer System Components – Motivation – Message -


Passing Systems versus Shared Memory Systems – Primitives for Distributed Communication –
Synchronous versus synchronous Executions – Design Issues and Challenges;

A Model of Distributed Computations: A Distributed Program – A Model of Distributed


Executions – Models of Communication Networks – Global State of a Distributed System

1 INTRODUCTION

1.1DEFINITION
Distributed systems also known as distributed computing and distributed databases, a
distributed system is a collection of independent components located on different machines that
share messages with each other in order to achieve common goals.
As such, the distributed system will appear as if it is one interface or computer to the end-user.
The hope is that together, the system can maximize resources and information while preventing
failures, as if one system fails, it won't affect the availability of the service.
Today, data is more distributed than ever, and modern applications no longer run in isolation.
The vast majority of products and applications rely on distributed systems.

Elements of a Distributed System


The most important functions of distributed computing are:
• Resource sharing - whether it’s the hardware, software or data that can be shared
• Openness - how open is the software designed to be developed and shared with each other
• Concurrency - multiple machines can process the same function at the same time
• Scalability - how do the computing and processing capabilities multiply when extended to
many machines
• Fault tolerance - how easy and quickly can failures in parts of the system be detected and
recovered
• Transparency - how much access does one node have to locate and communicate with other
nodes in the system.
Modern distributed systems have evolved to include autonomous processes that might run on
the same physical machine, but interact by exchanging messages with each other.

Examples of Distributed Systems


Networks
The earliest example of a distributed system happened in the 1970s when ethernet was invented
and LAN (local area networks) were created. For the first time computers would be able to send
messages to other systems with a local IP address. Peer-to-peer networks evolved and e-mail

CS3551 Distributed Computing Page 1


and then the Internet as we know it continue to be the biggest, ever growing example of
distributed systems. As the internet changed from IPv4 to IPv6, distributed systems have
evolved from “LAN” based to “Internet” based.

Telecommunication networks
Telephone and cellular networks are also examples of distributed networks. Telephone
networks have been around for over a century and it started as an early example of a peer to
peer network. Cellular networks are distributed networks with base stations physically
distributed in areas called cells. As telephone networks have evolved to VOIP (voice over IP), it
continues to grow in complexity as a distributed network.

Distributed Real-time Systems


Many industries use real-time systems that are distributed locally and globally. Airlines use
flight control systems, Uber and Lyft use dispatch systems, manufacturing plants use
automation control systems, logistics and e-commerce companies use real-time tracking systems.

Parallel Processing
There used to be a distinction between parallel computing and distributed systems. Parallel
computing was focused on how to run software on multiple threads or processors that accessed
the same data and memory. Distributed systems meant separate machines with their own
processors and memory. With the rise of modern operating systems, processors and cloud
services these days, distributed computing also encompasses parallel processing.

Distributed artificial intelligence


Distributed Artificial Intelligence is a way to use large scale computing power and parallel
processing to learn and process very large data sets using multi-agents.

Distributed Database Systems


A distributed database is a database that is located over multiple servers and/or physical
locations. The data can either be replicated or duplicated across systems.
Most popular applications use a distributed database and need to be aware of the homogenous
or heterogenous nature of the distributed database system.
A homogenous distributed database means that each system has the same database
management system and data model. They are easier to manage and scale performance by
adding new nodes and locations.
Heterogenous distributed databases allow for multiple data models, different database
management systems. Gateways are used to translate the data between nodes and usually
happen as a result of merging applications and systems.

Differences between centralized and distributed systems

CS3551 Distributed Computing Page 2


Centralized Systems Distributed Systems
In Centralized Systems, several jobs are In Distributed Systems, jobs are distributed
doneon a particular central processing among several processors. The Processor are
unit(CPU) interconnected by a computer network
They have shared memory and shared They have no global state (i.e.) no shared
variables. memory and no shared variables.
Clocking is present. No global clock.

1.2 RELATION TO COMPUTER SYSTEM COMPONENTS

Fig 1.1: Example of a Distributed System

As shown in Fig 1.1, Each computer has a memory-processing unit and the computers are
connected by a communication network. Each system connected to the distributed networks
hosts distributed software which is a middleware technology. This drives the Distributed
System (DS) at the same time preserves the heterogeneity of the DS. The term computation or
run in a distributed system is the execution of processes to achieve a common goal.

Fig 1.2: Interaction of layers of network

CS3551 Distributed Computing Page 3


The interaction of the layers of the network with the operating system and middleware is shown
in Fig 1.2. The middleware contains important library functions for facilitating the operations of
DS.
The distributed system uses a layered architecture to break down the complexity of system
design. The middleware is the distributed software that drives the distributed system, while
providing transparency of heterogeneity at the platform level

Examples of middleware: Object Management Group’s (OMG), Common Object Request Broker
Architecture (CORBA), Remote Procedure Call (RPC), Message Passing Interface (MPI)

1.3 MOTIVATION
The motivation for using a distributed system is some or all of the following requirements:

1. Inherently distributed computations

Inherently distributed systems are systems that are distributed by their own nature; in other words,
they are composed of subsystems, which are physically and geographically separated. The
computation is inherently distributed. Eg., money transfer in banking

2. Resource sharing

Resources such as peripherals, complete data sets in databases, special libraries, as well as data
(variable/files) cannot be fully replicated at all the sites. Further, they cannot be placed at a single
site. Therefore, such resources are typically distributed across the system.

For example, distributed databases such as DB2 partition the data sets across several servers

3. Access to geographically remote data and resources

In many scenarios, the data cannot be replicated at every site participating in the distributed
execution because it may be too large or too sensitive to be replicated.

For example, payroll data within a multinational corporation is both too large and too sensitive to
be replicated at every branch office/site.

4. Enhanced reliability

a. availability, i.e., the resource should be accessible at all times;

b. integrity, i.e., the value/state of the resource should be correct

c. fault-tolerance, i.e., the ability to recover from system failures

A distributed system has the inherent potential to provide increased reliability because of the
possibility of replicating resources and executions, as well as the reality that geographically

CS3551 Distributed Computing Page 4


distributed resources are not likely to crash/malfunction at the same time under normal
circumstances. Reliability entails several aspects:

5. Increased performance/cost ratio

By resource sharing and accessing geographically remote data and resources, the performance/cost
ratio is increased.

6. Scalability

As the processors are usually connected by a wide-area network, adding more processors does not
pose a direct bottleneck for the communication network.

7. Modularity and incremental expandability

Heterogeneous processors may be easily added into the system without affecting the performance,
as long as those processors are running the same middleware algorithms. Similarly, existing
processors may be easily replaced by other processors.

1.4 MESSAGE-PASSING SYSTEMS VERSUS SHARED MEMORY SYSTEMS


Communication among processors takes place via shared data variables, and control variables for
synchronization among the processors. The communications between the tasks in multiprocessor
systems take place through two main modes:

Message passing systems:

 This allows multiple processes to read and write data to the message queue without being
connected to each other.
 Messages are stored on the queue until their recipient retrieves them. Message queues are
quite useful for inter process communication and are used by most operating systems.
Shared memory systems:

 The shared memory is the memory that can be simultaneously accessed by multiple
processes. This is done so that the processes can communicate with each other.
 Communication among processors takes place through shared data variables, and control
variables for synchronization among the processors.
 Semaphores and monitors are common synchronization mechanisms on shared memory
systems.
 When shared memory model is implemented in a distributed environment, it is termed as
distributed shared memory.

CS3551 Distributed Computing Page 5


a) Message Passing Model b) Shared Memory Model

Fig 1.3: Inter-process communication models


Differences between message passing and shared memory models

Message Passing Distributed Shared Memory


Services Offered: The processes share variables directly, so no
Variables have to be marshalled from one marshalling and unmarshalling. Shared
process, transmitted andunmarshalled into variables can be named, stored and accessed
other variables at thereceiving process. inDSM.
Processes can communicate with other Here, a process does not have private address
processes. They can be protected from one space. So one process can alter the execution of
another by having private address spaces. other.
This technique can be used in heterogeneous This cannot be used to heterogeneous
computers. computers.
Synchronization between processes is through Synchronization is through locks and
message passing primitives. semaphores.
Processes communicating via message passing Processes communicating through DSM
must execute at the same time. may execute with non-overlapping lifetimes.
Efficiency:
All remote data accesses are explicit and Any particular read or update may or may
therefore the programmer is always aware of notinvolve communication by the underlying
whether a particular operation is in-process runtime support.
orinvolves the expense of communication.

1.4.1Emulating message-passing on a shared memory system (MP → SM)

 The shared memory system can be made to act as message passing system. The shared
address space can be partitioned into disjoint parts, one part being assigned to each
processor.

CS3551 Distributed Computing Page 6


 Send and receive operations care implemented by writing to and reading from the
destination/sender processor’s address space. The read and write operations are
synchronized.
 Specifically, a separate location can be reserved as the mailbox for each ordered pair of
processes.

1.4.2 Emulating shared memory on a message-passing system (SM → MP)

This involves the use of “send” and “receive” operations for “write” and “read” operations. Each
shared location can be modeled as a separate process; “write” to a shared location is emulated by
sending an update message to the corresponding owner process; a “read” to a shared location is
emulated by sending a query message to the owner process. As accessing another processor’s
memory requires send and receive operations, this emulation is expensive. Although emulating
shared memory might seem to be more attractive from a programmer’s perspective, it must be
remembered that in a distributed system, it is only an abstraction. Thus, the latencies involved in
read and write operations may be high even when using shared memory emulation because the
read and write operations are implemented by using network-wide communication under the
covers.

An application can of course use a combination of shared memory and message-passing. In a


MIMD message-passing multicomputer system, each “processor” may be a tightly coupled
multiprocessor system with shared memory. Within the multiprocessor system, the processors
communicate via shared memory. Between two computers, the communication is by message
passing. As message-passing systems are more common and more suited for wide-area distributed
systems, we will consider message-passing systems more extensively than we consider shared
memory systems.

1.5 PRIMITIVES FOR DISTRIBUTED COMMUNICATION


1.5.1 Blocking / Non blocking / Synchronous / Asynchronous

 Message send and message receive communication primitives are done through Send() and
Receive(), respectively.
 A Send primitive has two parameters: the destination and the buffer in the user space that
holds the data to be sent.
 The Receive primitive also has two parameters: the source from which the data is to be
received and the user buffer into which the data is to be received.
There are two ways of sending data when the Send primitive is called:
 Buffered: The standard option copies the data from the user buffer to the kernel buffer. The
data later gets copied from the kernel buffer onto the network. For the Receive primitive, the

CS3551 Distributed Computing Page 7


buffered option is usually required because the data may already have arrived when the
primitive is invoked, and needs a storage place in the kernel.
 Unbuffered: The data gets copied directly from the user buffer onto the network.

Blocking primitives

 The primitive commands wait for the message to be delivered. The execution of the
processes is blocked.
 The sending process must wait after a send until an acknowledgement is made by the
receiver.
 The receiving process must wait for the expected message from the sending process
 The receipt is determined by polling common buffer or interrupt
 This is a form of synchronization or synchronous communication.
 A primitive is blocking if control returns to the invoking process after the processing for the
primitive completes.

Non Blocking primitives

 If send is non blocking, it returns control to the caller immediately, before the message is sent.
 The advantage of this scheme is that the sending process can continue computing in parallel
with the message transmission, instead of having the CPU go idle.
 This is a form of asynchronous communication.
 A primitive is non-blocking if control returns back to the invoking process immediately after
invocation, even though the operation has not completed.
 For a non-blocking Send, control returns to the process even before the data is copied out of
the user buffer.
 For anon-blocking Receive, control returns to the process even before the data may have
arrived from the sender.

Synchronous primitives

 A Send or a Receive primitive is synchronous if both the Send() and Receive() handshake
with each other.
 The processing for the Send primitive completes only after the invoking processor learnsthat
the other corresponding Receive primitive has also been invoked and that the receive
operation has been completed.
 The processing for the Receive primitive completes when the data to be received is copied
into the receiver’s user buffer.
Asynchronous primitives

 A Send primitive is said to be asynchronous, if control returns back to the invoking process
after the data item to be sent has been copied out of the user- specified buffer.

CS3551 Distributed Computing Page 8


 It does not make sense to define asynchronous Receive primitives.
 Implementing non -blocking operations are tricky.
 For non-blocking primitives, a return parameter on the primitive call returns a system-
generated handle which can be later used to check the status of completion of the call.
 The process can check for the completion:
 Checking if the handle has been flagged or posted
 Issue a Wait with a list of handles as parameters: usually blocks until one of the
parameter handles is posted.

The send and receive primitives can be implemented in four modes:

 Blocking synchronous
 Non- blocking synchronous
 Blocking asynchronous
 Non- blocking asynchronous

Four modes of send operation

Blocking synchronous Send:

 The data gets copied from the user buffer to the kernel buffer and is then sent over the
network.
 After the data is copied to the receiver’s system buffer and a Receive call has been issued, an
acknowledgement back to the sender causes control to return to the process that invoked the
Send operation and completes the Send.

Non-blocking synchronous Send:


 Control returns back to the invoking process as soon as the copy of data from the user buffer
to the kernel buffer is initiated.
 A parameter in the non-blocking call also gets set with the handle of a location that the user
process can later check for the completion of the synchronous send operation.
 The location gets posted after an acknowledgement returns from the receiver.
 The user process can keep checking for the completion of the non-blocking synchronous
Send by testing the returned handle, or it can invoke the blocking Wait operation on the
returned handle

Blocking asynchronous Send:

 The user process that invokes the Send is blocked until the data is copied from the user’s
buffer to the kernel buffer.

Non-blocking asynchronous Send:

CS3551 Distributed Computing Page 9


 The user process that invokes the Send is blocked until the transfer of the data from the
user’s buffer to the kernel buffer is initiated.
 Control returns to the user process as soon as this transfer is initiated, and a parameter in the
non-blocking call also gets set with the handle of a location that the user process can check
later using the Wait operation for the completion of the asynchronous Send.

The asynchronous Send completes when the data has been copied out of the user’s buffer. The
checking for the completion may be necessary if the user wants to reuse the buffer from which the
data was sent.

Fig 1.4 a) Blocking synchronous send and Fig 1.4 b) Non-blocking synchronous send
blocking receive and blocking receive

Fig 1.4 c) Blocking asynchronous send Fig 1.4 d) Non-blocking asynchronous send

Modes of receive operation

Blocking Receive:

The Receive call blocks until the data expected arrives and is written in the specified user buffer.
Then control is returned to the user process.

CS3551 Distributed Computing Page 10


Non-blocking Receive:

 The Receive call will cause the kernel to register the call and return the handle of a location
that the user process can later check for the completion of the non-blocking Receive
operation.
 This location gets posted by the kernel after the expected data arrives and is copied to the
user-specified buffer. The user process can check for the completion of the non-blocking
Receive by invoking the Wait operation on the returned handle.

1.5.2 Processor Synchrony

“Processor synchrony indicates that all the processors execute in lock-step with their clocks
synchronized.”

Since distributed systems do not follow a common clock, this abstraction is implemented using
some form of barrier synchronization to ensure that no processor begins executing the next step of
code until all the processors have completed executing the previous steps of code assigned to each
of the processors.

1.5.3 Libraries and standards

There exists a wide range of primitives for message-passing. The message-passing interface (MPI)
library and the PVM (parallel virtual machine) library are used largely by the scientific community

Message Passing Interface (MPI): This is a standardized and portable message- passing system to
function on a wide variety of parallel computers. MPI primarily addresses the message-passing
parallel programming model: data is moved from the address space of one process to that of
another process through cooperative operations on each process.

The primary goal of the Message Passing Interface is to provide a widely used standard for writing
message passing programs.

Parallel Virtual Machine (PVM): It is a software tool for parallel networking of computers. It is
designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a
single distributed parallel processor.

Remote Procedure Call (RPC): The Remote Procedure Call (RPC) is a common model of request
reply protocol. In RPC, the procedure need not exist in the same address space as the calling
procedure. The two processes may be on the same system, or they may be on different systems with
a network connecting them.

Remote Method Invocation (RMI): RMI (Remote Method Invocation) is a way that a programmer
can write object-oriented programming in which objects on different computers can interact in a

CS3551 Distributed Computing Page 11


distributed network. It is a set of protocols being developed by Sun's JavaSoft division that enables
Java objects to communicate remotely with other Java objects.

Remote Procedure Call (RPC): RPC is a powerful technique for constructing distributed, client-
server based applications. In RPC, the procedure need not exist in the same address space as the
calling procedure. The two processes may be on the same system, or they may be on different
systems with a network connecting them. By using RPC, programmers of distributed applications
avoid the details of the interface with the network. RPC makes the client/server model of
computing more powerful and easier to program.

Differences between RMI and RPC

RMI RPC
RMI uses an object oriented paradigm RPC is not object oriented and does not deal
where the user needs to know the objectand with objects. Rather, it calls specific
the method of the object he needs toinvoke. subroutines that are already established
With RPC looks like a local call. RPC RMI handles the complexities of passingalong
handles the complexities involved with the invocation from the local to theremote
passing the call from the local to the computer. But instead of passing a procedural
remote computer. call, RMI passes a referenceto the object and
the method that is being
called.

The commonalities between RMI and RPC are as follows:

 They both support programming with interfaces.


 They are constructed on top of request-reply protocols.
 They both offer a similar level of transparency.

Common Object Request Broker Architecture (CORBA): CORBA describes a messaging


mechanism by which objects distributed over a network can communicate with each other
irrespective of the platform and language used to develop those objects. The data representation is
concerned with an external representation for the structured and primitive types that can be passed
as the arguments and results of remote method invocations in CORBA. It can be used by a variety
of programming languages.

1.6 SYNCHRONOUS VERSUS ASYNCHRONOUS EXECUTIONS


The execution of process in distributed systems may be synchronous or asynchronous.

Example: In asynchronous communication between systems, the caller sends a message and
continues with its other tasks not waiting for the answer. When the response eventually arrives it
handles it as any other arriving message. It is in contrast with synchronous communication where

CS3551 Distributed Computing Page 12


the caller waits for the answer.

Asynchronous Execution:
A communication among processes is considered asynchronous, when every
communicating process can have a different observation of the order of the messages
beingexchanged. In an asynchronous execution:
 There is no processor synchrony and there is no bound on the drift rate
of processorclocks
 Message delays are finite but unbounded
 No upper bound on the time taken by a process

Fig 1.5: Asynchronous execution in message passing system

An example asynchronous execution with four processes P0 to P3 is shown in Figure 1.9. The arrows
denote the messages; the tail and head of an arrow mark the send and receive event for that
message, denoted by a circle and vertical line, respectively. Non-communication events, also
termed as internal events, are shown by shaded circles.

Synchronous Execution:

A communication among processes is considered synchronous when every process observes the
same order of messages within the system. In the same manner, the execution is considered
synchronous, when every individual process in the system observes the same total order of all the
processes which happen within it. In an synchronous execution:

 Processors are synchronized and the clock drift rate between any two processors is bounded
 Message delivery times are such that they occur in one logical step or round
 Upper bound on the time taken by a process to execute a step.

An example of a synchronous execution with four processes P0 to P3 is shown in Figure 1.10. The
arrows denote the messages. All the messages sent in a round are received within that same round.

CS3551 Distributed Computing Page 13


Fig 1.6: Synchronous execution

Emulating an asynchronous system by a synchronous system (A → S)

An asynchronous program can be emulated on a synchronous system fairly trivially as the


synchronous system is a special case of an asynchronous system – all communication finishes
within the same round in which it is initiated.

Emulating a synchronous system by an asynchronous system (S → A)

A synchronous program can be emulated on an asynchronous system using a tool called


synchronizer.

Emulation for a fault free system

Fig 1.7: Emulations in a failure free message passing system

If system A can be emulated by system B, denoted A/B, and if a problem is not solvable in B, then it
is also not solvable in A. If a problem is solvable in A, it is also solvable in B. Hence, in a sense, all
four classes are equivalent in terms of computability in failure-free systems.

1.7 DESIGN ISSUES AND CHALLENGES


The design of distributed systems has numerous challenges. They can be categorized into:

CS3551 Distributed Computing Page 14


o Issues related to system and operating systems design
o Issues related to algorithm design
o Issues arising due to emerging technologies The above three classes are not mutually
exclusive.
1.7.1 Issues related to system and operating systems design
The following are some of the common challenges to be addressed in designing a distributed
system from system perspective:
Communication: This task involves designing suitable communication mechanisms among
the various processes in the networks.
Examples: RPC, RMI
Processes: The main challenges involved are: process and thread management at both client
and server environments, migration of code between systems, design of software and mobile
agents.
Naming: Devising easy to use and robust schemes for names, identifiers, and addresses is
essential for locating resources and processes in a transparent and scalable manner. The
remote and highly varied geographical locations make this task difficult.
Synchronization: Mutual exclusion, leader election, deploying physical clocks, global state
recording are some synchronization mechanisms.
Data storage and access Schemes: Designing file systems for easy and efficient data storage
with implicit accessing mechanism is very much essential for distributed operation
Consistency and replication: The notion of Distributed systems goes hand in hand with
replication of data, to provide high degree of scalability. The replicas should be handed with
care since data consistency is prime issue.
Fault tolerance: This requires maintenance of fail proof links, nodes, and processes. Some of
the common fault tolerant techniques are resilience, reliable communication, distributed
commit, checkpointing and recovery, agreement and consensus, failure detection, and self-
stabilization.
Security: Cryptography, secure channels, access control, key management – generation and
distribution, authorization, and secure group management are some of the security measure
that is imposed on distributed systems.
Applications Programming Interface (API) and transparency: The user friendliness and
ease of use is very important to make the distributed services to be used by wide community.
Transparency, which is hiding inner implementation policy from users, is of the following
types:
 Access transparency: hides differences in data representation
 Location transparency: hides differences in locations y providing uniform access to
data located at remote locations.
 Migration transparency: allows relocating resources without changing names.
 Replication transparency: Makes the user unaware whether he is working on original
or replicated data.

CS3551 Distributed Computing Page 15


 Concurrency transparency: Masks the concurrent use of shared resources for the user.
 Failure transparency: system being reliable and fault-tolerant.
Scalability and modularity: The algorithms, data and services must are as distributed as
possible. Various techniques such as replication, caching and cache management, and
asynchronous processing help to achieve scalability.

1.7.2 Algorithmic challenges in distributed computing


Designing useful execution models and frameworks
The interleaving model, partial order model, input/output automata model and the
Temporal Logic of Actions (TLA) are some examples of models that provide different
degrees of infrastructure.
Dynamic distributed graph algorithms and distributed routing algorithms
o The distributed system is generally modeled as a distributed graph.
o Hence graph algorithms are the base for large number of higher level communication,
data dissemination, object location, and object search functions.
o These algorithms must have the capacity to deal with highly dynamic graph
characteristics. They are expected to function like routing algorithms.
o The performance of these algorithms has direct impact on user-perceived latency, data
traffic and load in the network.
Time and global state in a distributed system
o The geographically remote resources demands the synchronization based on logical
time.
o Logical time is relative and eliminates the overheads of providing physical time for
applications. Logical time can (i) capture the logic and inter-process dependencies (ii)
track the relative progress at each process
o Maintaining the global state of the system across space involves the role of time
dimension for consistency. This can be done with extra effort in a coordinated manner.
o Deriving appropriate measures of concurrency also involves the timedimension, as
the execution and communication speed of threads may vary a lot.
Synchronization/coordination mechanisms
o Synchronization is essential for the distributed processes to facilitate concurrent
execution without affecting other processes.
o The synchronization mechanisms also involve resource management and concurrency
management mechanisms.
o Some techniques for providing synchronization are:
 Physical clock synchronization: Physical clocks usually diverge in their values
due to hardware limitations. Keeping them synchronized is a fundamental
challenge to maintain common time.

CS3551 Distributed Computing Page 16


 Leader election: All the processes need to agree on which process will play the
role of a distinguished process or a leader process. A leader is necessary even
for many distributed algorithms because there is often some asymmetry.
 Mutual exclusion: Access to the critical resource(s) has to be coordinated.
 Deadlock detection and resolution: This is done to avoid duplicate work, and
deadlock resolution should be coordinated to avoid unnecessary aborts of
processes.
 Termination detection: cooperation among the processes to detect the specific
global state of quiescence.
 Garbage collection: Detecting garbage requires coordination among the
processes.
Group communication, multicast, and ordered message delivery
o A group is a collection of processes that share a common context and collaborate on a
common task within an application domain. Group management protocols are needed
for group communication wherein processes can join and leave groups dynamically, or
fail.
o The concurrent execution of remote processes may sometimes violate the semantics and
order of the distributed program. Hence, a formal specification of the semantics of
ordered delivery need to be formulated, and then implemented.
Monitoring distributed events and predicates
o Predicates defined on program variables that are local to different processes are used
for specifying conditions on the global system state.
o On-line algorithms for monitoring such predicates are hence important.
o An important paradigm for monitoring distributed events is that of event streaming,
wherein streams of relevant events reported from different processes are examined
collectively to detect predicates.
o The specification of such predicates uses physical or logical time relationships.
Distributed program design and verification tools
Methodically designed and verifiably correct programs can greatly reduce the overhead of
software design, debugging, and engineering. Designing these is a big challenge.
Debugging distributed programs
Debugging distributed programs is much harder because of the concurrency and replications.
Adequate debugging mechanisms and tools are need of the hour.
Data replication, consistency models, and caching
o Fast access to data and other resources is important in distributed systems.
o Managing replicas and their updates faces concurrency problems.
o Placement of the replicas in the systems is also a challenge because resources usually
cannot be freely replicated.
World Wide Web design – caching, searching, scheduling
o WWW is a commonly known distributed system.

CS3551 Distributed Computing Page 17


o The issues of object replication and caching, prefetching of objects have to be done on
WWW also.
o Object search and navigation on the web are important functions in the operation of
the web.
Distributed shared memory abstraction
o A shared memory is easier to implement since it does not involve managing the
communication tasks.
o The communication is done by the middleware by message passing.
o The overhead of shared memory is to be dealt by the middleware technology.
o Some of the methodologies that does the task of communication in shared memory
distributed systems are:
 Wait-free algorithms: The ability of a process to complete its execution
irrespective of the actions of other processes is wait free algorithm. They
control the access to shared resources in the shared memory abstraction. They
are expensive.
 Mutual exclusion: Concurrent access of processes to a shared resource or data
is executed in mutually exclusive manner. Only one process is allowed to
execute the critical section at any given time. In a distributed system, shared
variables or a local kernel cannot be used to implement mutual exclusion.
Message passing is the sole means for implementing distributed mutual
exclusion.
 Register constructions: Architectures must be designed in such a way that,
registers allows concurrent access without any restrictions on the concurrency
permitted.
Reliable and fault-tolerant distributed systems
The following are some of the fault tolerant strategies:
 Consensus algorithms: Consensus algorithms allow correctly functioning processes
to reach agreement among themselves in spite of the existence of malicious processes.
The goal of the malicious processes is to prevent the correctly functioning processes
from reaching agreement. The malicious processes operate by sending messages with
misleading information, to confuse the correctly functioning processes.
 Replication and replica management: The Triple Modular Redundancy (TMR)
technique is used in software and hardware implementation. TMR is a fault-tolerant
form of N-modular redundancy, in which three systems perform a process and that
result is processed by a majority-voting system to produce a single output.
 Voting and quorum systems: Providing redundancy in the active or passive
components in the system and then performing voting based on some quorum
criterion is a classical way of dealing with fault-tolerance. Designing efficient
algorithms for this purpose is the challenge.

CS3551 Distributed Computing Page 18


 Distributed databases and distributed commit: The distributed databases should
also follow atomicity, consistency, isolation and durability (ACID) properties.
 Self-stabilizing systems: All system executions have associated good(or legal) states
and bad (or illegal) states; during correct functioning, the system makes transitions
among the good states. A self-stabilizing algorithm guarantee to take the system to a
good state even if a bad state were to arise due to some error. Self-stabilizing
algorithms require some in-built redundancy to track additional variables of the state
and do extra work.
 Checkpointing and recovery algorithms: Checkpointing is periodically recording the
current state on secondary storage so that, in case of a failure. The entire computation
is not lost but can be recovered from one of the recently taken checkpoints.
Checkpointing in a distributed environment is difficult because if the checkpoints at
the different processes are not coordinated, the local checkpoints may become useless
because they are inconsistent with the checkpoints at other processes.
 Failure detectors: The asynchronous distributed do not have a bound on the message
transmission time. This makes the message passing very difficult, since the receiver do
not know the waiting time. Failure detectors probabilistically suspect another process
as having failed and then converge on a determination of the up/down status of the
suspected process.
Load balancing
The objective of load balancing is to gain higher throughput, and reduce the user perceived
latency. Load balancing may be necessary because of a variety off actors such as high
network traffic or high request rate causing the network connection to be a bottleneck, or
high computational load. The following are some forms of load balancing:
 Data migration: The ability to move data around in the system, based on the access
pattern of the users
 Computation migration: The ability to relocate processes in order to perform a
redistribution of the workload.
 Distributed scheduling: This achieves a better turnaround time for the users by using
idle processing power in the system more efficiently.
Real-time scheduling
Real-time scheduling becomes more challenging when a global view of the system state is
absent with more frequent on-line or dynamic changes. The message propagation delays
which are network-dependent are hard to control or predict. This is an hindrance to meet the
QoS requirements of the network.

Performance
User perceived latency in distributed systems must be reduced. The common issues in
performance:

CS3551 Distributed Computing Page 19


 Metrics: Appropriate metrics must be defined for measuring the performance of
theoretical distributed algorithms and its implementation.
 Measurement methods/tools: The distributed system is a complex entity appropriate
methodology and tools must be developed for measuring the performance metrics.
1.7.3 Applications of distributed computing and newer challenges
The deployment environment of distributed systems ranges from mobile systems to cloud storage.
All the environments have their own challenges:
Mobile systems
o Mobile systems which use wireless communication in shared broadcast medium have
issues related to physical layer such as transmission range, power, battery power
consumption, interfacing with wired internet, signal processing and interference.
o The issues pertaining to other higher layers include routing, location management,
channel allocation, localization and position estimation, and mobility management.
o Apart from the above mentioned common challenges, the architectural differences of
the mobile network demands varied treatment. The two architectures are:
o Base-station approach (cellular approach): The geographical region is divided into
hexagonal physical locations called cells. The powerful base station transmits signals
to all other nodes in its range
o Ad-hoc network approach: This is an infrastructure-less approach which do not have
any base station to transmit signals. Instead all the responsibility is distributed among
the mobile nodes.
o It is evident that both the approaches work in different environment with different
principles of communication. Designing a distributed system to cater the varied need
is a great challenge.

Sensor networks
o A sensor is a processor with an electro-mechanical interface that is capable of sensing
physical parameters.
o They are low cost equipment with limited computational power and battery life. They
are designed to handle streaming data and route it to external computer network and
processes.
o They are susceptible to faults and have to reconfigure themselves.
o These features introduces a whole new set of challenges, such as position estimation
and time estimation when designing a distributed system .
Ubiquitous or pervasive computing
o In Ubiquitous systems the processors are embedded in the environment to perform
application functions in the background.
o Examples: Intelligent devices, smart homes etc.
o They are distributed systems with recent advancements operating in wireless
environments through actuator mechanisms.

CS3551 Distributed Computing Page 20


o They can be self-organizing and network-centric with limited resources.
Peer-to-peer computing
o Peer-to-peer (P2P) computing is computing over an application layer network where
all interactions among the processors are at a same level.
o This is a form of symmetric computation against the client sever paradigm.
o They are self-organizing with or without regular structure to the network.
o Some of the key challenges include: object storage mechanisms, efficient object lookup,
and retrieval in a scalable manner; dynamic reconfiguration with nodes as well as
objects joining and leaving the network randomly; replication strategies to expedite
object search; tradeoffs between object size latency and table sizes; anonymity, privacy,
and security.
Publish-subscribe, content distribution, and multimedia
o The users in present day require only the information of interest.
o In a dynamic environment where the information constantly fluctuates there is great
demand for
i) Publish: an efficient mechanism for distributing this information
ii) Subscribe: an efficient mechanism to allow end users to indicate interest in
receiving specific kinds of information
iii) An efficient mechanism for aggregating large volumes of published information
and filtering it as per the user’s subscription filter.
o Content distribution refers to a mechanism that categorizes the information based on
parameters.
o The publish subscribe and content distribution overlap each other.
o Multimedia data introduces special issue because of its large size.
Distributed agents
o Agents are software processes or sometimes robots that move around the system to
do specific tasks for which they are programmed.
o Agents collect and process information and can exchange such information with other
agents.
o Challenges in distributed agent systems include coordination mechanisms among the
agents, controlling the mobility of the agents, their software design and interfaces.
Distributed data mining
o Data mining algorithms process large amount of data to detect patterns and trends in
the data, to mine or extract useful information.
o The mining can be done by applying database and artificial intelligence techniques to
a data repository.
Grid computing
o Grid computing is deployed to manage resources. For instance, idle CPU cycles of
machines connected to the network will be available to others.

CS3551 Distributed Computing Page 21


o The challenges includes: scheduling jobs, framework for implementing quality of
service, real-time guarantees, security.
Security in distributed systems
The challenges of security in a distributed setting include: confidentiality, authentication and
availability. This can be addressed using efficient and scalable solutions
2 A MODEL OF DISTRIBUTED COMPUTATIONS

2.1 A DISTRIBUTED PROGRAM

 A distributed program is composed of a set of ‘n’ asynchronous processes p1, p2, .., pi ,...., pn.
that communicate by message passing over the communication network. Each process may
run on different processor.
 The processes do not share a global memory and communicate solely by passing messages.
These processes do not share a global clock that is instantaneously accessible to these
processes.
 Process execution and message transfer are asynchronous – a process may execute an action
spontaneously and a process sending a message does not wait for the delivery of the
message to be complete.
 The global state of a distributed computation is composed of the states of the processes and
the communication channels. The state of a process is characterized by the state of its local
memory and depends upon the context.
 The state of a channel is characterized by the set of messages in transit in the channel.
 Let Cij denote the channel from process pi to process pj and let mij denote a message sent by
pi to pj.
 The message transmission delay is finite and unpredictable.

2.2 A MODEL OF DISTRIBUTED EXECUTIONS

 The execution of a process consists of a sequential execution of its actions.


 The actions are atomic and the actions of a process are modeled as three types of events:
internal events, message send events, and message receive events.
 The occurrence of events changes the states of respective processes and channels, thus
causing transitions in the global system state.
 An internal event changes the state of the process at which it occurs.
 A send event changes the state of the process that sends the message and the state of the
channel on which the message is sent.
 The execution of process pi produces a sequence of events ei1, ei2, ei3, …, and it is denoted by
Hi:
 Hi =(hi→i). Here hi are states produced by pi and are the casual dependencies among
events pi.

CS3551 Distributed Computing Page 22


 Relation →msg indicates the dependency that exists due to message passing between two
events.

Relation →msg defines causal dependencies between the pairs of corresponding send and receive
events.

Fig 1.8: Space time distribution of distributed systems

 An internal event changes the state of the process at which it occurs. A send event changes
the state of the process that sends the message and the state of the channel on which the
message is sent.
 A receive event changes the state of the process that receives the message and the state of the
channel on which the message is received.

2.2.1 Casual Precedence Relations

Happen Before Relation

The partial ordering obtained by generalizing the relationship between two process is called as
happened-before relation or causal ordering or potential causal ordering. This term was coined
by Lamport. Happens-before defines a partial order of events in a distributed system. Some
events can’t be placed in the order. If say A →B if A happens before B. A→B is defined using the
following rules:
 Local ordering:A and B occur on same process and A occurs before B.
 Messages: send(m) → receive(m) for any message m
 Transitivity: e → e’’ if e → e’ and e’ → e’’

Ordering can be based on two situations:


1. If two events occur in same process then they occurred in the order observed.
2. During message passing, the event of sending message occurred before the event ofreceiving
it.

Lamports ordering is happen before relation denoted by →


 a→b, if a and b are events in the same process and a occurred before b.

CS3551 Distributed Computing Page 23


 a→b, if a is the vent of sending a message m in a process and b is the event of thesame
message m being received by another process.
 If a→b and b→c, then a→c. Lamports law follow transitivity property.

When all the above conditions are satisfied, then it can be concluded that a→b is casually related.
Consider two events c and d; c→d and d→c is false (i.e) they are not casually related, then c and
d are said to be concurrent events denoted as c||d.

Casual Precedence

CS3551 Distributed Computing Page 24


2.2.2 Logical vs physical concurrency

Physical as well as logical concurrency is two events that creates confusion in distributed systems.
Physical concurrency: Several program units from the same program that execute simultaneously.
Logical concurrency: Multiple processors providing actual concurrency. The actual execution of
programs is taking place in interleaved fashion on a single processor.

Differences between logical and physical concurrency

Logical concurrency Physical concurrency


Several units of the same program execute Several program units of the same program
simultaneously on same processor, giving an execute at the same time on different
illusion to the programmer that they are processors.
executing on multiple processors.
They are implemented through interleaving. They are implemented as uni-processor with
I/O channels, multiple CPUs, network of
uni ormulti CPU machines.

In a distributed computation, two events are logically concurrent if and only if they do not causally
affect each other.
 Physical concurrency, on the other hand, has a connotation that the events occur at the same
instant in physical time.

CS3551 Distributed Computing Page 25


 Two or more events may be logically concurrent even though they do not occur at the same
instant in physical time.
 However, if processor speed and message delays would have been different, the execution
of these events could have very well coincided in physical time.
 Whether a set of logically concurrent events coincide in the physical time or not, does not
change the outcome of the computation.
 Therefore, eventhough has set of logically concurrent events may not have occurred at the
same instant in physical time, we can assume that these events occurred at the same instant
in physical time

2.3 MODELS OF COMMUNICATION NETWORKS


There are several models of the service provided by communication networks, namely, FIFO, Non-
FIFO, and causal ordering.
 In the FIFO model, each channel acts as a first-in first-out message queue and thus, message
ordering is preserved by a channel.
 In the non-FIFO model, a channel acts like a set in which the sender process adds messages
and the receiver process removes messages from it in a random order.
 The “causal ordering” model is based on Lamport’s “happens before” relation.
 A system that supports the causal ordering model satisfies the following property:

 This property ensures that causally related messages destined to the same destination are
delivered in an order that is consistent with their causality relation.
 Causally ordered delivery of messages implies FIFO message delivery. (Note that CO  FIFO
 Non-FIFO.)
 Causal ordering model considerably simplifies the design of distributed algorithms because
it provides a built-in synchronization.

2.4 GLOBAL STATE OF A DISTRIBUTED SYSTEM

“The global state of a distributed system is a collection of the local states of its components, namely,
the processes and the communication channels.”
• The state of a process is defined by the contents of processor registers, stacks, local memory, etc.
and depends on the local context of the distributed application.
• The state of channel is given by the set of messages in transit in the channel.
• The occurrence of events changes the states of respective processes and channels.

CS3551 Distributed Computing Page 26


• An internal event changes the state of the process at which it occurs.
• A send event changes the state of the process that sends the message and the state of the channel
on which the message is sent.
• A receive event changes the state of the process that or receives the message and the state of the
channel on which the message is received.

A Consistent Global State


 Even if the state of all the components is not recorded at the same instant, such a state will be
meaningful provided every message that is recorded as received is also recorded as sent.
 Basic idea is that a state should not violate causality – an effect should not be present without
its cause. A message cannot be received if it was not sent.
 Such states are called consistent global states and are meaningful global states

CS3551 Distributed Computing Page 27


An Example
Consider the distributed execution Figure

CS3551 Distributed Computing Page 28

You might also like