Module-3
Distributed scheduling - Distributed shared memory - Distributed File system – Multimedia file systems - File
placement – Caching.
3.1 Scheduling in Distributed Systems:
The techniques that are used for scheduling the processes in distributed systems are as follows:
1. Task Assignment Approach: In the Task Assignment Approach, the user-submitted process is
composed of multiple related tasks which are scheduled to appropriate nodes in a system to
improve the performance of a system as a whole.
2. Load Balancing Approach: In the Load Balancing Approach, as the name implies, the
workload is balanced among the nodes of the system.
3. Load Sharing Approach: In the Load Sharing Approach, it is assured that no node would be
idle while processes are waiting for their processing.
3.1.1 Characteristics of a Good Scheduling Algorithm:
The following are the required characteristics of a Good Scheduling Algorithm:
The scheduling algorithms that require prior knowledge about the properties and resource
requirements of a process submitted by a user put a burden on the user. Hence, a good
scheduling algorithm does not require prior specification regarding the user-submitted
process.
A good scheduling algorithm must exhibit the dynamic scheduling of processes as the initial
allocation of the process to a system might need to be changed with time to balance the load
of the system.
The algorithm must be flexible enough to process migration decisions when there is a change
in the system load.
The algorithm must possess stability so that processors can be utilized optimally. It is
possible only when thrashing overhead gets minimized and there should no wastage of time
in process migration.
An algorithm with quick decision making is preferable such as heuristic methods that take
less time due to less computational work give near-optimal results in comparison to an
exhaustive search that provides an optimal solution but takes more time.
3.2 What is Distributed Shared Memory and its Advantages?
Distributed shared memory can be achieved via both software and hardware. Hardware examples
include cache coherence circuits and network interface controllers. In contrast, software DSM
systems implemented at the library or language level are not transparent and developers usually
have to program them differently.
What is Distributed Shared Memory?
It is a mechanism that manages memory across multiple nodes and makes inter-process
communications transparent to end-users. The applications will think that they are running
on shared memory. DSM is a mechanism of allowing user processes to access shared data
without using inter-process communications. In DSM every node has its own memory and
provides memory read and write services and it provides consistency protocols. The
distributed shared memory (DSM) implements the shared memory model in distributed
systems but it doesn't have physical shared memory. All the nodes share the virtual address
space provided by the shared memory model. The Data moves between the main memories
of different nodes.
3.2.1 Types of Distributed Shared Memory
1. On-Chip Memory
The data is present in the CPU portion of the chip.
Memory is directly connected to address lines.
On-Chip Memory DSM is expensive and complex.
2. Bus-Based Multiprocessors
A set of parallel wires called a bus acts as a connection between CPU and memory.
Accessing of same memory simultaneously by multiple CPUs is prevented by using some
algorithms.
Cache memory is used to reduce network traffic.
3. Ring-Based Multiprocessors
There is no global centralized memory present in Ring-based DSM.
All nodes are connected via a token passing ring.
In ring-bases DSM a single address line is divided into the shared area.
Advantages of Distributed Shared Memory
Simpler Abstraction: Programmer need not concern about data movement, as the address
space is the same it is easier to implement than RPC.
Easier Portability: The access protocols used in DSM allow for a natural transition from
sequential to distributed systems. DSM programs are portable as they use a common
programming interface.
Locality of Data: Data moved in large blocks i.e. data near to the current memory location
that is being fetched, may be needed future so it will be also fetched.
On-Demand Data Movement: It provided by DSM will eliminate the data exchange phase.
Larger Memory Space: It provides large virtual memory space, the total memory size is the
sum of the memory size of all the nodes, paging activities are reduced.
Better Performance: DSM improve performance and efficiency by speeding up access to
data.
Flexible Communication Environment: They can join and leave DSM system without
affecting the others as there is no need for sender and receiver to existing,
Process Migration Simplified: They all share the address space so one process can easily be
moved to a different machine.
Disadvantages of Distributed Shared Memory
Accessibility: The data access is slow in DSM as compare to non-distributed.
Consistency: When programming is done in DSM systems, programmers need to maintain
consistency.
Message Passing: DSM use asynchronous message passing and is not efficient as per
other message passing implementation.
Data Redundancy: DSM allows simultaneous access to data, consistency and data
redundancy is common disadvantage.
Lower Performance: CPU gets slowed down, even cache memory does not aid the situation.
Conclusion
Distributed shared memory is the concept used in distributed applications to improve
performance and resource sharing. The architecture addresses all the issues related to
distributed applications. DSM enables user to focus on application logic rather than
management of distributed application. This article helps you to understand DSM
architecture, its pros and cons as well.
3.3 What is DFS (Distributed File System)?
A Distributed File System (DFS) is a file system that is distributed on multiple file servers or multiple
locations. It allows programs to access or store isolated files as they do with the local ones, allowing
programmers to access files from any network or computer. In this article, we will discuss everything
about Distributed File System.
What is DFS (Distributed File System)?
A distributed file system (DFS) is a networked architecture that allows multiple users and applications
to access and manage files across various machines as if they were on a local storage device. Instead
of storing data on a single server, a DFS spreads files across multiple locations, enhancing redundancy
and reliability.
This setup not only improves performance by enabling parallel access but also simplifies data
sharing and collaboration among users.
By abstracting the complexities of the underlying hardware, a distributed file system
provides a seamless experience for file operations, making it easier to manage large volumes
of data in a scalable manner.
Components of DFS
Location Transparency: Location Transparency achieves through the namespace component.
Redundancy: Redundancy is done through a file replication component.
In the case of failure and heavy load, these components together improve data availability by
allowing the sharing of data in different locations to be logically grouped under one folder, which is
known as the "DFS root". It is not necessary to use both the two components of DFS together, it is
possible to use the namespace component without using the file replication component and it is
perfectly possible to use the file replication component without using the namespace component
between servers.
Distributed File System Replication
Early iterations of DFS made use of Microsoft's File Replication Service (FRS), which allowed for
straightforward file replication between servers. The most recent iterations of the whole file are
distributed to all servers by FRS, which recognises new or updated files. "DFS Replication" was
developed by Windows Server 2003 R2 (DFSR). By only copying the portions of files that have
changed and minimising network traffic with data compression, it helps to improve FRS. Additionally,
it provides users with flexible configuration options to manage network traffic on a configurable
schedule.
Features of DFS
Transparency
o Structure transparency: There is no need for the client to know about the number or
locations of file servers and the storage devices. Multiple file servers should be
provided for performance, adaptability, and dependability.
Access transparency: Both local and remote files should be accessible in the same manner.
The file system should be automatically located on the accessed file and send it to the client’s side.
o Naming transparency: There should not be any hint in the name of the file to the
location of the file. Once a name is given to the file, it should not be changed during
transferring from one node to another.
o Replication transparency: If a file is copied on multiple nodes, both the copies of the
file and their locations should be hidden from one node to another.
User mobility: It will automatically bring the user's home directory to the node where the
user logs in.
Performance: Performance is based on the average amount of time needed to convince the
client requests. This time covers the CPU time + time taken to access secondary storage +
network access time. It is advisable that the performance of the Distributed File System be
similar to that of a centralized file system.
Simplicity and ease of use: The user interface of a file system should be simple and the
number of commands in the file should be small.
High availability: A Distributed File System should be able to continue in case of any partial
failures like a link failure, a node failure, or a storage drive crash.
A high authentic and adaptable distributed file system should have different and
independent file servers for controlling different and independent storage devices.
Scalability: Since growing the network by adding new machines or joining two networks
together is routine, the distributed system will inevitably grow over time. As a result, a good
distributed file system should be built to scale quickly as the number of nodes and users in
the system grows. Service should not be substantially disrupted as the number of nodes and
users grows.
Data integrity: Multiple users frequently share a file system. The integrity of data saved in a
shared file must be guaranteed by the file system. That is, concurrent access requests from
many users who are competing for access to the same file must be correctly synchronized
using a concurrency control method. Atomic transactions are a high-level concurrency
management mechanism for data integrity that is frequently offered to users by a file
system.
Security: A distributed file system should be secure so that its users may trust that their data
will be kept private. To safeguard the information contained in the file system from
unwanted & unauthorized access, security mechanisms must be implemented.
History of DFS
The server component of the Distributed File System was initially introduced as an add-on feature. It
was added to Windows NT 4.0 Server and was known as "DFS 4.1". Then later on it was included as a
standard component for all editions of Windows 2000 Server. Client-side support has been included
in Windows NT 4.0 and also in later on version of Windows. Linux kernels 2.6.14 and versions after it
come with an SMB client VFS known as "cifs" which supports DFS. Mac OS X 10.7 (lion) and onwards
supports Mac OS X DFS.
Applications of DFS
NFS: NFS stands for Network File System. It is a client-server architecture that allows a
computer user to view, store, and update files remotely. The protocol of NFS is one of the
several distributed file system standards for Network-Attached Storage (NAS).
CIFS: CIFS stands for Common Internet File System. CIFS is an accent of SMB. That is, CIFS is
an application of SIMB protocol, designed by Microsoft.
SMB: SMB stands for Server Message Block. It is a protocol for sharing a file and was
invented by IBM. The SMB protocol was created to allow computers to perform read and
write operations on files to a remote host over a Local Area Network (LAN). The directories
present in the remote host can be accessed via SMB and are called as "shares".
Hadoop: Hadoop is a group of open-source software services. It gives a software framework
for distributed storage and operating of big data using the MapReduce programming model.
The core of Hadoop contains a storage part, known as Hadoop Distributed File System
(HDFS), and an operating part which is a MapReduce programming model.
NetWare: NetWare is an abandon computer network operating system developed by Novell,
Inc. It primarily used combined multitasking to run different services on a personal computer,
using the IPX network protocol.
Working of DFS
There are two ways in which DFS can be implemented:
Standalone DFS namespace: It allows only for those DFS roots that exist on the local
computer and are not using Active Directory. A Standalone DFS can only be acquired on
those computers on which it is created. It does not provide any fault liberation and cannot be
linked to any other DFS. Standalone DFS roots are rarely come across because of their limited
advantage.
Domain-based DFS namespace: It stores the configuration of DFS in Active Directory,
creating the DFS namespace root accessible
at \\<domainname>\<dfsroot> or \\<FQDN>\<dfsroot>
Advantages of Distributed File System(DFS)
DFS allows multiple user to access or store the data.
It allows the data to be share remotely.
It improved the availability of file, access time, and network efficiency.
Improved the capacity to change the size of the data and also improves the ability to
exchange the data.
Distributed File System provides transparency of data even if server or disk fails.
Disadvantages of Distributed File System(DFS)
In Distributed File System nodes and connections needs to be secured therefore we can say
that security is at stake.
There is a possibility of lose of messages and data in the network while movement from one
node to another.
Database connection in case of Distributed File System is complicated.
Also handling of the database is not easy in Distributed File System as compared to a single
user system.
There are chances that overloading will take place if all nodes tries to send data at once.
Conclusion
The Distributed File System (DFS) allows users to easily access and share files across multiple
servers, increasing availability, performance, and scalability. DFS improves data access and
dependability by offering characteristics such as location transparency, redundancy, and
replication. However, it also offers difficulties such as security threats, data loss, and complex
database management. Despite these drawbacks, DFS is still an important technology for
efficient and robust data management in distributed conditions.
What are Multimedia Systems:
A multimedia system is responsible for developing a multimedia application. A multimedia
application is a bundle of different kinds of data. A multimedia computer system is one that
can create, integrate, store, retrieve delete two or more types of media materials in digital
form, such as audio, image, video, and text information.
Following are some major characteristics or features of a Multimedia System:
Very High Processing Power:
To deal with large amount of data, very high processing power is used.
File System:
File system must be efficient to meet the requirements of continuous media. These media
files requires very high-disk bandwidth rates. Disks usually have low transfer rates and high
latency rates. To satisfy the requirements for multimedia data, disk schedulers must reduce
the latency time to ensure high bandwidth.
File formats that support multimedia:
Multimedia data consists of a variety of media formats or file representation including ,JPEG,
MPEG, AVI, MID, WAV, DOC, GIF,PNG, etc. AVI files can contain both audio and video data in a
file container that allows synchronous audio-with-video playback. Like the DVD video format,
AVI files support multiple streaming audio and video. Because of restrictions on the
conversion from one format to the other, the use of the data in a specific format has been
limited as well.
Input/Output:
In multimedia applications, the input and output should be continuous and fast. Real-time
recording as well as playback of data are common in most of the multimedia applications
which need efficient I/O.
Operating System:
The operating system must provide a fast response time for interactive applications. High
throughput for batch applications, and real-time scheduling,
Storage and Memory:
Multimedia systems require storage for large capacity objects such as video, audio,
animation and images. Depending on the compression scheme and reliability video and
audio require large amount of memory.
Network Support:
It includes internet, intranet, LAN, WAN, ATM, Mobile telephony and others. In recent years,
there has been a tremendous growth of multimedia applications on the internet like
streaming video, IP telephony, interactive games, teleconferencing, virtual world, distance
learning and so on. These multimedia networking applications are referred as continuous-
media applications and require high communication latency. Communication Latency is the
time it takes for a data packet to be received by the remote computer.
Software Tools:
For the development of multimedia applications, the various software tools like
programming languages, graphics software's, multimedia editing software's scripting
languages: authoring tools, design software's etc are required. In addition to these the
device drivers are required for interfacing the multimedia peripherals.
3.5 File Caching in Distributed File Systems
File caching enhances I/O performance because previously read files are kept in the main
memory. Because the files are available locally, the network transfer is zeroed when requests
for these files are repeated. Performance improvement of the file system is based on the
locality of the file access pattern. Caching also helps in reliability and scalability.
File caching is an important feature of distributed file systems that helps to improve
performance by reducing network traffic and minimizing disk access. In a distributed file
system, files are stored across multiple servers or nodes, and file caching involves
temporarily storing frequently accessed files in memory or on local disks to reduce the need
for network access or disk access.
Here are some ways file caching is implemented in distributed file systems:
Client-side caching: In this approach, the client machine stores a local copy of frequently
accessed files. When the file is requested, the client checks if the local copy is up-to-date
and, if so, uses it instead of requesting the file from the server. This reduces network traffic
and improves performance by reducing the need for network access.
Server-side caching: In this approach, the server stores frequently accessed files in memory
or on local disks to reduce the need for disk access. When a file is requested, the server
checks if it is in the cache and, if so, returns it without accessing the disk. This approach can
also reduce network traffic by reducing the need to transfer files over the network.
Distributed caching: In this approach, the file cache is distributed across multiple servers or
nodes. When a file is requested, the system checks if it is in the cache and, if so, returns it
from the nearest server. This approach reduces network traffic by minimizing the need for
data to be transferred across the network.
Advantages of file caching in distributed file systems include:
1. Improved performance: By reducing network traffic and minimizing disk access, file caching
can significantly improve the performance of distributed file systems.
2. Reduced latency: File caching can reduce latency by allowing files to be accessed more
quickly without the need for network access or disk access.
3. Better resource utilization: File caching allows frequently accessed files to be stored in
memory or on local disks, reducing the need for network or disk access and improving
resource utilization.
However, there are also some disadvantages to file caching in distributed file systems,
including:
1. Increased complexity: File caching can add complexity to distributed file systems, requiring
additional software and hardware to manage and maintain the cache.
2. Cache consistency issues: Keeping the cache up-to-date can be a challenge, and
inconsistencies between the cache and the actual file system can occur.
3. Increased memory usage: File caching requires additional memory resources to store
frequently accessed files, which can lead to increased
4. memory usage on client machines and servers.
Overall, file caching is an important feature of distributed file systems that can improve
performance and reduce latency. However, it also introduces some complexity and requires
careful management to ensure cache consistency and efficient resource utilization.
The majority of today's distributed file systems employ some form of caching. File caching
schemes are determined by a number of criteria, including cached data granularity, cache
size (large/ small/ fixed/ dynamic), replacement policy, cache location, modification
propagation mechanisms, and cache validation.
Cache Location: The file might be kept in the disc or main memory of the client or the server
in a client-server system with memory and disk.