0% found this document useful (0 votes)
63 views65 pages

CS441 S19 Mass - Storage PDF

The document discusses I/O and mass storage concepts in operating systems, including disk structure and scheduling algorithms used by the OS to efficiently manage requests to disk devices and improve both access time and bandwidth. It covers topics like disk architecture, controllers, bus standards, and algorithms like first-come, first-served, shortest-seek-time-first, SCAN, and C-SCAN.

Uploaded by

Devi Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views65 pages

CS441 S19 Mass - Storage PDF

The document discusses I/O and mass storage concepts in operating systems, including disk structure and scheduling algorithms used by the OS to efficiently manage requests to disk devices and improve both access time and bandwidth. It covers topics like disk architecture, controllers, bus standards, and algorithms like first-come, first-served, shortest-seek-time-first, SCAN, and C-SCAN.

Uploaded by

Devi Prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

CS 441/541: Operating System Concepts

I/O and Mass Storage

Prof. Foley
Spring 2019 1
Announcements

• Sign up for a demo slot for the Synchronization Project


• See Calendly link on Canvas
• See project description for the structure of the demo

• File systems assignment will be released next week


• Don’t forget to work on your research projects!

Spring 2019 2
The I/O Problem

• There are many I/O devices that the system needs in


order to do interesting work
• The access patterns and speeds of these devices vary
• Solution:
• Hierarchical Approach:
• Fast things are connected via faster buses, closer to the CPU
• Less fast things are on less fast, shared buses

Spring 2019 3
General System Architecture

Memory bus – used very


often, speed of this
connection has a HUGE
impact on the overall
performance of the machine

PCI – fast I/O devices that


require low latency and high
bandwidth. Fairly frequently
used. (Network cards, GPUs)

Peripherals – everything
else! Infrequent access,
devices are much
slower. (Keyboard,
mouse, external drives
and devices, storage)
Spring 2019 4
Modern System Architecture

Trends:
• Faster, fatter buses to
fastest devices
(memory, GPU)
• I/O chip dedicated to
managing I/O from
other devices
(network, storage,
slow connections)
• New faster versions
of buses to work with
the new architecture
(PCIe, eSATA)

Spring 2019 5
I/O Device

• Every I/O device must expose an interface so it can interact with


the host system, and has an internal structure to store data and
perform actions pertinent to its function.
• Interface
• Mechanism to communicate with the device. Can be expressed as registers.
• Internal Structure
• Microcontroller, memory, other chips, and code that communicates with the
host system through the interface.

Spring 2019 6
Canonical Protocol

• An OS can tell the device what to do by issuing commands to the device. However,
commands can only be issued when the device is not busy with something else.

While (STATUS == BUSY)


; // wait until device is not busy
Write data to DATA register
Write command to COMMAND register
(Doing so starts the device and executes the command)
While (STATUS == BUSY)
; // wait until device is done with your request

Spring 2019 7
Two ways to Check on a Device

• Polling
• Periodically check on the device to see if it is ready
• Pros:
• Simple
• Good for something that will happen fast
• Cons:
• Setting an appropriate time interval
• No overlap of I/O and computation
• Interrupts
• Main goal is to be able to do something productive while waiting
on I/O
• OS issues the request, calling process is put to sleep, context
switch to another task.
• When the device is ready, it issues a hardware interrupt.
Spring 2019 8
Interrupts

• Allows for other things to happen while we wait on I/O


• Pros:
• Good resource utilization
• Good for unpredictable I/O
• Good for slow I/O
• Cons:
• More complex hardware and software
• Can be slower than polling

Spring 2019 9
More I/O Optimizations: DMA

• When the data is ready from a device, the CPU then


needs to copy it into kernel memory, and then copy it
into user memory.
• DMA – Direct Memory Access
• OS sets up a region of memory for the device to write to when the
data is ready, OS issues the request to the device, the device
directly writes the data to memory and notifies the OS when the
data is ready.

Spring 2019 10
What is a driver?

• For the canonical device, we said the OS writes to the registers, but how does the
OS know how to write to those registers and what the commands are?
• OS has a basic API for dealing with devices that is exposed to the users (POSIX is
one such API)
• OS provides a generic device interface for reading and writing blocks of data to
devices
• Device drivers implement the specific interface to the devices based on the
generic API requests.

Spring 2019 11
Mass-Storage

Spring 2019 12
Mass Storage

• File systems are implemented on top of non-volatile data


storage devices.
• Hard disk drives are currently the predominant persistent
storage device. This has been the case for decades and
has led to optimizations in the operating system for
dealing with these devices.
• Tape storage was a dominant secondary storage device,
and remains a popular tertiary storage device for long
term backups and very large datasets.
• Optimizations for tapes also exist. tar, a common UNIX utility
stands for tape archiver and lays out a set of files in such a way
that is efficient for tape storage.

Spring 2019 13
Mass-Storage

• Transfer Rate
• Rate of data flow between driver and
computer
• Positioning Time
• Seek time + rotational latency
• Seek Time:
• Time to move disk arm to desired cylinder
• Rotational Latency:
• Time for desired sector to rotate under the
disk head
• Bandwidth
• Rate of I/O transfer of sectors
• Head Crash
• Results from head making contact with
the disk surface

Spring 2019 14
Mass-Storage

• I/O Bus: Perform a disk I/O operation


• Wires connecting driver to • Place command into the host
computer controller
• EIDE, ATA, SATA, USB, Fiber • Host controller send the command
Channel, SCSI via messages to disk controller
• Controllers: • Disk controller operates the disk-
• Control data transfers on an I/O driver hardware
bus • Disk cache <-> surface
• Host Controller • Transfer back to host
Computer side • Disk cache <-> host controller
• Disk Controller
Built into the disk

Spring 2019 15
Operating System Support

• 2 major jobs of an OS
• Manage physical devices
• Present a virtual machine abstraction to applications
• For hard disks, OS provides 2 abstractions:
• Raw device: Array of data blocks
• File system: Queuing and scheduling of interleaved requests
from several applications

Spring 2019 22
Disk Structure

• Disks are addressed as large 1-Dimensional arrays of


logical blocks
• Logical Block: Smallest unit of transfer (512B or 4KB)
• Sequential mapping of logical blocks to sectors
• Sector 0:
• First sector of the first track on the outermost cylinder
• Proceed:
• Through first track, then
• Remainder of tracks on that cylinder, then
• Through the rest of the cylinders from outer
to inner

Spring 2019 23
Disk Structure

Why might we want to start our mapping of logical


blocks to sectors starting with the first sector of the
first track on the outermost cylinder?

Spring 2019 24
Disk Scheduling

• OS is responsible for using hardware efficiently


• For disks this means having fast access and large bandwidth
• Access Time:
• Seek Time: Time for the disk arm to move the head to the
cylinder containing the desired sector
• Rotational Latency: Additional time for the disk to rotate the
desired sector to the disk head
• Bandwidth: Total number of bytes transferred, divided
by the total time between the first request for service
and the completion of the last transfer

Improve both access time and bandwidth by managing the


order in which disk I/O requests are serviced
Spring 2019 25
Disk Scheduling

• I/O Request System Call:


• Input/Output
• Disk Address
• Memory Address
• Number of sectors

• If disk is available,
• Process request immediately
• Otherwise,
• Place request in queue
• OS chooses a pending request when disk becomes available
Improve both access time and bandwidth by managing the
order in which disk I/O requests are serviced
Spring 2019 26
Disk Scheduling

• Request Queue:
• Head: 53
• Queue: 98, 183, 37, 122, 14, 124, 65, 67 (Cylinders)
• Algorithms
• FCFS: First-Come, First-Served
• SSTF: Shortest-Seek-Time-First
• SCAN: a.k.a., Elevator Algorithm
• C-SCAN: Circular SCAN
• LOOK
• C-LOOK

Primary Goal: Minimize seek time


Spring 2019 27
Mass-Storage

Spring 2019 28
First-Come, First-Served (FCFS)

Spring 2019 29
First-Come, First-Served (FCFS)

• 53- 98 = 45
• 98-183 = 85
• 183- 37 = 146
• 37-122 = 85
• 122- 14 = 108
• 14-124 = 110
• 124- 65 = 59
• 65- 67 = 2

• Fair, but does not perform well due to wild swings between
cylinders (e.g., 122 to 14)
• Total Head Movement: 640 Cylinders (640/8 = 80)
Spring 2019 30
Shortest-Seek-Time-First (SSTF)

• Select request with the minimum seek time from the current
position

Spring 2019 31
Shortest-Seek-Time-First (SSTF)

• 53- 65 = 12
• 65- 67 = 2
• 67- 37 = 30
• 37- 14 = 23
• 14- 98 = 84
• 98-122 = 24
• 122-124 = 2
• 124-183 = 59

• Select request with the minimum seek time from the current
position – May cause starvation of some requests
• Total Head Movement: 236 Cylinders (236/8 = 29.5)
Spring 2019 32
Elevator (SCAN)

• Move disk arm from one end to the other servicing requests on the
way. Reverse direction on when you hit the other end

Spring 2019 33
Elevator (SCAN)

• 53- 37 = 16
• 37- 14 = 23
• 14- 0 = 14
• 0- 65 = 65
• 65- 67 = 2
• 67- 98 = 31
• 98-122 = 24
• 122-124 = 2
• 124-183 = 59
• Move disk arm from one end to the other servicing requests on the
way. Reverse direction on when you hit the other end
• Total Head Movement: 236 Cylinders (236/8 = 29.5)
Spring 2019 34
Circular Elevator (C-SCAN)

• Move disk arm from one end to the other servicing requests on
the way. Return to the beginning of the disk when you hit the
other end.

Spring 2019 35
Circular Elevator (C-SCAN)
• 53- 65 = 12
• 65- 67 = 2
• 67- 98 = 31
• 98-122 = 24
• 122-124 = 2
• 124-183 = 59
• 183-199 = 16
• 199- 0 = 199
• 0- 14 = 14
• 14- 37 = 23
• Move disk arm from one end to the other servicing requests on
the way. Return to the beginning of the disk when you hit the
other end. Results in a more uniform wait time
• Total Head Movement: 382 Cylinders (382/8 = 47.8)
Spring 2019 36
Circular Look (C-LOOK)

• Move disk arm from one end to the other servicing requests on the
way. Return to the first req. near the beginning of the disk when
you service the last request in that direction

Spring 2019 37
Circular Look (C-LOOK)
• 53- 65 = 12
• 65- 67 = 2
• 67- 98 = 31
• 98-122 = 24
• 122-124 = 2
• 124-183 = 59
• 183- 14 = 169
• 14- 37 = 23

• Move disk arm from one end to the other servicing requests on the
way. Return to the first req. near the beginning of the disk when
you service the last request in that direction
• Total Head Movement: 322 Cylinders (322/8 = 40.25)
Spring 2019 38
Selecting a Scheduling Algorithm

• Performance depends on the Algorithm Total Avg.


number and types of requests Movement Movement

• SSTF is common because of FCFS 640 80.0

improvement over FCFS SSTF 236 29.5

• SCAN and C-SCAN perform SCAN 236 29.5


better for systems that place a C-SCAN 382 47.8
heavy load on the disk C-LOOK 322 40.3
• Less likely to cause starvation

• File-allocation method influences disk requests


• Sequential reading of a contiguously allocated file will generate requests that
are close together on disk

• Default choice: SSTF or LOOK


Spring 2019 39
What Does Linux Do?

• Anticipatory Scheduling:
• Assume: An I/O request will be closely followed by another
nearby request
• After servicing a request – Wait!
• If a nearby request occurs soon, service it.
• If not, C-LOOK.

Spring 2019 40
Disk Management

• Low-Level Formatting (Physical Formatting)


Dividing a disk into sectors that the disk controller can
read/write
• Special data structure per sector
<header, data, trailer>
• Sector size = 512 bytes
• Error-Correcting Code (ECC):
Used when reading a sector to determine if it has been damaged.
Updated when writing to a sector.

Spring 2019 41
Disk Management

• Low-Level Formatting (Physical Formatting)


Dividing a disk into sectors that the disk controller can
read/write
• Special data structure per sector
<header, data, trailer>
• Sector size = 512 bytes
• 4K sector sizes becoming more popular

• Error-Correcting Code (ECC): (stored in header)


During reading, used to determine if sector is damaged.
During writing, updated to reflect state of current data.
• Hamming codding or Reed-Solomon coding popular

Spring 2019 42
Disk Management

• Logical Formatting: File System


• OS puts its own data structures on disk
• Partition: Group of one or more cylinders
• To increase efficiency the file system groups blocks into clusters
• Disk I/O occurs in blocks
• File I/O occurs in clusters
• Helps ensure that I/O has more sequential-access and fewer random-access
characteristics

Spring 2019 43
Swap-Space Management

• Swap-space:
Virtual memory using disk as an extension of main
memory
• Typically a separate disk partition

Spring 2019 44
Performance

• 2 aspects of disk storage speed:


• Latency
• Bandwidth (Bytes per second)

• Sustained Bandwidth:
Avg. data rate during a large transfer
• Data rate when the data stream is actually flowing

• Effective Bandwidth:
Avg. over the entire I/O time
• Includes seek()/locate(), and cartridge switching
• Data rate provided by the drive
Spring 2019 45
Performance

• Access Latency:
Time needed to locate data
• Disk: (~5 milliseconds)
Move arm to the cylinder and wait for rotational latency
• Tape: (tens of seconds)
Winding tape reels until the selected block reaches the tape head
• If in a silo (jukebox) then we must add cartridge swapping time
[Could mean minutes]

Spring 2019 46
Disk Attachment

• Host-Attached Storage
Storage accessed through local I/O ports talk to I/O
busses
• SCSI, Fiber Channel, SATA
• Network-Attached Storage (NAS)
Storage made available over a network rather than a
local connection
• NFS, CIFS
• Storage-Area Network (SAN)
Private network connecting
severs and storage units

Spring 2019 47
Hardware Failures

Spring 2019 48
Hardware Failures

Spring 2019 49
Software Failures More on failure

• Software failures can be characterized by keeping track of software


defect density in the system. This number can be obtained by
keeping track of historical software defect history. Defect density will
depend on the following factors:
• Software process used to develop the design and code (use of peer level
design/code reviews, unit testing)
• Complexity of the software
• Size of the software
• Experience of the team developing the software
• Percentage of code reused from a previous stable project
• Rigor and depth of testing before product is shipped.
• Defect density is typically measured in number of defects per
thousand lines of code (defects/KLOC).

Spring 2019 50
RAID
http://www.acnc.com/raidedu/0
• RAID:
Redundant Arrays of Independent Disks
• Improve reliability and, possibly, performance
• “A case for redundant arrays of inexpensive disks”,
Patterson, Gibson, Katz in 1988.
• Reliability through redundancy
• MTBF: Mean Time Between Failure
• 10x250 GB disks with MTBF of 100 years each would result in a
2.5 TB System with a MTBF of 100/10 = 10 years

• 3 Core concepts: Mirroring, Striping, Parity


• 6 levels, but most popular levels are: 0, 1, 5, 1+0
Spring 2019 51
RAID Video: Boring, but nice visuals

• Mirroring:
Duplicate every disk.
• 1 logical disk = 2 physical disks, size of 1 physical disk
• Writes carried out twice
• Is there a performance benefit?
• Striping:
Use a group of physical disks as one larger, logical disk
• 1 logical disk = 2 physical disks, size of 2 physical disks
• Bit-level Striping:
Splitting the bits of each byte across multiple disks
• Block-level Striping: (Most common)
Splitting the blocks of a file across multiple disks
• Parity: (ECC)
An XOR calculation used to detect and recover from failures

Spring 2019 52
Error Correcting Codes (ECC)

Spring 2019 53
Parity Bits

• Even/Odd Parity
• Given a set of bits add one more bit as a checksum
• XOR each of the values in order, to determine even parity
• 100101 -> Parity bit 1
• 110011 -> Parity bit 0

Data Parity Bit Final Data


11001000
11010111
10110011
10011001
!

Spring 2019 54
Parity Bits

• Even/Odd Parity
• Given a set of bits add one more bit as a checksum
• XOR each of the values in order, to determine even parity
• 100101 -> Parity bit 1
• 110011 -> Parity bit 0

Data Parity Bit Final Data


11001000 1 110010001
11010111
10110011
10011001
!

Spring 2019 55
Parity Bits

• Even/Odd Parity
• Given a set of bits add one more bit as a checksum
• XOR each of the values in order, to determine even parity
• 100101 -> Parity bit 1
• 110011 -> Parity bit 0

Data Parity Bit Final Data


11001000 1 110010001
11010111 0 110101110
10110011 1 101100111
10011001 0 100110010
!

Spring 2019 56
Parity Bits

• Even Parity with RAID 4


• RAID 4 keeps a separate disk for the parity bits
• XOR each of the values in order, to determine even parity
• Disk0(101) XOR Disk1(011) XOR Disk2(100) = P(010)

Disk0 Disk1 Disk2 Parity Disk


10101010 01010101 11110000
00111100 11000011 10011001
11001000 11010111 10110011
01110011 00011110 00111101
!

Spring 2019 57
Parity Bits

• Even Parity with RAID 4


• RAID 4 keeps a separate disk for the parity bits
• XOR each of the values in order, to determine even parity
• Disk0(101) XOR Disk1(011) XOR Disk2(100) = P(010)

Disk0 Disk1 Disk2 Parity Disk


10101010 01010101 11110000 00001111
00111100 11000011 10011001
11001000 11010111 10110011
01110011 00011110 00111101
!

Spring 2019 58
Parity Bits

• Even Parity with RAID 4


• RAID 4 keeps a separate disk for the parity bits
• XOR each of the values in order, to determine even parity
• Disk0(101) XOR Disk1(011) XOR Disk2(100) = P(010)

Disk0 Disk1 Disk2 Parity Disk


10101010 01010101 11110000 00001111
00111100 11000011 10011001 01100110
11001000 11010111 10110011 10101100
01110011 00011110 00111101 01010000
!

Spring 2019 59
Parity Bits

• Even Parity with RAID 4


• RAID 4 keeps a separate disk for the parity bits
• XOR each of the values in order, to determine even parity
• Disk0(101) XOR Disk1(011) XOR Disk2(100) = P(010)

Disk0 Disk1 Disk2 Parity Disk


10101010 01010101 11110000 00001111
00111100 11000011 10011001 01100110
11001000 11010111 10110011 10101100
01110011 00011110 00111101 01010000
!

11111011

Spring 2019 60
Parity Bits

• Even Parity with RAID 4


• RAID 4 keeps a separate disk for the parity bits
• XOR each of the values in order, to determine even parity
• Disk0(101) XOR Disk1(011) XOR Disk2(100) = P(010)

Disk0 Disk1 Disk2 Parity Disk


10101010 01010101 11110000 00001111
00111100 11000011 10011001 01100110
11001000 11010111 10110011 10101100
01110011 00011110 00111101 01010000
!

11111011
10111111
10010001
01000000

Spring 2019 61
Parity Bits
video

• Hamming Code (7,4)


• 4 Bits of data + 3 parity bits = 7 bits total
• +1 parity bit for SECDED
• Single Error Correction, Double Error Detection
• Data encoded in example below: 0101

Data : 0101 Data : 0101


Encoded: 0100101 Encoded: 01001011
Spring 2019 62
RAID Levels: RAID 0

RAID 0: Striping
• Break data into blocks, each block
written to a different disk
• Advantages:
• I/O performance improvement by
distributing I/O load
• Simple, easy
• Disadvantages:
• Not fault-tolerant!

Spring 2019 63
RAID Levels: RAID 1

RAID 1: Mirroring
• Duplicate data between disks
• Advantages:
• Fault-tolerant
• 2x Read transaction rate,
1x Write transaction
• Disadvantages:
• High disk overhead
(2 physical disks to 1 logical)

Spring 2019 64
RAID Levels: RAID 2

RAID 2: Hamming (Bit-level) ECC


• 1 parity bit for each byte (8 bits).
Bits of the data striped across disks.
• Parity bits recorded on separate disks.
• On Read, verify correct data and correct if
needed
• Advantages:
• Single bit data error correction “on the fly”
• High transfer rates
• Disadvantages:
• Typically inefficient
• Can detect, but not repair double-bit corruption

Spring 2019 65
RAID Levels: RAID 3

RAID 3: Byte-level ECC


• 1 parity bit for each byte.
Bytes are striped across disks.
• Parity bits recorded on separate disk.
• Check parity on Read, Update on Write
• Advantages:
• High transfer rates for sequential read/write
• Fewer disks needed for parity than RAID 2
• Disadvantages:
• Since a block (which contains multiple bytes)
is spread out across the disks, all disks must
participate in every I/O operation (serializing
I/O)
• Parity disk bottleneck on simultaneous writes

Spring 2019 66
RAID Levels: RAID 4

RAID 4: Block-level ECC


• 1 parity bit for each block.
Blocks are striped across disks.
• Parity bits recorded on separate disk.
• Check parity on Read, Update on Write
• Essentially, RAID 0 with parity bits
• Advantages:
• High transfer rates for large files
• Good random access for reads
• Fewer disks needed for parity than RAID 2
(same as RAID 3)
• Disadvantages:
• Slow write transfer rates for small files
• Parity disk is bottleneck on simultaneous
writes
Spring 2019 67
RAID Levels: RAID 5

RAID 5: Block-level Distributed ECC


• Similar to RAID 4, but distributes parity
among all disks.
• Cannot store parity on same disk as data
• Check parity on Read, Update on Write

• Advantages:
• Requires all but one disk to operate.
• Good transfer rates
• Disadvantages:
• Requires N+1 disks (distributed parity)
• Cannot lose more than one disk at a time
• Difficult to rebuild after a failure

Spring 2019 68
RAID Levels: RAID 6

RAID 6: Block-level Distributed &


Replicated ECC
• Similar to RAID 5, but replicates parity
among disks.
• Cannot store parity on same disk as data
• Check parity on Read,
Update on Write
• Advantages:
• Allows for multiple disk failures
• Good for mission critical applications
• Disadvantages:
• Requires N+2 drives (dual parity)
• Complex

Spring 2019 69
RAID Levels: RAID 0+1 & 1+0

• Performance advantages of RAID 0,


Fault tolerance advantages of RAID 1
RAID 0+1: Mirror of Stripes
• Create a secondary striped set to
mirror a primary striped set.
• First stripe then mirror that stripe
• Typically only able to handle 1 drive failure
at a time
RAID 1+0: Stripe of Mirrors
• Create a striped set of disks to mirror
from a mirrored set
• First mirror then stripe the mirrors
• Often a better choice, provides better fault
tolerance and rebuild performance

Spring 2019 70
Extensions

• RAID does not always prevent or detect data corruption


• Checksums can be associated with files to help detect if
the file was corrupted (and sometimes help in repairing
the data).
• Do not store checksum with the data, only a pointer to it

Spring 2019 71

You might also like