0% found this document useful (0 votes)
12 views454 pages

ilovepdf_merged_merged

This document provides an introduction to the Internet and its underlying structure, including the concepts of network edge, core, protocols, and physical media. It outlines the importance of protocols in communication and the differences between packet switching and circuit switching. Additionally, it discusses various access networks and their characteristics, emphasizing the interconnected nature of the Internet as a 'network of networks'.

Uploaded by

tanayyurtturk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views454 pages

ilovepdf_merged_merged

This document provides an introduction to the Internet and its underlying structure, including the concepts of network edge, core, protocols, and physical media. It outlines the importance of protocols in communication and the differences between packet switching and circuit switching. Additionally, it discusses various access networks and their characteristics, emphasizing the interconnected nature of the Internet as a 'network of networks'.

Uploaded by

tanayyurtturk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 454

Chapter 1

Introduction

A note on the use of these Powerpoint slides:


We’re making these slides freely available to all (faculty, students, readers).
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs.
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Computer
▪ If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!) Networking: A Top
▪ If you post any slides on a www site, that you note that they are adapted
from (or perhaps identical to) our slides, and note our copyright of this
material.
Down Approach
Thanks and enjoy! JFK/KWR 7th Edition, Global Edition
Jim Kurose, Keith Ross
All material copyright 1996-2016
Pearson
J.F Kurose and K.W. Ross, All Rights Reserved
April 2016
Introduction 1-1
Chapter 1: introduction
our goal: overview:
▪ get “feel” and ▪ what’s the Internet?
terminology ▪ what’s a protocol?
▪ network edge; hosts, access net,
▪ more depth, detail physical media
later in course ▪ network core: packet/circuit
▪ approach: switching, Internet structure
• use Internet as ▪ performance: loss, delay,
throughput
example
▪ security
▪ protocol layers, service models
▪ history

Introduction 1-2
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
▪ end systems, access networks, links
1.3 network core
▪ packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-3
What’s the Internet: “nuts and bolts” view
PC
▪ billions of connected mobile network
server computing devices:
wireless
laptop
• hosts = end systems global ISP

smartphone • running network apps


home
▪ communication links network
regional ISP
wireless • fiber, copper, radio,
links satellite
wired
links • transmission rate:
bandwidth

▪ packet switches: forward


router
packets (chunks of data) institutional
• routers and switches network

Introduction 1-4
“Fun” Internet-connected devices

Web-enabled toaster +
weather forecaster

IP picture frame
http://www.ceiva.com/

Tweet-a-watt:
Slingbox: watch, monitor energy use
control cable TV remotely

sensorized,
bed
mattress
Internet
refrigerator Internet phones

Introduction 1-5
What’s the Internet: “nuts and bolts” view
mobile network
▪ Internet: “network of networks”
• Interconnected ISPs
global ISP
▪ protocols control sending, receiving
of messages
• e.g., TCP, IP, HTTP, Skype, 802.11 home
network
▪ Internet standards regional ISP
• RFC: Request for comments
• IETF: Internet Engineering Task Force

institutional
network

Introduction 1-6
What’s the Internet: a service view
mobile network
▪ infrastructure that provides
services to applications: global ISP

• Web, VoIP, email, games, e-


commerce, social nets, … home
▪ provides programming network
regional ISP
interface to apps
• hooks that allow sending
and receiving app programs
to “connect” to Internet
• provides service options,
analogous to postal service
institutional
network

Introduction 1-7
What’s a protocol?
human protocols: network protocols:
▪ “what’s the time?” ▪ machines rather than
▪ “I have a question” humans
▪ introductions ▪ all communication activity
in Internet governed by
protocols
… specific messages sent
… specific actions taken
when messages protocols define format, order of
received, or other
events messages sent and received
among network entities, and
actions taken on message
transmission, receipt
Introduction 1-8
What’s a protocol?
a human protocol and a computer network protocol:

Hi TCP connection
request
Hi TCP connection
response
Got the
time? Get http://www.awl.com/kurose-ross
2:00
<file>
time

Q: other human protocols?


Introduction 1-9
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
▪ end systems, access networks, links
1.3 network core
▪ packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-10
A closer look at network structure:
▪ network edge: mobile network

• hosts: clients and servers


global ISP
• servers often in data
centers
home
▪ access networks, physical network
regional ISP
media: wired, wireless
communication links

▪ network core:
• interconnected routers
• network of networks institutional
network

Introduction 1-11
Access networks and physical media

Q: How to connect end


systems to edge router?
▪ residential access nets
▪ institutional access
networks (school, company)
▪ mobile access networks
keep in mind:
▪ bandwidth (bits per second)
of access network?
▪ shared or dedicated?

Introduction 1-12
Access network: digital subscriber line (DSL)
central office telephone
network

DSL splitter
modem DSLAM

ISP
voice, data transmitted
at different frequencies over DSL access
dedicated line to central office multiplexer

▪ use existing telephone line to central office DSLAM


• data over DSL phone line goes to Internet
• voice over DSL phone line goes to telephone net
▪ < 2.5 Mbps upstream transmission rate (typically < 1 Mbps)
▪ < 24 Mbps downstream transmission rate (typically < 10 Mbps)
Introduction 1-13
Access network: cable network
cable headend

cable splitter
modem

C
O
V V V V V V N
I I I I I I D D T
D D D D D D A A R
E E E E E E T T O
O O O O O O A A L

1 2 3 4 5 6 7 8 9

Channels

frequency division multiplexing: different channels transmitted


in different frequency bands
Introduction 1-14
Access network: cable network
cable headend

cable splitter cable modem


modem CMTS termination system

data, TV transmitted at different


frequencies over shared cable ISP
distribution network

▪ HFC: hybrid fiber coax


• asymmetric: up to 30Mbps downstream transmission rate, 2
Mbps upstream transmission rate
▪ network of cable, fiber attaches homes to ISP router
• homes share access network to cable headend
• unlike DSL, which has dedicated access to central office
Introduction 1-15
Access network: home network
wireless
devices

to/from headend or
central office
often combined
in single box

cable or DSL modem

wireless access router, firewall, NAT


point (54 Mbps)
wired Ethernet (1 Gbps)

Introduction 1-16
Enterprise access networks (Ethernet)

institutional link to
ISP (Internet)
institutional router

Ethernet institutional mail,


switch web servers

▪ typically used in companies, universities, etc.


▪ 10 Mbps, 100Mbps, 1Gbps, 10Gbps transmission rates
▪ today, end systems typically connect into Ethernet switch

Introduction 1-17
Wireless access networks
▪ shared wireless access network connects end system to router
• via base station aka “access point”

wireless LANs: wide-area wireless access


▪ within building (100 ft.) ▪ provided by telco (cellular)
▪ 802.11b/g/n (WiFi): 11, 54, 450 operator, 10’s km
Mbps transmission rate ▪ between 1 and 10 Mbps
▪ 3G, 4G: LTE

to Internet

to Internet

Introduction 1-18
Host: sends packets of data
host sending function:
▪ takes application message
▪ breaks into smaller two packets,
chunks, known as packets, L bits each
of length L bits
▪ transmits packet into
access network at 2 1
transmission rate R R: link transmission rate
• link transmission rate, host
aka link capacity, aka
link bandwidth

packet time needed to L (bits)


transmission = transmit L-bit =
delay packet into link R (bits/sec)
Introduction 1-19
Physical media
▪ bit: propagates between
transmitter/receiver pairs
▪ physical link: what lies twisted pair (TP)
between transmitter & ▪ two insulated copper
receiver wires
▪ guided media: • Category 5: 100 Mbps, 1
Gbps Ethernet
• signals propagate in solid • Category 6: 10Gbps
media: copper, fiber, coax
▪ unguided media:
• signals propagate freely,
e.g., radio

Introduction 1-20
Physical media: coax, fiber
coaxial cable: fiber optic cable:
▪ two concentric copper ▪ glass fiber carrying light
conductors pulses, each pulse a bit
▪ bidirectional ▪ high-speed operation:
▪ broadband: • high-speed point-to-point
• multiple channels on cable transmission (e.g., 10’s-100’s
Gbps transmission rate)
• HFC
▪ low error rate:
• repeaters spaced far apart
• immune to electromagnetic
noise

Introduction 1-21
Physical media: radio
▪ signal carried in radio link types:
electromagnetic spectrum ▪ terrestrial microwave
▪ no physical “wire” • e.g. up to 45 Mbps channels
▪ bidirectional ▪ LAN (e.g., WiFi)
▪ propagation environment • 54 Mbps
effects: ▪ wide-area (e.g., cellular)
• reflection • 4G cellular: ~ 10 Mbps
• obstruction by objects ▪ satellite
• interference • Kbps to 45Mbps channel (or
multiple smaller channels)
• 270 msec end-end delay
• geosynchronous versus low
altitude

Introduction 1-22
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
▪ end systems, access networks, links
1.3 network core
▪ packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-23
The network core
▪ mesh of interconnected
routers
▪ packet-switching: hosts
break application-layer
messages into packets
• forward packets from one
router to the next, across
links on path from source
to destination
• each packet transmitted at
full link capacity

Introduction 1-24
Packet-switching: store-and-forward

L bits
per packet

3 2 1
source destination
R bps R bps

▪ takes L/R seconds to transmit one-hop numerical example:


(push out) L-bit packet into
link at R bps ▪ L = 7.5 Mbits
▪ store and forward: entire ▪ R = 1.5 Mbps
packet must arrive at router ▪ one-hop transmission
before it can be transmitted delay = 5 sec
on next link
▪ end-end delay = 2L/R (assuming
zero propagation delay) more on delay shortly …
Introduction 1-25
Packet Switching: queueing delay, loss

R = 100 Mb/s C
A
D
R = 1.5 Mb/s
B
queue of packets E
waiting for output link

queuing and loss:


▪ if arrival rate (in bits) to link exceeds transmission rate of link
for a period of time:
• packets will queue, wait to be transmitted on link
• packets can be dropped (lost) if memory (buffer) fills up

Introduction 1-26
Two key network-core functions
routing: determines source-
destination route taken by forwarding: move packets from
packets router’s input to appropriate
▪ routing algorithms router output

routing algorithm

local forwarding table


header value output link
0100 3 1
0101 2
0111 2 3 2
1001 1

destination address in arriving


packet’s header
Introduction 1-27
Alternative core: circuit switching
end-end resources allocated
to, reserved for “call”
between source & dest:
▪ in diagram, each link has four
circuits.
• call gets 2nd circuit in top
link and 1st circuit in right
link.
▪ dedicated resources: no sharing
• circuit-like (guaranteed)
performance
▪ circuit segment idle if not used
by call (no sharing)
▪ commonly used in traditional
telephone networks
Introduction 1-28
Circuit switching: FDM versus TDM
Example:
FDM
4 users

frequency

time
TDM

frequency

time
Introduction 1-29
Packet switching versus circuit switching
packet switching allows more users to use network!

example:
▪ 1 Mb/s link
▪ each user: N
users
• 100 kb/s when “active”
• active 10% of time 1 Mbps link

▪ circuit-switching:
• 10 users
▪ packet switching: Q: how did we get value 0.0004?
• with 35 users, probability >
10 active at same time is less Q: what happens if > 35 users ?
than .0004 *
* Check out the online interactive exercises for more examples: http://gaia.cs.umass.edu/kurose_ross/interactive/
Introduction 1-30
Packet switching versus circuit switching
is packet switching a “slam dunk winner?”
▪ great for bursty data
• resource sharing
• simpler, no call setup
▪ excessive congestion possible: packet delay and loss
• protocols needed for reliable data transfer, congestion
control
▪ Q: How to provide circuit-like behavior?
• bandwidth guarantees needed for audio/video apps
• still an unsolved problem (chapter 7)

Q: human analogies of reserved resources (circuit switching)


versus on-demand allocation (packet-switching)?
Introduction 1-31
Internet structure: network of networks
▪ End systems connect to Internet via access ISPs (Internet
Service Providers)
• residential, company and university ISPs
▪ Access ISPs in turn must be interconnected.
• so that any two hosts can send packets to each other
▪ Resulting network of networks is very complex
• evolution was driven by economics and national policies
▪ Let’s take a stepwise approach to describe current Internet
structure

Introduction 1-32
Internet structure: network of networks
Question: given millions of access ISPs, how to connect them
together?
access access
net net
access
net
access
access net
net
access
access net
net

access access
net net

access
net
access
net

access
net
access
net
access access
net access net
net

Introduction 1-33
Internet structure: network of networks
Option: connect each access ISP to every other access ISP?

access access
net net
access
net
access
access net
net
access
access net
net

connecting each access ISP


access
to each other directly doesn’t access
net
scale: O(N2) connections. net

access
net
access
net

access
net
access
net
access access
net access net
net

Introduction 1-34
Internet structure: network of networks
Option: connect each access ISP to one global transit ISP?
Customer and provider ISPs have economic agreement.
access access
net net
access
net
access
access net
net
access
access net
net

global
access
net
ISP access
net

access
net
access
net

access
net
access
net
access access
net access net
net

Introduction 1-35
Internet structure: network of networks
But if one global ISP is viable business, there will be competitors
….
access access
net net
access
net
access
access net
net
access
access net
net
ISP A

access
net ISP B access
net

access
net
ISP C
access
net

access
net
access
net
access access
net access net
net

Introduction 1-36
Internet structure: network of networks
But if one global ISP is viable business, there will be competitors
…. which must be interconnected
access access
Internet exchange point
net net
access
net
access
access net
net

access
IXP access
net
net
ISP A

access
net
IXP ISP B access
net

access
net
ISP C
access
net

access peering link


net
access
net
access access
net access net
net

Introduction 1-37
Internet structure: network of networks
… and regional networks may arise to connect access nets to
ISPs
access access
net net
access
net
access
access net
net

access
IXP access
net
net
ISP A

access
net
IXP ISP B access
net

access
net
ISP C
access
net

access
net regional net
access
net
access access
net access net
net

Introduction 1-38
Internet structure: network of networks
… and content provider networks (e.g., Google, Microsoft,
Akamai) may run their own network, to bring services, content
close to end users
access access
net net
access
net
access
access net
net

access
IXP access
net
net
ISP A
Content provider network
access
net
IXP ISP B access
net

access
net
ISP C
access
net

access
net regional net
access
net
access access
net access net
net

Introduction 1-39
Internet structure: network of networks

Tier 1 ISP Tier 1 ISP Google

IXP IXP IXP

Regional ISP Regional ISP

access access access access access access access access


ISP ISP ISP ISP ISP ISP ISP ISP

▪ at center: small # of well-connected large networks


• “tier-1” commercial ISPs (e.g., Level 3, Sprint, AT&T, NTT), national &
international coverage
• content provider network (e.g., Google): private network that connects
it data centers to Internet, often bypassing tier-1, regional ISPs Introduction 1-40
Tier-1 ISP: e.g., Sprint

POP: point-of-presence
to/from backbone

peering
… … …

to/from customers

Introduction 1-41
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
▪ end systems, access networks, links
1.3 network core
▪ packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-42
How do loss and delay occur?
packets queue in router buffers
▪ packet arrival rate to link (temporarily) exceeds output link
capacity
▪ packets queue, wait for turn
packet being transmitted (delay)

B
packets queueing (delay)
free (available) buffers: arriving packets
dropped (loss) if no free buffers

Introduction 1-43
Four sources of packet delay
transmission
A propagation

B
nodal
processing queueing

dnodal = dproc + dqueue + dtrans + dprop

dproc: nodal processing dqueue: queueing delay


▪ check bit errors ▪ time waiting at output link
▪ determine output link for transmission
▪ typically < msec ▪ depends on congestion
level of router
Introduction 1-44
Four sources of packet delay
transmission
A propagation

B
nodal
processing queueing

dnodal = dproc + dqueue + dtrans + dprop

dtrans: transmission delay: dprop: propagation delay:


▪ L: packet length (bits) ▪ d: length of physical link
▪ R: link bandwidth (bps) ▪ s: propagation speed (~2x108 m/sec)
▪ dtrans = L/R dtrans and dprop ▪ dprop = d/s
very different
* Check out the online interactive exercises for more examples: http://gaia.cs.umass.edu/kurose_ross/interactive/
* Check out the Java applet for an interactive animation on trans vs. prop delay Introduction 1-45
Caravan analogy
100 km 100 km
ten-car toll toll
caravan booth booth

▪ cars “propagate” at ▪ time to “push” entire


100 km/hr caravan through toll
▪ toll booth takes 12 sec to booth onto highway =
service car (bit transmission 12*10 = 120 sec
time) ▪ time for last car to
▪ car ~ bit; caravan ~ packet propagate from 1st to
▪ Q: How long until caravan is 2nd toll both:
lined up before 2nd toll 100km/(100km/hr)= 1
booth? hr
▪ A: 62 minutes
Introduction 1-46
Caravan analogy (more)
100 km 100 km
ten-car toll toll
caravan booth booth

▪ suppose cars now “propagate” at 1000 km/hr


▪ and suppose toll booth now takes one min to service a car
▪ Q: Will cars arrive to 2nd booth before all cars serviced at first
booth?
• A: Yes! after 7 min, first car arrives at second booth; three
cars still at first booth

Introduction 1-47
Queueing delay (revisited)

average queueing
▪ R: link bandwidth (bps)

delay
▪ L: packet length (bits)
▪ a: average packet arrival
rate
traffic intensity
= La/R
▪ La/R ~ 0: avg. queueing delay small La/R ~ 0

▪ La/R -> 1: avg. queueing delay large


▪ La/R > 1: more “work” arriving
than can be serviced, average delay infinite!

La/R -> 1
* Check online interactive animation on queuing and loss
Introduction 1-48
“Real” Internet delays and routes
▪ what do “real” Internet delay & loss look like?
▪ traceroute program: provides delay
measurement from source to router along end-
end Internet path towards destination. For all i:
• sends three packets that will reach router i on path
towards destination
• router i will return packets to sender
• sender times interval between transmission and reply.

3 probes 3 probes

3 probes

Introduction 1-49
“Real” Internet delays, routes
traceroute: gaia.cs.umass.edu to www.eurecom.fr
3 delay measurements from
gaia.cs.umass.edu to cs-gw.cs.umass.edu
1 cs-gw (128.119.240.254) 1 ms 1 ms 2 ms
2 border1-rt-fa5-1-0.gw.umass.edu (128.119.3.145) 1 ms 1 ms 2 ms
3 cht-vbns.gw.umass.edu (128.119.3.130) 6 ms 5 ms 5 ms
4 jn1-at1-0-0-19.wor.vbns.net (204.147.132.129) 16 ms 11 ms 13 ms
5 jn1-so7-0-0-0.wae.vbns.net (204.147.136.136) 21 ms 18 ms 18 ms
6 abilene-vbns.abilene.ucaid.edu (198.32.11.9) 22 ms 18 ms 22 ms
7 nycm-wash.abilene.ucaid.edu (198.32.8.46) 22 ms 22 ms 22 ms trans-oceanic
8 62.40.103.253 (62.40.103.253) 104 ms 109 ms 106 ms
9 de2-1.de1.de.geant.net (62.40.96.129) 109 ms 102 ms 104 ms link
10 de.fr1.fr.geant.net (62.40.96.50) 113 ms 121 ms 114 ms
11 renater-gw.fr1.fr.geant.net (62.40.103.54) 112 ms 114 ms 112 ms
12 nio-n2.cssi.renater.fr (193.51.206.13) 111 ms 114 ms 116 ms
13 nice.cssi.renater.fr (195.220.98.102) 123 ms 125 ms 124 ms
14 r3t2-nice.cssi.renater.fr (195.220.98.110) 126 ms 126 ms 124 ms
15 eurecom-valbonne.r3t2.ft.net (193.48.50.54) 135 ms 128 ms 133 ms
16 194.214.211.25 (194.214.211.25) 126 ms 128 ms 126 ms
17 * * *
18 * * * * means no response (probe lost, router not replying)
19 fantasia.eurecom.fr (193.55.113.142) 132 ms 128 ms 136 ms

* Do some traceroutes from exotic countries at www.traceroute.org


Introduction 1-50
Packet loss
▪ queue (aka buffer) preceding link in buffer has finite
capacity
▪ packet arriving to full queue dropped (aka lost)
▪ lost packet may be retransmitted by previous node, by
source end system, or not at all

buffer
(waiting area) packet being transmitted
A

B
packet arriving to
full buffer is lost
* Check out the Java applet for an interactive animation on queuing and loss Introduction 1-51
Throughput
▪ throughput: rate (bits/time unit) at which bits
transferred between sender/receiver
• instantaneous: rate at given point in time
• average: rate over longer period of time

server,
server withbits
sends linkpipe
capacity
that can carry linkpipe
capacity
that can carry
file of into
(fluid) F bitspipe Rs bits/sec
fluid at rate Rc bits/sec
fluid at rate
to send to client Rs bits/sec) Rc bits/sec)

Introduction 1-52
Throughput (more)
▪ Rs < Rc What is average end-end throughput?

Rs bits/sec Rc bits/sec

▪ Rs > Rc What is average end-end throughput?

Rs bits/sec Rc bits/sec

bottleneck link
link on end-end path that constrains end-end throughput
Introduction 1-53
Throughput: Internet scenario

▪ per-connection end-
end throughput: Rs
min(Rc,Rs,R/10) Rs Rs
▪ in practice: Rc or Rs
is often bottleneck
R

Rc Rc

Rc

10 connections (fairly) share


backbone bottleneck link R bits/sec
* Check out the online interactive exercises for more
examples: http://gaia.cs.umass.edu/kurose_ross/interactive/ Introduction 1-54
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
▪ end systems, access networks, links
1.3 network core
▪ packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-55
Protocol “layers”
Networks are complex,
with many “pieces”:
▪ hosts Question:
▪ routers is there any hope of
▪ links of various organizing structure of
media network?
▪ applications
▪ protocols …. or at least our
▪ hardware, discussion of networks?
software

Introduction 1-56
Organization of air travel
ticket (purchase) ticket (complain)

baggage (check) baggage (claim)

gates (load) gates (unload)

runway takeoff runway landing

airplane routing airplane routing


airplane routing

▪ a series of steps

Introduction 1-57
Layering of airline functionality

ticket (purchase) ticket (complain) ticket

baggage (check) baggage (claim baggage

gates (load) gates (unload) gate

runway (takeoff) runway (land) takeoff/landing

airplane routing airplane routing airplane routing airplane routing airplane routing

departure intermediate air-traffic arrival


airport control centers airport

layers: each layer implements a service


▪ via its own internal-layer actions
▪ relying on services provided by layer below

Introduction 1-58
Why layering?
dealing with complex systems:
▪ explicit structure allows identification,
relationship of complex system’s pieces
• layered reference model for discussion
▪ modularization eases maintenance, updating of
system
• change of implementation of layer’s service
transparent to rest of system
• e.g., change in gate procedure doesn’t affect rest of
system
▪ layering considered harmful?

Introduction 1-59
Internet protocol stack
▪ application: supporting network
applications
• FTP, SMTP, HTTP application
▪ transport: process-process data
transfer transport
• TCP, UDP
network
▪ network: routing of datagrams from
source to destination
link
• IP, routing protocols
▪ link: data transfer between physical
neighboring network elements
• Ethernet, 802.111 (WiFi), PPP
▪ physical: bits “on the wire”
Introduction 1-60
ISO/OSI reference model
▪ presentation: allow applications
to interpret meaning of data, application
e.g., encryption, compression,
machine-specific conventions presentation
▪ session: synchronization, session
checkpointing, recovery of data transport
exchange
network
▪ Internet stack “missing” these
layers! link
• these services, if needed, must be physical
implemented in application
• needed?

Introduction 1-61
message M
source
application
Encapsulation
segment Ht M transport
datagram Hn Ht M network
frame Hl Hn Ht M link
physical
link
physical

switch

destination Hn Ht M network
M application Hl Hn Ht M link Hn Ht M
Ht M transport physical
Hn Ht M network
Hl Hn Ht M link router
physical

Introduction 1-62
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
▪ end systems, access networks, links
1.3 network core
▪ packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-63
Network security
▪ field of network security:
• how bad guys can attack computer networks
• how we can defend networks against attacks
• how to design architectures that are immune to attacks
▪ Internet not originally designed with (much)
security in mind
• original vision: “a group of mutually trusting users
attached to a transparent network” ☺
• Internet protocol designers playing “catch-up”
• security considerations in all layers!

Introduction 1-64
Bad guys: put malware into hosts via Internet
▪ malware can get in host from:
• virus: self-replicating infection by receiving/executing
object (e.g., e-mail attachment)
• worm: self-replicating infection by passively receiving
object that gets itself executed
▪ spyware malware can record keystrokes, web
sites visited, upload info to collection site
▪ infected host can be enrolled in botnet, used for
spam. DDoS attacks

Introduction 1-65
Bad guys: attack server, network infrastructure
Denial of Service (DoS): attackers make resources
(server, bandwidth) unavailable to legitimate traffic
by overwhelming resource with bogus traffic

1. select target
2. break into hosts around
the network (see botnet)
3. send packets to target from
compromised hosts
target

Introduction 1-66
Bad guys can sniff packets
packet “sniffing”:
▪ broadcast media (shared Ethernet, wireless)
▪ promiscuous network interface reads/records all packets
(e.g., including passwords!) passing by

A C

src:B dest:A payload


B

▪ wireshark software used for end-of-chapter labs is a


(free) packet-sniffer
Introduction 1-67
Bad guys can use fake addresses
IP spoofing: send packet with false source address
A C

src:B dest:A payload

… lots more on security (throughout, Chapter 8)

Introduction 1-68
Chapter 1: roadmap
1.1 what is the Internet?
1.2 network edge
▪ end systems, access networks, links
1.3 network core
▪ packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks
1.5 protocol layers, service models
1.6 networks under attack: security
1.7 history

Introduction 1-69
Internet history
1961-1972: Early packet-switching principles
▪ 1961: Kleinrock - ▪ 1972:
queueing theory shows • ARPAnet public demo
effectiveness of packet- • NCP (Network Control
switching Protocol) first host-host
▪ 1964: Baran - packet- protocol
switching in military nets • first e-mail program
▪ 1967: ARPAnet • ARPAnet has 15 nodes
conceived by Advanced
Research Projects
Agency
▪ 1969: first ARPAnet node
operational

Introduction 1-70
Internet history
1972-1980: Internetworking, new and proprietary nets

▪ 1970: ALOHAnet satellite


network in Hawaii Cerf and Kahn’s
▪ 1974: Cerf and Kahn - internetworking principles:
architecture for interconnecting • minimalism, autonomy - no
networks internal changes required to
▪ 1976: Ethernet at Xerox PARC interconnect networks
• best effort service model
▪ late70’s: proprietary
architectures: DECnet, SNA, • stateless routers
XNA • decentralized control
▪ late 70’s: switching fixed length define today’s Internet
packets (ATM precursor) architecture
▪ 1979: ARPAnet has 200 nodes

Introduction 1-71
Internet history
1980-1990: new protocols, a proliferation of networks

▪ 1983: deployment of ▪ new national networks:


TCP/IP CSnet, BITnet, NSFnet,
▪ 1982: smtp e-mail Minitel
protocol defined ▪ 100,000 hosts connected
▪ 1983: DNS defined for to confederation of
name-to-IP-address networks
translation
▪ 1985: ftp protocol defined
▪ 1988: TCP congestion
control

Introduction 1-72
Internet history
1990, 2000’s: commercialization, the Web, new apps
▪ early 1990’s: ARPAnet late 1990’s – 2000’s:
decommissioned ▪ more killer apps: instant
▪ 1991: NSF lifts restrictions on messaging, P2P file sharing
commercial use of NSFnet ▪ network security to
(decommissioned, 1995) forefront
▪ early 1990s: Web ▪ est. 50 million host, 100
• hypertext [Bush 1945, million+ users
Nelson 1960’s] ▪ backbone links running at
• HTML, HTTP: Berners-Lee Gbps
• 1994: Mosaic, later Netscape
• late 1990’s:
commercialization of the Web

Introduction 1-73
Internet history
2005-present
▪ ~5B devices attached to Internet (2016)
• smartphones and tablets
▪ aggressive deployment of broadband access
▪ increasing ubiquity of high-speed wireless access
▪ emergence of online social networks:
• Facebook: ~ one billion users
▪ service providers (Google, Microsoft) create their own
networks
• bypass Internet, providing “instantaneous” access to
search, video content, email, etc.
▪ e-commerce, universities, enterprises running their
services in “cloud” (e.g., Amazon EC2)

Introduction 1-74
Introduction: summary
covered a “ton” of material! you now have:
▪ Internet overview ▪ context, overview, “feel”
▪ what’s a protocol? of networking
▪ network edge, core, access ▪ more depth, detail to
network follow!
• packet-switching versus
circuit-switching
• Internet structure
▪ performance: loss, delay,
throughput
▪ layering, service models
▪ security
▪ history

Introduction 1-75
Chapter 1
Additional Slides

Introduction 1-76
application
(www browser,
packet
email client)
analyzer
application

OS
packet Transport (TCP/UDP)
Network (IP)
capture copy of all
Ethernet Link (Ethernet)
(pcap) frames
sent/receive Physical
d
Chapter 2
Application Layer

A note on the use of these Powerpoint slides:


We’re making these slides freely available to all (faculty, students, readers).
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs.
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Computer
▪ If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!) Networking: A Top
▪ If you post any slides on a www site, that you note that they are adapted
from (or perhaps identical to) our slides, and note our copyright of this Down Approach
material.
7th Edition, Global Edition
Thanks and enjoy! JFK/KWR Jim Kurose, Keith Ross
Pearson
All material copyright 1996-2016
April 2016
J.F Kurose and K.W. Ross, All Rights Reserved
Application Layer 2-1
Chapter 2: outline
2.1 principles of network 2.5 P2P applications
applications 2.6 video streaming and
2.2 Web and HTTP content distribution
2.3 electronic mail networks
• SMTP, POP3, IMAP 2.7 socket programming
2.4 DNS with UDP and TCP

Application Layer 2-2


Chapter 2: application layer
our goals: ▪ learn about protocols by
▪ conceptual, examining popular
implementation aspects application-level
of network application protocols
protocols • HTTP
• transport-layer • FTP
service models • SMTP / POP3 / IMAP
• DNS
• client-server
paradigm ▪ creating network
applications
• peer-to-peer
paradigm • socket API
• content distribution
networks

Application Layer 2-3


Some network apps
▪ e-mail ▪ voice over IP (e.g.,
▪ web Skype)
▪ text messaging ▪ real-time video
▪ remote login conferencing
▪ P2P file sharing ▪ social networking
▪ multi-user network ▪ search
games ▪ …
▪ streaming stored ▪ …
video (YouTube, Hulu,
Netflix)

Application Layer 2-4


Creating a network app application
transport
network
data link
physical
write programs that:
▪ run on (different) end systems
▪ communicate over network
▪ e.g., web server software
communicates with browser
software

no need to write software


application
transport
network
for network-core devices data link
physical
application
transport
network
▪ network-core devices do not data link
physical
run user applications
▪ applications on end systems
allows for rapid app
development, propagation
Application Layer 2-5
Application architectures
possible structure of applications:
▪ client-server
▪ peer-to-peer (P2P)

Application Layer 2-6


Client-server architecture
server:
▪ always-on host
▪ permanent IP address
▪ data centers for scaling

clients:
▪ communicate with server
client/server ▪ may be intermittently
connected
▪ may have dynamic IP
addresses
▪ do not communicate directly
with each other
Application Layer 2-7
P2P architecture
▪ no always-on server peer-peer
▪ arbitrary end systems
directly communicate
▪ peers request service from
other peers, provide service
in return to other peers
• self scalability – new
peers bring new service
capacity, as well as new
service demands
▪ peers are intermittently
connected and change IP
addresses
• complex management

Application Layer 2-8


Processes communicating
process: program running clients, servers
within a host client process: process that
▪ within same host, two initiates communication
processes communicate server process: process that
using inter-process waits to be contacted
communication (defined by
OS)
▪ processes in different hosts
communicate by exchanging ▪ aside: applications with P2P
messages architectures have client
processes & server
processes

Application Layer 2-9


Sockets
▪ process sends/receives messages to/from its socket
▪ socket analogous to door
• sending process shoves message out door
• sending process relies on transport infrastructure on
other side of door to deliver message to socket at
receiving process

application application
socket controlled by
process process app developer

transport transport
network network controlled
link by OS
link Internet
physical physical

Application Layer 2-10


Addressing processes
▪ to receive messages, ▪ identifier includes both IP
process must have identifier address and port numbers
▪ host device has unique 32- associated with process on
bit IP address host.
▪ Q: does IP address of host ▪ example port numbers:
on which process runs • HTTP server: 80
suffice for identifying the • mail server: 25
process? ▪ to send HTTP message to
▪ A: no, many processes gaia.cs.umass.edu web
can be running on same server:
host • IP address: 128.119.245.12
• port number: 80
▪ more shortly…

Application Layer 2-11


App-layer protocol defines
▪ types of messages open protocols:
exchanged, ▪ defined in RFCs
• e.g., request, response ▪ allows for interoperability
▪ message syntax: ▪ e.g., HTTP, SMTP
• what fields in messages proprietary protocols:
& how fields are
delineated ▪ e.g., Skype
▪ message semantics
• meaning of information
in fields
▪ rules for when and how
processes send & respond
to messages

Application Layer 2-12


What transport service does an app need?
data integrity throughput
▪ some apps (e.g., file transfer, ▪ some apps (e.g.,
web transactions) require multimedia) require
100% reliable data transfer minimum amount of
▪ other apps (e.g., audio) can throughput to be
tolerate some loss “effective”
▪ other apps (“elastic apps”)
timing make use of whatever
▪ some apps (e.g., Internet throughput they get
telephony, interactive security
games) require low delay ▪ encryption, data integrity,
to be “effective” …

Application Layer 2-13


Transport service requirements: common apps

application data loss throughput time sensitive

file transfer no loss elastic no


e-mail no loss elastic no
Web documents no loss elastic no
real-time audio/video loss-tolerant audio: 5kbps-1Mbps yes, 100’s
video:10kbps-5Mbps msec
stored audio/video loss-tolerant same as above
interactive games loss-tolerant few kbps up yes, few secs
text messaging no loss elastic yes, 100’s
msec
yes and no

Application Layer 2-14


Internet transport protocols services
TCP service: UDP service:
▪ reliable transport between ▪ unreliable data transfer
sending and receiving between sending and
process receiving process
▪ flow control: sender won’t ▪ does not provide: reliability,
overwhelm receiver flow control, congestion
▪ congestion control: throttle control, timing,
sender when network throughput guarantee,
overloaded security, or connection
▪ does not provide: timing, setup,
minimum throughput
guarantee, security Q: why bother? Why is
▪ connection-oriented: setup there a UDP?
required between client and
server processes
Application Layer 2-15
Internet apps: application, transport protocols

application underlying
application layer protocol transport protocol

e-mail SMTP [RFC 2821] TCP


remote terminal access Telnet [RFC 854] TCP
Web HTTP [RFC 2616] TCP
file transfer FTP [RFC 959] TCP
streaming multimedia HTTP (e.g., YouTube), TCP or UDP
RTP [RFC 1889]
Internet telephony SIP, RTP, proprietary
(e.g., Skype) TCP or UDP

Application Layer 2-16


Securing TCP

TCP & UDP SSL is at app layer


▪ no encryption ▪ apps use SSL libraries, that
▪ cleartext passwds sent into “talk” to TCP
socket traverse Internet in SSL socket API
cleartext ▪ cleartext passwords sent
SSL into socket traverse
▪ provides encrypted TCP Internet encrypted
connection ▪ see Chapter 8
▪ data integrity
▪ end-point authentication

Application Layer 2-17


Chapter 2: outline
2.1 principles of network 2.5 P2P applications
applications 2.6 video streaming and
2.2 Web and HTTP content distribution
2.3 electronic mail networks
• SMTP, POP3, IMAP 2.7 socket programming
2.4 DNS with UDP and TCP

Application Layer 2-18


Web and HTTP
First, a review…
▪ web page consists of objects
▪ object can be HTML file, JPEG image, Java applet,
audio file,…
▪ web page consists of base HTML-file which
includes several referenced objects
▪ each object is addressable by a URL, e.g.,
www.someschool.edu/someDept/pic.gif

host name path name

Application Layer 2-19


HTTP overview
HTTP: hypertext
transfer protocol
▪ Web’s application layer
protocol PC running
▪ client/server model Firefox browser

• client: browser that


requests, receives,
(using HTTP protocol) server
and “displays” Web running
objects Apache Web
• server: Web server server
sends (using HTTP
protocol) objects in iPhone running
response to requests Safari browser

Application Layer 2-20


HTTP overview (continued)
uses TCP: HTTP is “stateless”
▪ client initiates TCP ▪ server maintains no
connection (creates socket) information about
to server, port 80 past client requests
▪ server accepts TCP
connection from client aside
▪ HTTP messages protocols that maintain
(application-layer protocol “state” are complex!
messages) exchanged ▪ past history (state) must be
between browser (HTTP maintained
client) and Web server ▪ if server/client crashes, their
(HTTP server) views of “state” may be
inconsistent, must be
▪ TCP connection closed reconciled

Application Layer 2-21


HTTP connections
non-persistent HTTP persistent HTTP
▪ at most one object ▪ multiple objects can
sent over TCP be sent over single
connection TCP connection
• connection then between client, server
closed
▪ downloading multiple
objects required
multiple connections

Application Layer 2-22


Non-persistent HTTP
suppose user enters URL: (contains text,
www.someSchool.edu/someDepartment/home.index references to 10
jpeg images)
1a. HTTP client initiates TCP
connection to HTTP server
(process) at 1b. HTTP server at host
www.someSchool.edu on port www.someSchool.edu waiting
80 for TCP connection at port 80.
“accepts” connection, notifying
2. HTTP client sends HTTP request client
message (containing URL) into
TCP connection socket. 3. HTTP server receives request
Message indicates that client message, forms response
wants object message containing requested
someDepartment/home.index object, and sends message into
its socket
time
Application Layer 2-23
Non-persistent HTTP (cont.)
4. HTTP server closes TCP
connection.
5. HTTP client receives response
message containing html file,
displays html. Parsing html file,
finds 10 referenced jpeg objects

time
6. Steps 1-5 repeated for each of
10 jpeg objects

Application Layer 2-24


Non-persistent HTTP: response time

RTT (definition): time for a


small packet to travel from
client to server and back
HTTP response time: initiate TCP
▪ one RTT to initiate TCP connection

connection RTT

▪ one RTT for HTTP request request


file
and first few bytes of HTTP RTT
time to
response to return transmit
file
▪ file transmission time file
received
▪ non-persistent HTTP
response time =
time time
2RTT+ file transmission
time

Application Layer 2-25


Persistent HTTP

non-persistent HTTP issues: persistent HTTP:


▪ requires 2 RTTs per object ▪ server leaves connection
▪ OS overhead for each TCP open after sending
connection response
▪ browsers often open ▪ subsequent HTTP
parallel TCP connections to messages between same
fetch referenced objects client/server sent over
open connection
▪ client sends requests as
soon as it encounters a
referenced object
▪ as little as one RTT for all
the referenced objects

Application Layer 2-26


HTTP request message
▪ two types of HTTP messages: request, response
▪ HTTP request message:
• ASCII (human-readable format)
carriage return character
line-feed character
request line
(GET, POST, GET /index.html HTTP/1.1\r\n
HEAD commands) Host: www-net.cs.umass.edu\r\n
User-Agent: Firefox/3.6.10\r\n
Accept: text/html,application/xhtml+xml\r\n
header Accept-Language: en-us,en;q=0.5\r\n
lines Accept-Encoding: gzip,deflate\r\n
carriage return, Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n
line feed at start Keep-Alive: 115\r\n
Connection: keep-alive\r\n
of line indicates \r\n
end of header lines

* Check out the online interactive exercises for more


examples: http://gaia.cs.umass.edu/kurose_ross/interactive/ Application Layer 2-27
HTTP request message: general format

method sp URL sp version cr lf request


line
header field name value cr lf
header
~
~ ~
~ lines

header field name value cr lf


cr lf

~
~ entity body ~
~ body

Application Layer 2-28


Uploading form input
POST method:
▪ web page often includes
form input
▪ input is uploaded to server
in entity body

URL method:
▪ uses GET method
▪ input is uploaded in URL
field of request line:
www.somesite.com/animalsearch?monkeys&banana

Application Layer 2-29


Method types
HTTP/1.0: HTTP/1.1:
▪ GET ▪ GET, POST, HEAD
▪ POST ▪ PUT
▪ HEAD • uploads file in entity
• asks server to leave body to path specified
requested object out in URL field
of response ▪ DELETE
• deletes file specified in
the URL field

Application Layer 2-30


HTTP response message
status line
(protocol
status code HTTP/1.1 200 OK\r\n
status phrase) Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n
Server: Apache/2.0.52 (CentOS)\r\n
Last-Modified: Tue, 30 Oct 2007 17:00:02
GMT\r\n
header ETag: "17dc6-a5c-bf716880"\r\n
Accept-Ranges: bytes\r\n
lines Content-Length: 2652\r\n
Keep-Alive: timeout=10, max=100\r\n
Connection: Keep-Alive\r\n
Content-Type: text/html; charset=ISO-8859-
1\r\n
data, e.g., \r\n
requested data data data data data ...
HTML file

* Check out the online interactive exercises for more


examples: http://gaia.cs.umass.edu/kurose_ross/interactive/ Application Layer 2-31
HTTP response status codes
▪ status code appears in 1st line in server-to-
client response message.
▪ some sample codes:
200 OK
• request succeeded, requested object later in this msg
301 Moved Permanently
• requested object moved, new location specified later in this msg
(Location:)
400 Bad Request
• request msg not understood by server
404 Not Found
• requested document not found on this server
505 HTTP Version Not Supported
Application Layer 2-32
Trying out HTTP (client side) for yourself
1. Telnet to your favorite Web server:
telnet gaia.cs.umass.edu 80 opens TCP connection to port 80
(default HTTP server port)
at gaia.cs.umass. edu.
anything typed in will be sent
to port 80 at gaia.cs.umass.edu

2. type in a GET HTTP request:


GET /kurose_ross/interactive/index.php HTTP/1.1
Host: gaia.cs.umass.edu by typing this in (hit carriage
return twice), you send
this minimal (but complete)
GET request to HTTP server

3. look at response message sent by HTTP server!


(or use Wireshark to look at captured HTTP request/response)
Application Layer 2-33
User-server state: cookies
example:
many Web sites use cookies ▪ Susan always access Internet
four components: from PC
1) cookie header line of ▪ visits specific e-commerce
HTTP response site for first time
message ▪ when initial HTTP requests
2) cookie header line in arrives at site, site creates:
next HTTP request • unique ID
message • entry in backend
3) cookie file kept on database for ID
user’s host, managed
by user’s browser
4) back-end database at
Web site
Application Layer 2-34
Cookies: keeping “state” (cont.)
client server

ebay 8734
usual http request msg Amazon server
cookie file creates ID
usual http response
1678 for user create backend
ebay 8734
set-cookie: 1678 entry database
amazon 1678
usual http request msg
cookie: 1678 cookie- access
specific
usual http response msg action

one week later:


access
ebay 8734 usual http request msg
amazon 1678 cookie: 1678 cookie-
specific
usual http response msg action
Application Layer 2-35
Cookies (continued)
aside
what cookies can be used cookies and privacy:
for:
▪ authorization ▪ cookies permit sites to
learn a lot about you
▪ shopping carts
▪ recommendations ▪ you may supply name and
▪ user session state (Web e-mail to sites
e-mail)

how to keep “state”:


▪ protocol endpoints: maintain state at
sender/receiver over multiple
transactions
▪ cookies: http messages carry state

Application Layer 2-36


Web caches (proxy server)
goal: satisfy client request without involving origin server
▪ user sets browser: Web
accesses via cache
▪ browser sends all HTTP proxy
requests to cache server
• object in cache: cache client
origin
returns object server
• else cache requests
object from origin
server, then returns
object to client
client origin
server

Application Layer 2-37


More about Web caching
▪ cache acts as both why Web caching?
client and server ▪ reduce response time
• server for original for client request
requesting client
• client to origin server ▪ reduce traffic on an
▪ typically cache is institution’s access link
installed by ISP ▪ Internet dense with
(university, company, caches: enables “poor”
residential ISP) content providers to
effectively deliver
content (so too does
P2P file sharing)

Application Layer 2-38


Caching example:
assumptions:
▪ avg object size: 100K bits origin
▪ avg request rate from browsers to servers
origin servers:15/sec public
▪ avg data rate to browsers: 1.50 Mbps Internet
▪ RTT from institutional router to any
origin server: 2 sec
▪ access link rate: 1.54 Mbps 1.54 Mbps
consequences: access link

▪ LAN utilization: 15% problem! institutional


network
▪ access link utilization = 99% 1 Gbps LAN
▪ total delay = Internet delay + access
delay + LAN delay
= 2 sec + minutes + usecs

Application Layer 2-39


Caching example: fatter access link
assumptions:
▪ avg object size: 100K bits origin
▪ avg request rate from browsers to servers
origin servers:15/sec public
▪ avg data rate to browsers: 1.50 Mbps Internet
▪ RTT from institutional router to any
origin server: 2 sec
▪ access link rate: 1.54 Mbps
154 Mbps 1.54 Mbps
154 Mbps
consequences: access link

▪ LAN utilization: 15% institutional


▪ access link utilization = 99% 9.9% network
1 Gbps LAN
▪ total delay = Internet delay + access
delay + LAN delay
= 2 sec + minutes + usecs
msecs

Cost: increased access link speed (not cheap!)


Application Layer 2-40
Caching example: install local cache
assumptions:
▪ avg object size: 100K bits origin
▪ avg request rate from browsers to servers
origin servers:15/sec public
▪ avg data rate to browsers: 1.50 Mbps Internet
▪ RTT from institutional router to any
origin server: 2 sec
▪ access link rate: 1.54 Mbps 1.54 Mbps
consequences: access link

▪ LAN utilization: 15% institutional


▪ access link utilization = 100% network
? 1 Gbps LAN
▪ total delay = Internet
? delay + access
delay + LAN delay local web
How to compute link
= 2 sec + minutes + usecs cache
utilization, delay?
Cost: web cache (cheap!)
Application Layer 2-41
Caching example: install local cache
Calculating access link
utilization, delay with cache:
origin
▪ suppose cache hit rate is 0.4 servers
• 40% requests satisfied at cache, public
60% requests satisfied at origin Internet

▪ access link utilization:


▪ 60% of requests use access link
▪ data rate to browsers over access link 1.54 Mbps
access link
= 0.6*1.50 Mbps = .9 Mbps
institutional
▪ utilization = 0.9/1.54 = .58
network
1 Gbps LAN
▪ total delay
▪ = 0.6 * (delay from origin servers) +0.4 local web
* (delay when satisfied at cache) cache
▪ = 0.6 (2.01) + 0.4 (~msecs) = ~ 1.2 secs
▪ less than with 154 Mbps link (and
cheaper too!)
Application Layer 2-42
Conditional GET
client server
▪ Goal: don’t send object if
cache has up-to-date
cached version HTTP request msg
object
If-modified-since: <date>
• no object transmission not
delay modified
• lower link utilization HTTP response
before
HTTP/1.0
▪ cache: specify date of 304 Not Modified <date>
cached copy in HTTP
request
If-modified-since:
<date> HTTP request msg
▪ server: response contains If-modified-since: <date> object
modified
no object if cached copy after
HTTP response
is up-to-date: HTTP/1.0 200 OK <date>
HTTP/1.0 304 Not <data>
Modified
Application Layer 2-43
Chapter 2: outline
2.1 principles of network 2.5 P2P applications
applications 2.6 video streaming and
2.2 Web and HTTP content distribution
2.3 electronic mail networks
• SMTP, POP3, IMAP 2.7 socket programming
2.4 DNS with UDP and TCP

Application Layer 2-44


Electronic mail outgoing
message queue
user mailbox
Three major components: user
agent
▪ user agents
▪ mail servers mail user
server agent
▪ simple mail transfer
protocol: SMTP SMTP mail user
server agent

User Agent SMTP


▪ a.k.a. “mail reader” SMTP user
agent
▪ composing, editing, reading mail
server
mail messages user
▪ e.g., Outlook, Thunderbird, agent
iPhone mail client user
agent
▪ outgoing, incoming
messages stored on server
Application Layer 2-45
Electronic mail: mail servers
mail servers: user
agent
▪ mailbox contains incoming
messages for user mail user
server
▪ message queue of outgoing agent

(to be sent) mail messages SMTP mail user


▪ SMTP protocol between server agent
mail servers to send email SMTP
messages user
• client: sending mail SMTP
agent
mail
server server
• “server”: receiving mail user
agent
server
user
agent

Application Layer 2-46


Electronic Mail: SMTP [RFC 2821]
▪ uses TCP to reliably transfer email message from
client to server, port 25
▪ direct transfer: sending server to receiving
server
▪ three phases of transfer
• handshaking (greeting)
• transfer of messages
• closure
▪ command/response interaction (like HTTP)
• commands: ASCII text
• response: status code and phrase
▪ messages must be in 7-bit ASCI
Application Layer 2-47
Scenario: Alice sends message to Bob
1) Alice uses UA to compose 4) SMTP client sends Alice’s
message “to” message over the TCP
[email protected] connection
2) Alice’s UA sends message 5) Bob’s mail server places the
to her mail server; message message in Bob’s mailbox
placed in message queue 6) Bob invokes his user agent
3) client side of SMTP opens to read message
TCP connection with Bob’s
mail server

1 user mail user


mail agent
agent server server
2 3 6
4
5
Alice’s mail server Bob’s mail server
Application Layer 2-48
Sample SMTP interaction
S: 220 hamburger.edu
C: HELO crepes.fr
S: 250 Hello crepes.fr, pleased to meet you
C: MAIL FROM: <[email protected]>
S: 250 [email protected]... Sender ok
C: RCPT TO: <[email protected]>
S: 250 [email protected] ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 hamburger.edu closing connection

Application Layer 2-49


Try SMTP interaction for yourself:
▪ telnet servername 25
▪ see 220 reply from server
▪ enter HELO, MAIL FROM, RCPT TO, DATA, QUIT
commands

above lets you send email without using email client (reader)

Application Layer 2-50


SMTP: final words
▪ SMTP uses persistent comparison with HTTP:
connections
▪ HTTP: pull
▪ SMTP requires message
(header & body) to be in ▪ SMTP: push
7-bit ASCII ▪ both have ASCII
▪ SMTP server uses command/response
CRLF.CRLF to interaction, status codes
determine end of message
▪ HTTP: each object
encapsulated in its own
response message
▪ SMTP: multiple objects
sent in multipart message

Application Layer 2-51


Mail message format

SMTP: protocol for


exchanging email messages header
blank
RFC 822: standard for text line
message format:
▪ header lines, e.g.,
• To: body
• From:
• Subject:
different from SMTP MAIL
FROM, RCPT TO:
commands!
▪ Body: the “message”
• ASCII characters only

Application Layer 2-52


Mail access protocols
user
mail access user
SMTP SMTP protocol
agent agent
(e.g., POP,
IMAP)

sender’s mail receiver’s mail


server server

▪ SMTP: delivery/storage to receiver’s server


▪ mail access protocol: retrieval from server
• POP: Post Office Protocol [RFC 1939]: authorization,
download
• IMAP: Internet Mail Access Protocol [RFC 1730]: more
features, including manipulation of stored messages on
server
• HTTP: gmail, Hotmail, Yahoo! Mail, etc.

Application Layer 2-53


POP3 protocol
S: +OK POP3 server ready
C: user bob
authorization phase S:
C:
+OK
pass hungry
▪ client commands: S: +OK user successfully logged on
• user: declare username
• pass: password C: list
S: 1 498
▪ server responses
S: 2 912
• +OK S: .
• -ERR C: retr 1
transaction phase, client: S:
S:
<message 1 contents>
.
▪ list: list message numbers C: dele 1
▪ retr: retrieve message by C: retr 2
number S: <message 1 contents>
▪ dele: delete S: .
▪ quit C: dele 2
C: quit
S: +OK POP3 server signing off
Application Layer 2-54
POP3 (more) and IMAP
more about POP3 IMAP
▪ previous example uses ▪ keeps all messages in one
POP3 “download and place: at server
delete” mode ▪ allows user to organize
• Bob cannot re-read e- messages in folders
mail if he changes ▪ keeps user state across
client sessions:
▪ POP3 “download-and- • names of folders and
keep”: copies of messages mappings between
on different clients message IDs and folder
▪ POP3 is stateless across name
sessions

Application Layer 2-55


Chapter 2: outline
2.1 principles of network 2.5 P2P applications
applications 2.6 video streaming and
2.2 Web and HTTP content distribution
2.3 electronic mail networks
• SMTP, POP3, IMAP 2.7 socket programming
2.4 DNS with UDP and TCP

Application Layer 2-56


DNS: domain name system
people: many identifiers: Domain Name System:
• SSN, name, passport # ▪ distributed database
Internet hosts, routers: implemented in hierarchy of
• IP address (32 bit) - many name servers
used for addressing ▪ application-layer protocol: hosts,
datagrams name servers communicate to
• “name”, e.g., resolve names (address/name
www.yahoo.com - translation)
used by humans • note: core Internet function,
Q: how to map between IP implemented as application-
layer protocol
address and name, and
vice versa ? • complexity at network’s
“edge”

Application Layer 2-57


DNS: services, structure
DNS services why not centralize DNS?
▪ hostname to IP address ▪ single point of failure
translation ▪ traffic volume
▪ host aliasing ▪ distant centralized database
• canonical, alias names ▪ maintenance
▪ mail server aliasing
▪ load distribution A: doesn‘t scale!
• replicated Web
servers: many IP
addresses correspond
to one name

Application Layer 2-58


DNS: a distributed, hierarchical database
Root DNS Servers

… …

com DNS servers org DNS servers edu DNS servers

pbs.org poly.edu umass.edu


yahoo.com amazon.com
DNS servers DNS serversDNS servers
DNS servers DNS servers

client wants IP for www.amazon.com; 1st approximation:


▪ client queries root server to find com DNS server
▪ client queries .com DNS server to get amazon.com DNS server
▪ client queries amazon.com DNS server to get IP address for
www.amazon.com

Application Layer 2-59


DNS: root name servers
▪ contacted by local name server that can not resolve name
▪ root name server:
• contacts authoritative name server if name mapping not known
• gets mapping
• returns mapping to local name server

c. Cogent, Herndon, VA (5 other sites)


d. U Maryland College Park, MD k. RIPE London (17 other sites)
h. ARL Aberdeen, MD
j. Verisign, Dulles VA (69 other sites ) i. Netnod, Stockholm (37 other sites)

e. NASA Mt View, CA m. WIDE Tokyo


f. Internet Software C. (5 other sites)
Palo Alto, CA (and 48 other
sites)

a. Verisign, Los Angeles CA


13 logical root name
(5 other sites)
b. USC-ISI Marina del Rey, CA
“servers” worldwide
l. ICANN Los Angeles, CA •each “server” replicated
(41 other sites)
g. US DoD Columbus, many times
OH (5 other sites)

Application Layer 2-60


TLD, authoritative servers
top-level domain (TLD) servers:
• responsible for com, org, net, edu, aero, jobs, museums,
and all top-level country domains, e.g.: uk, fr, ca, jp
• Network Solutions maintains servers for .com TLD
• Educause for .edu TLD
authoritative DNS servers:
• organization’s own DNS server(s), providing
authoritative hostname to IP mappings for organization’s
named hosts
• can be maintained by organization or service provider

Application Layer 2-61


Local DNS name server
▪ does not strictly belong to hierarchy
▪ each ISP (residential ISP, company, university) has
one
• also called “default name server”
▪ when host makes DNS query, query is sent to its
local DNS server
• has local cache of recent name-to-address translation
pairs (but may be out of date!)
• acts as proxy, forwards query into hierarchy

Application Layer 2-62


DNS name root DNS server
resolution example
2
▪ host at cis.poly.edu 3
TLD DNS server
wants IP address for 4
gaia.cs.umass.edu
5

iterated query: local DNS server


dns.poly.edu
▪ contacted server 7 6
1 8
replies with name of
server to contact
authoritative DNS server
▪ “I don’t know this dns.cs.umass.edu
name, but ask this requesting host
server” cis.poly.edu

gaia.cs.umass.edu

Application Layer 2-63


DNS name root DNS server
resolution example
2 3
recursive query: 7
6
▪ puts burden of name TLD DNS
server
resolution on
contacted name local DNS server
server dns.poly.edu 5 4

▪ heavy load at upper 1 8


levels of hierarchy?
authoritative DNS server
dns.cs.umass.edu
requesting host
cis.poly.edu

gaia.cs.umass.edu

Application Layer 2-64


DNS: caching, updating records
▪ once (any) name server learns mapping, it caches
mapping
• cache entries timeout (disappear) after some time (TTL)
• TLD servers typically cached in local name servers
• thus root name servers not often visited
▪ cached entries may be out-of-date (best effort
name-to-address translation!)
• if name host changes IP address, may not be known
Internet-wide until all TTLs expire
▪ update/notify mechanisms proposed IETF standard
• RFC 2136

Application Layer 2-65


DNS records
DNS: distributed database storing resource records (RR)
RR format: (name, value, type, ttl)

type=A type=CNAME
▪ name is hostname ▪ name is alias name for some
▪ value is IP address “canonical” (the real) name
type=NS ▪ www.ibm.com is really
• name is domain (e.g., servereast.backup2.ibm.com
foo.com) ▪ value is canonical name
• value is hostname of
authoritative name type=MX
server for this domain ▪ value is name of mailserver
associated with name

Application Layer 2-66


DNS protocol, messages
▪ query and reply messages, both with same message
format 2 bytes 2 bytes

message header identification flags

▪ identification: 16 bit # for # questions # answer RRs


query, reply to query uses
# authority RRs # additional RRs
same #
▪ flags: questions (variable # of questions)
▪ query or reply
▪ recursion desired answers (variable # of RRs)
▪ recursion available
▪ reply is authoritative authority (variable # of RRs)

additional info (variable # of RRs)

Application Layer 2-67


DNS protocol, messages

2 bytes 2 bytes

identification flags

# questions # answer RRs

# authority RRs # additional RRs

name, type fields


questions (variable # of questions)
for a query
RRs in response answers (variable # of RRs)
to query
records for
authority (variable # of RRs)
authoritative servers
additional “helpful” additional info (variable # of RRs)
info that may be used
Application Layer 2-68
Inserting records into DNS
▪ example: new startup “Network Utopia”
▪ register name networkuptopia.com at DNS registrar
(e.g., Network Solutions)
• provide names, IP addresses of authoritative name server
(primary and secondary)
• registrar inserts two RRs into .com TLD server:
(networkutopia.com, dns1.networkutopia.com, NS)
(dns1.networkutopia.com, 212.212.212.1, A)
▪ create authoritative server type A record for
www.networkuptopia.com; type MX record for
networkutopia.com

Application Layer 2-69


Attacking DNS
DDoS attacks redirect attacks
▪ bombard root servers ▪ man-in-middle
with traffic • Intercept queries
• not successful to date ▪ DNS poisoning
• traffic filtering ▪ Send bogus relies to
• local DNS servers cache DNS server, which
IPs of TLD servers, caches
allowing root server exploit DNS for DDoS
bypass
▪ bombard TLD servers ▪ send queries with
spoofed source
• potentially more
dangerous address: target IP
▪ requires amplification
Application Layer 2-70
Chapter 2: outline
2.1 principles of network 2.5 P2P applications
applications 2.6 video streaming and
2.2 Web and HTTP content distribution
2.3 electronic mail networks
• SMTP, POP3, IMAP 2.7 socket programming
2.4 DNS with UDP and TCP

Application Layer 2-71


Pure P2P architecture
▪ no always-on server
▪ arbitrary end systems
directly communicate
▪ peers are intermittently
connected and change
IP addresses
examples:
• file distribution
(BitTorrent)
• Streaming (KanKan)
• VoIP (Skype)

Application Layer 2-72


File distribution: client-server vs P2P
Question: how much time to distribute file (size F) from
one server to N peers?
• peer upload/download capacity is limited resource

us: server upload


capacity

di: peer i download


file, size F u1 d1 capacity
us u2 d2
server
di
uN network (with abundant
bandwidth) ui
dN
ui: peer i upload
capacity

Application Layer 2-73


File distribution time: client-server
▪ server transmission: must
sequentially send (upload) N F
us
file copies:
di
• time to send one copy: F/us
network
• time to send N copies: NF/us ui

▪ client: each client must


download file copy
• dmin = min client download rate
• min client download time: F/dmin

time to distribute F
to N clients using Dc-s > max{NF/us,,F/dmin}
client-server approach

increases linearly in N
Application Layer 2-74
File distribution time: P2P
▪ server transmission: must
upload at least one copy F
us
• time to send one copy: F/us
di
▪ client: each client must network
download file copy ui
• min client download time: F/dmin
▪ clients: as aggregate must download NF bits
• max upload rate (limiting max download rate) is us + Sui

time to distribute F
to N clients using DP2P > max{F/us,,F/dmin,,NF/(us + Sui)}
P2P approach

increases linearly in N …
… but so does this, as each peer brings service capacity
Application Layer 2-75
Client-server vs. P2P: example
client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us

3.5
P2P
Minimum Distribution Time

3
Client-Server
2.5

1.5

0.5

0
0 5 10 15 20 25 30 35

N
Application Layer 2-76
P2P file distribution: BitTorrent
▪ file divided into 256Kb chunks
▪ peers in torrent send/receive file chunks
tracker: tracks peers torrent: group of peers
participating in torrent exchanging chunks of a file

Alice arrives …
… obtains list
of peers from tracker
… and begins exchanging
file chunks with peers in torrent

Application Layer 2-77


P2P file distribution: BitTorrent
▪ peer joining torrent:
• has no chunks, but will
accumulate them over time
from other peers
• registers with tracker to get
list of peers, connects to
subset of peers
(“neighbors”)
▪ while downloading, peer uploads chunks to other peers
▪ peer may change peers with whom it exchanges chunks
▪ churn: peers may come and go
▪ once peer has entire file, it may (selfishly) leave or
(altruistically) remain in torrent

Application Layer 2-78


BitTorrent: requesting, sending file chunks

requesting chunks: sending chunks: tit-for-tat


▪ at any given time, different ▪ Alice sends chunks to those
peers have different subsets four peers currently sending her
of file chunks chunks at highest rate
▪ periodically, Alice asks each • other peers are choked by Alice
peer for list of chunks that (do not receive chunks from her)
they have • re-evaluate top 4 every10 secs
▪ Alice requests missing ▪ every 30 secs: randomly select
chunks from peers, rarest another peer, starts sending
first chunks
• “optimistically unchoke” this peer
• newly chosen peer may join top 4

Application Layer 2-79


BitTorrent: tit-for-tat
(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers

higher upload rate: find better


trading partners, get file faster !

Application Layer 2-80


Chapter 2: outline
2.1 principles of network 2.5 P2P applications
applications 2.6 video streaming and
2.2 Web and HTTP content distribution
2.3 electronic mail networks (CDNs)
• SMTP, POP3, IMAP 2.7 socket programming
2.4 DNS with UDP and TCP

Application Layer 2-81


Video Streaming and CDNs: context
▪ video traffic: major consumer of Internet bandwidth
• Netflix, YouTube: 37%, 16% of downstream
residential ISP traffic
• ~1B YouTube users, ~75M Netflix users
▪ challenge: scale - how to reach ~1B
users?
• single mega-video server won’t work (why?)
▪ challenge: heterogeneity
▪ different users have different capabilities (e.g.,
wired versus mobile; bandwidth rich versus
bandwidth poor)
▪ solution: distributed, application-level
infrastructure

Application Layer 2-82


Multimedia: video
spatial coding example: instead
of sending N values of same
color (all purple), send only two
values: color value (purple) and
number of repeated values (N)
▪ video: sequence of images
displayed at constant rate ……………………..
……………….…….
• e.g., 24 images/sec
▪ digital image: array of pixels
• each pixel represented
by bits
▪ coding: use redundancy frame i
within and between images
to decrease # bits used to
encode image
• spatial (within image) temporal coding example:
instead of sending
• temporal (from one complete frame at i+1,
image to next) send only differences from
frame i frame i+1

Application Layer 2-83


Multimedia: video
spatial coding example: instead
of sending N values of same
color (all purple), send only two
values: color value (purple) and

▪ CBR: (constant bit rate): number of repeated values (N)

video encoding rate fixed ……………………..


……………….…….
▪ VBR: (variable bit rate):
video encoding rate changes
as amount of spatial,
temporal coding changes
▪ examples:
• MPEG 1 (CD-ROM) 1.5 frame i
Mbps
• MPEG2 (DVD) 3-6 Mbps
• MPEG4 (often used in temporal coding example:
instead of sending
Internet, < 1 Mbps) complete frame at i+1,
send only differences from
frame i frame i+1

Application Layer 2-84


Streaming stored video:
simple scenario:

Internet

video server client


(stored video)

Application Layer 2-85


Streaming multimedia: DASH
▪ DASH: Dynamic, Adaptive Streaming over HTTP
▪ server:
• divides video file into multiple chunks
• each chunk stored, encoded at different rates
• manifest file: provides URLs for different chunks
▪ client:
• periodically measures server-to-client bandwidth
• consulting manifest, requests one chunk at a time
• chooses maximum coding rate sustainable given
current bandwidth
• can choose different coding rates at different points
in time (depending on available bandwidth at time)
Application Layer 2-86
Streaming multimedia: DASH
▪ DASH: Dynamic, Adaptive Streaming over HTTP
▪ “intelligence” at client: client determines
• when to request chunk (so that buffer starvation, or
overflow does not occur)
• what encoding rate to request (higher quality when
more bandwidth available)
• where to request chunk (can request from URL server
that is “close” to client or has high available
bandwidth)

Application Layer 2-87


Content distribution networks
▪ challenge: how to stream content (selected from
millions of videos) to hundreds of thousands of
simultaneous users?

▪ option 1: single, large “mega-server”


• single point of failure
• point of network congestion
• long path to distant clients
• multiple copies of video sent over outgoing link

….quite simply: this solution doesn’t scale

Application Layer 2-88


Content distribution networks
▪ challenge: how to stream content (selected from
millions of videos) to hundreds of thousands of
simultaneous users?

▪ option 2: store/serve multiple copies of videos at


multiple geographically distributed sites (CDN)
• enter deep: push CDN servers deep into many access
networks
• close to users
• used by Akamai, 1700 locations
• bring home: smaller number (10’s) of larger clusters in
POPs near (but not within) access networks
• used by Limelight

Application Layer 2-89


Content Distribution Networks (CDNs)
▪ CDN: stores copies of content at CDN nodes
• e.g. Netflix stores copies of MadMen
▪ subscriber requests content from CDN
• directed to nearby copy, retrieves content
• may choose different copy if network path congested

manifest file
where’s Madmen?

Application Layer 2-90


Content Distribution Networks (CDNs)

“over the top”

Internet host-host communication as a service


OTT challenges: coping with a congested Internet
▪ from which CDN node to retrieve content?
▪ viewer behavior in presence of congestion?
▪ what content to place in which CDN node?
more .. in chapter 7
CDN content access: a closer look
Bob (client) requests video http://netcinema.com/6Y7B23V
▪ video stored in CDN at http://KingCDN.com/NetC6y&B23V

1. Bob gets URL for video


http://netcinema.com/6Y7B23V
from netcinema.com web page 2. resolve http://netcinema.com/6Y7B23V
2 via Bob’s local DNS
1
6. request video from 5 Bob’s
KINGCDN server, local DNS
streamed via HTTP server
3. netcinema’s DNS returns URL 4&5. Resolve
netcinema.com 4 http://KingCDN.com/NetC6y&B23
http://KingCDN.com/NetC6y&B23V
via KingCDN’s authoritative DNS,
3 which returns IP address of KingCDN
server with video
netcinema’s
authoratative DNS KingCDN.com KingCDN
authoritative DNS Application Layer 2-92
Case study: Netflix
Amazon cloud upload copies of
multiple versions of
video to CDN servers
CDN
server
Netflix registration,
accounting servers
3. Manifest file
2. Bob browses returned for
CDN
Netflix video 2 requested video server
3
1

1. Bob manages
Netflix account CDN
server

4. DASH
streaming

Application Layer 2-93


Chapter 2: outline
2.1 principles of network 2.5 P2P applications
applications 2.6 video streaming and
2.2 Web and HTTP content distribution
2.3 electronic mail networks
• SMTP, POP3, IMAP 2.7 socket programming
2.4 DNS with UDP and TCP

Application Layer 2-94


Socket programming
goal: learn how to build client/server applications that
communicate using sockets
socket: door between application process and end-
end-transport protocol

application application
socket controlled by
process process app developer

transport transport
network network controlled
link by OS
link Internet
physical physical

Application Layer 2-95


Socket programming
Two socket types for two transport services:
• UDP: unreliable datagram
• TCP: reliable, byte stream-oriented

Application Example:
1. client reads a line of characters (data) from its
keyboard and sends data to server
2. server receives the data and converts characters
to uppercase
3. server sends modified data to client
4. client receives modified data and displays line on
its screen
Application Layer 2-96
Socket programming with UDP
UDP: no “connection” between client & server
▪ no handshaking before sending data
▪ sender explicitly attaches IP destination address and
port # to each packet
▪ receiver extracts sender IP address and port# from
received packet
UDP: transmitted data may be lost or received
out-of-order
Application viewpoint:
▪ UDP provides unreliable transfer of groups of bytes
(“datagrams”) between client and server

Application Layer 2-97


Client/server socket interaction: UDP

server (running on serverIP) client


create socket:
create socket, port= x: clientSocket =
serverSocket = socket(AF_INET,SOCK_DGRAM)
socket(AF_INET,SOCK_DGRAM)
Create datagram with server IP and
port=x; send datagram via
read datagram from clientSocket
serverSocket

write reply to
serverSocket read datagram from
specifying clientSocket
client address,
port number close
clientSocket

Application 2-98
Example app: UDP client
Python UDPClient
include Python’s socket
library
from socket import *
serverName = ‘hostname’
serverPort = 12000
create UDP socket for clientSocket = socket(AF_INET,
server
SOCK_DGRAM)
get user keyboard
input message = raw_input(’Input lowercase sentence:’)
Attach server name, port to clientSocket.sendto(message.encode(),
message; send into socket
(serverName, serverPort))
read reply characters from modifiedMessage, serverAddress =
socket into string
clientSocket.recvfrom(2048)
print out received string print modifiedMessage.decode()
and close socket
clientSocket.close()
Application Layer 2-99
Example app: UDP server
Python UDPServer
from socket import *
serverPort = 12000
create UDP socket serverSocket = socket(AF_INET, SOCK_DGRAM)
bind socket to local port
number 12000
serverSocket.bind(('', serverPort))
print (“The server is ready to receive”)
loop forever while True:
Read from UDP socket into message, clientAddress = serverSocket.recvfrom(2048)
message, getting client’s
address (client IP and port) modifiedMessage = message.decode().upper()
send upper case string serverSocket.sendto(modifiedMessage.encode(),
back to this client
clientAddress)

Application Layer 2-100


Socket programming with TCP
client must contact server ▪ when contacted by client,
▪ server process must first be server TCP creates new socket
running for server process to
▪ server must have created communicate with that
socket (door) that particular client
welcomes client’s contact • allows server to talk with
multiple clients
client contacts server by: • source port numbers used
▪ Creating TCP socket, to distinguish clients
specifying IP address, port (more in Chap 3)
number of server process
▪ when client creates socket: application viewpoint:
client TCP establishes TCP provides reliable, in-order
connection to server TCP byte-stream transfer (“pipe”)
between client and server

Application Layer 2-101


Client/server socket interaction: TCP
server (running on hostid) client
create socket,
port=x, for incoming
request:
serverSocket = socket()

wait for incoming create socket,


connection request
TCP connect to hostid, port=x
connectionSocket = connection setup clientSocket = socket()
serverSocket.accept()

send request using


read request from clientSocket
connectionSocket

write reply to
connectionSocket read reply from
clientSocket
close
connectionSocket close
clientSocket

Application Layer 2-102


Example app: TCP client
Python TCPClient
from socket import *
serverName = ’servername’
create TCP socket for
serverPort = 12000
server, remote port 12000
clientSocket = socket(AF_INET, SOCK_STREAM)
clientSocket.connect((serverName,serverPort))
sentence = raw_input(‘Input lowercase sentence:’)
No need to attach server clientSocket.send(sentence.encode())
name, port
modifiedSentence = clientSocket.recv(1024)
print (‘From Server:’, modifiedSentence.decode())
clientSocket.close()

Application Layer 2-103


Example app: TCP server
Python TCPServer
from socket import *
create TCP welcoming serverPort = 12000
socket serverSocket = socket(AF_INET,SOCK_STREAM)
serverSocket.bind((‘’,serverPort))
server begins listening for
incoming TCP requests serverSocket.listen(1)
print ‘The server is ready to receive’
loop forever
while True:
server waits on accept()
for incoming requests, new
connectionSocket, addr = serverSocket.accept()
socket created on return

sentence = connectionSocket.recv(1024).decode()
read bytes from socket (but
not address as in UDP) capitalizedSentence = sentence.upper()
close connection to this connectionSocket.send(capitalizedSentence.
client (but not welcoming
socket) encode())
connectionSocket.close()
Application Layer 2-104
Chapter 2: summary
our study of network apps now complete!
▪ application architectures ▪ specific protocols:
• client-server • HTTP
• P2P • SMTP, POP, IMAP
▪ application service
requirements: • DNS
• reliability, bandwidth, delay • P2P: BitTorrent
▪ Internet transport service ▪ video streaming, CDNs
model ▪ socket programming:
• connection-oriented,
TCP, UDP sockets
reliable: TCP
• unreliable, datagrams: UDP

Application Layer 2-105


Chapter 2: summary
most importantly: learned about protocols!

▪ typical request/reply important themes:


message exchange:
▪ control vs. messages
• client requests info or
service • in-band, out-of-band
• server responds with ▪ centralized vs. decentralized
data, status code
▪ stateless vs. stateful
▪ message formats:
▪ reliable vs. unreliable message
• headers: fields giving
info about data transfer
• data: info(payload) ▪ “complexity at network
being communicated edge”

Application Layer 2-106


Chapter 3
Transport Layer

A note on the use of these Powerpoint slides:


We’re making these slides freely available to all (faculty, students, readers).
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs.

Computer
They obviously represent a lot of work on our part. In return for use, we only
ask the following:

▪ If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!)
Networking: A Top
▪ If you post any slides on a www site, that you note that they are adapted
from (or perhaps identical to) our slides, and note our copyright of this Down Approach
material.
7th Edition, Global Edition
Thanks and enjoy! JFK/KWR
Jim Kurose, Keith Ross
All material copyright 1996-2016 Pearson
J.F Kurose and K.W. Ross, All Rights Reserved April 2016
Transport Layer 2-1
Chapter 3: Transport Layer
our goals:
▪ understand principles ▪ learn about Internet
behind transport transport layer protocols:
layer services: • UDP: connectionless
• multiplexing, transport
demultiplexing • TCP: connection-oriented
• reliable data transfer reliable transport
• flow control • TCP congestion control
• congestion control

Transport Layer 3-2


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP • connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-3


Transport services and protocols
application
transport
▪ provide logical communication network
data link
between app processes physical

running on different hosts


▪ transport protocols run in
end systems
• send side: breaks app
messages into segments,
passes to network layer
• rcv side: reassembles application
segments into messages, transport
network
passes to app layer data link
physical

▪ more than one transport


protocol available to apps
• Internet: TCP and UDP
Transport Layer 3-4
Transport vs. network layer
▪ network layer: logical household analogy:
communication
between hosts 12 kids in Ann’s house sending
letters to 12 kids in Bill’s
▪ transport layer: house:
logical ▪ hosts = houses
communication ▪ processes = kids
between processes ▪ app messages = letters in
envelopes
• relies on, enhances, ▪ transport protocol = Ann
network layer and Bill who demux to in-
services house siblings
▪ network-layer protocol =
postal service

Transport Layer 3-5


Internet transport-layer protocols
application
▪ reliable, in-order transport
network

delivery (TCP) data link


physical
network

• congestion control network


data link
data link
physical
physical
• flow control network
data link

• connection setup physical

network

▪ unreliable, unordered data link


physical

delivery: UDP network


data link
physical
• no-frills extension of network
data link application
“best-effort” IP physical
network
data link
transport
network
data link
▪ services not available: physical
physical

• delay guarantees
• bandwidth guarantees

Transport Layer 3-6


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP • connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-7


Multiplexing/demultiplexing
multiplexing at sender:
handle data from multiple demultiplexing at receiver:
sockets, add transport header use header info to deliver
(later used for demultiplexing) received segments to correct
socket

application

application P1 P2 application socket


P3 transport P4
process
transport network transport
network link network
link physical link
physical physical

Transport Layer 3-8


How demultiplexing works
▪ host receives IP datagrams 32 bits
• each datagram has source IP source port # dest port #
address, destination IP
address
other header fields
• each datagram carries one
transport-layer segment
• each segment has source, application
destination port number data
▪ host uses IP addresses & (payload)
port numbers to direct
segment to appropriate
TCP/UDP segment format
socket

Transport Layer 3-9


Connectionless demultiplexing
▪ recall: created socket has ▪ recall: when creating
host-local port #: datagram to send into UDP
DatagramSocket mySocket1 socket, must specify
= new DatagramSocket(12534);
• destination IP address
• destination port #

▪ when host receives UDP IP datagrams with same


segment: dest. port #, but different
• checks destination port # source IP addresses
in segment and/or source port
numbers will be directed
• directs UDP segment to to same socket at dest
socket with that port #

Transport Layer 3-10


Connectionless demux: example
DatagramSocket
DatagramSocket serverSocket = new
DatagramSocket DatagramSocket
mySocket2 = new mySocket1 = new
DatagramSocket (6428); DatagramSocket
(9157); application
(5775);
application application
P1
P3 P4
transport
transport transport
network
network link network
link physical link
physical physical

source port: 6428 source port: ?


dest port: 9157 dest port: ?

source port: 9157 source port: ?


dest port: 6428 dest port: ?
Transport Layer 3-11
Connection-oriented demux
▪ TCP socket identified ▪ server host may support
by 4-tuple: many simultaneous TCP
• source IP address sockets:
• source port number • each socket identified by
• dest IP address its own 4-tuple
• dest port number ▪ web servers have
▪ demux: receiver uses all different sockets for
four values to direct each connecting client
segment to appropriate • non-persistent HTTP will
socket have different socket for
each request

Transport Layer 3-12


Connection-oriented demux: example

application
application P4 P5 P6 application
P3 P2 P3
transport
transport transport
network
network link network
link physical link
physical server: IP physical
address B

host: IP source IP,port: B,80 host: IP


address A dest IP,port: A,9157 source IP,port: C,5775 address C
dest IP,port: B,80
source IP,port: A,9157
dest IP, port: B,80
source IP,port: C,9157
dest IP,port: B,80
three segments, all destined to IP address: B,
dest port: 80 are demultiplexed to different sockets Transport Layer 3-13
Connection-oriented demux: example
threaded server
application
application application
P4
P3 P2 P3
transport
transport transport
network
network link network
link physical link
physical server: IP physical
address B

host: IP source IP,port: B,80 host: IP


address A dest IP,port: A,9157 source IP,port: C,5775 address C
dest IP,port: B,80
source IP,port: A,9157
dest IP, port: B,80
source IP,port: C,9157
dest IP,port: B,80

Transport Layer 3-14


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP • connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-15


UDP: User Datagram Protocol [RFC 768]
▪ “no frills,” “bare bones” ▪ UDP use:
Internet transport ▪ streaming multimedia
protocol apps (loss tolerant, rate
▪ “best effort” service, UDP sensitive)
segments may be: ▪ DNS
• lost ▪ SNMP
• delivered out-of-order ▪ reliable transfer over
to app
UDP:
▪ connectionless:
▪ add reliability at
• no handshaking application layer
between UDP sender,
receiver ▪ application-specific error
recovery!
• each UDP segment
handled independently
of others
Transport Layer 3-16
UDP: segment header
length, in bytes of
32 bits UDP segment,
source port # dest port # including header

length checksum
why is there a UDP?
▪ no connection
application establishment (which can
data add delay)
(payload) ▪ simple: no connection
state at sender, receiver
▪ small header size
UDP segment format ▪ no congestion control:
UDP can blast away as fast
as desired

Transport Layer 3-17


UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment
sender: receiver:
▪ treat segment contents, ▪ compute checksum of
including header fields, received segment
as sequence of 16-bit ▪ check if computed checksum
integers
equals checksum field value:
▪ checksum: addition
(one’s complement sum) • NO - error detected
of segment contents • YES - no error detected.
▪ sender puts checksum But maybe errors
value into UDP checksum nonetheless? More later
field ….

Transport Layer 3-18


Internet checksum: example
example: add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

Note: when adding numbers, a carryout from the most


significant bit needs to be added to the result

* Check out the online interactive exercises for more


examples: http://gaia.cs.umass.edu/kurose_ross/interactive/ Transport Layer 3-19
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP • connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-20


Principles of reliable data transfer
▪ important in application, transport, link layers
• top-10 list of important networking topics!

▪ characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
Transport Layer 3-21
Principles of reliable data transfer
▪ important in application, transport, link layers
• top-10 list of important networking topics!

▪ characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
Transport Layer 3-22
Principles of reliable data transfer
▪ important in application, transport, link layers
• top-10 list of important networking topics!

▪ characteristics of unreliable channel will determine


complexity of reliable data transfer protocol (rdt)
Transport Layer 3-23
Reliable data transfer: getting started
rdt_send(): called from above, deliver_data(): called by
(e.g., by app.). Passed data to rdt to deliver data to upper
deliver to receiver upper layer

send receive
side side

udt_send(): called by rdt, rdt_rcv(): called when packet


to transfer packet over arrives on rcv-side of channel
unreliable channel to receiver

Transport Layer 3-24


Reliable data transfer: getting started
we’ll:
▪ incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
▪ consider only unidirectional data transfer
• but control info will flow on both directions!
▪ use finite state machines (FSM) to specify sender,
receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state state state
uniquely determined 1 event
by next event 2
actions

Transport Layer 3-25


rdt1.0: reliable transfer over a reliable channel
▪ underlying channel perfectly reliable
• no bit errors
• no loss of packets
▪ separate FSMs for sender, receiver:
• sender sends data into underlying channel
• receiver reads data from underlying channel

Wait for rdt_send(data) Wait for rdt_rcv(packet)


call from call from extract (packet,data)
above packet = make_pkt(data) below deliver_data(data)
udt_send(packet)

sender receiver

Transport Layer 3-26


rdt2.0: channel with bit errors
▪ underlying channel may flip bits in packet
• checksum to detect bit errors
▪ the question: how to recover from errors:
• acknowledgements (ACKs): receiver explicitly tells sender
that pkt received OK
• negative acknowledgements (NAKs): receiver explicitly tells
sender that pkt had errors
• sender
Howretransmits
do humanspkt on receipt from
recover of NAK“errors”
▪ new mechanisms in rdt2.0 (beyond rdt1.0):
• error detection
during conversation?
• receiver feedback: control msgs (ACK,NAK) rcvr-
>sender

Transport Layer 3-27


rdt2.0: channel with bit errors
▪ underlying channel may flip bits in packet
• checksum to detect bit errors
▪ the question: how to recover from errors:
• acknowledgements (ACKs): receiver explicitly tells sender
that pkt received OK
• negative acknowledgements (NAKs): receiver explicitly tells
sender that pkt had errors
• sender retransmits pkt on receipt of NAK
▪ new mechanisms in rdt2.0 (beyond rdt1.0):
• error detection
• feedback: control msgs (ACK,NAK) from receiver to
sender

Transport Layer 3-28


rdt2.0: FSM specification
rdt_send(data)
sndpkt = make_pkt(data, checksum) receiver
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
L
call from
sender below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-29


rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
L call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-30


rdt2.0: error scenario
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)

rdt_rcv(rcvpkt) && isACK(rcvpkt)


Wait for
L call from
below

rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)

Transport Layer 3-31


rdt2.0 has a fatal flaw!
what happens if handling duplicates:
ACK/NAK corrupted? ▪ sender retransmits
▪ sender doesn’t know current pkt if ACK/NAK
what happened at corrupted
receiver!
▪ sender adds sequence
▪ can’t just retransmit: number to each pkt
possible duplicate
▪ receiver discards (doesn’t
deliver up) duplicate pkt
stop and wait
sender sends one packet,
then waits for receiver
response

Transport Layer 3-32


rdt2.1: sender, handles garbled ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt) rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK or
isNAK(rcvpkt) )
call 0 from
NAK 0 udt_send(sndpkt)
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt) && notcorrupt(rcvpkt)
&& isACK(rcvpkt)
L
L
Wait for Wait for
ACK or call 1 from
rdt_rcv(rcvpkt) && NAK 1 above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) ) rdt_send(data)

udt_send(sndpkt) sndpkt = make_pkt(1, data, checksum)


udt_send(sndpkt)

Transport Layer 3-33


rdt2.1: receiver, handles garbled ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)

extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)

Transport Layer 3-34


rdt2.1: discussion
sender: receiver:
▪ seq # added to pkt ▪ must check if received
▪ two seq. #’s (0,1) will packet is duplicate
suffice. Why? • state indicates whether
0 or 1 is expected pkt
▪ must check if received seq #
ACK/NAK corrupted
▪ note: receiver can not
▪ twice as many states know if its last
• state must ACK/NAK received
“remember” whether OK at sender
“expected” pkt should
have seq # of 0 or 1

Transport Layer 3-35


rdt2.2: a NAK-free protocol
▪ same functionality as rdt2.1, using ACKs only
▪ instead of NAK, receiver sends ACK for last pkt
received OK
• receiver must explicitly include seq # of pkt being ACKed
▪ duplicate ACK at sender results in same action as
NAK: retransmit current pkt

Transport Layer 3-36


rdt2.2: sender, receiver fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for Wait for
ACK isACK(rcvpkt,1) )
call 0 from
above 0 udt_send(sndpkt)
sender FSM
fragment rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && && isACK(rcvpkt,0)
(corrupt(rcvpkt) || L
has_seq1(rcvpkt)) Wait for receiver FSM
0 from
udt_send(sndpkt) below fragment
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt) Transport Layer 3-37
rdt3.0: channels with errors and loss

new assumption: approach: sender waits


underlying channel can “reasonable” amount of
also lose packets (data, time for ACK
ACKs) ▪ retransmits if no ACK
• checksum, seq. #, received in this time
ACKs, retransmissions ▪ if pkt (or ACK) just delayed
(not lost):
will be of help … but
not enough • retransmission will be
duplicate, but seq. #’s
already handles this
• receiver must specify seq
# of pkt being ACKed
▪ requires countdown timer

Transport Layer 3-38


rdt3.0 sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) start_timer L
L Wait for Wait
for timeout
call 0from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_send(data) L
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) || sndpkt = make_pkt(1, data, checksum)
isACK(rcvpkt,0) ) udt_send(sndpkt)
start_timer
L

Transport Layer 3-39


rdt3.0 in action
sender receiver sender receiver
send pkt0 pkt0 send pkt0 pkt0
rcv pkt0 rcv pkt0
ack0 send ack0 ack0 send ack0
rcv ack0 rcv ack0
send pkt1 pkt1 send pkt1 pkt1
rcv pkt1 X
ack1 send ack1 loss
rcv ack1
send pkt0 pkt0
rcv pkt0 timeout
ack0 send ack0 resend pkt1 pkt1
rcv pkt1
ack1 send ack1
rcv ack1
send pkt0 pkt0
(a) no loss rcv pkt0
ack0 send ack0

(b) packet loss


Transport Layer 3-40
rdt3.0 in action
sender receiver
sender receiver send pkt0 pkt0
send pkt0 pkt0 rcv pkt0
send ack0
rcv pkt0 ack0
send ack0 rcv ack0
ack0 send pkt1 pkt1
rcv ack0 rcv pkt1
send pkt1 pkt1
send ack1
rcv pkt1 ack1
ack1 send ack1
X
loss timeout
resend pkt1 pkt1
rcv pkt1
timeout
resend pkt1 pkt1 rcv ack1 (detect duplicate)
rcv pkt1 send pkt0
pkt0
send ack1
(detect duplicate) ack1
ack1 send ack1 rcv ack1 rcv pkt0
rcv ack1 send pkt0
ack0 send ack0
send pkt0 pkt0 pkt0
rcv pkt0
rcv pkt0 ack0 (detect duplicate)
ack0 send ack0 send ack0

(c) ACK loss (d) premature timeout/ delayed ACK

Transport Layer 3-41


Performance of rdt3.0
▪ rdt3.0 is correct, but performance stinks
▪ e.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit packet:
L 8000 bits
Dtrans = R = = 8 microsecs
109 bits/sec

▪ U sender: utilization – fraction of time sender busy sending

U L/R .008
sender = = = 0.00027
RTT + L / R 30.008

▪ if RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec thruput


over 1 Gbps link
▪ network protocol limits use of physical resources!
Transport Layer 3-42
rdt3.0: stop-and-wait operation
sender receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK

ACK arrives, send next


packet, t = RTT + L / R

U L/R .008
sender = = = 0.00027
RTT + L / R 30.008

Transport Layer 3-43


Pipelined protocols
pipelining: sender allows multiple, “in-flight”, yet-
to-be-acknowledged pkts
• range of sequence numbers must be increased
• buffering at sender and/or receiver

▪ two generic forms of pipelined protocols: go-Back-N,


selective repeat
Transport Layer 3-44
Pipelining: increased utilization
sender receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R

first packet bit arrives


RTT last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
ACK arrives, send next
packet, t = RTT + L / R
3-packet pipelining increases
utilization by a factor of 3!

U 3L / R .0024
sender = = = 0.00081
RTT + L / R 30.008

Transport Layer 3-45


Pipelined protocols: overview
Go-back-N: Selective Repeat:
▪ sender can have up to ▪ sender can have up to N
N unacked packets in unack’ed packets in
pipeline pipeline
▪ receiver only sends ▪ rcvr sends individual ack
cumulative ack for each packet
• doesn’t ack packet if
there’s a gap
▪ sender has timer for ▪ sender maintains timer
oldest unacked packet for each unacked packet
• when timer expires, • when timer expires,
retransmit all unacked retransmit only that
packets unacked packet

Transport Layer 3-46


Go-Back-N: sender
▪ k-bit seq # in pkt header
▪ “window” of up to N, consecutive unack’ed pkts allowed

▪ ACK(n): ACKs all pkts up to, including seq # n - “cumulative


ACK”
• may receive duplicate ACKs (see receiver)
▪ timer for oldest in-flight pkt
▪ timeout(n): retransmit packet n and all higher seq # pkts in
window
Transport Layer 3-47
GBN: sender extended FSM
rdt_send(data)
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
start_timer
nextseqnum++
}
L else
refuse_data(data)
base=1
nextseqnum=1
timeout
start_timer
Wait
udt_send(sndpkt[base])
rdt_rcv(rcvpkt) udt_send(sndpkt[base+1])
&& corrupt(rcvpkt) …
udt_send(sndpkt[nextseqnum-1])
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1
If (base == nextseqnum)
stop_timer
else
start_timer
Transport Layer 3-48
GBN: receiver extended FSM
default
udt_send(sndpkt) rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
L && hasseqnum(rcvpkt,expectedseqnum)
expectedseqnum=1 Wait extract(rcvpkt,data)
sndpkt = deliver_data(data)
make_pkt(expectedseqnum,ACK,chksum) sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++

ACK-only: always send ACK for correctly-received


pkt with highest in-order seq #
• may generate duplicate ACKs
• need only remember expectedseqnum
▪ out-of-order pkt:
• discard (don’t buffer): no receiver buffering!
• re-ACK pkt with highest in-order seq #
Transport Layer 3-49
GBN in action
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
012345678 send pkt2 receive pkt0, send ack0
012345678 send pkt3 Xloss receive pkt1, send ack1
(wait)
receive pkt3, discard,
012345678 rcv ack0, send pkt4 (re)send ack1
012345678 rcv ack1, send pkt5 receive pkt4, discard,
(re)send ack1
ignore duplicate ACK receive pkt5, discard,
(re)send ack1
pkt 2 timeout
012345678 send pkt2
012345678 send pkt3
012345678 send pkt4 rcv pkt2, deliver, send ack2
012345678 send pkt5 rcv pkt3, deliver, send ack3
rcv pkt4, deliver, send ack4
rcv pkt5, deliver, send ack5

Transport Layer 3-50


Selective repeat
▪ receiver individually acknowledges all correctly
received pkts
• buffers pkts, as needed, for eventual in-order delivery
to upper layer
▪ sender only resends pkts for which ACK not
received
• sender timer for each unACKed pkt
▪ sender window
• N consecutive seq #’s
• limits seq #s of sent, unACKed pkts

Transport Layer 3-51


Selective repeat: sender, receiver windows

Transport Layer 3-52


Selective repeat
sender receiver
data from above: pkt n in [rcvbase, rcvbase+N-1]
▪ if next available seq # in ▪ send ACK(n)
window, send pkt ▪ out-of-order: buffer
timeout(n): ▪ in-order: deliver (also
▪ resend pkt n, restart timer deliver buffered, in-order
pkts), advance window to
ACK(n) in [sendbase,sendbase+N]: next not-yet-received pkt
▪ mark pkt n as received pkt n in [rcvbase-N,rcvbase-1]
▪ if n smallest unACKed pkt,
▪ ACK(n)
advance window base to
next unACKed seq # otherwise:
▪ ignore

Transport Layer 3-53


Selective repeat in action
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
012345678 send pkt2 receive pkt0, send ack0
012345678 send pkt3 Xloss receive pkt1, send ack1
(wait)
receive pkt3, buffer,
012345678 rcv ack0, send pkt4 send ack3
012345678 rcv ack1, send pkt5 receive pkt4, buffer,
send ack4
record ack3 arrived receive pkt5, buffer,
send ack5
pkt 2 timeout
012345678 send pkt2
012345678 record ack4 arrived
012345678 rcv pkt2; deliver pkt2,
record ack5 arrived
012345678 pkt3, pkt4, pkt5; send ack2

Q: what happens when ack2 arrives?

Transport Layer 3-54


sender window receiver window
Selective repeat: (after receipt) (after receipt)

dilemma 0123012 pkt0


pkt1
0123012 0123012
pkt2 0123012
example:
0123012
0123012
pkt3
▪ seq #’s: 0, 1, 2, 3
0123012
X
0123012
▪ window size=3 pkt0 will accept packet
with seq number 0
(a) no problem
▪ receiver sees no
difference in two receiver can’t see sender side.
scenarios! receiver behavior identical in both cases!
something’s (very) wrong!
▪ duplicate data
accepted as new in (b) 0123012 pkt0
0123012 pkt1 0123012
pkt2
Q: what relationship 0123012
X
0123012
0123012
between seq # size X
and window size to timeout
retransmit pkt0 X
avoid problem in (b)? 0123012 pkt0
will accept packet
with seq number 0
(b) oops!
Transport Layer 3-55
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP • connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-56


TCP: Overview RFCs: 793,1122,1323, 2018, 2581

▪ point-to-point: ▪ full duplex data:


• one sender, one receiver • bi-directional data flow
▪ reliable, in-order byte in same connection
steam: • MSS: maximum segment
size
• no “message
boundaries” ▪ connection-oriented:
▪ pipelined: • handshaking (exchange
of control msgs) inits
• TCP congestion and sender, receiver state
flow control set window before data exchange
size
▪ flow controlled:
• sender will not
overwhelm receiver
Transport Layer 3-57
TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UAP R S F receive window
(generally not used) # bytes
checksum Urg data pointer
rcvr willing
RST, SYN, FIN: to accept
options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)

Transport Layer 3-58


TCP seq. numbers, ACKs
outgoing segment from sender
sequence numbers: source port # dest port #
sequence number
• byte stream “number” of acknowledgement number

first byte in segment’s rwnd

data
checksum urg pointer

window size
acknowledgements: N

• seq # of next byte


expected from other side sender sequence number space
• cumulative ACK
sent sent, not- usable not
Q: how receiver handles ACKed yet ACKed but not usable
out-of-order segments (“in-
flight”)
yet sent

• A: TCP spec doesn’t say, incoming segment to sender


- up to implementor source port # dest port #
sequence number
acknowledgement number
A rwnd
checksum urg pointer

Transport Layer 3-59


TCP seq. numbers, ACKs
Host A Host B

User
types
‘C’ Seq=42, ACK=79, data = ‘C’
host ACKs
receipt of
‘C’, echoes
Seq=79, ACK=43, data = ‘C’ back ‘C’
host ACKs
receipt
of echoed
‘C’ Seq=43, ACK=80

simple telnet scenario

Transport Layer 3-60


TCP round trip time, timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value? ▪ SampleRTT: measured
time from segment
▪ longer than RTT transmission until ACK
• but RTT varies receipt
▪ too short: premature • ignore retransmissions
timeout, unnecessary ▪ SampleRTT will vary, want
retransmissions estimated RTT “smoother”
• average several recent
▪ too long: slow reaction measurements, not just
to segment loss current SampleRTT

Transport Layer 3-61


TCP round trip time, timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
▪ exponential weighted moving average
▪ influence of past sample decreases exponentially fast
▪ typical value:  = 0.125 RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

350

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr


RTT (milliseconds)

300

250
RTT (milliseconds)

200

sampleRTT
150

EstimatedRTT

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
time (seconds) Transport Layer 3-62
SampleRTT Estimated RTT
TCP round trip time, timeout
▪ timeout interval: EstimatedRTT plus “safety margin”
• large variation in EstimatedRTT -> larger safety margin
▪ estimate SampleRTT deviation from EstimatedRTT:
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically,  = 0.25)

TimeoutInterval = EstimatedRTT + 4*DevRTT

estimated RTT “safety margin”

* Check out the online interactive exercises for more


examples: http://gaia.cs.umass.edu/kurose_ross/interactive/ Transport Layer 3-63
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP • connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-64


TCP reliable data transfer
▪ TCP creates rdt service
on top of IP’s unreliable
service
• pipelined segments
• cumulative acks let’s initially consider
• single retransmission simplified TCP sender:
timer • ignore duplicate acks
▪ retransmissions • ignore flow control,
triggered by: congestion control
• timeout events
• duplicate acks

Transport Layer 3-65


TCP sender events:
data rcvd from app: timeout:
▪ create segment with ▪ retransmit segment
seq # that caused timeout
▪ seq # is byte-stream ▪ restart timer
number of first data ack rcvd:
byte in segment ▪ if ack acknowledges
▪ start timer if not previously unacked
already running segments
• think of timer as for • update what is known
oldest unacked to be ACKed
segment
• start timer if there are
• expiration interval: still unacked segments
TimeOutInterval

Transport Layer 3-66


TCP sender (simplified)
data received from application above
create segment, seq. #: NextSeqNum
pass segment to IP (i.e., “send”)
NextSeqNum = NextSeqNum + length(data)
if (timer currently not running)
L start timer
NextSeqNum = InitialSeqNum wait
SendBase = InitialSeqNum for
event timeout
retransmit not-yet-acked segment
with smallest seq. #
start timer
ACK received, with ACK field value y
if (y > SendBase) {
SendBase = y
/* SendBase–1: last cumulatively ACKed byte */
if (there are currently not-yet-acked segments)
start timer
else stop timer
} Transport Layer 3-67
TCP: retransmission scenarios
Host A Host B Host A Host B

SendBase=92
Seq=92, 8 bytes of data Seq=92, 8 bytes of data

Seq=100, 20 bytes of data


timeout

timeout
ACK=100
X
ACK=100
ACK=120

Seq=92, 8 bytes of data Seq=92, 8


SendBase=100 bytes of data
SendBase=120
ACK=100
ACK=120

SendBase=120

lost ACK scenario premature timeout


Transport Layer 3-68
TCP: retransmission scenarios
Host A Host B

Seq=92, 8 bytes of data

Seq=100, 20 bytes of data


timeout

ACK=100
X
ACK=120

Seq=120, 15 bytes of data

cumulative ACK
Transport Layer 3-69
TCP ACK generation [RFC 1122, RFC 2581]

event at receiver TCP receiver action


arrival of in-order segment with delayed ACK. Wait up to 500ms
expected seq #. All data up to for next segment. If no next segment,
expected seq # already ACKed send ACK

arrival of in-order segment with immediately send single cumulative


expected seq #. One other ACK, ACKing both in-order segments
segment has ACK pending

arrival of out-of-order segment immediately send duplicate ACK,


higher-than-expect seq. # . indicating seq. # of next expected byte
Gap detected

arrival of segment that immediate send ACK, provided that


partially or completely fills gap segment starts at lower end of gap

Transport Layer 3-70


TCP fast retransmit
▪ time-out period often
relatively long: TCP fast retransmit
• long delay before if sender receives 3
resending lost packet ACKs for same data
▪ detect lost segments (“triple
(“triple duplicate
duplicate ACKs”),
ACKs”),
via duplicate ACKs. resend unacked
• sender often sends segment with smallest
many segments back- seq #
to-back
▪ likely that unacked
• if segment is lost, there segment lost, so don’t
will likely be many wait for timeout
duplicate ACKs.

Transport Layer 3-71


TCP fast retransmit
Host A Host B

Seq=92, 8 bytes of data


Seq=100, 20 bytes of data
X

ACK=100
timeout

ACK=100
ACK=100
ACK=100
Seq=100, 20 bytes of data

fast retransmit after sender


receipt of triple duplicate ACK
Transport Layer 3-72
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP • connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-73


TCP flow control
application
application may process
remove data from application
TCP socket buffers ….
TCP socket OS
receiver buffers
… slower than TCP
receiver is delivering
(sender is sending) TCP
code

IP
flow control code
receiver controls sender, so
sender won’t overflow
receiver’s buffer by transmitting from sender
too much, too fast
receiver protocol stack

Transport Layer 3-74


TCP flow control
▪ receiver “advertises” free
buffer space by including to application process
rwnd value in TCP header
of receiver-to-sender
segments RcvBuffer buffered data
• RcvBuffer size set via
socket options (typical default rwnd free buffer space
is 4096 bytes)
• many operating systems
autoadjust RcvBuffer TCP segment payloads
▪ sender limits amount of
unacked (“in-flight”) data to receiver-side buffering
receiver’s rwnd value
▪ guarantees receive buffer
will not overflow
Transport Layer 3-75
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP • connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-76


Connection Management
before exchanging data, sender/receiver “handshake”:
▪ agree to establish connection (each knowing the other willing
to establish connection)
▪ agree on connection parameters

application application

connection state: ESTAB connection state: ESTAB


connection variables: connection Variables:
seq # client-to-server seq # client-to-server
server-to-client server-to-client
rcvBuffer size rcvBuffer size
at server,client at server,client

network network

Socket clientSocket = Socket connectionSocket =


newSocket("hostname","port welcomeSocket.accept();
number");

Transport Layer 3-77


Agreeing to establish a connection

2-way handshake:
Q: will 2-way handshake
always work in
network?
Let’s talk
ESTAB ▪ variable delays
OK
ESTAB ▪ retransmitted messages (e.g.
req_conn(x)) due to
message loss
▪ message reordering
choose x
req_conn(x)
▪ can’t “see” other side
ESTAB
acc_conn(x)
ESTAB

Transport Layer 3-78


Agreeing to establish a connection
2-way handshake failure scenarios:

choose x choose x
req_conn(x) req_conn(x)
ESTAB ESTAB
retransmit acc_conn(x) retransmit acc_conn(x)
req_conn(x) req_conn(x)

ESTAB ESTAB
data(x+1) accept
req_conn(x)
retransmit data(x+1)
data(x+1)
connection connection
client x completes server x completes server
client
terminates forgets x terminates forgets x
req_conn(x)

ESTAB ESTAB
data(x+1) accept
half open connection! data(x+1)
(no client!)
Transport Layer 3-79
TCP 3-way handshake

client state server state


LISTEN LISTEN
choose init seq num, x
send TCP SYN msg
SYNSENT SYNbit=1, Seq=x
choose init seq num, y
send TCP SYNACK
msg, acking SYN SYN RCVD
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
ESTAB indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, ACKnum=y+1
client-to-server data
received ACK(y)
indicates client is live
ESTAB

Transport Layer 3-80


TCP 3-way handshake: FSM

closed

Socket connectionSocket =
welcomeSocket.accept();

L Socket clientSocket =
SYN(x) newSocket("hostname","port
number");
SYNACK(seq=y,ACKnum=x+1)
create new socket for SYN(seq=x)
communication back to client listen

SYN SYN
rcvd sent

SYNACK(seq=y,ACKnum=x+1)
ESTAB ACK(ACKnum=y+1)
ACK(ACKnum=y+1)
L

Transport Layer 3-81


TCP: closing a connection
▪ client, server each close their side of connection
• send TCP segment with FIN bit = 1
▪ respond to received FIN with ACK
• on receiving FIN, ACK can be combined with own FIN
▪ simultaneous FIN exchanges can be handled

Transport Layer 3-82


TCP: closing a connection
client state server state
ESTAB ESTAB
clientSocket.close()
FIN_WAIT_1 can no longer FINbit=1, seq=x
send but can
receive data CLOSE_WAIT
ACKbit=1; ACKnum=x+1
can still
FIN_WAIT_2 wait for server send data
close

LAST_ACK
FINbit=1, seq=y
TIMED_WAIT can no longer
send data
ACKbit=1; ACKnum=y+1
timed wait
for 2*max CLOSED
segment lifetime

CLOSED

Transport Layer 3-83


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP • connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-84


Principles of congestion control
congestion:
▪ informally: “too many sources sending too much
data too fast for network to handle”
▪ different from flow control!
▪ manifestations:
• lost packets (buffer overflow at routers)
• long delays (queueing in router buffers)
▪ a top-10 problem!

Transport Layer 3-85


Causes/costs of congestion: scenario 1
original data: lin throughput: lout
▪ two senders, two
receivers Host A

▪ one router, infinite buffers unlimited shared


▪ output link capacity: R output link buffers

▪ no retransmission

Host B

R/2

delay
lout

lin R/2 lin R/2


▪ maximum per-connection ❖ large delays as arrival rate, lin,
throughput: R/2 approaches capacity
Transport Layer 3-86
Causes/costs of congestion: scenario 2
▪ one router, finite buffers
▪ sender retransmission of timed-out packet
• application-layer input = application-layer output: lin =
lout
• transport-layer input includes retransmissions : l‘in lin

lin : original data


lout
l'in: original data, plus
retransmitted data

Host A

finite shared output


Host B
link buffers
Transport Layer 3-87
Causes/costs of congestion: scenario 2
R/2
idealization: perfect
knowledge

lout
▪ sender sends only when
router buffers available
lin R/2

lin : original data


lout
copy l'in: original data, plus
retransmitted data

A free buffer space!

finite shared output


Host B
link buffers
Transport Layer 3-88
Causes/costs of congestion: scenario 2
Idealization: known loss
packets can be lost,
dropped at router due
to full buffers
▪ sender only resends if
packet known to be lost

lin : original data


lout
copy l'in: original data, plus
retransmitted data

A
no buffer space!

Host B
Transport Layer 3-89
Causes/costs of congestion: scenario 2
Idealization: known loss R/2
packets can be lost,
dropped at router due when sending at R/2,
some packets are

lout
to full buffers retransmissions but

▪ sender only resends if


asymptotic goodput
is still R/2 (why?)
packet known to be lost lin R/2

lin : original data


lout
l'in: original data, plus
retransmitted data

A
free buffer space!

Host B
Transport Layer 3-90
Causes/costs of congestion: scenario 2
Realistic: duplicates R/2
▪ packets can be lost, dropped at
router due to full buffers when sending at R/2,
some packets are

lout
▪ sender times out prematurely, retransmissions

sending two copies, both of including duplicated


that are delivered!
which are delivered lin R/2

lin
timeout
copy l'in lout

A
free buffer space!

Host B
Transport Layer 3-91
Causes/costs of congestion: scenario 2
Realistic: duplicates R/2
▪ packets can be lost, dropped at
router due to full buffers when sending at R/2,
some packets are

lout
▪ sender times out prematurely, retransmissions

sending two copies, both of including duplicated


that are delivered!
which are delivered lin R/2

“costs” of congestion:
▪ more work (retrans) for given “goodput”
▪ unneeded retransmissions: link carries multiple copies of pkt
• decreasing goodput

Transport Layer 3-92


Causes/costs of congestion: scenario 3
▪ four senders Q: what happens as lin and lin’
increase ?
▪ multihop paths
A: as red lin’ increases, all arriving
▪ timeout/retransmit blue pkts at upper queue are
dropped, blue throughput g 0
Host A
lin : original data lout
Host B
l'in: original data, plus
retransmitted data
finite shared output
link buffers

Host D
Host C

Transport Layer 3-93


Causes/costs of congestion: scenario 3

C/2
lout

lin’ C/2

another “cost” of congestion:


▪ when packet dropped, any “upstream
transmission capacity used for that packet was
wasted!

Transport Layer 3-94


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP • connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-95


TCP congestion control: additive increase
multiplicative decrease
▪ approach: sender increases transmission rate (window
size), probing for usable bandwidth, until loss occurs
• additive increase: increase cwnd by 1 MSS every
RTT until loss detected
• multiplicative decrease: cut cwnd in half after loss
additively increase window size …
…. until loss occurs (then cut window in half)
congestion window size
cwnd: TCP sender

AIMD saw tooth


behavior: probing
for bandwidth

time
Transport Layer 3-96
TCP Congestion Control: details
sender sequence number space
cwnd TCP sending rate:
▪ roughly: send cwnd
bytes, wait RTT for
last byte last byte
ACKS, then send
ACKed sent, not-
yet ACKed
sent more bytes
(“in-
flight”) cwnd
▪ sender limits transmission: rate ~
~
RTT
bytes/sec

LastByteSent- < cwnd


LastByteAcked

▪ cwnd is dynamic, function


of perceived network
congestion
Transport Layer 3-97
TCP Slow Start
Host A Host B
▪ when connection begins,
increase rate
exponentially until first
loss event:

RTT
• initially cwnd = 1 MSS
• double cwnd every RTT
• done by incrementing
cwnd for every ACK
received
▪ summary: initial rate is
slow but ramps up
exponentially fast time

Transport Layer 3-98


TCP: detecting, reacting to loss
▪ loss indicated by timeout:
• cwnd set to 1 MSS;
• window then grows exponentially (as in slow start)
to threshold, then grows linearly
▪ loss indicated by 3 duplicate ACKs: TCP RENO
• dup ACKs indicate network capable of delivering
some segments
• cwnd is cut in half window then grows linearly
▪ TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)

Transport Layer 3-99


TCP: switching from slow start to CA
Q: when should the
exponential
increase switch to
linear?
A: when cwnd gets
to 1/2 of its value
before timeout.

Implementation:
▪ variable ssthresh
▪ on loss event, ssthresh
is set to 1/2 of cwnd just
before loss event

* Check out the online interactive exercises for more


examples: http://gaia.cs.umass.edu/kurose_ross/interactive/ Transport Layer 3-100
Summary: TCP Congestion Control
New
New ACK!
ACK! new ACK
duplicate ACK
dupACKcount++ new ACK
.
cwnd = cwnd + MSS (MSS/cwnd)
dupACKcount = 0
cwnd = cwnd+MSS transmit new segment(s), as allowed
dupACKcount = 0
L transmit new segment(s), as allowed
cwnd = 1 MSS
ssthresh = 64 KB cwnd > ssthresh
dupACKcount = 0 slow L congestion
start timeout avoidance
ssthresh = cwnd/2
cwnd = 1 MSS duplicate ACK
timeout dupACKcount = 0 dupACKcount++
ssthresh = cwnd/2 retransmit missing segment
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
timeout
New
ACK!
ssthresh = cwnd/2
cwnd = 1 New ACK
dupACKcount = 0
cwnd = ssthresh dupACKcount == 3
dupACKcount == 3 retransmit missing segment dupACKcount = 0
ssthresh= cwnd/2 ssthresh= cwnd/2
cwnd = ssthresh + 3 cwnd = ssthresh + 3
retransmit missing segment retransmit missing segment
fast
recovery
duplicate ACK
cwnd = cwnd + MSS
transmit new segment(s), as allowed

Transport Layer 3-101


TCP throughput
▪ avg. TCP thruput as function of window size, RTT?
• ignore slow start, assume always data to send
▪ W: window size (measured in bytes) where loss occurs
• avg. window size (# in-flight bytes) is ¾ W
• avg. thruput is 3/4W per RTT
3 W
avg TCP thruput = bytes/sec
4 RTT

W/2

Transport Layer 3-102


TCP Futures: TCP over “long, fat pipes”

▪ example: 1500 byte segments, 100ms RTT, want


10 Gbps throughput
▪ requires W = 83,333 in-flight segments
▪ throughput in terms of segment loss probability, L
[Mathis 1997]:
1.22 . MSS
TCP throughput =
RTT L

➜ to achieve 10 Gbps throughput, need a loss rate of L


= 2·10-10 – a very small loss rate!
▪ new versions of TCP for high-speed

Transport Layer 3-103


TCP Fairness
fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K

TCP connection 1

bottleneck
router
capacity R
TCP connection 2

Transport Layer 3-104


Why is TCP fair?
two competing sessions:
▪ additive increase gives slope of 1, as throughout increases
▪ multiplicative decrease decreases throughput proportionally

R equal bandwidth share

loss: decrease window by factor of 2


congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase

Connection 1 throughput R
Transport Layer 3-105
Fairness (more)
Fairness and UDP Fairness, parallel TCP
▪ multimedia apps often connections
do not use TCP ▪ application can open
• do not want rate multiple parallel
throttled by congestion connections between
control
two hosts
▪ instead use UDP:
• send audio/video at
▪ web browsers do this
constant rate, tolerate ▪ e.g., link of rate R with 9
packet loss existing connections:
• new app asks for 1 TCP, gets
rate R/10
• new app asks for 11 TCPs,
gets R/2

Transport Layer 3-106


Explicit Congestion Notification (ECN)
network-assisted congestion control:
▪ two bits in IP header (ToS field) marked by network router
to indicate congestion
▪ congestion indication carried to receiving host
▪ receiver (seeing congestion indication in IP datagram) )
sets ECE bit on receiver-to-sender ACK segment to
notify sender of congestion
TCP ACK segment
source destination
application application
ECE=1
transport transport
network network
link link
physical physical

ECN=00 ECN=11

IP datagram
Transport Layer 3-107
Chapter 3: summary
▪ principles behind transport
layer services: next:
• multiplexing, ▪ leaving the network
demultiplexing “edge” (application,
• reliable data transfer transport layers)
• flow control ▪ into the network
• congestion control “core”
▪ instantiation, ▪ two network layer
implementation in the chapters:
Internet • data plane
• UDP • control plane
• TCP
Transport Layer 3-108
Chapter 4
Network Layer:
The Data Plane

A note on the use of these Powerpoint slides:


We’re making these slides freely available to all (faculty, students, readers).
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs.
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Computer
▪ If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!)
Networking: A Top
▪ If you post any slides on a www site, that you note that they are adapted
from (or perhaps identical to) our slides, and note our copyright of this Down Approach
material.
7th Edition, Global Edition
Thanks and enjoy! JFK/KWR
Jim Kurose, Keith Ross
All material copyright 1996-2016 Pearson
J.F Kurose and K.W. Ross, All Rights Reserved April 2016
Network Layer: Data Plane 4-1
Chapter 4: outline
4.1 Overview of Network 4.4 Generalized Forward and
layer SDN
• data plane • match
• control plane • action
4.2 What’s inside a router • OpenFlow examples
4.3 IP: Internet Protocol of match-plus-action in
• datagram format action
• fragmentation
• IPv4 addressing
• network address
translation
• IPv6

Network Layer: Data Plane 4-2


Chapter 4: network layer
chapter goals:
▪ understand principles behind network layer
services, focusing on data plane:
• network layer service models
• forwarding versus routing
• how a router works
• generalized forwarding
▪ instantiation, implementation in the Internet

Network Layer: Data Plane 4-3


Network layer
application

▪ transport segment from transport


network

sending to receiving host data link


physical
network network

▪ on sending side network


data link
data link
physical
data link
physical

encapsulates segments physical network


data link
network
data link

into datagrams physical physical

▪ on receiving side, delivers network


data link
network
data link

segments to transport
physical physical
network
data link

layer network
physical
application
transport
▪ network layer protocols network
data link
physical
network
data link
network
data link

in every host, router data link


physical
physical physical

▪ router examines header


fields in all IP datagrams
passing through it
Network Layer: Data Plane 4-4
Two key network-layer functions

network-layer functions: analogy: taking a trip


▪forwarding: move packets ▪ forwarding: process of
from router’s input to getting through single
appropriate router output interchange
▪routing: determine route
taken by packets from ▪ routing: process of
source to destination planning trip from source
• routing algorithms to destination

Network Layer: Data Plane 4-5


Network layer: data plane, control plane

Data plane Control plane


▪ local, per-router function ▪ network-wide logic
▪ determines how datagram ▪ determines how datagram is
arriving on router input routed among routers along
port is forwarded to end-end path from source host
router output port to destination host
▪ forwarding function ▪ two control-plane approaches:
• traditional routing algorithms:
values in arriving
packet header implemented in routers
• software-defined networking
0111 1
(SDN): implemented in
(remote) servers
2
3

Network Layer: Data Plane 4-6


Per-router control plane
Individual routing algorithm components in each and every
router interact in the control plane

Routing
Algorithm
control
plane

data
plane

values in arriving
packet header
0111 1
2
3

Network Layer: Control Plane 5-7


Logically centralized control plane
A distinct (typically remote) controller interacts with local
control agents (CAs)

Remote Controller

control
plane

data
plane

CA
CA CA CA CA
values in arriving
packet header

0111 1
2
3

Network Layer: Control Plane 5-8


Network service model
Q: What service model for “channel” transporting
datagrams from sender to receiver?
example services for example services for a flow
individual datagrams: of datagrams:
▪ guaranteed delivery ▪ in-order datagram
▪ guaranteed delivery with delivery
less than 40 msec delay ▪ guaranteed minimum
bandwidth to flow
▪ restrictions on changes in
inter-packet spacing

Network Layer: Data Plane 4-9


Network layer service models:
Guarantees ?
Network Service Congestion
Architecture Model Bandwidth Loss Order Timing feedback

Internet best effort none no no no no (inferred


via loss)
ATM CBR constant yes yes yes no
rate congestion
ATM VBR guaranteed yes yes yes no
rate congestion
ATM ABR guaranteed no yes no yes
minimum
ATM UBR none no yes no no

Network Layer: Data Plane 4-10


Chapter 4: outline
4.1 Overview of Network 4.4 Generalized Forward and
layer SDN
• data plane • match
• control plane • action
4.2 What’s inside a router • OpenFlow examples
4.3 IP: Internet Protocol of match-plus-action in
• datagram format action
• fragmentation
• IPv4 addressing
• network address
translation
• IPv6

Network Layer: Data Plane 4-11


Router architecture overview
▪ high-level view of generic router architecture:
routing, management
routing control plane (software)
processor operates in millisecond
time frame
forwarding data plane
(hardware) operttes in
nanosecond
timeframe
high-seed
switching
fabric

router input ports router output ports

Network Layer: Data Plane 4-12


Input port functions
lookup,
link forwarding
line layer switch
termination protocol fabric
(receive)
queueing

physical layer:
bit-level reception
data link layer: decentralized switching:
e.g., Ethernet ▪ using header field values, lookup output
see chapter 5 port using forwarding table in input port
memory (“match plus action”)
▪ goal: complete input port processing at
‘line speed’
▪ queuing: if datagrams arrive faster than
forwarding rate into switch fabric
Network Layer: Data Plane 4-13
Input port functions
lookup,
link forwarding
line layer switch
termination protocol fabric
(receive)
queueing

physical layer:
bit-level reception
decentralized switching:
data link layer: ▪ using header field values, lookup output
e.g., Ethernet port using forwarding table in input port
see chapter 5 memory (“match plus action”)
▪ destination-based forwarding: forward based
only on destination IP address (traditional)
▪ generalized forwarding: forward based on
any set of header field values

Network Layer: Data Plane 4-14


Destination-based forwarding
forwarding table
Destination Address Range Link Interface

11001000 00010111 00010000 00000000


through 0
11001000 00010111 00010111 11111111

11001000 00010111 00011000 00000000


through 1
11001000 00010111 00011000 11111111

11001000 00010111 00011001 00000000


2
through
11001000 00010111 00011111 11111111

otherwise 3

Q: but what happens if ranges don’t divide up so nicely?


Network Layer: Data Plane 4-15
Longest prefix matching
longest prefix matching
when looking for forwarding table entry for given
destination address, use longest address prefix that
matches destination address.

Destination Address Range Link interface


11001000 00010111 00010*** ********* 0

11001000 00010111 00011000 ********* 1

11001000 00010111 00011*** ********* 2


3
otherwise

examples:
DA: 11001000 00010111 00010110 10100001 which interface?
DA: 11001000 00010111 00011000 10101010 which interface?
Network Layer: Data Plane 4-16
Longest prefix matching
▪ we’ll see why longest prefix matching is used
shortly, when we study addressing
▪ longest prefix matching: often performed using
ternary content addressable memories (TCAMs)
• content addressable: present address to TCAM: retrieve
address in one clock cycle, regardless of table size
• Cisco Catalyst: can up ~1M routing table entries in
TCAM

Network Layer: Data Plane 4-17


Switching fabrics
▪ transfer packet from input buffer to appropriate
output buffer
▪ switching rate: rate at which packets can be
transfer from inputs to outputs
• often measured as multiple of input/output line rate
• N inputs: switching rate N times line rate desirable
▪ three types of switching fabrics

memory

memory bus crossbar

Network Layer: Data Plane 4-18


Switching via memory
first generation routers:
▪ traditional computers with switching under direct control
of CPU
▪ packet copied to system’s memory
▪ speed limited by memory bandwidth (2 bus crossings per
datagram)

input output
port memory port
(e.g., (e.g.,
Ethernet) Ethernet)

system bus

Network Layer: Data Plane 4-19


Switching via a bus
▪ datagram from input port memory
to output port memory via a
shared bus
▪ bus contention: switching speed
limited by bus bandwidth
▪ 32 Gbps bus, Cisco 5600: sufficient bus
speed for access and enterprise
routers

Network Layer: Data Plane 4-20


Switching via interconnection network
▪ overcome bus bandwidth limitations
▪ banyan networks, crossbar, other
interconnection nets initially
developed to connect processors in
multiprocessor
▪ advanced design: fragmenting
datagram into fixed length cells, crossbar
switch cells through the fabric.
▪ Cisco 12000: switches 60 Gbps
through the interconnection
network

Network Layer: Data Plane 4-21


Input port queuing
▪ fabric slower than input ports combined -> queueing may
occur at input queues
• queueing delay and loss due to input buffer overflow!
▪ Head-of-the-Line (HOL) blocking: queued datagram at front
of queue prevents others in queue from moving forward

switch switch
fabric fabric

output port contention: one packet time later:


only one red datagram can be green packet
transferred. experiences HOL
lower red packet is blocked blocking

Network Layer: Data Plane 4-22


Output ports This slide in HUGELY important!

datagram
switch buffer link
fabric layer line
protocol termination
queueing (send)

▪ buffering required when datagrams


Datagram arrive
(packets) can be lost
from fabric faster than the
due to transmission
congestion, lack of buffers
rate
▪ scheduling discipline chooses
Priority among
scheduling – who queued
gets best
datagrams for transmission
performance, network neutrality

Network Layer: Data Plane 4-23


Output port queueing

switch
switch
fabric
fabric

at t, packets more one packet time later


from input to output

▪ buffering when arrival rate via switch exceeds


output line speed
▪ queueing (delay) and loss due to output port buffer
overflow!
Network Layer: Data Plane 4-24
How much buffering?
▪ RFC 3439 rule of thumb: average buffering equal
to “typical” RTT (say 250 msec) times link
capacity C
• e.g., C = 10 Gpbs link: 2.5 Gbit buffer
▪ recent recommendation: with N flows, buffering
equal to
RTT . C
N

Network Layer: Data Plane 4-25


Scheduling mechanisms
▪ scheduling: choose next packet to send on link
▪ FIFO (first in first out) scheduling: send in order of
arrival to queue
• real-world example?
• discard policy: if packet arrives to full queue: who to discard?
• tail drop: drop arriving packet
• priority: drop/remove on priority basis
• random: drop/remove randomly

packet packet
arrivals queue link departures
(waiting area) (server)

Network Layer: Data Plane 4-26


Scheduling policies: priority
priority scheduling: send
high priority queue
(waiting area)
highest priority arrivals departures
queued packet
▪ multiple classes, with classify link
different priorities low priority queue
(server)
(waiting area)
• class may depend on
marking or other 2
5
header info, e.g. IP arrivals
1 3 4

source/dest, port
numbers, etc. packet
in 1 3 2 4 5
• real world example? service

departures
1 3 2 4 5

Network Layer: Data Plane 4-27


Scheduling policies: still more
Round Robin (RR) scheduling:
▪ multiple classes
▪ cyclically scan class queues, sending one complete
packet from each class (if available)
▪ real world example?
2
1 3 4 5
arrivals

packet
in 1 3 2 4 5
service

departures
1 3 3 4 5

Network Layer: Data Plane 4-28


Scheduling policies: still more
Weighted Fair Queuing (WFQ):
▪ generalized Round Robin
▪ each class gets weighted amount of service in
each cycle
▪ real-world example?

Network Layer: Data Plane 4-29


Chapter 4: outline
4.1 Overview of Network 4.4 Generalized Forward and
layer SDN
• data plane • match
• control plane • action
4.2 What’s inside a router • OpenFlow examples
4.3 IP: Internet Protocol of match-plus-action in
• datagram format action
• fragmentation
• IPv4 addressing
• network address
translation
• IPv6

Network Layer: Data Plane 4-30


The Internet network layer
host, router network layer functions:

transport layer: TCP, UDP

routing protocols IP protocol


• path selection • addressing conventions
• RIP, OSPF, BGP • datagram format
network • packet handling conventions
layer forwarding
table
ICMP protocol
• error reporting
• router
“signaling”
link layer

physical layer

Network Layer: Data Plane 4-31


IP datagram format
IP protocol version
number 32 bits total datagram
header length head. type of length (bytes)
(bytes) ver length
len service for
“type” of data fragment fragmentation/
16-bit identifier flgs
offset reassembly
max number time to upper header
remaining hops live layer checksum
(decremented at
each router) 32 bit source IP address

upper layer protocol 32 bit destination IP address


to deliver payload to e.g. timestamp,
options (if any)
record route
how much overhead? data taken, specify
(variable length, list of routers
❖ 20 bytes of TCP
typically a TCP to visit.
❖ 20 bytes of IP
or UDP segment)
❖ = 40 bytes + app
layer overhead

Network Layer: Data Plane 4-32


IP fragmentation, reassembly
▪ network links have MTU
(max.transfer size) -
largest possible link-level fragmentation:
frame


in: one large datagram
• different link types, out: 3 smaller datagrams
different MTUs
▪ large IP datagram divided
(“fragmented”) within net reassembly
• one datagram becomes
several datagrams
• “reassembled” only at …
final destination
• IP header bits used to
identify, order related
fragments
Network Layer: Data Plane 4-33
IP fragmentation, reassembly
length ID fragflag offset
example: =4000 =x =0 =0
❖ 4000 byte datagram
one large datagram becomes
❖ MTU = 1500 bytes several smaller datagrams

1480 bytes in length ID fragflag offset


data field =1500 =x =1 =0

offset = length ID fragflag offset


1480/8 =1500 =x =1 =185

length ID fragflag offset


=1040 =x =0 =370

Network Layer: Data Plane 4-34


Chapter 4: outline
4.1 Overview of Network 4.4 Generalized Forward and
layer SDN
• data plane • match
• control plane • action
4.2 What’s inside a router • OpenFlow examples
4.3 IP: Internet Protocol of match-plus-action in
• datagram format action
• fragmentation
• IPv4 addressing
• network address
translation
• IPv6

Network Layer: Data Plane 4-35


IP addressing: introduction
223.1.1.1
▪ IP address: 32-bit
identifier for host, router
223.1.2.1

interface 223.1.1.2
223.1.1.4 223.1.2.9
▪ interface: connection
between host/router and 223.1.3.27
physical link 223.1.1.3
223.1.2.2
• router’s typically have
multiple interfaces
• host typically has one or
two interfaces (e.g., wired 223.1.3.1 223.1.3.2

Ethernet, wireless 802.11)


▪ IP addresses associated
with each interface 223.1.1.1 = 11011111 00000001 00000001 00000001

223 1 1 1

Network Layer: Data Plane 4-36


IP addressing: introduction
223.1.1.1
Q: how are interfaces
actually connected?
223.1.2.1

A: we’ll learn about that 223.1.1.2


223.1.1.4 223.1.2.9

in chapter 5, 6.
223.1.3.27
223.1.1.3
223.1.2.2

A: wired Ethernet interfaces


connected by Ethernet switches
223.1.3.1 223.1.3.2

For now: don’t need to worry


about how one interface is
connected to another (with no
A: wireless WiFi interfaces
intervening router)
connected by WiFi base station

Network Layer: Data Plane 4-37


Subnets
▪ IP address: 223.1.1.1
• subnet part - high order
bits 223.1.1.2 223.1.2.1
223.1.1.4 223.1.2.9
• host part - low order
bits 223.1.2.2
▪ what’s a subnet ? 223.1.1.3 223.1.3.27

• device interfaces with subnet


same subnet part of IP
address 223.1.3.1 223.1.3.2

• can physically reach


each other without
intervening router network consisting of 3 subnets

Network Layer: Data Plane 4-38


Subnets
223.1.1.0/24
223.1.2.0/24
recipe 223.1.1.1

▪ to determine the 223.1.1.2 223.1.2.1


subnets, detach each 223.1.1.4 223.1.2.9

interface from its host 223.1.2.2


or router, creating 223.1.1.3 223.1.3.27

islands of isolated subnet


networks
▪ each isolated network 223.1.3.1 223.1.3.2

is called a subnet
223.1.3.0/24

subnet mask: /24


Network Layer: Data Plane 4-39
Subnets 223.1.1.2

how many? 223.1.1.1 223.1.1.4

223.1.1.3

223.1.9.2 223.1.7.0

223.1.9.1 223.1.7.1
223.1.8.1 223.1.8.0

223.1.2.6 223.1.3.27

223.1.2.1 223.1.2.2 223.1.3.1 223.1.3.2

Network Layer: Data Plane 4-40


IP addressing: CIDR
CIDR: Classless InterDomain Routing
• subnet portion of address of arbitrary length
• address format: a.b.c.d/x, where x is # bits in
subnet portion of address

subnet host
part part
11001000 00010111 00010000 00000000
200.23.16.0/23

Network Layer: Data Plane 4-41


IP addresses: how to get one?
Q: How does a host get IP address?

▪ hard-coded by system admin in a file


• Windows: control-panel->network->configuration-
>tcp/ip->properties
• UNIX: /etc/rc.config
▪ DHCP: Dynamic Host Configuration Protocol:
dynamically get address from as server
• “plug-and-play”

Network Layer: Data Plane 4-42


DHCP: Dynamic Host Configuration Protocol
goal: allow host to dynamically obtain its IP address from network
server when it joins network
• can renew its lease on address in use
• allows reuse of addresses (only hold address while
connected/“on”)
• support for mobile users who want to join network (more
shortly)
DHCP overview:
• host broadcasts “DHCP discover” msg [optional]
• DHCP server responds with “DHCP offer” msg [optional]
• host requests IP address: “DHCP request” msg
• DHCP server sends address: “DHCP ack” msg

Network Layer: Data Plane 4-43


DHCP client-server scenario

DHCP
223.1.1.0/24
server
223.1.1.1 223.1.2.1

223.1.1.2 arriving DHCP


223.1.1.4 223.1.2.9
client needs
address in this
223.1.3.27
223.1.2.2 network
223.1.1.3

223.1.2.0/24

223.1.3.1 223.1.3.2

223.1.3.0/24

Network Layer: Data Plane 4-44


DHCP client-server scenario
DHCP server: 223.1.2.5 DHCP discover arriving
client
src : 0.0.0.0, 68
Broadcast: is there a
dest.: 255.255.255.255,67
DHCPyiaddr:
server0.0.0.0
out there?
transaction ID: 654

DHCP offer
src: 223.1.2.5, 67
Broadcast: I’m a DHCP
dest: 255.255.255.255, 68
server! Here’s an IP
yiaddrr: 223.1.2.4
address youID:can
transaction 654 use
lifetime: 3600 secs
DHCP request
src: 0.0.0.0, 68
Broadcast: OK. I’ll take
dest:: 255.255.255.255, 67
yiaddrr: 223.1.2.4
that IP address!
transaction ID: 655
lifetime: 3600 secs

DHCP ACK
src: 223.1.2.5, 67
Broadcast: OK. You’ve
dest: 255.255.255.255, 68
yiaddrr: 223.1.2.4
got that IPID:
transaction address!
655
lifetime: 3600 secs

Network Layer: Data Plane 4-45


DHCP: more than IP addresses
DHCP can return more than just allocated IP
address on subnet:
• address of first-hop router for client
• name and IP address of DNS sever
• network mask (indicating network versus host portion
of address)

Network Layer: Data Plane 4-46


DHCP: example
DHCP DHCP ▪ connecting laptop needs
DHCP UDP its IP address, addr of
IP
first-hop router, addr of
DHCP

DHCP Eth
Phy DNS server: use DHCP
▪ DHCP request encapsulated
DHCP

in UDP, encapsulated in IP,


DHCP DHCP 168.1.1.1 encapsulated in 802.1
DHCP UDP Ethernet
IP
▪ Ethernet frame broadcast
DHCP

DHCP Eth router with DHCP


Phy server built into (dest: FFFFFFFFFFFF) on LAN,
router received at router running
DHCP server
▪ Ethernet demuxed to IP
demuxed, UDP demuxed to
DHCP

Network Layer: Data Plane 4-47


DHCP: example
DHCP DHCP ▪ DCP server formulates
DHCP UDP DHCP ACK containing
DHCP IP client’s IP address, IP
DHCP Eth address of first-hop
Phy router for client, name &
IP address of DNS server
▪ encapsulation of DHCP
DHCP DHCP server, frame forwarded
DHCP UDP to client, demuxing up to
DHCP IP DHCP at client
DHCP Eth router with DHCP
DHCP
Phy server built into ▪ client now knows its IP
router address, name and IP
address of DSN server, IP
address of its first-hop
router

Network Layer: Data Plane 4-48


DHCP: Wireshark Message type: Boot Reply (2)
reply
output (home LAN) Hardware type: Ethernet
Hardware address length: 6
Hops: 0
Transaction ID: 0x6b3a11b7
Seconds elapsed: 0
Message type: Boot Request (1) Bootp flags: 0x0000 (Unicast)
Hardware type: Ethernet Client IP address: 192.168.1.101 (192.168.1.101)
Hardware address length: 6 Your (client) IP address: 0.0.0.0 (0.0.0.0)
Hops: 0
Transaction ID: 0x6b3a11b7
request Next server IP address: 192.168.1.1 (192.168.1.1)
Relay agent IP address: 0.0.0.0 (0.0.0.0)
Seconds elapsed: 0 Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a)
Bootp flags: 0x0000 (Unicast) Server host name not given
Client IP address: 0.0.0.0 (0.0.0.0) Boot file name not given
Your (client) IP address: 0.0.0.0 (0.0.0.0) Magic cookie: (OK)
Next server IP address: 0.0.0.0 (0.0.0.0) Option: (t=53,l=1) DHCP Message Type = DHCP ACK
Relay agent IP address: 0.0.0.0 (0.0.0.0) Option: (t=54,l=4) Server Identifier = 192.168.1.1
Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a) Option: (t=1,l=4) Subnet Mask = 255.255.255.0
Server host name not given Option: (t=3,l=4) Router = 192.168.1.1
Boot file name not given Option: (6) Domain Name Server
Magic cookie: (OK) Length: 12; Value: 445747E2445749F244574092;
Option: (t=53,l=1) DHCP Message Type = DHCP Request IP Address: 68.87.71.226;
Option: (61) Client identifier IP Address: 68.87.73.242;
Length: 7; Value: 010016D323688A; IP Address: 68.87.64.146
Hardware type: Ethernet Option: (t=15,l=20) Domain Name = "hsd1.ma.comcast.net."
Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a)
Option: (t=50,l=4) Requested IP Address = 192.168.1.101
Option: (t=12,l=5) Host Name = "nomad"
Option: (55) Parameter Request List
Length: 11; Value: 010F03062C2E2F1F21F92B
1 = Subnet Mask; 15 = Domain Name
3 = Router; 6 = Domain Name Server
44 = NetBIOS over TCP/IP Name Server
……

Network Layer: Data Plane 4-49


IP addresses: how to get one?
Q: how does network get subnet part of IP addr?
A: gets allocated portion of its provider ISP’s address
space

ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20

Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23


Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23
Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23
... ….. …. ….
Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23

Network Layer: Data Plane 4-50


Hierarchical addressing: route aggregation
hierarchical addressing allows efficient advertisement of routing
information:

Organization 0
200.23.16.0/23
Organization 1
“Send me anything
200.23.18.0/23 with addresses
Organization 2 beginning
200.23.20.0/23 . Fly-By-Night-ISP 200.23.16.0/20”
.
. . Internet
.
Organization 7 .
200.23.30.0/23
“Send me anything
ISPs-R-Us
with addresses
beginning
199.31.0.0/16”

Network Layer: Data Plane 4-51


Hierarchical addressing: more specific routes

ISPs-R-Us has a more specific route to Organization 1

Organization 0
200.23.16.0/23

“Send me anything
with addresses
Organization 2 beginning
200.23.20.0/23 . Fly-By-Night-ISP 200.23.16.0/20”
.
. . Internet
.
Organization 7 .
200.23.30.0/23
“Send me anything
ISPs-R-Us
with addresses
Organization 1 beginning 199.31.0.0/16
or 200.23.18.0/23”
200.23.18.0/23

Network Layer: Data Plane 4-52


IP addressing: the last word...

Q: how does an ISP get block of addresses?


A: ICANN: Internet Corporation for Assigned
Names and Numbers http://www.icann.org/
• allocates addresses
• manages DNS
• assigns domain names, resolves disputes

Network Layer: Data Plane 4-53


NAT: network address translation
rest of local network
Internet (e.g., home network)
10.0.0/24 10.0.0.1

10.0.0.4
10.0.0.2
138.76.29.7

10.0.0.3

all datagrams leaving local datagrams with source or


network have same single destination in this network
source NAT IP address: have 10.0.0/24 address for
138.76.29.7,different source source, destination (as usual)
port numbers
Network Layer: Data Plane 4-54
NAT: network address translation
motivation: local network uses just one IP address as far
as outside world is concerned:
▪ range of addresses not needed from ISP: just one
IP address for all devices
▪ can change addresses of devices in local network
without notifying outside world
▪ can change ISP without changing addresses of
devices in local network
▪ devices inside local net not explicitly addressable,
visible by outside world (a security plus)

Network Layer: Data Plane 4-55


NAT: network address translation
implementation: NAT router must:

▪ outgoing datagrams: replace (source IP address, port #) of


every outgoing datagram to (NAT IP address, new port #)
. . . remote clients/servers will respond using (NAT IP
address, new port #) as destination addr

▪ remember (in NAT translation table) every (source IP address,


port #) to (NAT IP address, new port #) translation pair

▪ incoming datagrams: replace (NAT IP address, new port #) in


dest fields of every incoming datagram with corresponding
(source IP address, port #) stored in NAT table

Network Layer: Data Plane 4-56


NAT: network address translation
NAT translation table 1: host 10.0.0.1
2: NAT router WAN side addr LAN side addr
changes datagram sends datagram to
source addr from 138.76.29.7, 5001 10.0.0.1, 3345 128.119.40.186, 80
10.0.0.1, 3345 to …… ……
138.76.29.7, 5001,
updates table S: 10.0.0.1, 3345
D: 128.119.40.186, 80
10.0.0.1
1
S: 138.76.29.7, 5001
2 D: 128.119.40.186, 80 10.0.0.4
10.0.0.2
138.76.29.7 S: 128.119.40.186, 80
D: 10.0.0.1, 3345
4
S: 128.119.40.186, 80
D: 138.76.29.7, 5001 3 10.0.0.3
4: NAT router
3: reply arrives changes datagram
dest. address: dest addr from
138.76.29.7, 5001 138.76.29.7, 5001 to 10.0.0.1, 3345

* Check out the online interactive exercises for more


examples: http://gaia.cs.umass.edu/kurose_ross/interactive/ Network Layer: Data Plane 4-57
NAT: network address translation
▪ 16-bit port-number field:
• 60,000 simultaneous connections with a single
LAN-side address!
▪ NAT is controversial:
• routers should only process up to layer 3
• address shortage should be solved by IPv6
• violates end-to-end argument
• NAT possibility must be taken into account by app
designers, e.g., P2P applications
• NAT traversal: what if client wants to connect
to server behind NAT?
Network Layer: Data Plane 4-58
Chapter 4: outline
4.1 Overview of Network 4.4 Generalized Forward and
layer SDN
• data plane • match
• control plane • action
4.2 What’s inside a router • OpenFlow examples
4.3 IP: Internet Protocol of match-plus-action in
• datagram format action
• fragmentation
• IPv4 addressing
• network address
translation
• IPv6

Network Layer: Data Plane 4-59


IPv6: motivation
▪ initial motivation: 32-bit address space soon to be
completely allocated.
▪ additional motivation:
• header format helps speed processing/forwarding
• header changes to facilitate QoS

IPv6 datagram format:


• fixed-length 40 byte header
• no fragmentation allowed

Network Layer: Data Plane 4-60


IPv6 datagram format
priority: identify priority among datagrams in flow
flow Label: identify datagrams in same “flow.”
(concept of“flow” not well defined).
next header: identify upper layer protocol for data
ver pri flow label
payload len next hdr hop limit
source address
(128 bits)
destination address
(128 bits)

data

32 bits
Network Layer: Data Plane 4-61
Other changes from IPv4
▪ checksum: removed entirely to reduce processing
time at each hop
▪ options: allowed, but outside of header, indicated
by “Next Header” field
▪ ICMPv6: new version of ICMP
• additional message types, e.g. “Packet Too Big”
• multicast group management functions

Network Layer: Data Plane 4-62


Transition from IPv4 to IPv6
▪ not all routers can be upgraded simultaneously
• no “flag days”
• how will network operate with mixed IPv4 and
IPv6 routers?
▪ tunneling: IPv6 datagram carried as payload in IPv4
datagram among IPv4 routers
IPv4 header fields IPv6 header fields
IPv4 payload
IPv4 source, dest addr IPv6 source dest addr
UDP/TCP payload

IPv6 datagram
IPv4 datagram
Network Layer: Data Plane 4-63
Tunneling
A B IPv4 tunnel E F
connecting IPv6 routers
logical view:
IPv6 IPv6 IPv6 IPv6

A B C D E F
physical view:
IPv6 IPv6 IPv4 IPv4 IPv6 IPv6

Network Layer: Data Plane 4-64


Tunneling
A B IPv4 tunnel E F
connecting IPv6 routers
logical view:
IPv6 IPv6 IPv6 IPv6

A B C D E F
physical view:
IPv6 IPv6 IPv4 IPv4 IPv6 IPv6

flow: X src:B src:B flow: X


src: A dest: E src: A
dest: F
dest: E
dest: F
Flow: X Flow: X
Src: A Src: A
data Dest: F Dest: F data

data data

A-to-B: E-to-F:
IPv6 B-to-C: B-to-C: IPv6
IPv6 inside IPv6 inside
IPv4 IPv4 Network Layer: Data Plane 4-65
IPv6: adoption
▪ Google: 8% of clients access services via IPv6
▪ NIST: 1/3 of all US government domains are IPv6
capable

▪ Long (long!) time for deployment, use


•20 years and counting!
•think of application-level changes in last 20 years: WWW,
Facebook, streaming media, Skype, …
•Why?

Network Layer: Data Plane 4-66


Chapter 4: outline
4.1 Overview of Network 4.4 Generalized Forward and
layer SDN
• data plane • match
• control plane • action
4.2 What’s inside a router • OpenFlow examples
4.3 IP: Internet Protocol of match-plus-action in
• datagram format action
• fragmentation
• IPv4 addressing
• network address
translation
• IPv6

Network Layer: Data Plane 4-67


Generalized Forwarding and SDN
Each router contains a flow table that is computed and
distributed by a logically centralized routing controller

logically-centralized routing controller

control plane

data plane
local flow table
headers counters actions

1
0100 1101

3 2
values in arriving
packet’s header
Network Layer: Data Plane 4-68
OpenFlow data plane abstraction
▪ flow: defined by header fields
▪ generalized forwarding: simple packet-handling rules
• Pattern: match values in packet header fields
• Actions: for matched packet: drop, forward, modify, matched
packet or send matched packet to controller
• Priority: disambiguate overlapping patterns
• Counters: #bytes and #packets

Flow table in a router (computed and distributed by


controller) define router’s match+action rules
Network Layer: Data Plane 4-69
OpenFlow data plane abstraction
▪ flow: defined by header fields
▪ generalized forwarding: simple packet-handling rules
• Pattern: match values in packet header fields
• Actions: for matched packet: drop, forward, modify, matched
packet or send matched packet to controller
• Priority: disambiguate overlapping patterns
• Counters: #bytes and #packets

* : wildcard
1. src=1.2.*.*, dest=3.4.5.* → drop
2. src = *.*.*.*, dest=3.4.*.* → forward(2)
3. src=10.1.2.3, dest=*.*.*.* → send to controller
OpenFlow: Flow Table Entries

Rule Action Stats

Packet + byte counters


1. Forward packet to port(s)
2. Encapsulate and forward to controller
3. Drop packet
4. Send to normal processing pipeline
5. Modify Fields

Switch VLAN MAC MAC Eth IP IP IP TCP TCP


Port ID src dst type Src Dst Prot sport dport

Link layer Network layer Transport layer


Examples
Destination-based forwarding:
Switch MAC MAC Eth VLAN IP IP IP TCP TCP
Action
Port src dst type ID Src Dst Prot sport dport
* * * * * * 51.6.0.8 * * * port6
IP datagrams destined to IP address 51.6.0.8 should
be forwarded to router output port 6
Firewall:
Switch MAC MAC Eth VLAN IP IP IP TCP TCP
Forward
Port src dst type ID Src Dst Prot sport dport
* * * * * * * * * 22 drop
do not forward (block) all datagrams destined to TCP port 22

Switch MAC MAC Eth VLAN IP IP IP TCP TCP


Forward
Port src dst type ID Src Dst Prot sport dport
* * * * * 128.119.1.1
* * * * drop
do not forward (block) all datagrams sent by host 128.119.1.1
Examples
Destination-based layer 2 (switch) forwarding:
Switch MAC MAC Eth VLAN IP IP IP TCP TCP
Action
Port src dst type ID Src Dst Prot sport dport
22:A7:23:
* 11:E1:02 * * * * * * * * port3
layer 2 frames from MAC address 22:A7:23:11:E1:02
should be forwarded to output port 6

Network Layer: Data Plane 4-73


OpenFlow abstraction
▪ match+action: unifies different kinds of devices
▪ Router ▪ Firewall
• match: longest • match: IP addresses
destination IP prefix and TCP/UDP port
• action: forward out numbers
a link • action: permit or
▪ Switch deny
• match: destination ▪ NAT
MAC address • match: IP address
• action: forward or and port
flood • action: rewrite
address and port

Network Layer: Data Plane 4-74


OpenFlow example Example: datagrams from
hosts h5 and h6 should
be sent to h3 or h4, via s1
match action and from there to s2
IP Src = 10.3.*.* Host h6
forward(3)
IP Dst = 10.2.*.* 10.3.0.6
1 s3 controller
2

3 4
Host h5
10.3.0.5

1 s1 1 s2
2 Host h4
4 2 4
Host h1 10.2.0.4
3 3
10.1.0.1
Host h2
10.1.0.2 match action
match action Host h3
ingress port = 2
10.2.0.3 forward(3)
ingress port = 1 IP Dst = 10.2.0.3
IP Src = 10.3.*.* forward(4) ingress port = 2
forward(4)
IP Dst = 10.2.*.* IP Dst = 10.2.0.4
Chapter 4: done!
4.1 Overview of Network 4.4 Generalized Forward and
layer: data plane and SDN
control plane • match plus action
4.2 What’s inside a router • OpenFlow example
4.3 IP: Internet Protocol
• datagram format
• fragmentation Question: how do forwarding tables
• IPv4 addressing (destination-based forwarding) or
• NAT flow tables (generalized
• IPv6 forwarding) computed?
Answer: by the control plane (next
chapter)

Network Layer: Data Plane 4-76


Chapter 5
Network Layer:
The Control Plane

A note on the use of these Powerpoint slides:


We’re making these slides freely available to all (faculty, students, readers).
They’re in PowerPoint form so you see the animations; and can add, modify,
and delete slides (including this one) and slide content to suit your needs.
They obviously represent a lot of work on our part. In return for use, we only
ask the following: Computer
▪ If you use these slides (e.g., in a class) that you mention their source
(after all, we’d like people to use our book!)
Networking: A Top
▪ If you post any slides on a www site, that you note that they are adapted
from (or perhaps identical to) our slides, and note our copyright of this Down Approach
material.
7th Edition, Global Edition
Thanks and enjoy! JFK/KWR
Jim Kurose, Keith Ross
All material copyright 1996-2016 Pearson
J.F Kurose and K.W. Ross, All Rights Reserved April 2016
Network Layer: Control Plane 5-1
Chapter 5: network layer control plane
chapter goals: understand principles behind network
control plane
▪ traditional routing algorithms
▪ SDN controlllers
▪ Internet Control Message Protocol
▪ network management

and their instantiation, implementation in the Internet:


▪ OSPF, BGP, OpenFlow, ODL and ONOS
controllers, ICMP, SNMP

Network Layer: Control Plane 5-2


Chapter 5: outline
5.1 introduction 5.5 The SDN control plane
5.2 routing protocols 5.6 ICMP: The Internet
▪ link state Control Message
▪ distance vector Protocol
5.3 intra-AS routing in the 5.7 Network management
Internet: OSPF and SNMP
5.4 routing among the ISPs:
BGP

Network Layer: Control Plane 5-3


Network-layer functions
Recall: two network-layer functions:
▪ forwarding: move packets
from router’s input to data plane
appropriate router output
▪ routing: determine route
taken by packets from source control plane
to destination

Two approaches to structuring network control plane:


▪ per-router control (traditional)
▪ logically centralized control (software defined networking)

Network Layer: Control Plane 5-4


Per-router control plane
Individual routing algorithm components in each and every
router interact with each other in control plane to compute
forwarding tables

Routing
Algorithm
control
plane

data
plane

Network Layer: Control Plane 5-5


Logically centralized control plane
A distinct (typically remote) controller interacts with local
control agents (CAs) in routers to compute forwarding tables

Remote Controller

control
plane

data
plane

CA
CA CA CA CA

Network Layer: Control Plane 5-6


Chapter 5: outline
5.1 introduction 5.5 The SDN control plane
5.2 routing protocols 5.6 ICMP: The Internet
▪ link state Control Message
▪ distance vector Protocol
5.3 intra-AS routing in the 5.7 Network management
Internet: OSPF and SNMP
5.4 routing among the ISPs:
BGP

Network Layer: Control Plane 5-7


Routing protocols

Routing protocol goal: determine “good” paths


(equivalently, routes), from sending hosts to
receiving host, through network of routers
▪ path: sequence of routers packets will traverse
in going from given initial source host to given
final destination host
▪ “good”: least “cost”, “fastest”, “least
congested”
▪ routing: a “top-10” networking challenge!

Network Layer: Control Plane 5-8


Graph abstraction of the network
5

v 3 w
2 5
u 2 1 z
3
1 2
x 1
y
graph: G = (N,E)

N = set of routers = { u, v, w, x, y, z }

E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }

aside: graph abstraction is useful in other network contexts, e.g.,


P2P, where N is set of peers and E is set of TCP connections

Network Layer: Control Plane 5-9


Graph abstraction: costs
5
c(x,x’) = cost of link (x,x’)
3 e.g., c(w,z) = 5
v w 5
2
u cost could always be 1, or
2
3
1 z inversely related to bandwidth,
1 2 or inversely related to
x 1
y
congestion

cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)

key question: what is the least-cost path between u and z ?


routing algorithm: algorithm that finds that least cost path

Network Layer: Control Plane 5-10


Routing algorithm classification
Q: global or decentralized Q: static or dynamic?
information?
static:
global:
▪ routes change slowly over
▪ all routers have complete time
topology, link cost info
dynamic:
▪ “link state” algorithms
▪ routes change more
decentralized: quickly
▪ router knows physically- • periodic update
connected neighbors, link
costs to neighbors • in response to link
cost changes
▪ iterative process of
computation, exchange of
info with neighbors
▪ “distance vector” algorithms
Network Layer: Control Plane 5-11
Chapter 5: outline
5.1 introduction 5.5 The SDN control plane
5.2 routing protocols 5.6 ICMP: The Internet
▪ link state Control Message
▪ distance vector Protocol
5.3 intra-AS routing in the 5.7 Network management
Internet: OSPF and SNMP
5.4 routing among the ISPs:
BGP

Network Layer: Control Plane 5-12


A link-state routing algorithm
Dijkstra’s algorithm notation:
▪ net topology, link costs ▪ c(x,y): link cost from
known to all nodes node x to y; = ∞ if not
• accomplished via “link state direct neighbors
broadcast” ▪ D(v): current value of
• all nodes have same info cost of path from source
▪ computes least cost paths to dest. v
from one node (‘source”) ▪ p(v): predecessor node
to all other nodes along path from source to
• gives forwarding table for v
that node ▪ N': set of nodes whose
▪ iterative: after k least cost path definitively
iterations, know least cost known
path to k dest.’s
Network Layer: Control Plane 5-13
Dijsktra’s algorithm
1 Initialization:
2 N' = {u}
3 for all nodes v
4 if v adjacent to u
5 then D(v) = c(u,v)
6 else D(v) = ∞
7
8 Loop
9 find w not in N' such that D(w) is a minimum
10 add w to N'
11 update D(v) for all v adjacent to w and not in N' :
12 D(v) = min( D(v), D(w) + c(w,v) )
13 /* new cost to v is either old cost to v or known
14 shortest path cost to w plus cost from w to v */
15 until all nodes in N'

Network Layer: Control Plane 5-14


Dijkstra’s algorithm: example
D(v) D(w) D(x) D(y) D(z)
Step N' p(v) p(w) p(x) p(y) p(z)
0 u 7,u 3,u 5,u ∞ ∞
1 uw 6,w 5,u 11,w ∞
2 uwx 6,w 11,w 14,x
3 uwxv 10,v 14,x
4 uwxvy 12,y
5 uwxvyz x
9

notes: 5
4
7
❖ construct shortest path tree by
8
tracing predecessor nodes
ties can exist (can be broken 3 w z
❖ u y
arbitrarily) 2
3
7 4
v
Network Layer: Control Plane 5-15
Dijkstra’s algorithm: another example
Step N' D(v),p(v) D(w),p(w) D(x),p(x) D(y),p(y) D(z),p(z)
0 u 2,u 5,u 1,u ∞ ∞
1 ux 2,u 4,x 2,x ∞
2 uxy 2,u 3,y 4,y
3 uxyv 3,y 4,y
4 uxyvw 4,y
5 uxyvwz
5

v 3 w
2 5
u 2 1 z
3
1 2
x 1
y

* Check out the online interactive exercises for more


examples: http://gaia.cs.umass.edu/kurose_ross/interactive/ Network Layer: Control Plane 5-16
Dijkstra’s algorithm: example (2)
resulting shortest-path tree from u:

v w
u z
x y

resulting forwarding table in u:


destination link
v (u,v)
x (u,x)
y (u,x)
w (u,x)
z (u,x)
Network Layer: Control Plane 5-17
Dijkstra’s algorithm, discussion
algorithm complexity: n nodes
▪ each iteration: need to check all nodes, w, not in N
▪ n(n+1)/2 comparisons: O(n2)
▪ more efficient implementations possible: O(nlogn)
oscillations possible:
▪ e.g., support link cost equals amount of carried traffic:

1
A 1+e A A A
2+e 0 0 2+e 2+e 0
D 0 0 B D 1+e 1 B D B D 1+e 1 B
0 0
0 e 0 0
C 0 1 1+e 0
1 C C C
1
e
given these costs, given these costs, given these costs,
initially find new routing…. find new routing…. find new routing….
resulting in new costs resulting in new costs resulting in new costs
Network Layer: Control Plane 5-18
Chapter 5: outline
5.1 introduction 5.5 The SDN control plane
5.2 routing protocols 5.6 ICMP: The Internet
▪ link state Control Message
▪ distance vector Protocol
5.3 intra-AS routing in the 5.7 Network management
Internet: OSPF and SNMP
5.4 routing among the ISPs:
BGP

Network Layer: Control Plane 5-19


Distance vector algorithm
Bellman-Ford equation (dynamic programming)

let
dx(y) := cost of least-cost path from x to y
then
dx(y) = min
v
{c(x,v) + dv (y) }

cost from neighbor v to destination y


cost to neighbor v

min taken over all neighbors v of x


Network Layer: Control Plane 5-20
Bellman-Ford example
5
3
clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3
v w 5
2
u 2 1 z B-F equation says:
3
1 2 du(z) = min { c(u,v) + dv(z),
x y
1 c(u,x) + dx(z),
c(u,w) + dw(z) }
= min {2 + 5,
1 + 3,
5 + 3} = 4
node achieving minimum is next
hop in shortest path, used in forwarding table

Network Layer: Control Plane 5-21


Distance vector algorithm
▪ Dx(y) = estimate of least cost from x to y
• x maintains distance vector Dx = [Dx(y): y є N ]
▪ node x:
• knows cost to each neighbor v: c(x,v)
• maintains its neighbors’ distance vectors. For
each neighbor v, x maintains
Dv = [Dv(y): y є N ]

Network Layer: Control Plane 5-22


Distance vector algorithm
key idea:
▪ from time-to-time, each node sends its own
distance vector estimate to neighbors
▪ when x receives new DV estimate from neighbor,
it updates its own DV using B-F equation:
Dx(y) ← minv{c(x,v) + Dv(y)} for each node y ∊ N

❖ under minor, natural conditions, the estimate Dx(y)


converge to the actual least cost dx(y)

Network Layer: Control Plane 5-23


Distance vector algorithm
iterative, asynchronous: each node:
each local iteration
caused by:
▪ local link cost change wait for (change in local link
cost or msg from neighbor)
▪ DV update message from
neighbor
distributed: recompute estimates
▪ each node notifies
neighbors only when its
DV changes if DV to any dest has
• neighbors then notify their changed, notify neighbors
neighbors if necessary

Network Layer: Control Plane 5-24


Dx(z) = min{c(x,y) +
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2 Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
node x cost to cost to
table x y z x y z
x 0 2 7 x 0 2 3

from
from

y ∞∞ ∞ y 2 0 1
z ∞∞ ∞ z 7 1 0

node y cost to
table x y z y
2 1
x ∞ ∞ ∞
x z
from

y 2 0 1 7
z ∞∞ ∞

node z cost to
table x y z
x ∞∞ ∞
from

y ∞∞ ∞
z 7 1 0
time
Network Layer: Control Plane 5-25
Dx(z) = min{c(x,y) +
Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)}
= min{2+0 , 7+1} = 2 Dy(z), c(x,z) + Dz(z)}
= min{2+1 , 7+0} = 3
node x cost to cost to cost to
table x y z x y z x y z
x 0 2 7 x 0 2 3 x 0 2 3

from
from

y ∞∞ ∞ y 2 0 1

from
y 2 0 1
z ∞∞ ∞ z 7 1 0 z 3 1 0
node y cost to cost to cost to
table x y z x y z x y z y
2 1
x ∞ ∞ ∞ x 0 2 7 x 0 2 3 x z
from

from

y 2 0 1 y 2 0 1 7

from
y 2 0 1
z ∞∞ ∞ z 7 1 0 z 3 1 0

node z cost to cost to cost to


table x y z x y z x y z

x ∞∞ ∞ x 0 2 7 x 0 2 3
from

from

y 2 0 1 y 2 0 1
from

y ∞∞ ∞
z 7 1 0 z 3 1 0 z 3 1 0
time
Network Layer: Control Plane 5-26
Distance vector: link cost changes
link cost changes: 1
❖ node detects local link cost change 4
y
1
❖ updates routing info, recalculates x z
distance vector 50
❖ if DV changes, notify neighbors

“good t0 : y detects link-cost change, updates its DV, informs its


news neighbors.
travels t1 : z receives update from y, updates its table, computes new
fast” least cost to x , sends its neighbors its DV.

t2 : y receives z’s update, updates its distance table. y’s least costs
do not change, so y does not send a message to z.

* Check out the online interactive exercises for more


examples: http://gaia.cs.umass.edu/kurose_ross/interactive/ Network Layer: Control Plane 5-27
Distance vector: link cost changes
link cost changes: 60
❖ node detects local link cost change 4
y
1
❖ bad news travels slow - “count to x z
infinity” problem! 50
❖ 44 iterations before algorithm
stabilizes: see text
poisoned reverse:
❖ If Z routes through Y to get to X :
▪ Z tells Y its (Z’s) distance to X is infinite (so Y won’t route
to X via Z)
❖ will this completely solve count to infinity problem?

Network Layer: Control Plane 5-28


Example:

Da(b) ← minc{cost(a,c) + Dc(b)} for each node b ∊ N

Network Layer 4-29


Comparison of LS and DV algorithms
message complexity robustness: what happens if
▪ LS: with n nodes, E links, O(nE) router malfunctions?
msgs sent LS:
▪ DV: exchange between neighbors • node can advertise incorrect
only link cost
• convergence time varies • each node computes only its
own table
speed of convergence DV:
▪ LS: O(n2)algorithm requires
O(nE) msgs • DV node can advertise
incorrect path cost
• may have oscillations
• each node’s table used by
▪ DV: convergence time varies others
• may be routing loops • error propagate thru
• count-to-infinity problem network

Network Layer: Control Plane 5-30


Chapter 5: outline
5.1 introduction 5.5 The SDN control plane
5.2 routing protocols 5.6 ICMP: The Internet
▪ link state Control Message
▪ distance vector Protocol
5.3 intra-AS routing in the 5.7 Network management
Internet: OSPF and SNMP
5.4 routing among the ISPs:
BGP

Network Layer: Control Plane 5-31


Making routing scalable
our routing study thus far - idealized
▪ all routers identical
▪ network “flat”
… not true in practice

scale: with billions of administrative autonomy


destinations: ▪ internet = network of
▪ can’t store all networks
destinations in routing ▪ each network admin may
tables! want to control routing in
▪ routing table exchange its own network
would swamp links!

Network Layer: Control Plane 5-32


Internet approach to scalable routing
aggregate routers into regions known as “autonomous
systems” (AS) (a.k.a. “domains”)

intra-AS routing inter-AS routing


▪ routing among hosts, routers ▪ routing among AS’es
in same AS (“network”) ▪ gateways perform inter-
▪ all routers in AS must run domain routing (as well
same intra-domain protocol as intra-domain routing)
▪ routers in different AS can run
different intra-domain routing
protocol
▪ gateway router: at “edge” of
its own AS, has link(s) to
router(s) in other AS’es
Network Layer: Control Plane 5-33
Interconnected ASes

3c
3a 2c
3b 2a
AS3 2b
1c AS2
1a 1b AS1
1d ▪ forwarding table
configured by both intra-
and inter-AS routing
Intra-AS Inter-AS algorithm
Routing
algorithm
Routing
algorithm • intra-AS routing
determine entries for
Forwarding
table
destinations within AS
• inter-AS & intra-AS
determine entries for
external destinations
Network Layer: Control Plane 5-34
Inter-AS tasks
▪ suppose router in AS1 AS1 must:
receives datagram 1. learn which dests are
destined outside of AS1: reachable through AS2,
• router should forward which through AS3
packet to gateway 2. propagate this
router, but which one? reachability info to all
routers in AS1
job of inter-AS routing!

3c
3a
3b
AS3 2c other
1c 2a networks
other 1a 2b
networks 1b AS2
AS1 1d

Network Layer: Control Plane 5-35


Intra-AS Routing
▪ also known as interior gateway protocols (IGP)
▪ most common intra-AS routing protocols:
• RIP: Routing Information Protocol
• OSPF: Open Shortest Path First (IS-IS protocol
essentially same as OSPF)
• IGRP: Interior Gateway Routing Protocol
(Cisco proprietary for decades, until 2016)

Network Layer: Control Plane 5-36


OSPF (Open Shortest Path First)
▪ “open”: publicly available
▪ uses link-state algorithm
• link state packet dissemination
• topology map at each node
• route computation using Dijkstra’s algorithm
▪ router floods OSPF link-state advertisements to all
other routers in entire AS
• carried in OSPF messages directly over IP (rather than
TCP or UDP
• link state: for each attached link
▪ IS-IS routing protocol: nearly identical to OSPF

Network Layer: Control Plane 5-37


OSPF “advanced” features
▪ security: all OSPF messages authenticated (to prevent
malicious intrusion)
▪ multiple same-cost paths allowed (only one path in
RIP)
▪ for each link, multiple cost metrics for different TOS
(e.g., satellite link cost set low for best effort ToS;
high for real-time ToS)
▪ integrated uni- and multi-cast support:
• Multicast OSPF (MOSPF) uses same topology data
base as OSPF
▪ hierarchical OSPF in large domains.

Network Layer: Control Plane 5-38


Hierarchical OSPF
boundary router
backbone router

backbone
area
border
routers

area 3

internal
routers
area 1
area 2

Network Layer: Control Plane 5-39


Hierarchical OSPF
▪ two-level hierarchy: local area, backbone.
• link-state advertisements only in area
• each nodes has detailed area topology; only know
direction (shortest path) to nets in other areas.
▪ area border routers: “summarize” distances to nets in
own area, advertise to other Area Border routers.
▪ backbone routers: run OSPF routing limited to
backbone.
▪ boundary routers: connect to other AS’es.

Network Layer: Control Plane 5-40


Chapter 5: outline
5.1 introduction 5.5 The SDN control plane
5.2 routing protocols 5.6 ICMP: The Internet
▪ link state Control Message
▪ distance vector Protocol
5.3 intra-AS routing in the 5.7 Network management
Internet: OSPF and SNMP
5.4 routing among the ISPs:
BGP

Network Layer: Control Plane 5-41


Internet inter-AS routing: BGP
▪ BGP (Border Gateway Protocol): the de facto
inter-domain routing protocol
• “glue that holds the Internet together”
▪ BGP provides each AS a means to:
• eBGP: obtain subnet reachability information from
neighboring ASes
• iBGP: propagate reachability information to all AS-
internal routers.
• determine “good” routes to other networks based on
reachability information and policy
▪ allows subnet to advertise its existence to rest of
Internet: “I am here”
Network Layer: Control Plane 5-42
eBGP, iBGP connections

2b

2a ∂
2c
1b 3b
2d
1a 1c ∂
3a 3c
AS 2
1d 3d

AS 1 eBGP connectivity AS 3
iBGP connectivity

1c gateway routers run both eBGP and iBGP protools

Network Layer: Control Plane 5-43


BGP basics
▪ BGP session: two BGP routers (“peers”) exchange BGP
messages over semi-permanent TCP connection:
• advertising paths to different destination network prefixes
(BGP is a “path vector” protocol)
▪ when AS3 gateway router 3a advertises path AS3,X to AS2
gateway router 2c:
• AS3 promises to AS2 it will forward datagrams towards X

AS 3 3b
AS 1 1b
3a 3c
1a 1c
AS 2 2b 3d X
1d
BGP advertisement:
2a 2c AS3, X

2d
Network Layer: Control Plane 5-44
Path attributes and BGP routes
▪ advertised prefix includes BGP attributes
• prefix + attributes = “route”
▪ two important attributes:
• AS-PATH: list of ASes through which prefix advertisement
has passed
• NEXT-HOP: indicates specific internal-AS router to next-
hop AS
▪ Policy-based routing:
• gateway receiving route advertisement uses import policy to
accept/decline path (e.g., never route through AS Y).
• AS policy also determines whether to advertise path to
other other neighboring ASes

Network Layer: Control Plane 5-45


BGP path advertisement
AS3 3b
AS1 1b
3a 3c
1a 1c
AS2 2b 3d X
1d AS3,X
AS2,AS3,X
2a 2c

2d

▪ AS2 router 2c receives path advertisement AS3,X (via eBGP) from AS3
router 3a
▪ Based on AS2 policy, AS2 router 2c accepts path AS3,X, propagates
(via iBGP) to all AS2 routers
▪ Based on AS2 policy, AS2 router 2a advertises (via eBGP) path AS2,
AS3, X to AS1 router 1c
Network Layer: Control Plane 5-46
BGP path advertisement
AS3 3b
AS1 1b
3a 3c
1a 1c
AS2 2b 3d X
1d AS3,X
AS2,AS3,X
2a 2c

2d

gateway router may learn about multiple paths to destination:


▪ AS1 gateway router 1c learns path AS2,AS3,X from 2a
▪ AS1 gateway router 1c learns path AS3,X from 3a
▪ Based on policy, AS1 gateway router 1c chooses path AS3,X, and
advertises path within AS1 via iBGP
Network Layer: Control Plane 5-47
BGP messages
▪ BGP messages exchanged between peers over TCP
connection
▪ BGP messages:
• OPEN: opens TCP connection to remote BGP peer and
authenticates sending BGP peer
• UPDATE: advertises new path (or withdraws old)
• KEEPALIVE: keeps connection alive in absence of
UPDATES; also ACKs OPEN request
• NOTIFICATION: reports errors in previous msg; also
used to close connection

Network Layer: Control Plane 5-48


BGP, OSPF, forwarding table entries
Q: how does router set forwarding table entry to distant prefix?

AS3 3b
AS1 1b
1
3a 3c
1a 2 1c
local link AS2 2b 3d X
interfaces 2 1d 1 AS3,X
at 1a, 1d AS2,AS3,X
2a 2c
physical link
2d

dest interface ▪ recall: 1a, 1b, 1c learn about dest X via iBGP
… … from 1c: “path to X goes through 1c”
X 1 ▪ 1d: OSPF intra-domain routing: to get to 1c,
… … forward over outgoing local interface 1

Network Layer: Control Plane 5-49


BGP, OSPF, forwarding table entries
Q: how does router set forwarding table entry to distant prefix?

AS3 3b
AS1 1b
1
3a 3c
1a 2 1c
AS2 2b 3d X
1d
2a 2c

2d

dest interface ▪ recall: 1a, 1b, 1c learn about dest X via iBGP
… … from 1c: “path to X goes through 1c”
X 2 ▪ 1d: OSPF intra-domain routing: to get to 1c,
… … forward over outgoing local interface 1
▪ 1a: OSPF intra-domain routing: to get to 1c,
forward over outgoing local interface 2
Network Layer: Control Plane 5-50
BGP route selection
▪ router may learn about more than one route to
destination AS, selects route based on:
1. local preference value attribute: policy decision
2. shortest AS-PATH
3. closest NEXT-HOP router: hot potato routing
4. additional criteria

Network Layer: Control Plane 5-51


Hot Potato Routing
AS3 3b
AS1 1b
3a 3c
1a 1c
AS2 2b 3d X
1d 112
AS3,X
152
AS1,AS3,X 2a 263 2c
201
OSPF link weights
2d

▪ 2d learns (via iBGP) it can route to X via 2a or 2c


▪ hot potato routing: choose local gateway that has least intra-
domain cost (e.g., 2d chooses 2a, even though more AS hops
to X): don’t worry about inter-domain cost!

Network Layer: Control Plane 5-52


BGP: achieving policy via advertisements
legend: provider
B network
X
W A
customer
C network:

Suppose an ISP only wants to route traffic to/from its customer


networks (does not want to carry transit traffic between other ISPs)
▪ A advertises path Aw to B and to C
▪ B chooses not to advertise BAw to C:
▪ B gets no “revenue” for routing CBAw, since none of C, A, w are B’s
customers
▪ C does not learn about CBAw path
▪ C will route CAw (not using B) to get to w
Network Layer: Control Plane 5-53
BGP: achieving policy via advertisements
legend: provider
B network
X
W A
customer
C network:

Suppose an ISP only wants to route traffic to/from its customer


networks (does not want to carry transit traffic between other ISPs)

▪ A,B,C are provider networks


▪ X,W,Y are customer (of provider networks)
▪ X is dual-homed: attached to two networks
▪ policy to enforce: X does not want to route from B to C via X
▪ .. so X will not advertise to B a route to C
Network Layer: Control Plane 5-54
Why different Intra-, Inter-AS routing ?
policy:
▪ inter-AS: admin wants control over how its traffic
routed, who routes through its net.
▪ intra-AS: single admin, so no policy decisions needed
scale:
▪ hierarchical routing saves table size, reduced update
traffic
performance:
▪ intra-AS: can focus on performance
▪ inter-AS: policy may dominate over performance

Network Layer: Control Plane 5-55


Chapter 5: outline
5.1 introduction 5.5 The SDN control plane
5.2 routing protocols 5.6 ICMP: The Internet
▪ link state Control Message
▪ distance vector Protocol
5.3 intra-AS routing in the 5.7 Network management
Internet: OSPF and SNMP
5.4 routing among the ISPs:
BGP

Network Layer: Control Plane 5-56


Software defined networking (SDN)
▪ Internet network layer: historically has been
implemented via distributed, per-router approach
• monolithic router contains switching hardware, runs
proprietary implementation of Internet standard
protocols (IP, RIP, IS-IS, OSPF, BGP) in proprietary
router OS (e.g., Cisco IOS)
• different “middleboxes” for different network layer
functions: firewalls, load balancers, NAT boxes, ..

▪ ~2005: renewed interest in rethinking network


control plane

Network Layer: Control Plane 5-57


Recall: per-router control plane
Individual routing algorithm components in each and every
router interact with each other in control plane to compute
forwarding tables

Routing
Algorithm
control
plane

data
plane

Network Layer: Control Plane 5-58


Recall: logically centralized control plane
A distinct (typically remote) controller interacts with local
control agents (CAs) in routers to compute forwarding tables

Remote Controller

control
plane

data
plane

CA
CA CA CA CA

Network Layer: Control Plane 5-59


Software defined networking (SDN)
Why a logically centralized control plane?
▪ easier network management: avoid router
misconfigurations, greater flexibility of traffic flows
▪ table-based forwarding (recall OpenFlow API)
allows “programming” routers
• centralized “programming” easier: compute tables
centrally and distribute
• distributed “programming: more difficult: compute
tables as result of distributed algorithm (protocol)
implemented in each and every router
▪ open (non-proprietary) implementation of control
plane

Network Layer: Control Plane 5-60


Analogy: mainframe to PC evolution *

Ap Ap Ap Ap Ap Ap Ap Ap Ap Ap
App
Specialized p p p p p p p p p p
Applications Open Interface

Specialized Windows Mac


or Linux or
Operating (OS) OS
System
Open Interface
Specialized
Hardware
Microprocessor

Vertically integrated Horizontal


Closed, proprietary Open interfaces
Slow innovation Rapid innovation
Small industry Huge industry
* Slide courtesy: N. McKeown Network Layer: Control Plane 5-61
Traffic engineering: difficult traditional routing

5
3
2 v w 5

u 2 1
3 z
1
2
x 1 y

Q: what if network operator wants u-to-z traffic to flow along


uvwz, x-to-z traffic to flow xwyz?
A: need to define link weights so traffic routing algorithm
computes routes accordingly (or need a new routing algorithm)!

Link weights are only control “knobs”: wrong!


Network Layer: Control Plane 5-62
Traffic engineering: difficult
5
3
2 v w 5

u 2 1
3 z
1
2
x 1 y

Q: what if network operator wants to split u-to-z


traffic along uvwz and uxyz (load balancing)?
A: can’t do it (or need a new routing algorithm)

Network Layer: Control Plane 5-63


Networking 401
Traffic engineering: difficult
5
3
v
v
w
w
2 5

zz
u 2 1
3
1
2
xx yy
1

Q: what if w wants to route blue and red traffic


differently?

A: can’t do it (with destination based forwarding, and LS,


DV routing)

Network Layer: Control Plane 5-64


Software defined networking (SDN)
4. programmable
control routing
access
control
… load
balance
3. control plane
functions
applications external to data-
plane switches
Remote Controller

control
plane

data
plane

CA 2. control,
data plane
CA CA CA CA separation

1: generalized“ flow-
based” forwarding
(e.g., OpenFlow)
Network Layer: Control Plane 5-65
SDN perspective: data plane switches
Data plane switches network-control applications

▪ fast, simple, commodity


routing

switches implementing
generalized data-plane access load
balance
control
forwarding (Section 4.4) in
hardware control
plane
northbound API
▪ switch flow table computed,
installed by controller SDN Controller
▪ API for table-based switch (network operating system)
control (e.g., OpenFlow)
• defines what is controllable and southbound API
what is not
▪ protocol for communicating data
with controller (e.g., OpenFlow) plane

SDN-controlled switches
Network Layer: Control Plane 5-66
SDN perspective: SDN controller
SDN controller (network OS): network-control applications

▪ maintain network state


routing

information
load
▪ interacts with network access
control balance
control applications “above”
via northbound API northbound API
control
plane
▪ interacts with network
switches “below” via SDN Controller
southbound API (network operating system)
▪ implemented as distributed
system for performance, southbound API

scalability, fault-tolerance,
robustness data
plane

SDN-controlled switches
Network Layer: Control Plane 5-67
SDN perspective: control applications
network-control apps: network-control applications

▪ “brains” of control:
routing

implement control functions
using lower-level services, API access load
balance
control
provided by SND controller
▪ unbundled: can be provided by northbound API
control
plane
3rd party: distinct from routing
vendor, or SDN controller SDN Controller
(network operating system)

southbound API

data
plane

SDN-controlled switches
Network Layer: Control Plane 5-68
Components of SDN controller

routing access load


control balance
Interface layer to
network control Interface, abstractions for network control apps
apps: abstractions
API
network
graph
RESTful
API
… intent

Network-wide state
management layer: statistics … flow tables
state of networks SDN
links, switches, Network-wide distributed, robust state management
controller
services: a distributed
database
Link-state info host info … switch info

communication layer: OpenFlow … SNMP


communicate Communication to/from controlled devices
between SDN
controller and
controlled switches

Network Layer: Control Plane 5-69


OpenFlow protocol
▪ operates between
OpenFlow Controller controller, switch
▪ TCP used to exchange
messages
• optional encryption
▪ three classes of
OpenFlow messages:
• controller-to-switch
• asynchronous (switch
to controller)
• symmetric (misc)

Network Layer: Control Plane 5-70


OpenFlow: controller-to-switch messages

Key controller-to-switch messages


▪ features: controller queries OpenFlow Controller
switch features, switch replies
▪ configure: controller
queries/sets switch
configuration parameters
▪ modify-state: add, delete, modify
flow entries in the OpenFlow
tables
▪ packet-out: controller can send
this packet out of specific
switch port
Network Layer: Control Plane 5-71
OpenFlow: switch-to-controller messages
Key switch-to-controller messages
▪ packet-in: transfer packet (and its OpenFlow Controller
control) to controller. See packet-
out message from controller
▪ flow-removed: flow table entry
deleted at switch
▪ port status: inform controller of a
change on a port.

Fortunately, network operators don’t “program” switches by


creating/sending OpenFlow messages directly. Instead use
higher-level abstraction at controller
Network Layer: Control Plane 5-72
SDN: control/data plane interaction example
Dijkstra’s link-state 1 S1, experiencing link failure
Routing using OpenFlow port status
message to notify controller
4 5
network
graph
RESTful
API
… intent 2 SDN controller receives
OpenFlow message, updates
statistics
3
… flow tables
link status info
3 Dijkstra’s routing algorithm
Link-state info host info … switch info application has previously
2 registered to be called when
OpenFlow
… SNMP
ever link status changes. It is
called.
4 Dijkstra’s routing algorithm
1 access network graph info, link
state info in controller,
s2 computes new routes
s1
s4
s3
Network Layer: Control Plane 5-73
SDN: control/data plane interaction example
Dijkstra’s link-state
Routing

4 5
network
graph
RESTful
API
… intent 5 link state routing app interacts
with flow-table-computation
statistics
3
… flow tables
component in SDN controller,
which computes new flow
Link-state info host info … switch info
tables needed

2 6 Controller uses OpenFlow to


OpenFlow
… SNMP
install new tables in switches
that need updating

s2
s1
s4
s3
Network Layer: Control Plane 5-74
OpenDaylight (ODL) controller
Traffic …
Engineering ▪ ODL Lithium
controller
REST
API ▪ network apps may
Network Basic Network Service Functions
be contained within,
service apps or be external to
Access
topology
manager
switch
manager
stats
manager
SDN controller
Control
host
▪ Service Abstraction
forwarding
manager manager Layer: interconnects
internal, external
Service Abstraction Layer (SAL) applications and
services
OpenFlow 1.0
… SNMP OVSDB

Network Layer: Control Plane 5-75


ONOS controller
Network …
control apps
▪ control apps
northbound separate from
REST API Intent abstractions,
protocols
controller
▪ intent framework:
hosts paths flow rules topology high-level
specification of
ONOS
devices links statistics distributed service: what rather
core than how
▪ considerable
device link host flow packet southbound emphasis on
OpenFlow Netconf OVSDB
abstractions,
protocols distributed core:
service reliability,
replication
performance scaling
Network Layer: Control Plane 5-76
SDN: selected challenges
▪ hardening the control plane: dependable, reliable,
performance-scalable, secure distributed system
• robustness to failures: leverage strong theory of
reliable distributed system for control plane
• dependability, security: “baked in” from day one?
▪ networks, protocols meeting mission-specific
requirements
• e.g., real-time, ultra-reliable, ultra-secure
▪ Internet-scaling

Network Layer: Control Plane 5-77


Chapter 5: outline
5.1 introduction 5.5 The SDN control plane
5.2 routing protocols 5.6 ICMP: The Internet
▪ link state Control Message
▪ distance vector Protocol
5.3 intra-AS routing in the 5.7 Network management
Internet: OSPF and SNMP
5.4 routing among the ISPs:
BGP

Network Layer: Control Plane 5-78


ICMP: internet control message protocol

▪ used by hosts & routers


to communicate network- Type Code description
0 0 echo reply (ping)
level information 3 0 dest. network unreachable
• error reporting: 3 1 dest host unreachable
unreachable host, network, 3 2 dest protocol unreachable
port, protocol 3 3 dest port unreachable
• echo request/reply (used by 3 6 dest network unknown
ping) 3 7 dest host unknown
▪ network-layer “above” IP: 4 0 source quench (congestion
• ICMP msgs carried in IP control - not used)
datagrams 8 0 echo request (ping)
9 0 route advertisement
▪ ICMP message: type, code 10 0 router discovery
plus first 8 bytes of IP 11 0 TTL expired
datagram causing error 12 0 bad IP header

Network Layer: Control Plane 5-79


Traceroute and ICMP
▪ source sends series of ▪ when ICMP message
UDP segments to arrives, source records
destination RTTs
• first set has TTL =1
• second set has TTL=2, etc. stopping criteria:
• unlikely port number ▪ UDP segment eventually
▪ when datagram in nth set arrives at destination host
arrives to nth router: ▪ destination returns ICMP
• router discards datagram and “port unreachable”
sends source ICMP message message (type 3, code 3)
(type 11, code 0)
• ICMP message include name ▪ source stops
of router & IP address

3 probes 3 probes

3 probes
Network Layer: Control Plane 5-80
Chapter 5: outline
5.1 introduction 5.5 The SDN control plane
5.2 routing protocols 5.6 ICMP: The Internet
▪ link state Control Message
▪ distance vector Protocol
5.3 intra-AS routing in the 5.7 Network management
Internet: OSPF and SNMP
5.4 routing among the ISPs:
BGP

Network Layer: Control Plane 5-81


What is network management?
▪ autonomous systems (aka “network”): 1000s of interacting
hardware/software components
▪ other complex systems requiring monitoring, control:
• jet airplane
• nuclear power plant
• others?

"Network management includes the deployment, integration


and coordination of the hardware, software, and human
elements to monitor, test, poll, configure, analyze, evaluate,
and control the network and element resources to meet the
real-time, operational performance, and Quality of Service
requirements at a reasonable cost."

Network Layer: Control Plane 5-82


Infrastructure for network management
definitions:
managing entity
agent data
managing
managed device
managed devices
entity data
contain managed
agent data
objects whose data is
network gathered into a
management
protocol agent data
managed device Management
managed device
Information Base
(MIB)
agent data
agent data
managed device
managed device

Network Layer: Control Plane 5-83


SNMP protocol
Two ways to convey MIB info, commands:

managing managing
entity entity

request
trap msg
response

agent data agent data

managed device managed device

request/response mode trap mode


Network Layer: Control Plane 5-84
SNMP protocol: message types
Message type Function
GetRequest
GetNextRequest manager-to-agent: “get me data”
GetBulkRequest (data instance, next data in list, block of data)

InformRequest manager-to-manager: here’s MIB value

SetRequest manager-to-agent: set MIB value

Response Agent-to-manager: value, response to


Request

Trap Agent-to-manager: inform manager


of exceptional event

Network Layer: Control Plane 5-85


SNMP protocol: message formats
Get/set header Variables to get/set
PDU Error
Request Error
type Status Name Value Name Value ….
ID Index
(0-3) (0-5)

PDU Trap
type Enterprise Agent Type
Specific Time
Name Value ….
Addr code stamp
4 (0-7)
Trap header Trap info

SNMP PDU

More on network management: see earlier editions of text!

Network Layer: Control Plane 5-86


Chapter 5: summary
we’ve learned a lot!
▪ approaches to network control plane
• per-router control (traditional)
• logically centralized control (software defined networking)
▪ traditional routing algorithms
• implementation in Internet: OSPF, BGP
▪ SDN controllers
• implementation in practice: ODL, ONOS
▪ Internet Control Message Protocol
▪ network management

next stop: link layer!


Network Layer: Control Plane 5-87

You might also like