Skip to content

Commit e19c354

Browse files
committed
More work on the kubecon slides
1 parent 1e7edda commit e19c354

File tree

10 files changed

+60
-108
lines changed

10 files changed

+60
-108
lines changed

slides/201810-voxxeddays/slides.key

33 Bytes
Binary file not shown.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

slides/201812-kubecon/notes.md

Lines changed: 60 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,22 @@
11
# Container Networking Talk Notes
22

3-
* Motivation. Why am I doing this?
4-
Some time back I looked into updating the networking layer in the Oracle managed
5-
networking service from using Flannel (an overlay network) to a solution which utilises
6-
the native networking features of the Oracle cloud (secondary VNICs + IPs). However, once
7-
I started digging in, I quickly found that I didnt understand the current solution...
8-
9-
* Prerequsites. Here I am aiming to describe *container* networking from scratch.
10-
However, some networking concepts such as L2 vs. L3, subnets, CIDR ranges are assumed.
11-
However, I'll try my best to briefly describe these as we go..
12-
13-
* No expert though, i.e. at the end of this talk you'll know everything I know about container networking!
3+
* I work in the Oracle cloud infrastruture group, more specifically on
4+
Kubernetes related stuff, and some time back I was given the task of looking
5+
into updating the networking layer in the Oracle managed Kubernetes service
6+
from using Flannel (an overlay network) to a solution which utilises the native
7+
networking features of the Oracle cloud (secondary VNICs + IPs). Dont worry if
8+
you dont know what Flannel is, or know what an overlay network is, as that is
9+
the point of this talk! However, once I started digging in, I quickly found
10+
that I didn't understand how Flannle worked, and as it seemed a little wrong to
11+
replace one thing with another solution, if you dont understand how the
12+
original worked, I started digging deeper, and then relaised that I dont
13+
understand networking in general! Long story short, big rabitt hole, learnt
14+
some stuff, and most importantly found that I really enjoyed this, so I thought
15+
I would write a talk and come and spread the networking love!
16+
17+
* So, I'm Kris, and in the next 30 minutes or so, I'm going to attempt to explain
18+
how a contaniner on one computer on the internet, can talk to a container on
19+
another computer, somewhere else on the internet.
1420

1521
## Slide: The aim
1622

@@ -19,34 +25,29 @@
1925
* No NAT'ing going on.
2026
* Host can talk to containers, and vice versa.
2127

22-
* Contrast this with the default docker approach (RHS diagram).
23-
* i.e. Only containers on a node have unique IP addresses.
24-
* Processes inside containers accessed via port mapping (IP tables).
28+
* Note: We are not covering the default docker model here, where
29+
containers on different nodes can have the same IPs.
2530

2631
## Slide: The plan
2732

28-
* Summarise the 4 steps.
33+
* Going to work our way toward the general case in 4 steps.
34+
35+
* Foreach, we will explain the model via a diagram. Show some code, run the code,
36+
then test what we have created.
2937

30-
* Summarise the demo setup, i.e. using pre-prepared/up vagrant environments.
38+
* Each step we be created using vagrant based VMs.
39+
40+
* Summarise the 4 steps.
3141

3242
## Slide: Single network namespace diagram
3343

3444
* Describe the outer box (the node). Could be a physical machine, or a VM as in this case.
3545

3646
* Describe containers vs namespaces:
37-
* What is a container:
38-
* Cgroups: What a process can do. E.g:
39-
* Restrict memory
40-
* Restrict CPU
41-
* Restrict network bandwidth
42-
43-
* Apparmour/secconf/capabilities: Security layer. E.g:
44-
* Restrict the system calls the contained process has access to.
45-
46-
* Namespaces: What a process can see. E.g.
47-
* Mount namespace: Controls which parts of the file system the contained process can see.
48-
* Process namespace: Controls which other processes the contained process can see.
49-
* Network namespace: See below...
47+
* Containers use a bunch of different linux mechanisms to isolate the processes running inside,
48+
both in terms of system calls, available resources, what it can see, i.e. filesystems, other processes, etc.
49+
However, from a network connectivity point of view, the only mechanism that matters here is the network
50+
namespace, so from now on, whenever I say container, what I really mean is network namespace.
5051

5152
* What is a network namespace:
5253
* It's own network stack containing:
@@ -56,9 +57,13 @@
5657
* When created, it is empty, i.e. no interfaces, routing or IP tables rules.
5758

5859
* Describe VETH pair: Ethernet cable with NIC on each end.
59-
* Describe the relevant routing from/to the network namespace, including the
60-
types of each of the routing rules used here. Note The 'aha' monment, when I worked out
61-
the possible types of routing rules => Key takeaway, understanding these is key!
60+
61+
* Describe the relevant routing from/to the network namespace:
62+
* Directly connected route from the host to the network namespace.
63+
* Default route out of the network namespace.
64+
65+
* Note The 'aha' monment, when I worked out the possible types of routing rules
66+
=> Understanding these was for me the key to understanding networking in general.
6267

6368
## Code: Single network namespace setup.sh
6469

@@ -71,48 +76,29 @@
7176

7277
```
7378
./setup.sh
74-
# The interfaces + routes inside the network namespace
79+
# The interfaces inside the network namespace
7580
sudo ip netns exec con ip a
76-
sudo ip netns exec con ip r
77-
# The interfaces + routes on the node
78-
ip a
79-
ip r
8081
# Pings the network namespace from the node
8182
ping 176.16.0.1
82-
# Pings the node from the network namespace
83-
sudo ip netns exec con ping 10.0.0.10
8483
```
8584

8685
* What is actually responding to the pings in these cases, as there is no process running
87-
inside the namespace who can respond in this case?
88-
Do a quick dive into ICMP here. It is a layer 3(.5?) protocol, i.e. we have
89-
an ICMP header inside of the IP packet, which defines a bunch of bits used in managing
90-
IP packetes. e.g.
91-
- Reporting that TTL has expired - More on this later...
92-
- Reporting that we need to fragment, but the DF bit is set - again, more on this later.
93-
- Bits for echo request and echo response (A.K.A ping).
94-
Therefore, in this case, it is the network stack in the kernel that is reponding to the
95-
ICMP echo request packet, with a ICMP echo request packet.
96-
97-
For a more realistic example, We can run one (or more) real process in the network namespace
98-
(e.g. the python file server), and can the curl this from the node:
86+
inside the namespace who can respond in this case? It is the kernel network stack
87+
inside of the network namespace that is responding to these IMCP echo requests, with
88+
an ICMP echo request packet.
9989

100-
```
101-
# Runs the python file server in the background on port 8000, inside the network namespace
102-
sudo ip netns exec con python3 -m http.server 8000 &
103-
# Curls the python file server from the node
104-
curl 172.16.0.1:8000
105-
```
90+
* For a more realistic example, We would run one (or more) real process in the network namespace.
91+
However, for the purposes of testing connectivity, pinging is enough.
10692

107-
Note: you can run multiple processes inside a network namespace, which roughly corresponds to a Kubernetes pod.
93+
* Note: you can run multiple processes inside a network namespace, which is what happens inside Kubernetes pods.
10894

10995
## Slide: Diagram of multiple network namespaces on the same node
11096

111-
* Describe the Linux bridge:
112-
* A single L2 broadcast domain, much like a switch, implemented in the kernel.
97+
* Describe the Linux bridge: A single L2 broadcast domain, a virtual ethernet switch, implemented in the kernel.
11398
* The bridge now has its own subnet.
11499
* The bridge also has its own IP: Allows access from the outside.
115100
* Describe the route for the subnet.
101+
* Note: This corresponds to the default docker0 bridge.
116102

117103
## Code: Multiple network namespace setup.sh
118104

@@ -125,26 +111,16 @@ Note: you can run multiple processes inside a network namespace, which roughly c
125111

126112
```
127113
./setup.sh
128-
# The interfaces + routes inside a network namespace
129-
sudo ip netns exec con1 ip a
130-
sudo ip netns exec con1 ip r
131-
# The interfaces + routes on the node
114+
# The interfaces on the node
132115
ip a
133-
ip r
134116
# Pings between the network namespaces
135117
sudo ip netns exec con1 ping 172.16.0.3
136118
# Pings the node from the network namespace
137119
sudo ip netns exec con1 ping 10.0.0.10
138120
```
139121

140-
* When we ping between the network namespaces:
141-
* Highlight the TTL. Should be the default value, thus no routing is going on here!
142-
* Describe what the TTL is, and what happens when the TTL reaches zero.
143-
* Describe how the TTL is used, e.g. in the implementation of traceroute.
144-
* When we ping network namespace from node:
145-
* Highlight the TTL. Should be the same.
146-
* Mention that currently we cant get external traffic to the namespaces, as we are not fowarding IP packets.
147-
However, we will set this up in the next example.
122+
* Highlight the TTL. Should be the default value, thus no routing is going on here!
123+
* Describe what the TTL is, and what happens when the TTL reaches zero.
148124

149125
## Slide: Diagram of multiple network namespaces on different nodes but same L2 network
150126

@@ -161,34 +137,27 @@ sudo ip netns exec con1 ping 10.0.0.10
161137
* Talk through the *setup.sh*.
162138
* Describe the parts common to the previous step.
163139
* Describe the setup of the extra routes.
164-
* Explain the IP forwarding.
165-
* What does this do/why is it needed: Turns your Linux box into a router.
166-
* Is enabling this a security risk: Maybe, but it is required in this case!
140+
* Explain the IP forwarding: Turns your Linux box into a router, which is
141+
required in this case as the node has to forward the packets for any network
142+
namespaces that live on that node.
167143

168144
## Demo: Multi node
169145

170146
On each node, run:
171147

172148
```
173149
./setup.sh
174-
# The routes on the node
175-
ip r
176150
```
177151

178-
From 10.0.0.20:
179-
180-
```
181-
# Captures ICMP packetes on the veth10 interface connected to the bridge
182-
sudo tcpdump -ni veth10 icmp
183-
```
184-
185-
Then from 10.0.0.10:
152+
From 10.0.0.10:
186153

187154
```
155+
# The routes on the node
156+
ip r
188157
# Pings from a network namespaces on one node to one on the other node
189158
sudo ip netns exec con1 ping 172.16.1.2
190-
# Pings the same network namespace from the node
191-
ping 172.16.1.2
159+
# Pings from a network namespaces on one node to the other node
160+
sudo ip netns exec con1 ping 10.0.0.20
192161
```
193162

194163
* When we ping from a network namespaces to another network namespace across nodes:
@@ -223,20 +192,13 @@ ping 172.16.1.2
223192
* Talk through the *setup.sh*.
224193
* Describe the parts common to the previous step.
225194
* We need packet forwarding enabled here. This allows the node to act as a router, i.e.
226-
to accept and forward packets recieved, but no tdestined for, the IP of the node.
195+
to accept and forward packets recieved, but not destined for, the IP of the node.
227196
* Now no extra routes, but contains the socat implementation of the overlay.
228197
* Describe *socat* in general. It creates 2 bidirectional bytestreams, and transfers data between them.
229198
* Describe how *socat* is being used here.
230-
* Describe how this is similar to a VPN: How could we construct a virtual network
231-
between 2 hosts using socat (creating a VN!). For example, start the VN 'server' on the destination
232-
network using a tun device and the UDP tunnel. Start the VN 'client', with the same setup. Connect the UDP
233-
tunnel, then assign the tun device an address on the desination network. Just add encryption to this, and you'll
234-
have your very own VPN!
235199
* Note the MTU settings, what is going on here? We reduce the MTU of the tun0
236200
device as this allows for the 8 bytes UDP header that will be added, thus ensuring that
237201
fragmentation does not occur.
238-
* Describe the scheme that is used to ensure that the kernel chooses packet sizes that dont cause fragmentation
239-
(using the DF bit in the IP packets, and the 'Fragmentation required' ICMP response.
240202
* Reverse packet filtering:
241203
* What is this: Discards incoming packets from interfaces where they shouldn't be.
242204
* It's purpose: A security feature to stop IP spoofed packets from being propagated.
@@ -245,9 +207,6 @@ ping 172.16.1.2
245207
tunnel. However, the response will not (as it is destined for the node), thus the response
246208
will emerge on a different interface to which the request packet went. Therefore, the kernel
247209
consider this suspicious, unless we tell it that all is ok.
248-
* Is it OK to turn this off? Again, maybe, as this is primarily for DOS attacks, thus aimed at
249-
routers on the public internet. The alternative is to ensure that packets from network
250-
namespaces to remote nodes also go via the overlay (which would involve src based routing!)
251210

252211
## Demo: Overlay network
253212

@@ -257,20 +216,13 @@ On each node, run:
257216
./setup.sh
258217
```
259218

260-
From 10.0.0.20:
261-
262-
```
263-
# Captures ICMP packetes on the veth20 interface connected to the bridge
264-
sudo tcpdump -ni veth10 icmp
265-
```
266-
267219
From 10.0.0.10:
268220

269221
```
270222
# Ping from a network namespaces on one node to one on the other node
271223
sudo ip netns exec con1 ping 172.16.1.2
272-
# Pings the same network namespace from the node
273-
ping 172.16.1.2
224+
# Ping from a network namespaces on one node to the other node
225+
sudo ip netns exec con1 ping 10.0.0.20
274226
```
275227

276228
* When we ping from a network namespace to a network namespace across nodes:

slides/201812-kubecon/slides.key

-941 KB
Binary file not shown.

slides/201812-kubecon/slides.pdf

-922 KB
Binary file not shown.

0 commit comments

Comments
 (0)