摘要:舉個(gè)例子在一個(gè)由臺(tái)主機(jī)節(jié)點(diǎn)組成系統(tǒng)中用戶希望每個(gè)節(jié)點(diǎn)上容器的地址在各自設(shè)定的子網(wǎng)范圍內(nèi)但是在的默認(rèn)配置中容器的地址是由節(jié)點(diǎn)上的服務(wù)自身決定的。典型組網(wǎng)上圖為系統(tǒng)運(yùn)行的典型組網(wǎng)。
介紹
Flannel是CoreOS團(tuán)隊(duì)針對(duì)Kubernates設(shè)計(jì)的跨主機(jī)容器網(wǎng)絡(luò)解決方案, 它可以使集群中不同節(jié)點(diǎn)上運(yùn)行的docker容器都具有全集群唯一的虛擬IP地址。
舉個(gè)例子,在一個(gè)由3臺(tái)主機(jī)節(jié)點(diǎn)組成系統(tǒng)中,用戶希望每個(gè)節(jié)點(diǎn)上容器的IP地址在各自設(shè)定的子網(wǎng)范圍內(nèi):
Host1: 10.0.1.0/24
Host2: 10.0.2.0/24
Host3: 10.0.3.0/24
但是在 docker 的默認(rèn)配置中,容器的IP地址是由節(jié)點(diǎn)上的 docker 服務(wù)自身決定的。以容器默認(rèn)使用的 bridge 網(wǎng)絡(luò)為例,其分配到的 IP 地址與 docker0 橋處于同一個(gè)網(wǎng)段。而如果沒(méi)有手動(dòng)配置,多臺(tái)主機(jī)上的 docker0 的 IP 地址很有可能重復(fù),那么,主機(jī)上運(yùn)行的容器也就有可能被分配到相同的 IP 地址。雖然可以通過(guò)手工配置 docker 服務(wù)的啟動(dòng)參數(shù)(-bip)來(lái)使得各個(gè)主機(jī)的 docker0 橋 IP 地址各異,但這樣的手動(dòng)方式大大增加運(yùn)維難度。并且除了IP地址,容器間的網(wǎng)絡(luò)互通還需要配置主機(jī)間的 route table,neigh table 等等。而 Flannel 可以使這些工作變得簡(jiǎn)單。
典型組網(wǎng)
上圖為 Flannel 系統(tǒng)運(yùn)行的典型組網(wǎng)。Flannel 運(yùn)行在每臺(tái)需要運(yùn)行 docker 容器的 host 上,etcd 是一個(gè)分布式數(shù)據(jù)庫(kù),它運(yùn)行在另一臺(tái) host 上(實(shí)際上,它也可以運(yùn)行在某臺(tái)運(yùn)行 Flannel 的 host 上),它存儲(chǔ)著 Flannel 當(dāng)前的 IP 資源池以及當(dāng)前的已分配狀況,這是不同 host上 的容器 IP 不同的關(guān)鍵。當(dāng)某臺(tái) host 上的 Flannel 啟動(dòng)時(shí),它會(huì)訪問(wèn) etcd 去得到一個(gè)空閑的 IP 網(wǎng)段,并將自己已占用該網(wǎng)段的信息寫(xiě)入 etcd ,這樣其他 host 就不能分配到同樣的網(wǎng)段了。
每臺(tái)運(yùn)行 Flanne l服務(wù)的 host 之間通過(guò) backend 轉(zhuǎn)發(fā)跨主機(jī)容器之間的網(wǎng)絡(luò)流量??蛇x擇的 backend 有 host-gw udp vxlan ipip gce alivpc awsvpc等模式,下面將通過(guò)實(shí)例演示host-gw udp vxlan三種模式。
Host1: ubuntu16.04 docker18.06.0ce etcd-3.2.4 對(duì)外網(wǎng)卡ens33: 172.16.112.128
Host2: ubuntu16.04 docker18.06.0ce flanneld-0.10 對(duì)外網(wǎng)卡ens33: 172.16.112.133
Host3: ubuntu16.04 docker18.06.0ce flanneld-0.10 對(duì)外網(wǎng)卡ens33: 172.16.112.130
1 下載二進(jìn)制安裝包 etcd-v3.2.4-linux-amd64.tar.gz 解壓后復(fù)制etcd 和 etcdctl到 /usr/local/bin/ 目錄
2 創(chuàng)建文件/etc/etcd/etcd.conf
ETCD_DATA_DIR="/var/run/etcd" ETCD_ADVERTISE_CLIENT_URLS="http://172.16.112.128:2379,http://127.0.0.1:2379" ETCD_NAME="node-1" ETCD_LISTEN_CLIENT_URLS="http://172.16.112.128:2379,http://127.0.0.1:2379"
3 創(chuàng)建 Service 文件/lib/system/system/etcd.service
[Unit] Description=Etcd Server Documentation=https://github.com/coreos/etcd After=network.target After=network-online.target Wants=network-online.target [Service] User=root Type=notify EnvironmentFile=-/etc/etcd/etcd.conf ExecStart=/usr/local/bin/etcd LimitNOFILE=40000 [Install] WantedBy=multi-user.target
4 啟動(dòng)Service,可以看到其運(yùn)行狀態(tài)正常
root@node-1:~# systemctl start etcd systemctl status etcd ● etcd.service - Etcd Server Loaded: loaded (/lib/systemd/system/etcd.service; disabled; vendor preset: enabled) Active: active (running) since Fri 2018-09-07 03:23:49 PDT; 2 days ago Docs: https://github.com/coreos/etcd Main PID: 20497 (etcd) Tasks: 7 Memory: 81.9M CPU: 8min 46.959s CGroup: /system.slice/etcd.service └─20497 /usr/local/bin/etcd
5 創(chuàng)建/etc/flannel-config.json如下 (以host-gw為例)
root@node-1:~# cat /etc/flannel-config.json { "Network":"10.2.0.0/16", "SubnetLen":24, "Backend":{ "Type":"host-gw" } }
6 將之后flannel網(wǎng)絡(luò)的分配信息存入etcd
root@node-1:~# etcdctl set /docker-subnet/network/config < /etc/flannel-config.json { "Network":"10.100.0.0/16", "SubnetLen":24, "Backend":{ "Type":"host-gw" } }flannel 安裝(host1)
下載二進(jìn)制安裝包flannel-v0.10.0-linux-amd64.tar.gz 解壓后將 flanneld 和mk-docker-opts.sh復(fù)制到 /usr/local/bin/ 目錄
創(chuàng)建Service文件/lib/system/system/flanneld.service
[Unit] Description=Flanneld After=network.target Before=docker.service [Service] User=root ExecStart=/usr/local/bin/flanneld --etcd-endpoints=http://172.16.112.128:2379 --iface=ens33 -etcd-prefix=/docker-subnet/network Type=notify LimitNOFILE=65536
3 啟動(dòng)flannel服務(wù)
root@node-2:~# systemctl start flanneld.service root@node-2:~# systemctl status flanneld.service ● flanneld.service - Flanneld Loaded: loaded (/lib/systemd/system/flanneld.service; static; vendor preset: enabled) Active: active (running) since Mon 2018-09-10 01:32:26 PDT; 5s ago Main PID: 26076 (flanneld) Tasks: 7 Memory: 10.1M CPU: 146ms CGroup: /system.slice/flanneld.service └─26076 /usr/local/bin/flanneld --etcd-endpoints=http://172.16.112.128:2379 --iface=ens33 -etcd-prefix=/docker-subnet/network
可以從subnet.env看到從hode-1上獲得的子網(wǎng)信息
root@node-2:~# cat /run/flannel/subnet.env FLANNEL_NETWORK=10.100.0.0/16 FLANNEL_SUBNET=10.100.50.1/24 FLANNEL_MTU=1500 FLANNEL_IPMASQ=false
4 執(zhí)行 mk-docker-opts腳本,得到docker啟動(dòng)參數(shù)
root@node-2:~# mk-docker-opts.sh root@node-2:~# cat /run/docker_opts.env DOCKER_OPT_BIP="--bip=10.100.50.1/24" DOCKER_OPT_IPMASQ="--ip-masq=true" DOCKER_OPT_MTU="--mtu=1500" DOCKER_OPTS=" --bip=10.100.50.1/24 --ip-masq=true --mtu=1500"
5 修改docker服務(wù)參數(shù) /lib/system/system/docker.service
EnvironmentFile=/run/docker_opts.env ExecStart=/usr/bin/dockerd -H fd:// $DOCKER_OPTS
重新啟動(dòng)docker, 可以看到啟動(dòng)參數(shù)中已經(jīng)有 bip為我們從etcd中獲得的網(wǎng)段信息(10.100.50.1/24)了
root@node-2:~# systemctl daemon-reload root@node-2:~# systemctl restart docker root@node-2:~# systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2018-09-10 01:39:56 PDT; 21s ago Docs: https://docs.docker.com Main PID: 26664 (dockerd) Tasks: 18 Memory: 53.9M CPU: 769ms CGroup: /system.slice/docker.service ├─26664 /usr/bin/dockerd -H fd:// --bip=10.100.50.1/24 --ip-masq=true --mtu=1500 └─26673 docker-containerd --config /var/run/docker/containerd/containerd.toml
docker0 分配的IP地址也符合預(yù)期
root@node-2:~# ifconfig docker0 docker0 Link encap:Ethernet HWaddr 02:42:d0:bc:0a:1f inet addr:10.100.50.1 Bcast:10.100.50.255 Mask:255.255.255.0 inet6 addr: fe80::42:d0ff:febc:a1f/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:93 errors:0 dropped:0 overruns:0 frame:0 TX packets:289 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:5628 (5.6 KB) TX bytes:27078 (27.0 KB)
在Host3 上重復(fù)此過(guò)程,得到的網(wǎng)段地址為10.100.83.0/24
root@node-3:~# ifconfig docker0 docker0 Link encap:Ethernet HWaddr 02:42:fa:91:74:a1 inet addr:10.100.83.1 Bcast:10.100.83.255 Mask:255.255.255.0 inet6 addr: fe80::42:faff:fe91:74a1/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:114 errors:0 dropped:0 overruns:0 frame:0 TX packets:275 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:7112 (7.1 KB) TX bytes:25875 (25.8 KB)連通性實(shí)驗(yàn)分析 host-gw模式
前面準(zhǔn)備環(huán)境時(shí), 正是按照host-gw方式分配I的
在host2和host3上各運(yùn)行一個(gè)busybox容器來(lái)測(cè)試其連通性
root@node-2:~# docker run --name bbox2 -tid busybox 759fb9b67f0b21901da1ab6870d3e592bc2923eb0b753d5c009630d5e2c228d5 root@node-2:~# docker exec bbox2 ifconfig eth0 Link encap:Ethernet HWaddr 02:42:0A:64:32:02 inet addr:10.100.50.2 Bcast:10.100.50.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:18 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2335 (2.2 KiB) TX bytes:0 (0.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
root@node-3:~# docker run --name bbox3 -tid busybox 60da1aefb40d9e96984888554cfbc3e927974a6cb1422f679f6a4e2381184385 root@node-3:~# docker exec bbox3 ifconfig eth0 Link encap:Ethernet HWaddr 02:42:0A:64:53:02 inet addr:10.100.83.2 Bcast:10.100.83.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:18 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2335 (2.2 KiB) TX bytes:0 (0.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
此時(shí)的組網(wǎng)環(huán)境為
我們使用ICMP報(bào)文來(lái)驗(yàn)證其連通性 (同時(shí),使用tcpdump監(jiān)視host2上的docker0和ens33網(wǎng)卡)
root@node-2:~# docker exec bbox2 ping -c 4 10.100.83.2 PING 10.100.83.2 (10.100.83.2): 56 data bytes 64 bytes from 10.100.83.2: seq=0 ttl=62 time=0.441 ms 64 bytes from 10.100.83.2: seq=1 ttl=62 time=0.384 ms 64 bytes from 10.100.83.2: seq=2 ttl=62 time=0.361 ms 64 bytes from 10.100.83.2: seq=3 ttl=62 time=0.414 ms
可以看出兩個(gè)容器之間是可以ping通的。
root@node-2:~# tcpdump -i docker0 -vv tcpdump: listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes 02:03:50.054787 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.100.50.1 tell 10.100.50.2, length 28 02:03:50.054807 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.100.50.1 is-at 02:42:d0:bc:0a:1f (oui Unknown), length 28 02:03:50.054811 IP (tos 0x0, ttl 64, id 52386, offset 0, flags [DF], proto ICMP (1), length 84) 10.100.50.2 > 10.100.83.2: ICMP echo request, id 3072, seq 0, length 64 02:03:50.055149 IP (tos 0x0, ttl 62, id 59405, offset 0, flags [none], proto ICMP (1), length 84) 10.100.83.2 > 10.100.50.2: ICMP echo reply, id 3072, seq 0, length 64 02:03:51.055498 IP (tos 0x0, ttl 64, id 52420, offset 0, flags [DF], proto ICMP (1), length 84) 10.100.50.2 > 10.100.83.2: ICMP echo request, id 3072, seq 1, length 64 02:03:51.055792 IP (tos 0x0, ttl 62, id 59524, offset 0, flags [none], proto ICMP (1), length 84) 10.100.83.2 > 10.100.50.2: ICMP echo reply, id 3072, seq 1, length 64 02:03:52.056381 IP (tos 0x0, ttl 64, id 52433, offset 0, flags [DF], proto ICMP (1), length 84)
root@node-2:~# tcpdump -i ens33 -n icmp -vv tcpdump: listening on ens33, link-type EN10MB (Ethernet), capture size 262144 bytes 02:04:57.146397 IP (tos 0x0, ttl 63, id 59135, offset 0, flags [DF], proto ICMP (1), length 84) 172.16.112.133 > 10.100.83.2: ICMP echo request, id 5888, seq 0, length 64 02:04:57.146785 IP (tos 0x0, ttl 63, id 8507, offset 0, flags [none], proto ICMP (1), length 84) 10.100.83.2 > 172.16.112.133: ICMP echo reply, id 5888, seq 0, length 64 02:04:58.148039 IP (tos 0x0, ttl 63, id 59199, offset 0, flags [DF], proto ICMP (1), length 84) 172.16.112.133 > 10.100.83.2: ICMP echo request, id 5888, seq 1, length 64 02:04:58.148295 IP (tos 0x0, ttl 63, id 8744, offset 0, flags [none], proto ICMP (1), length 84) 10.100.83.2 > 172.16.112.133: ICMP echo reply, id 5888, seq 1, length 64 02:04:59.148941 IP (tos 0x0, ttl 63, id 59359, offset 0, flags [DF], proto ICMP (1), length 84)
對(duì)比 docker0 和 ens33 上的抓包情況可以看出,原始的 ICMP 報(bào)文在從 host2 上,經(jīng)過(guò)了一次SNAT,源IP從容器IP替換成了主機(jī)IP
root@node-2:~# iptables -t nat -S ... -A POSTROUTING -s 10.100.50.0/24 ! -o docker0 -j MASQUERADE ...
在 host-gw 模式中,host3 上 Flannel 啟動(dòng)后,F(xiàn)lannel 會(huì)在 host2 上增加一條路由如下,即為 host3 上分配的網(wǎng)段設(shè)置下一條網(wǎng)關(guān)172.16.112.130.這樣,所有在 host2 上,所有目的地址是 host3 上容器IP的報(bào)文的下一條都為 host3,且從本地的ens33網(wǎng)卡發(fā)出。
root@node-2:~# ip route ... 10.100.83.0/24 via 172.16.112.130 dev ens33 ...
同理,在 host3 上也有相應(yīng)的路由設(shè)置,為目的地址為 host2 所分網(wǎng)段的報(bào)文設(shè)置下一跳網(wǎng)關(guān)
root@node-3:~# ip route ... 10.100.50.0/24 via 172.16.112.133 dev ens33 ...udp 模式
在 host1 上,編輯 flannel 的網(wǎng)絡(luò)配置文件,并將配置重新寫(xiě)入 etcd,為了區(qū)別,將IP資源池?fù)Q成 10.101.0.0/16
root@node-1:~# etcdctl set /docker-subnet/network/config < /etc/flannel-config.json { "Network":"10.101.0.0/16", "SubnetLen":24, "Backend":{ "Type":"udp" } }
在 host2 和 host3 上重啟flannel服務(wù),執(zhí)行mk-docker-opts.sh腳本,重啟 docker 服務(wù)
root@node-2:~# systemctl restart flanneld.service root@node-2:~# cat /run/flannel/subnet.env FLANNEL_NETWORK=10.101.0.0/16 FLANNEL_SUBNET=10.101.83.1/24 FLANNEL_MTU=1472 FLANNEL_IPMASQ=false root@node-2:~# mk-docker-opts.sh root@node-2:~# systemctl restart docker
host2 分配到 10.101.83.1/24 網(wǎng)段, bbox2 上eth0分配的IP地址為 10.101.83.2/24
host3 分配到 10.101.12.1/24 網(wǎng)段, bbox3 上eth0分配的IP地址為 10.101.12.2/24
查看host2上的路由表
root@node-2:~# ip route default via 172.16.112.2 dev ens33 proto static metric 100 10.101.0.0/16 dev flannel0 proto kernel scope link src 10.101.83.0 10.101.83.0/24 dev docker0 proto kernel scope link src 10.101.83.1 ......
注意其中第2條,它表示目的地址是 10.101.0.0/16網(wǎng)段 (排除10.101.83.0/24網(wǎng)段) 的報(bào)文都要經(jīng)過(guò) flannel0 設(shè)備轉(zhuǎn)發(fā),那么flannel0 是什么?
root@node-2:~# ip -d link show dev flannel0 27: flannel0:mtu 1472 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 500 link/none promiscuity 0 tun root@node-2:~# ifconfig flannel0 flannel0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:10.101.83.0 P-t-P:10.101.83.0 Mask:255.255.0.0 inet6 addr: fe80::f837:978c:898b:55ee/64 Scope:Link UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1472 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:3 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:0 (0.0 B) TX bytes:144 (144.0 B)
可以看出flannel0是一個(gè)tun設(shè)備,此時(shí)的組網(wǎng)環(huán)境為
同樣使用ICMP報(bào)文驗(yàn)證容器之間的連通性 (同時(shí)監(jiān)控docker0 flannel0 ens33上報(bào)文)
root@node-2:~# docker exec bbox2 ping -c 4 10.101.12.2 PING 10.101.12.2 (10.101.12.2): 56 data bytes 64 bytes from 10.101.12.2: seq=0 ttl=60 time=3.440 ms 64 bytes from 10.101.12.2: seq=1 ttl=60 time=1.004 ms 64 bytes from 10.101.12.2: seq=2 ttl=60 time=0.898 ms 64 bytes from 10.101.12.2: seq=3 ttl=60 time=0.776 ms
可以發(fā)現(xiàn),也是可以ping通的。
docker0上的ICMP報(bào)文
root@node-2:~# tcpdump -i docker0 -vv tcpdump: listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes 02:54:12.868669 IP (tos 0x0, ttl 64, id 42742, offset 0, flags [DF], proto ICMP (1), length 84) 10.101.83.2 > 10.101.12.2: ICMP echo request, id 2816, seq 0, length 64 02:54:12.870702 IP (tos 0x0, ttl 60, id 21839, offset 0, flags [none], proto ICMP (1), length 84) 10.101.12.2 > 10.101.83.2: ICMP echo reply, id 2816, seq 0, length 64 02:54:13.870698 IP (tos 0x0, ttl 64, id 42882, offset 0, flags [DF], proto ICMP (1), length 84) 10.101.83.2 > 10.101.12.2: ICMP echo request, id 2816, seq 1, length 64 02:54:13.871515 IP (tos 0x0, ttl 60, id 22088, offset 0, flags [none], proto ICMP (1), length 84) 10.101.12.2 > 10.101.83.2: ICMP echo reply, id 2816, seq 1, length 64
flannel0上的ICMP報(bào)文
root@node-2:~# tcpdump -i flannel0 -vv tcpdump: listening on flannel0, link-type RAW (Raw IP), capture size 262144 bytes 02:54:12.868749 IP (tos 0x0, ttl 63, id 42742, offset 0, flags [DF], proto ICMP (1), length 84) 10.101.83.0 > 10.101.12.2: ICMP echo request, id 2816, seq 0, length 64 02:54:12.870659 IP (tos 0x0, ttl 61, id 21839, offset 0, flags [none], proto ICMP (1), length 84) 10.101.12.2 > 10.101.83.0: ICMP echo reply, id 2816, seq 0, length 64 02:54:13.870725 IP (tos 0x0, ttl 63, id 42882, offset 0, flags [DF], proto ICMP (1), length 84) 10.101.83.0 > 10.101.12.2: ICMP echo request, id 2816, seq 1, length 64 02:54:13.871504 IP (tos 0x0, ttl 61, id 22088, offset 0, flags [none], proto ICMP (1), length 84) 10.101.12.2 > 10.101.83.0: ICMP echo reply, id 2816, seq 1, length 64
ens33 上的UDP報(bào)文 (ens33上已經(jīng)抓不到ICMP報(bào)文了)
root@node-2:~# tcpdump udp -i ens33 -v 02:57:09.349575 IP (tos 0x0, ttl 64, id 23956, offset 0, flags [DF], proto UDP (17), length 112) 172.16.112.133.8285 > node-3.8285: UDP, length 84 02:57:09.349874 IP (tos 0x0, ttl 64, id 31417, offset 0, flags [DF], proto UDP (17), length 112) node-3.8285 > 172.16.112.133.8285: UDP, length 84 02:57:10.350543 IP (tos 0x0, ttl 64, id 24046, offset 0, flags [DF], proto UDP (17), length 112) 172.16.112.133.8285 > node-3.8285: UDP, length 84 02:57:10.350868 IP (tos 0x0, ttl 64, id 31597, offset 0, flags [DF], proto UDP (17), length 112) node-3.8285 > 172.16.112.133.8285: UDP, length 84
ICMP報(bào)文在傳輸過(guò)程中經(jīng)過(guò)了幾次變化:
docker0上 10.101.83.2 > 10.101.12.2
flannel0上 10.101.83.0 > 10.101.12.2
ens33上 172.16.112.133 > 172.16.112.130 內(nèi)層10.101.83.0 > 10.101.12.2 (內(nèi)層報(bào)文可通過(guò)wireshark看出來(lái)是一個(gè)IP報(bào)文)
udp模式中,F(xiàn)lannel會(huì)創(chuàng)建 tun 設(shè)備 flannel0,所有跨主機(jī)流量在內(nèi)核會(huì)通過(guò) flannel0轉(zhuǎn)發(fā),會(huì)被 Flannel 用戶態(tài)讀?。ǖ谝淮螆?bào)文變化),而在 Flanne l用戶態(tài)讀取后,F(xiàn)lannel會(huì)將報(bào)文進(jìn)行隧道封裝,將 ICMP 報(bào)文外層包裹為 UDP 報(bào)文(第二次報(bào)文變化)
與之前 udp 模式的修改方式類(lèi)似,在 host1 上,將backend類(lèi)型修改為 vxlan,資源池替換為 10.102.0.0/16
root@node-1:~# etcdctl set /docker-subnet/network/config < /etc/flannel-config.json { "Network":"10.102.0.0/16", "SubnetLen":24, "Backend":{ "Type":"vxlan" } }
然后重啟flannel和docker
host2 分配到 10.102.20.1/24 網(wǎng)段, bbox2 上eth0分配的IP地址為 10.101.20.2/24
host3 分配到 10.102.19.1/24 網(wǎng)段, bbox3 上eth0分配的IP地址為 10.101.19.2/24
查看網(wǎng)卡,可知Flannel為host2創(chuàng)建了一個(gè)vxlan設(shè)備flannel.1
IP:10.102.20.0/32
MAC: 72:45:d3:56:86:48
root@node-2:~# ip -d link show dev flannel.1 3: flannel.1:mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default link/ether 72:45:d3:56:86:48 brd ff:ff:ff:ff:ff:ff promiscuity 0 vxlan id 1 local 172.16.112.133 dev ens33 srcport 0 0 dstport 8472 root@node-2:~# ip addr show dev flannel.1 3: flannel.1: mtu 1450 qdisc noqueue state UNKNOWN group default link/ether 72:45:d3:56:86:48 brd ff:ff:ff:ff:ff:ff inet 10.102.20.0/32 scope global flannel.1 valid_lft forever preferred_lft forever
同理, 在host3上
Flannel.1 IP: 10.102.19.0/32 MAC:d6:1a:65:fc:48:77
root@node-3:~# ip -d link show dev flannel.1 3: flannel.1:mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default link/ether d6:1a:65:fc:48:77 brd ff:ff:ff:ff:ff:ff promiscuity 0 vxlan id 1 local 172.16.112.130 dev ens33 srcport 0 0 dstport 8472 nolearning ageing 300 udpcsum addrgenmode none root@node-3:~# ip addr show dev flannel.1 3: flannel.1: mtu 1450 qdisc noqueue state UNKNOWN group default link/ether d6:1a:65:fc:48:77 brd ff:ff:ff:ff:ff:ff inet 10.102.19.0/32 scope global flannel.1 valid_lft forever preferred_lft forever
在host2上 查看其路由表,可以看到Flannel為主機(jī)增加了一條跨主機(jī)流量的路由
root@node-2:~# ip route default via 172.16.112.2 dev ens33 proto static metric 100 10.102.19.0/24 via 10.102.19.0 dev flannel.1 onlink ...
再查看其鄰居表,可以看到其增加了一條針對(duì) host3 上的 flannel.1 的設(shè)置 (d6:1a:65:fc:48:77為 host3 上 flannel.1 的MAC地址,10.102.19.0為IP地址)
root@node-2:~# ip neigh 10.102.19.0 dev flannel.1 lladdr d6:1a:65:fc:48:77 PERMANENT
查看其fdb,可以看到一條設(shè)置 (其中172.16.112.130為 host3 上的 ens33 的IP地址)
root@node-2:~# bridge fdb d6:1a:65:fc:48:77 dev flannel.1 dst 172.16.112.130 self permanent ......
同理 Flannel也為 Host3上進(jìn)行了相同的配置
此時(shí)的組網(wǎng)圖為
依然使用 busybox 進(jìn)行 ICMP 通信實(shí)驗(yàn) (同時(shí)監(jiān)控docker0 flanne.1 ens33上的報(bào)文情況)
root@node-2:~# docker exec bbox2 ping 10.102.19.2 PING 10.102.19.2 (10.102.19.2): 56 data bytes 64 bytes from 10.102.19.2: seq=0 ttl=62 time=1.287 ms 64 bytes from 10.102.19.2: seq=1 ttl=62 time=1.326 ms 64 bytes from 10.102.19.2: seq=2 ttl=62 time=1.792 ms 64 bytes from 10.102.19.2: seq=3 ttl=62 time=1.653 ms
依然可以ping通
docker0上的報(bào)文如下
root@node-2:~# tcpdump -i docker0 -vv tcpdump: listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes 12:21:15.194354 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.102.20.1 tell 10.102.20.2, length 28 12:21:15.194372 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.102.20.1 is-at 02:42:1a:0c:ec:3e (oui Unknown), length 28 12:21:15.194375 IP (tos 0x0, ttl 64, id 37449, offset 0, flags [DF], proto ICMP (1), length 84) 10.102.20.2 > 10.102.19.2: ICMP echo request, id 2816, seq 0, length 64 12:21:15.195431 IP (tos 0x0, ttl 62, id 23004, offset 0, flags [none], proto ICMP (1), length 84) 10.102.19.2 > 10.102.20.2: ICMP echo reply, id 2816, seq 0, length 64
flanne.1上的報(bào)文如下
root@node-2:~# tcpdump -i flannel.1 -vv tcpdump: listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes 12:21:15.194390 IP (tos 0x0, ttl 63, id 37449, offset 0, flags [DF], proto ICMP (1), length 84) 10.102.20.0 > 10.102.19.2: ICMP echo request, id 2816, seq 0, length 64 12:21:15.195425 IP (tos 0x0, ttl 63, id 23004, offset 0, flags [none], proto ICMP (1), length 84) 10.102.19.2 > 10.102.20.0: ICMP echo reply, id 2816, seq 0, length 64 12:21:16.197793 IP (tos 0x0, ttl 63, id 37695, offset 0, flags [DF], proto ICMP (1), length 84) 10.102.20.0 > 10.102.19.2: ICMP echo request, id 2816, seq 1, length 64 12:21:16.198588 IP (tos 0x0, ttl 63, id 23066, offset 0, flags [none], proto ICMP (1), length 84) 10.102.19.2 > 10.102.20.0: ICMP echo reply, id 2816, seq 1, length 64
ens33上的udp報(bào)文如下:
root@node-2:~# tcpdump udp -i ens33 -vv tcpdump: listening on ens33, link-type EN10MB (Ethernet), capture size 262144 bytes 12:21:15.194972 IP (tos 0x0, ttl 64, id 28096, offset 0, flags [none], proto UDP (17), length 134) 172.16.112.133.38092 > node-3.8472: [bad udp cksum 0x39ac -> 0xae89!] OTV, flags [I] (0x08), overlay 0, instance 1 IP (tos 0x0, ttl 63, id 37449, offset 0, flags [DF], proto ICMP (1), length 84) 10.102.20.0 > 10.102.19.2: ICMP echo request, id 2816, seq 0, length 64 12:21:15.195415 IP (tos 0x0, ttl 64, id 64197, offset 0, flags [none], proto UDP (17), length 134) node-3.45840 > 172.16.112.133.8472: [udp sum ok] OTV, flags [I] (0x08), overlay 0, instance 1 IP (tos 0x0, ttl 63, id 23004, offset 0, flags [none], proto ICMP (1), length 84) 10.102.19.2 > 10.102.20.0: ICMP echo reply, id 2816, seq 0, length 64
可以看出,與UDP模式相似, vxlan模式也是利用了隧道封裝完成跨主機(jī)報(bào)文的傳遞
docker0上 10.102.20.2 > 10.102.19.2
flannel0上 10.102.20.0 > 10.102.19.2
ens33上 外層 172.16.112.133 > 172.16.112.130 , 內(nèi)層 10.102.20.0 > 10.102.19.2 (內(nèi)層報(bào)文可通過(guò)wireshark看到是一個(gè)二層報(bào)文)
與UDP模式不同, vxlan模式的報(bào)文封裝都是在內(nèi)核完成的, 它是標(biāo)準(zhǔn)的vxlan封裝過(guò)程,在此過(guò)程中,查詢了route table, neigh table, fdb table,這也正是 Flannel 為主機(jī)內(nèi)核寫(xiě)入的信息
ICMP request 發(fā)送過(guò)程:
bbox2 組裝ICMP報(bào)文 但缺少下一跳mac
bbox2 有路由 default via 10.102.20.1 dev eth0, 因此它要首先拿到10.102.20.1 的mac地址,所以它發(fā)送ARP報(bào)文
docker0 回復(fù) bbox2 其mac地址02:42:1A:0C:EC:3E
bbox2 組裝ICMP報(bào)文
docker0 收到后, 根據(jù)路由 10.102.19.0/24 via 10.102.19.0 dev flannel.1 onlink
將報(bào)文轉(zhuǎn)發(fā)給 flannel.1, 指定下一跳網(wǎng)關(guān)為10.102.19.0 (host3的flannel.1地址), 而由neigh表能查詢到其mac為 d6:1a:65:fc:48:77
vxlan 設(shè)備 flannel.1 對(duì)報(bào)文進(jìn)行處理, 修改除dmac, smac, sip
vxlan 設(shè)備flannel.1 對(duì)報(bào)文進(jìn)行vxlan封裝, 上面的ICMP作為內(nèi)層報(bào)文, 現(xiàn)在組裝外層報(bào)文
由fbd表可查得 dip = 172.16.112.130 sip = IP(host2"s ens33)
一篇文章帶你了解Flannel
Linux上實(shí)現(xiàn)vxlan網(wǎng)絡(luò)
vxlan介紹
vxlan-linux
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://systransis.cn/yun/27530.html
摘要:最近被業(yè)務(wù)折騰的死去活來(lái),實(shí)在沒(méi)時(shí)間發(fā)帖,花了好多個(gè)晚上才寫(xiě)好這篇帖子,后續(xù)會(huì)加油的利用技術(shù)棧打造個(gè)人私有云系列文章目錄利用技術(shù)棧打造個(gè)人私有云連載之初章利用技術(shù)棧打造個(gè)人私有云連載之集群搭建利用技術(shù)棧打造個(gè)人私有云連載之環(huán)境理解和練手利用 showImg(https://segmentfault.com/img/remote/1460000013077799); 最近被業(yè)務(wù)折騰的死...
摘要:最近被業(yè)務(wù)折騰的死去活來(lái),實(shí)在沒(méi)時(shí)間發(fā)帖,花了好多個(gè)晚上才寫(xiě)好這篇帖子,后續(xù)會(huì)加油的利用技術(shù)棧打造個(gè)人私有云系列文章目錄利用技術(shù)棧打造個(gè)人私有云連載之初章利用技術(shù)棧打造個(gè)人私有云連載之集群搭建利用技術(shù)棧打造個(gè)人私有云連載之環(huán)境理解和練手利用 showImg(https://segmentfault.com/img/remote/1460000013077799); 最近被業(yè)務(wù)折騰的死...
摘要:每個(gè)節(jié)點(diǎn)的網(wǎng)橋使用一個(gè)子網(wǎng),每個(gè)容器使用一個(gè)子網(wǎng)內(nèi)的,那么我們就可以組成下圖中所示網(wǎng)絡(luò)。到此,在的協(xié)調(diào)下,各個(gè)主機(jī)上的子網(wǎng)就不會(huì)再?zèng)_突了,另外,會(huì)維護(hù)容器網(wǎng)絡(luò)的路由規(guī)則,容器就可以通過(guò)訪問(wèn)容器了,也就實(shí)現(xiàn)了跨主機(jī)容器互聯(lián)。 當(dāng)您將多臺(tái)服務(wù)器節(jié)點(diǎn)組成一個(gè)Docker集群時(shí),需要對(duì)集群網(wǎng)絡(luò)進(jìn)行設(shè)置,否則默認(rèn)情況下,無(wú)法跨主機(jī)容器互聯(lián),接下來(lái)我們首先分析一下原因。 跨主機(jī)容器互聯(lián) 下圖描述了...
摘要:本文詳細(xì)講解如何搭建高可用的集群,以下簡(jiǎn)稱由三臺(tái)服務(wù)器組成集群命名為,,,用來(lái)代替集群搭建首先搭建集群為集群的核心組成部分,負(fù)責(zé)所有集群配置信息和服務(wù)信息的存儲(chǔ),所以必須要保證高可用,此處采用的靜態(tài)服務(wù)發(fā)現(xiàn),即在啟動(dòng)的時(shí)候,確定的。 本文詳細(xì)講解如何搭建高可用的Kubernetes集群,以下簡(jiǎn)稱k8s 由三臺(tái)服務(wù)器(CentOS 7.0)組成master集群,命名為m1,m2,m3,i...
閱讀 2074·2021-09-22 15:43
閱讀 8748·2021-09-22 15:07
閱讀 1088·2021-09-03 10:28
閱讀 2064·2021-08-19 10:57
閱讀 1077·2020-01-08 12:18
閱讀 2983·2019-08-29 15:09
閱讀 1535·2019-08-29 14:05
閱讀 1647·2019-08-29 13:57