Flannel故障案例

1.案例一 : 未安装Flannel插件

[root@master231 pods]# kubectl get pods -o wide
NAME   READY   STATUS              RESTARTS   AGE    IP       NODE        NOMINATED NODE   READINESS GATES
xixi   0/1     ContainerCreating   0          169m   <none>   worker233   <none>           <none>
[root@master231 pods]# 
[root@master231 pods]# kubectl describe pod xixi 
Name:         xixi
Namespace:    default
Priority:     0
Node:         worker233/10.0.0.233
​
...
​
Events:
  Type     Reason                  Age                      From     Message
​
----     ------                  ----                     ----     -------
​
  Warning  FailedCreatePodSandBox  9m43s (x3722 over 169m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ae48b3c943557dafdc5f8a3b06897da299233021ed2fd907818cc5acf86c16eb" network for pod "xixi": networkPlugin cni failed to set up pod "xixi_default" network: loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory
  Normal   SandboxChanged          4m43s (x3850 over 169m)  kubelet  Pod sandbox changed, it will be killed and re-created.
[root@master231 pods]#  

问题原因: - 发现缺少'/run/flannel/subnet.env'这个文件,说明没有部署Flannel插件,安装插件即可。

2.案例二 : 缺少Flannel程序

[root@master231 flannel]# kubectl get pods -o wide
NAME               READY   STATUS              RESTARTS   AGE   IP       NODE        NOMINATED NODE   READINESS GATES
ds-xiuxian-dcjsg   0/1     ContainerCreating   0          10s   <none>   worker232   <none>           <none>
ds-xiuxian-vjbnw   0/1     ContainerCreating   0          10s   <none>   worker233   <none>           <none>
[root@master231 flannel]# 
[root@master231 flannel]# kubectl describe pod ds-xiuxian-dcjsg 
Name:           ds-xiuxian-dcjsg
Namespace:      default
Priority:       0
Node:           worker232/10.0.0.232
...
Events:
  Type     Reason                  Age               From               Message
​
----     ------                  ----              ----               -------
​
  Normal   Scheduled               15s               default-scheduler  Successfully assigned default/ds-xiuxian-dcjsg to worker232
  Warning  FailedCreatePodSandBox  15s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "1fcadf406345fda4f800feb9ec42ddce495da0ecef8ba4f5cb5ebbdad795e4fe" network for pod "ds-xiuxian-dcjsg": networkPlugin cni failed to set up pod "ds-xiuxian-dcjsg_default" network: failed to find plugin "flannel" in path [/opt/cni/bin], failed to clean up sandbox container "1fcadf406345fda4f800feb9ec42ddce495da0ecef8ba4f5cb5ebbdad795e4fe" network for pod "ds-xiuxian-dcjsg": networkPlugin cni failed to teardown pod "ds-xiuxian-dcjsg_default" network: failed to find plugin "flannel" in path [/opt/cni/bin]]
  Normal   SandboxChanged          4s (x2 over 14s)  kubelet            Pod sandbox changed, it will be killed and re-created.
[root@master231 flannel]# 

问题原因: 在"/opt/cni/bin"路径下找不到一个名为"flannel"的二进制文件。

官方关于Flannel的初始化容器中存在Flannel设备,因此删除Pod后就能够自动生成该程序文件。
[root@master231 flannel]# kubectl get pods -o wide -n kube-flannel 
NAME                    READY   STATUS    RESTARTS   AGE   IP           NODE        NOMINATED NODE   READINESS GATES
kube-flannel-ds-b5wb4   1/1     Running   0          24m   10.0.0.231   master231   <none>           <none>
kube-flannel-ds-jrj8q   1/1     Running   0          24m   10.0.0.233   worker233   <none>           <none>
kube-flannel-ds-vsg8t   1/1     Running   0          24m   10.0.0.232   worker232   <none>           <none>
[root@master231 flannel]# 
[root@master231 flannel]# kubectl -n kube-flannel delete pod -l k8s-app=flannel
pod "kube-flannel-ds-b5wb4" deleted
pod "kube-flannel-ds-jrj8q" deleted
pod "kube-flannel-ds-vsg8t" deleted
[root@master231 flannel]# 
[root@master231 flannel]# kubectl get pods -o wide -n kube-flannel 
NAME                    READY   STATUS    RESTARTS   AGE   IP           NODE        NOMINATED NODE   READINESS GATES
kube-flannel-ds-4p825   1/1     Running   0          4s    10.0.0.233   worker233   <none>           <none>
kube-flannel-ds-9jwwx   1/1     Running   0          4s    10.0.0.231   master231   <none>           <none>
kube-flannel-ds-hx7pd   1/1     Running   0          4s    10.0.0.232   worker232   <none>           <none>
[root@master231 flannel]# 

3.案例三:缺失程序包文件

3.1 故障复现

[root@worker232 ~]# ll /opt/cni/bin/
total 68944
drwxrwxr-x 2 root root     4096 Sep 26 11:17 ./
drwxr-xr-x 3 root root     4096 Sep 19 10:12 ../
-rwxr-xr-x 1 root root  3859475 Jan 17  2023 bandwidth*
-rwxr-xr-x 1 root root  4299004 Jan 17  2023 bridge*
-rwxr-xr-x 1 root root 10167415 Jan 17  2023 dhcp*
-rwxr-xr-x 1 root root  3986082 Jan 17  2023 dummy*
-rwxr-xr-x 1 root root  4385098 Jan 17  2023 firewall*
-rwxr-xr-x 1 root root  3870731 Jan 17  2023 host-device*
-rwxr-xr-x 1 root root  3287319 Jan 17  2023 host-local*
-rwxr-xr-x 1 root root  3999593 Jan 17  2023 ipvlan*
-rwxr-xr-x 1 root root  3353028 Jan 17  2023 loopback*
-rwxr-xr-x 1 root root  4029261 Jan 17  2023 macvlan*
-rwxr-xr-x 1 root root  3746163 Jan 17  2023 portmap*
-rwxr-xr-x 1 root root  4161070 Jan 17  2023 ptp*
-rwxr-xr-x 1 root root  3550152 Jan 17  2023 sbr*
-rwxr-xr-x 1 root root  2845685 Jan 17  2023 static*
-rwxr-xr-x 1 root root  3437180 Jan 17  2023 tuning*
-rwxr-xr-x 1 root root  3993252 Jan 17  2023 vlan*
-rwxr-xr-x 1 root root  3586502 Jan 17  2023 vrf*
[root@worker232 ~]# 
[root@worker232 ~]# mount -t tmpfs -o size=90M tmpfs /opt/
[root@worker232 ~]# 
[root@worker232 ~]# df -h | grep opt
tmpfs                               90M     0   90M   0% /opt
[root@worker232 ~]# 
[root@worker232 ~]# ll /opt/
total 4
drwxrwxrwt  2 root root   40 Sep 26 11:23 ./
drwxr-xr-x 21 root root 4096 Sep 19 10:03 ../
[root@worker232 ~]# 

3.2 创建测试的pod

[root@master231 flannel]# kubectl get pods -o wide
NAME               READY   STATUS              RESTARTS   AGE     IP             NODE        NOMINATED NODE   READINESS GATES
ds-xiuxian-hhvn2   0/1     ContainerCreating   0          5s      <none>         worker233   <none>           <none>
ds-xiuxian-thchl   0/1     ContainerCreating   0          5s      <none>         worker232   <none>           <none>
[root@master231 flannel]# 
[root@master231 flannel]# kubectl describe pod ds-xiuxian-hhvn2 
Name:           ds-xiuxian-hhvn2
Namespace:      default
Priority:       0
Node:           worker233/10.0.0.233
...
Events:
  Type     Reason                  Age   From               Message
​
----     ------                  ----  ----               -------
​
  Normal   Scheduled               11s   default-scheduler  Successfully assigned default/ds-xiuxian-hhvn2 to worker233
  Warning  FailedCreatePodSandBox  11s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "52b723c545c56d611e265b25bbd930fd66c9f1b651c65550fd515ea9702cbec6" network for pod "ds-xiuxian-hhvn2": networkPlugin cni failed to set up pod "ds-xiuxian-hhvn2_default" network: failed to find plugin "loopback" in path [/opt/cni/bin], failed to clean up sandbox container "52b723c545c56d611e265b25bbd930fd66c9f1b651c65550fd515ea9702cbec6" network for pod "ds-xiuxian-hhvn2": networkPlugin cni failed to teardown pod "ds-xiuxian-hhvn2_default" network: failed to find plugin "portmap" in path [/opt/cni/bin]]
  Normal   SandboxChanged          11s   kubelet            Pod sandbox changed, it will be killed and re-created.
[root@master231 flannel]#  

问题原因: 在'/opt/cni/bin'目录下缺失'portmap'程序,该程序主要做端口映射相关的。

我们需要检查一下该目录是否有程序文件,也有可能是运维同事误操作挂载导致的问题。检查是否有挂载点冲突问题。

4.无删除CNI程序包案例

[root@worker232 ~]# wget https://github.com/containernetworking/plugins/releases/download/v1.8.0/cni-plugins-linux-amd64-v1.8.0.tgz
​
SVIP:
[root@worker232 ~]# wget http://192.168.16.253/Resources/Kubernetes/K8S%20Cluster/CNI/flannel/softwares/cni-plugins-linux-amd64-v1.8.0.tgz
[root@worker232 ~]# tar xf cni-plugins-linux-amd64-v1.8.0.tgz  -C /opt/cni/bin/
[root@worker232 ~]# 
[root@worker232 ~]# ll /opt/cni/bin/
total 98940
drwxr-xr-x 2 root root     4096 Sep  1 23:29 ./
drwxr-xr-x 3 root root     4096 Sep 19 10:12 ../
-rwxr-xr-x 1 root root  5042186 Sep  1 23:29 bandwidth*
-rwxr-xr-x 1 root root  5694189 Sep  1 23:29 bridge*
-rwxr-xr-x 1 root root 13719696 Sep  1 23:29 dhcp*
-rwxr-xr-x 1 root root  5251247 Sep  1 23:29 dummy*
-rwxr-xr-x 1 root root  5701763 Sep  1 23:29 firewall*
-rwxr-xr-x 1 root root  2907995 Sep 26 11:33 flannel*
-rwxr-xr-x 1 root root  5159307 Sep  1 23:29 host-device*
-rwxr-xr-x 1 root root  4350430 Sep  1 23:29 host-local*
-rwxr-xr-x 1 root root  5273398 Sep  1 23:29 ipvlan*
-rw-r--r-- 1 root root    11357 Sep  1 23:29 LICENSE
-rwxr-xr-x 1 root root  4301450 Sep  1 23:29 loopback*
-rwxr-xr-x 1 root root  5306499 Sep  1 23:29 macvlan*
-rwxr-xr-x 1 root root  5107586 Sep  1 23:29 portmap*
-rwxr-xr-x 1 root root  5474778 Sep  1 23:29 ptp*
-rw-r--r-- 1 root root     2343 Sep  1 23:29 README.md
-rwxr-xr-x 1 root root  4521078 Sep  1 23:29 sbr*
-rwxr-xr-x 1 root root  3772408 Sep  1 23:29 static*
-rwxr-xr-x 1 root root  5330851 Sep  1 23:29 tap*
-rwxr-xr-x 1 root root  4384728 Sep  1 23:29 tuning*
-rwxr-xr-x 1 root root  5266939 Sep  1 23:29 vlan*
-rwxr-xr-x 1 root root  4684912 Sep  1 23:29 vrf*
[root@worker232 ~]# 

参考链接: https://github.com/containernetworking/plugins

5.Flannel路由缺失案例

[root@master231 deployments]# route -n  # 删除之前检查本地路由没有10.100.0.0/16的路由
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.254      0.0.0.0         UG    0      0        0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
[root@master231 deployments]# 
[root@master231 deployments]# kubectl get pods -n kube-flannel -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP           NODE        NOMINATED NODE   READINESS GATES
kube-flannel-ds-c9mp9   1/1     Running   0          39m   10.0.0.232   worker232   <none>           <none>
kube-flannel-ds-pphgj   1/1     Running   0          39m   10.0.0.233   worker233   <none>           <none>
kube-flannel-ds-vqr86   1/1     Running   0          39m   10.0.0.231   master231   <none>           <none>
[root@master231 deployments]# 
[root@master231 deployments]# 
[root@master231 deployments]# kubectl -n kube-flannel delete pods --all
pod "kube-flannel-ds-c9mp9" deleted
pod "kube-flannel-ds-pphgj" deleted
pod "kube-flannel-ds-vqr86" deleted
[root@master231 deployments]# 
[root@master231 deployments]# kubectl get pods -n kube-flannel -o wide
NAME                    READY   STATUS            RESTARTS   AGE   IP           NODE        NOMINATED NODE   READINESS GATES
kube-flannel-ds-b7vpr   0/1     PodInitializing   0          2s    10.0.0.231   master231   <none>           <none>
kube-flannel-ds-lhb85   0/1     PodInitializing   0          2s    10.0.0.233   worker233   <none>           <none>
kube-flannel-ds-lns8p   0/1     PodInitializing   0          3s    10.0.0.232   worker232   <none>           <none>
[root@master231 deployments]# 
[root@master231 deployments]# kubectl get pods -n kube-flannel -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP           NODE        NOMINATED NODE   READINESS GATES
kube-flannel-ds-b7vpr   1/1     Running   0          4s    10.0.0.231   master231   <none>           <none>
kube-flannel-ds-lhb85   1/1     Running   0          4s    10.0.0.233   worker233   <none>           <none>
kube-flannel-ds-lns8p   1/1     Running   0          5s    10.0.0.232   worker232   <none>           <none>
[root@master231 deployments]# 
[root@master231 deployments]# 
[root@master231 deployments]# route -n   # 删除之后检查本地路由存在10.100.0.0/16的路由,因此可以正常访问Pod。
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.254      0.0.0.0         UG    0      0        0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.100.1.0      10.0.0.232      255.255.255.0   UG    0      0        0 eth0
10.100.2.0      10.0.0.233      255.255.255.0   UG    0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
[root@master231 deployments]# 
[root@master231 deployments]#