第一个:

node节点注册提示:failed to get config map: Unauthorized

代码如下:

[root@node1 ~]# kubeadm join 10.5.1.10:6443 --token llilpc.9je7qvdn7l4sygoo     --discovery-token-ca-cert-hash sha256:a82baf34d02c5338c6c7c8e9234316dffecee709cea7cc76cda47c8e595f1745
W0122 19:36:32.447752   12903 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Unauthorized
To see the stack trace of this error execute with --v=5 or higher

错误原因,token令牌失效,解决方法:

在master节点执行下面的命令

sudo kubeadm token create

然后再执行下面的命令重新生成注册令牌

sudo kubeadm token create --print-join-command

然后杀死node节点所有K8S相关的进程再注册一次即可

image.png

至此,问题解决!

第二个:

重新加入master节点提示error execution phase preflight: [preflight] Some fatal errors occurred

  ``[root@node1 ~]``# kubeadm join 10.5.1.5:6443 --token 1a8fot.izehoikcbfm6vcj6   --discovery-token-ca-cert-hash sha256:41498e76da4b483ec99963948303e3df1d0a4308bb096d33f77d6f8f42e53e63``W0203 17:56:00.454059  11793 ``join``.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not ``set``.``[preflight] Running pre-flight checks``error execution phase preflight: [preflight] Some fatal errors occurred:``  ``[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: ``/etc/kubernetes/kubelet``.conf already exists``  ``[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: ``/etc/kubernetes/pki/ca``.crt already exists``[preflight] If you know what you are doing, you can ``make` `a check non-fatal with `--ignore-preflight-errors=...```To see the stack trace of this error execute with --``v``=5 or higher

解决办法:删除相关残留文件

rm` `-f ``/etc/kubernetes/kubelet``.conf  ``#删除k8s配置文件

  

rm` `-f ``/etc/kubernetes/pki/ca``.crt  ``#删除K8S证书

  然后重新加入

image.png

验证一下:
image.png

至此,问题解决!

第三个:

端口占用提示:[ERROR Port-10250]: Port 10250 is in use

解决办法:查看占用进程,然后杀掉,再加入

image.png

sudo` `yum ``install` `-y net-tools -q  ``#安装相关工具(-q:静默安装)

  然后查看端口

netstat` `-ntpl | ``grep` `10250

image.png  

可以看出,是K8S占用了,那就尝试重启服务看看能不能解决

image.png

此时,端口已经不在使用了,然后重新加入

image.png

加入成功,问题解决!

第四个:

  应用yaml文件提示:unknown field "NodePort" in io.k8s.api.core.v1.ServicePort; if you choose to ignore these errors, turn validation off with --validate=false

image.png

错误原因,yaml参数填写错误,在这里我的是NodePort这段写错了,正确的nodeport模式指定具体值时应首字母小写

image.png

一开始写成了“NodePort”,后来改成“nodePort”之后问题解决

image.png

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

第五个:

加入master提示:error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition

解决办法:

swapoff -a  ``#关闭swap交换
kubeadm reset  ``#重置K8S配置
systemctl daemon-reload&&systemctl restart docker kubelet  ``#重置配置,重启服务

  

rm` `-rf $HOME/.kube``/config``  ``#删除配置文件

 

iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X  ``#更新iptables规则

 最后重新加入即可,至此,问题解决!

第六个

  加入master提示:[ERROR FileExisting-nsenter]: nsenter not found in system path

image.png

node节点信息如下:
image.png

image.png

image.png

master节点信息为:

Centos7
image.png

image.png

解决方法如下:

rm` `-f util-linux-2.25.``tar``.gz*&&wget https:``//k8s-1252147235``.cos.ap-chengdu.myqcloud.com``/docker/util-linux-2``.25.``tar``.gz
mkdir -p /cx/&&tar -zxvf util-linux-2.25.tar.gz -C /cx/
sudo apt-get install autopoint autoconf libtool automake make
./configure --without-python --disable-all-programs --enable-nsenter --without-ncurses
make nsenter; cp nsenter /usr/local/bin

image.png

可以看到,此时已经没有之前那个错误警告了,不过出现了新的问题-------------------------------------------emmmmmmmmmmmmm

image.png

第七个

加入master节点提示:/proc/sys/net/bridge/bridge-nf-call-iptables does not exist

解决办法如下:

modprobe br_netfilter&&``echo` `1 > ``/proc/sys/net/bridge/bridge-nf-call-iptables

image.png

第八个

[ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty

[root@localhost ~]# kubeadm join 192.168.158.128:6443 --token jh32f5.rru0gyvo9h2853ix     --discovery-token-ca-cert-hash sha256:480a42ef88102d9fa1f3d8ed35b32adcf48abbf4ef009786fc6616103612d1df
W0629 19:07:13.401225   26458 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.6. Latest validated version: 19.03
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

解决:根据提示,后边进行加入忽略错误

kubeadm join 192.168.158.128:6443 --token jh32f5.rru0gyvo9h2853ix     --discovery-token-ca-cert-hash sha256:480a42ef88102d9fa1f3d8ed35b32adcf48abbf4ef009786fc6616103612d1df --ignore-preflight-errors=all

image20210630101153603.png

第九个

The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.

[root@localhost kubernetes]# vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false"
[root@localhost kubernetes]# systemctl daemon-reload  
[root@localhost kubernetes]# systemctl restart kubelet

第十个

Docker服务启动报错:Job for docker.service failed because the control process exited with error

也有可能版本内核没问题 还是这样的话,就是容器引擎失败。修改/etc/docker/daemon.json { "storage-driver": "devicemapper" } 和etc/sysconfig/docker-storage DOCKER_STORAGE_OPTIONS="--selinux-enabled --log-driver=journald --signature-verification=false"

service docker restart

第十一个

kubeadm初始化时报出警告detected “cgroupfs“ as the Docker cgroup driver

cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true"
  ]
}
EOF

第十二个

k8s端口被占用:[ERROR FileAvailable--etc-kubernetes-manifests-kub、[ERROR Port-10250]: Port 10250 is in use

kubeadm reset

第十三个

K8s配置过程中出现以下错误“[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty”

rm -rf /var/lib/etcd

k8s 集群部署问题整理

2018年09月09日 21:21:54 Mr-Liuqx 阅读数:10233

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_34857250/article/details/82562514

对kubernetes感兴趣的可以加群885763297,一起玩转kubernetes

1、hostname “master” could not be reached

在host中没有加解析

2、curl -sSL http://localhost:10248/healthz

curl: (7) Failed connect to localhost:10248; 拒绝连接 在host中没有localhost的解析

3、Error starting daemon: SELinux is not supported with the overlay2 graph driver on this kernel. Either boot into a newer kernel or…abled=false)

vim /etc/ssconfig/docker --selinux-enabled=False

4、bridge-nf-call-iptables 固化的问题:

#下面的是关于bridge的配置: net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 1 #意味着二层的网络在转发包的时候会被iptables的forward规则过滤 net.bridge.bridge-nf-call-arptables = 0

5、The connection to the server localhost:8080 was refused - did you specify the right host or port?

unable to recognize "kube-flannel.yml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused 下面如果在root用户下执行的,就不会报错 mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config`
\###6、error: unable to recognize “mycronjob.yml”: no matches for kind “CronJob” in version “batch/v2alpha1”
`去kube-apiserver.yaml文件中添加: - --runtime-config=batch/v2alpha1=true,然后重启kubelet服务,就可以了

7、Container runtime network not ready

Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Unable to update cni config: No networks found in /etc/cni/net.d Failed to get system container stats for “/system.slice/kubelet.service”: failed to get cgroup stats for “/system.slice/kubelet.service”: failed to get container info for “/system.slice/kubelet.service”: unknown container “/system.slice/kubelet.service”

docker pull quay.io/coreos/flannel:v0.10.0-amd64 



mkdir -p /etc/cni/net.d/



cat <<EOF> /etc/cni/net.d/10-flannel.conf



{"name":"cbr0","type":"flannel","delegate": {"isDefaultGateway": true}}



EOF



mkdir /usr/share/oci-umount/oci-umount.d -p



mkdir /run/flannel/



cat <<EOF> /run/flannel/subnet.env



FLANNEL_NETWORK=172.100.0.0/16



FLANNEL_SUBNET=172.100.1.0/24



FLANNEL_MTU=1450



FLANNEL_IPMASQ=true



EOF



kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml1234567891011121314

8、Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of “crypto/rsa: verification error” while trying to verify candidate authority certificate “kubernetes”)

export KUBECONFIG=/etc/kubernetes/kubelet.conf

9、Failed to get system container stats for

Failed to get system container stats for “/system.slice/docker.service”: failed to get cgroup stats for “/system.slice/docker.service”: failed to get container info for “/system.slice/docker.service”: unknown container “/system.slice/docker.service”

vim /etc/sysconfig/kubelet --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice systemctl restart kubelet

大概意思是Flag --cgroup-driver --kubelet-cgroups 驱动已经被禁用,这个参数应该通过kubelet 的配置指定配置文件来配置

10、The HTTP call equal to ‘curl -sSL http://localhost:10255/healthz’ failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.

vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false"

###11、failed to run Kubelet: failed to create kubelet: miscon figuration: kubelet cgroup driver: “systemd” is different from docker cgroup driver: “cgroupfs”
kubelet: Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd" docker:   vi /lib/systemd/system/docker.service -exec-opt native.cgroupdriver=systemd

12、[ERROR CRI]: unable to check if the container runtime at “/var/run/dockershim.sock” is running: exit status 1

rm -f /usr/bin/crictl

13、 Warning FailedScheduling 2s (x7 over 33s) default-scheduler 0/4 nodes are available: 4 node(s) didn’t match node selector.

如果指定的label在所有node上都无法匹配,则创建Pod失败,会提示无法调度:

14、kubeadm 生成的token过期后,集群增加节点

 kubeadm token create



 



openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | 



openssl dgst -sha256 -hex | sed 's/^.* //'



 



kubeadm join --token aa78f6.8b4cafc8ed26c34f --discovery-token-ca-cert-hash sha256:0fd95a9bc67a7bf0ef42da968a0d55d92e52898ec37c971bd77ee501d845b538  172.16.6.79:6443 --skip-preflight-checks123456

15、### systemctl status kubelet告警

cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
May 29 06:30:28 fnode kubelet[4136]: E0529 06:30:28.935309 4136 kubelet.go:2130] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
删除 /etc/systemd/system/kubelet.service.d/10-kubeadm.conf 的 KUBELET_NETWORK_ARGS,然后重启kubelet服务 临时解决。没啥用
根本原因是缺少: k8s.gcr.io/pause-amd64:3.1

16 删除flannel网络:

ifconfig cni0 down



ifconfig flannel.1 down



ifconfig del flannel.1



ifconfig del cni0



 



ip link del flannel.1



ip link del cni0



 



yum install bridge-utils



brctl delbr  flannel.1



brctl delbr cni0



rm -rf /var/lib/cni/flannel/* && rm -rf /var/lib/cni/networks/cbr0/* && ip link delete cni0 &&  rm -rf /var/lib/cni/network/cni0/*12345678910111213

17、E0906 15:10:55.415662 1 leaderelection.go:234]

E0906 15:10:55.415662 1 leaderelection.go:234] error retrieving resource lock default/ceph.com-rbd: endpoints “ceph.com-rbd” is forbidden: User “system:serviceaccount:default:rbd-provisioner” cannot get endpoints in the namespace “default”

`在 添加下面的这一段 (会重新申请资源) kubectl apply -f ceph/rbd/deploy/rbac/clusterrole.yaml

  • apiGroups: [""]
    resources: [“endpoints”]
    verbs: [“get”, “list”, “watch”, “create”, “update”, “patch”]`

18、flannel指定网卡设备:

- --iface=eth0

21、 Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container

Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container “957541888b8a0e5b9ad65da932f688eb02cc182808e10d1a89a6e8db2132c253” network for pod “coredns-7655b945bc-6hgj9”: NetworkPlugin cni failed to set up pod “coredns-7655b945bc-6hgj9_kube-system” network: failed to find plugin “loopback” in path [/opt/cni/bin], failed to clean up sandbox container “957541888b8a0e5b9ad65da932f688eb02cc182808e10d1a89a6e8db2132c253” network for pod “coredns-7655b945bc-6hgj9”: NetworkPlugin cni failed to teardown pod “coredns-7655b945bc-6hgj9_kube-system” network: failed to find plugin “portmap” in path [/opt/cni/bin]]

https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/#coredns-pods-have-crashloopbackoff-or-error-state
如果您的网络提供商不支持portmap CNI插件,您可能需要使用服务的NodePort功能或使用HostNetwork=true。

22、问题:

kubelet设置了system-reserved(800m)、kube-reserved(500m)、eviction-hard(800),其实集群实际可用的内存是总内存-800m-800m-500m ,但是发现还 是会触发系统级别kill进程,

排查:使用top查看前几名的内存使用情况,发现etcd服务使用了内存达到500M以上,kubelet使用内存200m,ceph使用内存总和是200多m,加起来就已经900m了,这些都是k8s之外的系统开销,已经完全超出了系统预留内存,因此可能会触发系统级别的kill,

23、如何访问api-server?

使用kubectl proxy功能

24、使用svc的endpoint代理集群外部服务,经常出现endpoint丢失的问题

解决:去掉service.spec.selecter 标签就好了。

25、集群雪崩的一次问题处理,node节点偶尔出现noreading状态,

排查:此node节点上cpu使用率过高。

1、没有触发node节点上的cpuPressure的状态,判断出来不是k8s所管理的cpu占用过高的问题,应该是system、kube组件预留的cpu高导致的。



2、查看cpu和mem的cgroup分组,发现kubelet,都在system.sliec下面,因此判断kube预留资源没有生效导致的。



3、



--enforce-node-allocatable=pods,kube-reserved,system-reserved  #采用硬限制,超出限制就oom



--system-reserved-cgroup=/system.slice  #指定系统reserved-cgroup对那些cgroup限制。



--kube-reserved-cgroup=/system.slice/kubelet.service #指定kube-reserved-cgroup对那些服务的cgroup进行限制



--system-reserved=memory=1Gi,cpu=500m  



--kube-reserved=memory=500Mi,cpu=500m,ephemeral-storage=10Gi 12345678

26、[etcd] Checking Etcd cluster health

etcd cluster is not healthy: context deadline exceeded

Q.E.D.


只有创造,才是真正的享受,只有拚搏,才是充实的生活。