Update cluster DRAFT - NOT TESTED
The process of installing an HA Kubernetes cluster on-premises or in the Cloud is well documented and, in most cases, we don’t have to perform many steps. There are additional tools like Kops or Kubespray that help to automate some of this process.
Every so often, though, we are required to upgrade the cluster to keep up with the latest security features and bug fixes, as well as benefit from new features being released on an on-going basis. This is especially important when we have installed a really outdated version (for example v1.9) or if we want to automate the process and always be on top of the latest supported version.
In general, when operating an HA Kubernetes Cluster, the upgrade process involves two separate tasks which may not overlap or be performed simultaneously: upgrading the Kubernetes Cluster; and, if needed, upgrading the etcd cluster which is the distributed key-value backing store of Kubernetes. Let’s see how we can perform those tasks with minimal disruptions.
Upgrading control plane nodes
Upgrade the first control plane node
yum list --showduplicates kubeadm --disableexcludes=kubernetes
# find the latest 1.17 version in the list
# it should look like 1.17.x-0, where x is the latest patch
Verify that the download works and has the expected version
kubeadm version
Verify the upgrade plan
kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.16.0
[upgrade/versions] kubeadm version: v1.17.0
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT CURRENT AVAILABLE
Kubelet 1 x v1.16.0 v1.17.0
Upgrade to the latest version in the v1.16 series:
COMPONENT CURRENT AVAILABLE
API Server v1.16.0 v1.17.0
Controller Manager v1.16.0 v1.17.0
Scheduler v1.16.0 v1.17.0
Kube Proxy v1.16.0 v1.17.0
CoreDNS 1.6.2 1.6.5
Etcd 3.3.15 3.4.3-0
You can now apply the upgrade by executing the following command:
kubeadm upgrade apply v1.17.0
_____________________________________________________________________
Apply the upgrade plan
# replace x with the patch version you picked for this upgrade
sudo kubeadm upgrade apply v1.17.x
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade/version] You have chosen to change the cluster version to "v1.17.0"
[upgrade/versions] Cluster version: v1.16.0
[upgrade/versions] kubeadm version: v1.17.0
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
[upgrade/prepull] Prepulling image for component etcd.
[upgrade/prepull] Prepulling image for component kube-apiserver.
[upgrade/prepull] Prepulling image for component kube-controller-manager.
[upgrade/prepull] Prepulling image for component kube-scheduler.
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-etcd
[upgrade/prepull] Prepulled image for component etcd.
[upgrade/prepull] Prepulled image for component kube-controller-manager.
[upgrade/prepull] Prepulled image for component kube-apiserver.
[upgrade/prepull] Prepulled image for component kube-scheduler.
[upgrade/prepull] Successfully prepulled the images for all the control plane components
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.17.0"...
Static pod: kube-apiserver-luboitvbox hash: 8d931c2296a38951e95684cbcbe3b923
Static pod: kube-controller-manager-luboitvbox hash: 2480bf6982ad2103c05f6764e20f2787
Static pod: kube-scheduler-luboitvbox hash: 9b290132363a92652555896288ca3f88
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests446257614"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing "apiserver-etcd-client" certificate
[upgrade/staticpods] Renewing "apiserver" certificate
[upgrade/staticpods] Renewing "apiserver-kubelet-client" certificate
[upgrade/staticpods] Renewing "front-proxy-client" certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-06-05-23-38-03/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-apiserver-luboitvbox hash: 8d931c2296a38951e95684cbcbe3b923
Static pod: kube-apiserver-luboitvbox hash: 1b4e2b09a408c844f9d7b535e593ead9
[apiclient] Found 1 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing certificate embedded in "controller-manager.conf"
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-06-05-23-38-03/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-controller-manager-luboitvbox hash: 2480bf6982ad2103c05f6764e20f2787
Static pod: kube-controller-manager-luboitvbox hash: 6617d53423348aa619f1d6e568bb894a
[apiclient] Found 1 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing certificate embedded in "scheduler.conf"
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2019-06-05-23-38-03/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-scheduler-luboitvbox hash: 9b290132363a92652555896288ca3f88
Static pod: kube-scheduler-luboitvbox hash: edf58ab819741a5d1eb9c33de756e3ca
[apiclient] Found 1 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upgrade/staticpods] Renewing certificate embedded in "admin.conf"
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.17" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.17.0". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
Manually upgrade your CNI provider plugin
!!!
Update Kubelet and restart the service
# replace x in 1.17.x-0 with the latest patch version
yum install -y kubelet-1.17.x-0 kubectl-1.17.x-0 --disableexcludes=kubernetes
Restart the kubelet
systemctl restart kubelet
Upgrade kubeadm on first worker node
The upgrade procedure on worker nodes should be executed one node at a time or few nodes at a time, without compromising the minimum required capacity for running your workloads.
# replace x in 1.17.x-0 with the latest patch version
yum install -y kubeadm-1.17.x-0 --disableexcludes=kubernetes
Login to a master node and drain first worker node
# replace <node-to-drain> with the name of your node you are draining
kubectl drain <node-to-drain> --ignore-daemonsets
node/ip-172-31-85-18 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-dj7d7, kube-system/weave-net-z65qx
node/ip-172-31-85-18 drained
Upgrade kubelet config on worker node
kubeadm upgrade node
Upgrade kubelet on worker node and restart the service
# replace x in 1.17.x-0 with the latest patch version
yum install -y kubelet-1.17.x-0 kubectl-1.17.x-0 --disableexcludes=kubernetes
Restart the kubelet on worker
systemctl restart kubelet
Uncordon the node
# replace <node-to-drain> with the name of your node
kubectl uncordon <node-to-drain>
Repeat steps for the rest of the worker nodes.
Verify the status of the cluster
kubectl get nodes
The STATUS column should show Ready for all your nodes, and the version number should be updated.
https://v1-17.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
https://kubernetes.io/docs/setup/release/version-skew-policy/#supported-versions
Recovering from a failure state
If kubeadm upgrade
fails and does not roll back, for example because of an unexpected shutdown during execution, you can run kubeadm upgrade
again. This command is idempotent and eventually makes sure that the actual state is the desired state you declare.
To recover from a bad state, you can also run kubeadm upgrade apply --force
without changing the version that your cluster is running.
During upgrade kubeadm writes the following backup folders under /etc/kubernetes/tmp
:
kubeadm-backup-etcd-<date>-<time>
kubeadm-backup-manifests-<date>-<time>
kubeadm-backup-etcd
contains a backup of the local etcd member data for this control-plane Node. In case of an etcd upgrade failure and if the automatic rollback does not work, the contents of this folder can be manually restored in /var/lib/etcd
. In case external etcd is used this backup folder will be empty.
kubeadm-backup-manifests
contains a backup of the static Pod manifest files for this control-plane Node. In case of a upgrade failure and if the automatic rollback does not work, the contents of this folder can be manually restored in /etc/kubernetes/manifests
. If for some reason there is no difference between a pre-upgrade and post-upgrade manifest file for a certain component, a backup file for it will not be written
Source: https://platform9.com/blog/kubernetes-upgrade-the-definitive-guide-to-do-it-yourself/