Skip to content

Commit 10e42d4

Browse files
authored
Added known issues for rke2/k3s (#683)
1 parent 8045e55 commit 10e42d4

File tree

1 file changed

+72
-0
lines changed

1 file changed

+72
-0
lines changed

asciidoc/edge-book/releasenotes.adoc

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,42 @@ Summary: SUSE Edge 3.2.1 is the first z-stream release in the SUSE Edge 3.2 rele
6060
* When deploying via the directed network provisioning flow, a bug affects clusters with static IPs in networks with DHCP servers and/or RAs: static network configurations only apply to the provisioned host and will not be in effect during the host discovery and enrollment. Please refer to the https://github.com/suse-edge/atip/tree/main/telco-examples/edge-clusters/dhcp-less/dual-stack/single-node#readme[SUSE Edge for Telco examples repository] for more details and updates.
6161
* When using `toolbox` in SUSE Linux Micro 6.0, the default container image does not contain some tools which were included in the previous 5.5 version. The workaround is to configure toolbox to use the previous `suse/sle-micro/5.5/toolbox` container image, see `toolbox --help` for options to configure the image.
6262
* When updating to RKE2 1.31.7, which resolves https://nvd.nist.gov/vuln/detail/CVE-2025-1974[CVE-2025-1974], SUSE Linux Micro 6.0 *must* be updated to include kernel `>=6.4.0-26-default` or `>=6.4.0-30-rt` (real-time kernel) due to required SELinux kernel patches. If not applied, the ingress-nginx pod will remain in a `CrashLoopBackOff` state. To apply the kernel update run `transactional-update` on the host itself (to update all packages), or `transactional-update pkg update kernel-default` (or kernel-rt) to update just the kernel, then reboot the host. If deploying new clusters, please follow <<guides-kiwi-builder-images>> to build fresh images containing the latest kernel.
63+
* A bug with Kubernetes Job Controller has been identified that on certain conditions it can cause the RKE2/K3s nodes to stay in `NotReady` state (see the https://github.com/rancher/rke2/issues/8357[#8357 RKE2 issue]). The errors can look like:
64+
65+
[,bash]
66+
----
67+
E0605 23:11:18.489721 1 job_controller.go:631] "Unhandled Error" err="syncing job: tracking status: adding uncounted pods to status: Operation cannot be fulfilled on jobs.batch \"helm-install-rke2-ingress-nginx\": StorageError: invalid object, Code: 4, Key: /registry/jobs/kube-system/helm-install-rke2-ingress-nginx, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 0aa6a781-7757-4c61-881a-cb1a4e47802c, UID in object meta: 6a320146-16b8-4f83-88c5-fc8b5a59a581" logger="UnhandledError"
68+
----
69+
70+
As a workaround, the `kube-controller-manager` pod can be restarted with `crictl` as:
71+
72+
[,bash]
73+
----
74+
export CONTAINER_RUNTIME_ENDPOINT=unix:///run/k3s/containerd/containerd.sock
75+
export KUBEMANAGER_POD=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=kube-controller-manager --quiet)
76+
/var/lib/rancher/rke2/bin/crictl stop ${KUBEMANAGER_POD} && \
77+
/var/lib/rancher/rke2/bin/crictl rm ${KUBEMANAGER_POD}
78+
----
79+
80+
* On RKE2/K3s 1.31 and 1.32 versions, the directory `/etc/cni` being used to store CNI configurations may not trigger a notification of the files being written there to `containerd` due to certain conditions related to `overlayfs` (see the https://github.com/rancher/rke2/issues/8356[#8356 RKE2 issue]). This in turn results in the deployment of RKE2/K3s to get stuck waiting for the CNI to start, and the RKE2/K3s nodes to stay in `NotReady` state. This can be seen at node level with `kubectl describe node <affected_node>`:
81+
82+
[,bash]
83+
----
84+
​​Conditions:
85+
Type Status LastHeartbeatTime LastTransitionTime Reason Message
86+
---- ------ ----------------- ------------------ ------ -------
87+
Ready False Thu, 05 Jun 2025 17:41:28 +0000 Thu, 05 Jun 2025 14:38:16 +0000 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
88+
----
89+
90+
As a workaround, a tmpfs volume can be mounted at the `/etc/cni` directory before RKE2 starts. It avoids the usage of overlayfs which results in containerd missing notifications and the configs should get rewritten every time the node is restarted and the pods initcontainers run again. If using EIB, this can be a `04-tmpfs-cni.sh` script in the `custom/scripts` directory (as explained here[https://github.com/suse-edge/edge-image-builder/blob/release-1.2/docs/building-images.md#custom]) that looks like:
91+
92+
[,bash]
93+
----
94+
#!/bin/bash
95+
mkdir -p /etc/cni
96+
mount -t tmpfs -o mode=0700,size=5M tmpfs /etc/cni
97+
echo "tmpfs /etc/cni tmpfs defaults,size=5M,mode=0700 0 0" >> /etc/fstab
98+
----
6399

64100
== Component Versions
65101

@@ -190,6 +226,42 @@ Summary: SUSE Edge 3.2.0 is the first release in the SUSE Edge 3.2 release strea
190226

191227
* When deploying via the directed network provisioning flow, a bug affects clusters with static IPs in networks with DHCP servers and/or RAs: static network configurations only apply to the provisioned host and will not be in effect during the host discovery and enrollment. Please refer to the https://github.com/suse-edge/atip/tree/main/telco-examples/edge-clusters/dhcp-less/dual-stack/single-node#readme[SUSE Edge for Telco examples repository] for more details and updates.
192228
* When using `toolbox` in SUSE Linux Micro 6.0, the default container image does not contain some tools which were included in the previous 5.5 version. The workaround is to configure toolbox to use the previous `suse/sle-micro/5.5/toolbox` container image, see `toolbox --help` for options to configure the image.
229+
* A bug with Kubernetes Job Controller has been identified that on certain conditions it can cause the RKE2/K3s nodes to stay in `NotReady` state (see the https://github.com/rancher/rke2/issues/8357[#8357 RKE2 issue]). The errors can look like:
230+
231+
[,bash]
232+
----
233+
E0605 23:11:18.489721 1 job_controller.go:631] "Unhandled Error" err="syncing job: tracking status: adding uncounted pods to status: Operation cannot be fulfilled on jobs.batch \"helm-install-rke2-ingress-nginx\": StorageError: invalid object, Code: 4, Key: /registry/jobs/kube-system/helm-install-rke2-ingress-nginx, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 0aa6a781-7757-4c61-881a-cb1a4e47802c, UID in object meta: 6a320146-16b8-4f83-88c5-fc8b5a59a581" logger="UnhandledError"
234+
----
235+
236+
As a workaround, the `kube-controller-manager` pod can be restarted with `crictl` as:
237+
238+
[,bash]
239+
----
240+
export CONTAINER_RUNTIME_ENDPOINT=unix:///run/k3s/containerd/containerd.sock
241+
export KUBEMANAGER_POD=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=kube-controller-manager --quiet)
242+
/var/lib/rancher/rke2/bin/crictl stop ${KUBEMANAGER_POD} && \
243+
/var/lib/rancher/rke2/bin/crictl rm ${KUBEMANAGER_POD}
244+
----
245+
246+
* On RKE2/K3s 1.31 and 1.32 versions, the directory `/etc/cni` being used to store CNI configurations may not trigger a notification of the files being written there to `containerd` due to certain conditions related to `overlayfs` (see the https://github.com/rancher/rke2/issues/8356[#8356 RKE2 issue]). This in turn results in the deployment of RKE2/K3s to get stuck waiting for the CNI to start, and the RKE2/K3s nodes to stay in `NotReady` state. This can be seen at node level with `kubectl describe node <affected_node>`:
247+
248+
[,bash]
249+
----
250+
​​Conditions:
251+
Type Status LastHeartbeatTime LastTransitionTime Reason Message
252+
---- ------ ----------------- ------------------ ------ -------
253+
Ready False Thu, 05 Jun 2025 17:41:28 +0000 Thu, 05 Jun 2025 14:38:16 +0000 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
254+
----
255+
256+
As a workaround, a tmpfs volume can be mounted at the `/etc/cni` directory before RKE2 starts. It avoids the usage of overlayfs which results in containerd missing notifications and the configs should get rewritten every time the node is restarted and the pods initcontainers run again. If using EIB, this can be a `04-tmpfs-cni.sh` script in the `custom/scripts` directory (as explained here[https://github.com/suse-edge/edge-image-builder/blob/release-1.2/docs/building-images.md#custom]) that looks like:
257+
258+
[,bash]
259+
----
260+
#!/bin/bash
261+
mkdir -p /etc/cni
262+
mount -t tmpfs -o mode=0700,size=5M tmpfs /etc/cni
263+
echo "tmpfs /etc/cni tmpfs defaults,size=5M,mode=0700 0 0" >> /etc/fstab
264+
----
193265

194266
== Component Versions
195267

0 commit comments

Comments
 (0)