Skip to content

Commit bc8e6eb

Browse files
authored
Added known issues for rke2/k3s (suse-edge#681)
1 parent 1ea88a4 commit bc8e6eb

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed

asciidoc/edge-book/releasenotes.adoc

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,42 @@ If deploying new clusters, please follow <<guides-kiwi-builder-images>> to build
7070
* When using RKE2 1.32.3, which resolves https://nvd.nist.gov/vuln/detail/CVE-2025-1974[CVE-2025-1974], SUSE Linux Micro 6.1 *must* be updated to include kernel `>=6.4.0-26-default` or `>=6.4.0-30-rt` (real-time kernel) due to required SELinux kernel patches. If not applied, the ingress-nginx pod will remain in a `CrashLoopBackOff` state. To apply the kernel update run `transactional-update` on the host itself (to update all packages), or `transactional-update pkg update kernel-default` (or kernel-rt) to update just the kernel, then reboot the host. If deploying new clusters, please follow <<guides-kiwi-builder-images>> to build fresh images containing the latest kernel.
7171
* When configuring networking via nm-configurator, certain configurations which identify interfaces by MAC currently do not work, this will be resolved in a future update https://github.com/suse-edge/nm-configurator/issues/163[Upstream NM Configurator Issue]
7272
* For long running Metal^3^ management clusters, it is possible for certificate expiry to cause the baremetal-operator connection to ironic to fail, requiring a workaround of a manual pod restart https://github.com/suse-edge/charts/issues/178[SUSE Edge charts issue]
73+
* A bug with Kubernetes Job Controller has been identified that on certain conditions it can cause the RKE2/K3s nodes to stay in `NotReady` state (see the https://github.com/rancher/rke2/issues/8357[#8357 RKE2 issue]). The errors can look like:
74+
75+
[,bash]
76+
----
77+
E0605 23:11:18.489721 1 job_controller.go:631] "Unhandled Error" err="syncing job: tracking status: adding uncounted pods to status: Operation cannot be fulfilled on jobs.batch \"helm-install-rke2-ingress-nginx\": StorageError: invalid object, Code: 4, Key: /registry/jobs/kube-system/helm-install-rke2-ingress-nginx, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 0aa6a781-7757-4c61-881a-cb1a4e47802c, UID in object meta: 6a320146-16b8-4f83-88c5-fc8b5a59a581" logger="UnhandledError"
78+
----
79+
80+
As a workaround, the `kube-controller-manager` pod can be restarted with `crictl` as:
81+
82+
[,bash]
83+
----
84+
export CONTAINER_RUNTIME_ENDPOINT=unix:///run/k3s/containerd/containerd.sock
85+
export KUBEMANAGER_POD=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=kube-controller-manager --quiet)
86+
/var/lib/rancher/rke2/bin/crictl stop ${KUBEMANAGER_POD} && \
87+
/var/lib/rancher/rke2/bin/crictl rm ${KUBEMANAGER_POD}
88+
----
89+
90+
* On RKE2/K3s 1.31 and 1.32 versions, the directory `/etc/cni` being used to store CNI configurations may not trigger a notification of the files being written there to `containerd` due to certain conditions related to `overlayfs` (see the https://github.com/rancher/rke2/issues/8356[#8356 RKE2 issue]). This in turn results in the deployment of RKE2/K3s to get stuck waiting for the CNI to start, and the RKE2/K3s nodes to stay in `NotReady` state. This can be seen at node level with `kubectl describe node <affected_node>`:
91+
92+
[,bash]
93+
----
94+
​​Conditions:
95+
Type Status LastHeartbeatTime LastTransitionTime Reason Message
96+
---- ------ ----------------- ------------------ ------ -------
97+
Ready False Thu, 05 Jun 2025 17:41:28 +0000 Thu, 05 Jun 2025 14:38:16 +0000 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
98+
----
99+
100+
As a workaround, a tmpfs volume can be mounted at the `/etc/cni` directory before RKE2 starts. It avoids the usage of overlayfs which results in containerd missing notifications and the configs should get rewritten every time the node is restarted and the pods initcontainers run again. If using EIB, this can be a `04-tmpfs-cni.sh` script in the `custom/scripts` directory (as explained here[https://github.com/suse-edge/edge-image-builder/blob/release-1.2/docs/building-images.md#custom]) that looks like:
101+
102+
[,bash]
103+
----
104+
#!/bin/bash
105+
mkdir -p /etc/cni
106+
mount -t tmpfs -o mode=0700,size=5M tmpfs /etc/cni
107+
echo "tmpfs /etc/cni tmpfs defaults,size=5M,mode=0700 0 0" >> /etc/fstab
108+
----
73109

74110
== Component Versions
75111

0 commit comments

Comments
 (0)