Skip to content

Commit 31d9588

Browse files
committed
latest
1 parent 934fbc1 commit 31d9588

File tree

2 files changed

+164
-0
lines changed

2 files changed

+164
-0
lines changed
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
---
2+
title: "0/1 nodes available: insufficient cpu, insufficient memory"
3+
summary: "No nodes available errors"
4+
draft: false
5+
---
6+
7+
## Overview {#overview}
8+
9+
Example errors:
10+
11+
```
12+
0/1 nodes available: insufficient memory
13+
0/1 nodes available: insufficient cpu
14+
```
15+
16+
More generally:
17+
18+
```
19+
0/[n] nodes available: insufficient [resource]
20+
```
21+
22+
This issue happens when Kubernetes does not have enough resources to fulfil your workload request.
23+
24+
## Initial Steps Overview {#initial-steps-overview}
25+
26+
1) [Determine requested resources](#step-1)
27+
28+
2) [Have you requested too many resources?](#step-2)
29+
30+
## Detailed Steps {#detailed-steps}
31+
32+
### 1) Determine requested resources {#step-1}
33+
34+
To determine your requested resources for your workload, you must first extract its YAML.
35+
36+
What type of resource to extract the YAML for may depend, but most commonly you can just get the YAML for the pod that reports the problem.
37+
38+
From that YAML, determine whether there are any resource requests made in the `containers` section, under `resources`.
39+
40+
A simplified YAML that makes a large request for memory resources (1000G) might look like this, for example:
41+
42+
```yaml
43+
apiVersion: v1
44+
kind: Pod
45+
metadata:
46+
name: too-much-mem
47+
spec:
48+
containers:
49+
- command:
50+
- sleep
51+
- "3600"
52+
image: busybox
53+
name: broken-pods-too-much-mem-container
54+
resources:
55+
requests:
56+
memory: "1000Gi"
57+
```
58+
59+
If no resource requests are in the YAML, then a default request may be made. What this request is will depend on other configuration in the cluster. See [here](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/) for more information.
60+
61+
See [here](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) for more information.
62+
63+
If no request is made and you are out of resources, then it is likely that you have no available nodes. At this point you need to consider [solution A](#solution-a).
64+
65+
### 2) Have you requested too many resources? {#step-2}
66+
67+
If you _have_ made a resource request, then there are two possibilities:
68+
69+
- Your resource request cannot fit into any node on the cluster
70+
71+
- Your resource request can fit on a node in the cluster, but those nodes already have workloads running on them which block yours being provisioned
72+
73+
[Step 1](#step-1) should have shown you whether you are specifically requesting resources. Once you know what those resources are, you can compare them to the resources available on each node.
74+
75+
If you are able, run:
76+
77+
```sh
78+
kubectl describe nodes
79+
```
80+
81+
which, under 'Capacity:', 'Allocatable:', and 'Allocated resources:' will tell you the resources available in each node, eg:
82+
83+
```
84+
$ kubectl describe nodes
85+
[...]
86+
Capacity:
87+
cpu: 4
88+
ephemeral-storage: 61255492Ki
89+
hugepages-1Gi: 0
90+
hugepages-2Mi: 0
91+
memory: 2038904Ki
92+
pods: 110
93+
Allocatable:
94+
cpu: 4
95+
ephemeral-storage: 56453061334
96+
hugepages-1Gi: 0
97+
hugepages-2Mi: 0
98+
memory: 1936504Ki
99+
pods: 110
100+
[...]
101+
Allocated resources:
102+
(Total limits may be over 100 percent, i.e., overcommitted.)
103+
Resource Requests Limits
104+
-------- -------- ------
105+
cpu 750m (18%) 0 (0%)
106+
memory 140Mi (7%) 340Mi (17%)
107+
ephemeral-storage 0 (0%) 0 (0%)
108+
hugepages-1Gi 0 (0%) 0 (0%)
109+
hugepages-2Mi 0 (0%) 0 (0%)
110+
```
111+
112+
You should be able to compare these to the resources you requested to determine why your request was not met, and choose where to [autoscale](#solution-a) or [provision a larger node](#solution-b) accordingly.
113+
114+
[Link to Solution A](#solution-a)
115+
116+
## Solutions List {#solutions-list}
117+
118+
A) [Set up autoscaling](#solution-a)
119+
120+
B) [Provision appropriately-sized nodes](#solution-b)
121+
122+
## Solutions Detail {#solutions-detail}
123+
124+
### A) Set up autoscaling {#solution-a}
125+
126+
The details of this will vary depending on your platform, but in general the principle is that you have legitimately used up all your resources, and you ened more nodes to take the load.
127+
128+
Note this solution will not work if:
129+
130+
- Your nodes are unavailable for other reasons (such as: you have a 'runaway' workload that is consuming all the resources it finds), as you will see this error again once the new resources are consumed.
131+
132+
- Your workload cannot fit on any node in the cluster
133+
134+
Some potentially useful links to achieving this:
135+
136+
- [K8s autoscaling](https://kubernetes.io/blog/2016/07/autoscaling-in-kubernetes/)
137+
- [EKS](https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html)
138+
- [GKE](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-autoscaler)
139+
- [AKS](https://azure.microsoft.com/en-gb/updates/generally-available-aks-cluster-autoscaler/)
140+
141+
### B) Provision appropriately-sized nodes {#solution-a}
142+
143+
The details of this will vary according to your platform. You will need to add a node (or set of nodes) that exceeds in size the amount your workload is requesting.
144+
145+
Also note: your workload scheduler may 'actively' or 'intelligently' move workloads to make them all 'fit' onto the given nodes. In these cases, you may need to significantly over-provision node sizes to reliably accommodate your workload.
146+
147+
## Check Resolution {#check-resolution}
148+
149+
If the error is no longer seen in the workload description in Kubernetes, then this particular issue has been resolved.
150+
151+
## Further Information {#further-information}
152+
153+
When you make a request for Kubernetes to run your workload, it tries to find all the nodes that can fulfil the requirements.
154+
155+
[Kubernetes resource management docs](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/)
156+
157+
## Owner {#owner}
158+
159+
email
160+
161+
[//]: # (REFERENCED DOCS)
162+
[//]: # (eg https://somestackoverflowpage)
163+
[//]: # (https://github.com/kubernetes/kubernetes/issues/33777 - TODO)

content/posts/kubernetes/pod-stuck-in-pending-status.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,7 @@ If you don't know how to restart the kubelet, you may need to contact your syste
202202

203203
Determine whether you need to increase the resources available, or limit resources your pod requests so as not to breach the limits.
204204
Which is appropriate depends on your particular circumstances.
205+
See [the "0 nodes available" runbook]({{< relref "0-nodes-available-insufficient.md" >}}) for further guidance.
205206

206207
### C) Repair your CNI {#solution-c}
207208

0 commit comments

Comments
 (0)