Kubernetes — Node size

Research on the trade offs when choosing an instance type for a Kubernetes cluster

Find more research at: How to contribute: Leave a comment or drop us a line at [email protected]

https://learnk8s.io/research License: Apache 2.0

Last updated: February 14, 2022

Check out the Kubernetes instance calculator

What is it? Not all memory and CPU in a Node can be used to run Pods. The resources are partitioned in 4: 1. Memory and CPU reserved to the operating system and system daemons such as SSH 2. Memory and CPU reserved to the Kubelet and Kubernetes agents such as the CRI 3. Memory reserved for the hard eviction threshold 4. Memory and CPU available to Pods The graph show the total memory available for running Pods after subtracting the reserved memory Memory (in GiB) GKE EKS AKS

1 55.00% 0.00% 0.00%

2 65.00% 16.75% 32.50%

4 70.00% 58.38% 53.75%

8 75.00% 79.19% 66.88%

16 82.50% 89.59% 78.44%

64 91.13% 97.40% 90.11%

128 92.56% 97.50% 92.05%

192 93.88% 98.33% 93.54%

256 94.91% 98.75% 94.65%

Example If you have a Kubernetes cluster in GKE with a single Node of 2GB of memory, only 65% of the available memory is used to run Pods. The reamining memory is necessary to run the OS, Kubelet, CRI, CNI, etc.

Notes GKE and AKS reach 90% level of utilisation with instances over 64GB. EKS is 90% efficient starting with 8GB.

1 84.00% 84.00% 84.00%

2 91.50% 91.50% 90.00%

4 95.50% 95.50% 94.00%

8 97.63% 97.63% 96.50%

16 98.69% 98.69% 97.75%

32 99.22% 99.22% 98.38%

64 99.48% 99.40% 98.69%

Example If you have a Kubernetes cluster in AKS with a single Node and 2 vCPU 90% of the available CPUs are used to run Pods. The reamining memory is necessary to run the OS, Kubelet, CRI, CNI, etc.

Notes As long as you use node with at least 2 vCPU you should be fine.

What is it? There's a upper limit on the number of Pods that you can run on each Node. Each cloud provider has a different limit. Most of the time the limit is independent of the Node size (e.g. GKE, AKS). There are cases where the number of Pods depends on the Node size (notable: EKS). Memory (in GiB) GKE EKS AKS

1 110 110 250

2 110 110 250

4 110 110 250

8 110 110 250

16 110 110 250

64 110 110 250

128 110 250 250

192 110 250 250

Notes The metrics is relevant to measure your blast radius. Assuming that a Node is lost how many Pods are affected? 256 110 250 250

What is it? Nodes have an upper limit on the number of Pods that they can run. Assuming that you run the max number of Pods for that node, how much memory is available to each Pod? This metric divides the available Node memory by the max number of Pods for that instance type. Memory (in GiB) GKE EKS AKS

1 0.01 0.00 0.00

2 0.01 0.00 0.00

4 0.03 0.02 0.01

8 0.05 0.06 0.02

16 0.12 0.13 0.05

64 0.53 0.57 0.23

Example If you have Kubernetes cluster in GKE with a single Node of 128GB of memory, you can run up to 110 Pods and each of them can use 1.08GB of memory. 128 1.08 0.50 0.47

192 1.64 0.76 0.72

256 2.21 1.01 0.97

Notes It's not possible to run small workloads (less than 1GB of memory) efficiently on GKE when the node size is greater than 128GB of memory. EKS has a peak at 192GB of memory. That's where there are the most Pod with the larger memory available to them (234 Pods with 810MiB of memory each).

What is it? If all my Pods are using 1GB of memory, what instance type I should use to maximise the memory available? The charts presents 5 scenarios: what if all the Pods in the Node have limits of 1, 2, 4, 8 or 16 GiB. The chart shows how utilised is the node. Pod memory limit 1GiB 2GiB 4GiB 8GiB 16GiB 64GiB 128 GiB 192 GiB 256 GiB

0.5GiB 50.00% 50.00% 62.50% 75.00% 81.25% 85.94% 42.97% 28.65% 21.48%

1 GiB 0.00% 50.00% 50.00% 75.00% 81.25% 90.63% 85.94% 57.29% 42.97%

2 GiB 0.00% 0.00% 50.00% 75.00% 75.00% 90.63% 92.19% 93.75% 85.94%

4 GiB 0.00% 0.00% 0.00% 50.00% 75.00% 87.50% 90.63% 93.75% 93.75%

8 GiB 0.00% 0.00% 0.00% 0.00% 50.00% 87.50% 87.50% 91.67% 93.75%

16 GiB 0.00% 0.00% 0.00% 0.00% 0.00% 75.00% 87.50% 91.67% 93.75%

Example When all Pods in your cluster are 1GB, the best node that can allocate the most number of Pods is a Node with 64GB of memory. Values before the peak means that the node is underutilised (there's still space, but not enough to run a Pod). Values after the peak means that you reached the limit of Pods on that and you can't schedule more Pods on that node.

Notes It's clear that the best Node for Pods that average 1GB of memory is a 64GB Node. If the Pod memory limit increases in average to 2GB, a 192GB memory instance is the more efficient.

0.5GiB 0.00% 0.00% 50.00% 75.00% 87.50% 85.94% 97.27% 65.10% 48.83%

1 GiB 0.00% 0.00% 50.00% 75.00% 87.50% 96.88% 96.88% 97.92% 97.66%

2 GiB 0.00% 0.00% 50.00% 75.00% 87.50% 96.88% 96.88% 97.92% 98.44%

4 GiB 0.00% 0.00% 0.00% 50.00% 75.00% 93.75% 96.88% 97.92% 98.44%

8 GiB 0.00% 0.00% 0.00% 0.00% 50.00% 87.50% 93.75% 95.83% 96.88%

16 GiB 0.00% 0.00% 0.00% 0.00% 0.00% 75.00% 87.50% 91.67% 93.75%

Example When all Pods in your cluster are 1GB, the best node that can allocate the most number of Pods is a Node with 192GB of memory. Values before the peak means that the node is underutilised (there's still space, but not enough to run a Pod).

Notes Pay attention to local inefficiencies due to the limits on how many Pods can be deployed on a Node.

0.5GiB 0.00% 25.00% 50.00% 62.50% 78.13% 89.84% 91.80% 65.10% 48.83%

1 GiB 0.00% 25.00% 50.00% 62.50% 78.13% 89.84% 91.80% 65.10% 48.83%

2 GiB 0.00% 0.00% 50.00% 62.50% 75.00% 89.06% 91.41% 93.23% 94.53%

4 GiB 0.00% 0.00% 50.00% 50.00% 75.00% 87.50% 90.63% 92.71% 94.53%

8 GiB 0.00% 0.00% 0.00% 50.00% 75.00% 87.50% 90.63% 91.67% 93.75%

16 GiB 0.00% 0.00% 0.00% 0.00% 50.00% 87.50% 87.50% 91.67% 93.75%

Example When all Pods in your cluster are 1GB, the best node that can allocate the most number of Pods is a Node with 128GB of memory. Values before the peak means that the node is underutilised (there's still space, but not enough to run a Pod). Values after the peak means that you reached the limit of Pods on that and you can't schedule more Pods on that node.

Info
Tags	Kubernetes, Technology, Programming, Devops
Type	Google Sheet
Published	25/03/2024, 10:06:25

Kubernetes — Node size

Research on the trade-offs when choosing an instance type for a Kubernetes cluster

Kubernetes — Node size

Research on the trade-offs when choosing an instance type for a Kubernetes cluster

kubernetes, technology, programming, devops

Resources

Kubernetes Ingress Controllers

Kubernetes managed services

Service meshes