Quota's, Limits and Resources

Understanding Kubernetes limits and requests by example

Namespace quotas

Kubernetes allows administrators to set quotas, in namespaces, as hard limits for resource usage. This has an additional effect; if you set a CPU request quota in a namespace, then all pods need to set a CPU request in their definition, otherwise they will not be scheduled.

Let’s look at an example:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: mem-cpu-example
spec:
  hard:
    requests.cpu: 2
    requests.memory: 2Gi
    limits.cpu: 3
    limits.memory: 4Gi

If we apply this file to a namespace, we will set the following requirements:

All pod containers have to declare requests and limits for CPU and memory.
The sum of all the CPU requests can’t be higher than 2 cores.
The sum of all the CPU limits can’t be higher than 3 cores.
The sum of all the memory requests can’t be higher than 2 GiB.
The sum of all the memory limits can’t be higher than 4 GiB.

If we already have 1.9 cores allocated with pods and try to allocate a new pod with 200m of CPU request, the pod will not be scheduled and will remain in a pending state.

Container will be OOM killed if it tries to allocate more than requested RAM, most likely making the pod fail.
Container will suffer CPU throttle if it tries to use more than 30ms of CPU every 100ms, causing performance degradation.

In order to detect problems, we should be monitoring:

CPU and Memory usage in the node. Memory pressure can trigger OOM kills if the node memory is full, despite all of the containers being under their limits. CPU pressure will throttle processes and affect performance.

Conclusion

Some lessons you should learn from this are:

Dear developer, set requests and limits in your workloads.
Beloved cluster admin, setting a namespace quota will enforce all of the workloads in the namespace to have a request and limit in every container.

Quotas are a necessity to properly share resources. If someone tells you that you can use any shared service without limits, they are either lying or the system will eventually collapse, to no fault of your own.

Memory resources

Memory requests and limits are associated with Containers, but it is useful to think of a Pod as having a memory request and limit. The memory request for the Pod is the sum of the memory requests for all the Containers in the Pod. Likewise, the memory limit for the Pod is the sum of the limits of all the Containers in the Pod.

Pod scheduling is based on requests. A Pod is scheduled to run on a Node only if the Node has enough available memory to satisfy the Pod’s memory request.

Example Memory

apiVersion: v1
kind: Pod
metadata:
  name: memory-demo
  namespace: mem-example
spec:
  containers:
  - name: memory-demo-ctr
    image: polinux/stress
    resources:
      limits:
        memory: "200Mi"
      requests:
        memory: "100Mi"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]

Memory units

The memory resource is measured in bytes. You can express memory as a plain integer or a fixed-point integer with one of these suffixes: E, P, T, G, M, K, Ei, Pi, Ti, Gi, Mi, Ki.

Motivation for memory requests and limits

By configuring memory requests and limits for the Containers that run in your cluster, you can make efficient use of the memory resources available on your cluster’s Nodes. By keeping a Pod’s memory request low, you give the Pod a good chance of being scheduled. By having a memory limit that is greater than the memory request, you accomplish two things:

The Pod can have bursts of activity where it makes use of memory that happens to be available.
The amount of memory a Pod can use during a burst is limited to some reasonable amount.

CPU Resources

CPU requests and limits are associated with Containers, but it is useful to think of a Pod as having a CPU request and limit. The CPU request for a Pod is the sum of the CPU requests for all the Containers in the Pod. Likewise, the CPU limit for a Pod is the sum of the CPU limits for all the Containers in the Pod.

Pod scheduling is based on requests. A Pod is scheduled to run on a Node only if the Node has enough CPU resources available to satisfy the Pod CPU request.

Example CPU

apiVersion: v1
kind: Pod
metadata:
  name: cpu-demo
  namespace: cpu-example
spec:
  containers:
  - name: cpu-demo-ctr
    image: vish/stress
    resources:
      limits:
        cpu: "1"
      requests:
        cpu: "0.5"
    args:
    - -cpus
    - "2"

Motivation for CPU requests and limits

By configuring the CPU requests and limits of the Containers that run in your cluster, you can make efficient use of the CPU resources available on your cluster Nodes. By keeping a Pod CPU request low, you give the Pod a good chance of being scheduled. By having a CPU limit that is greater than the CPU request, you accomplish two things:

The Pod can have bursts of activity where it makes use of CPU resources that happen to be available.
The amount of CPU resources a Pod can use during a burst is limited to some reasonable amount.

Last updated on 18 Mar 2021
Published on 18 Mar 2021
Edit on GitHub