PodAffinity

Kubernetes: How to deploy different pods close to each other (same node or zone etc.)

In my previous 2 posts, I talked about, how to avoid scheduling pods on certain nodes and how to schedule pods to certain nodes. You can read then at following links:

Avoid scheduling pods on certain nodes
Schedule pods on certain nodes

In this post, I will talk about, how to schedule your pods near to some other pod or in other words schedule a pod on a node which already has some other pod running on it. I will use some terminology from my earlier post about scheduling pods on certain nodes. So, please make sure you get some understanding from that post. Let’s get started with this one.

Idea here is to make sure when your pod is scheduled on some node, you want to make sure, it is placed on a node which already has some other pod running or it should not have some other pod running. First, let’s talk about possible use cases where you might need this.

Use cases

A application can have pods which frequently talk to each other, so you need to make sure, they are placed on same nodes to avoid latency. For ex: In an ecommerce application, every time you place an order, service needs to check inventory before accepting the order. For every order placement, there will be a inventory check service invocation. So, you might want to make sure that node having a order service pod should also have inventory service pod.
Another example could be, to place a cache pod near to a web application pod for faster access to cache contents.
You need to make sure that no more than one pod is scheduled on a node to make sure your pods are as distributed as possible.

Pod Affinity: Schedule pods closer to already running pods

In my previous post, I talked about node affinity to schedule pods on specific set of nodes. In this post I will talk about another affinity which is pod affinity. So, here is how pod affinity and anti affinity works:

Schedule a pod (or don’t schedule, in case of anti-affinity) on a node X, if pod Y is already running on that node. Here X is a label key of the node (it can have any value) and Y is the label assigned to already running pod.

Don’t panic, if you could not make much sense out of above statement. We will discuss this in detail.

Like nodeaffinity, podaffinity is of 2 types: requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution.

Check my previous post to understand difference between the two.

To understand pod affinity, let’s discuss a scenario from one of the use cases above, where you want to place a web-app pod on the same node where cache pod is already running. (or vice versa depending upon your application)

Here is the yaml definition to achieve the same.

apiVersion: v1
kind: Pod
metadata:
  name: web-app
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: name
            operator: In
            values:
            - web-cache
        topologyKey: my-node-label   
  containers:
  - name: web-app
    image: <container-image>

Some new elements in above yaml definition are defined below. Other elements either you must have seen in my previous post or they are pretty much self explanatory.

podAffinity: This contains all the required rules pod scheduling

topologyKey: This is the label key assigned to the target nodes. Remember, label has key and value and we are talking only about key. This can be any label key (value does not matter) whether set by you or by your cloud vendor.

With above pod definition file, we are deploying a pod containing web application named “we-app” and we want to place this pod on a node which is already running a cache pod named “web-cache”. Under podAffinity, we are telling scheduler to schedule this pod on a node have a label with key “my-node-label” if a pod with name “web-cache” is already running on this node.

Note: There are multiple other operators which you can use here depending upon your need. Check my previous post where I have talked mentioned possible operators which you can use in affinity.

While trying to find suitable node for this pod, scheduler takes care of above requirement and whether pod is scheduled or not depends upon podAffinity type used. In our example, we used podAffinity of type, requiredDuringSchedulingIgnoredDuringExecution. If there is no node which is running a cache pod your web pod will not be scheduled and will be in pending state until it finds a suitable node. So, it is important that you are aware of how your application works and which pods will be scheduled before other pods and accordingly use the appropriate podAffinity type.

That’s it. It is as simple as use podAffinity and you are good to go.

Pod Anti Affinity: Avoid scheduling pods on nodes which are already running certain pods

To understand pod anti-affinity, let’s discuss another scenario from one of the use cases above, where we want to make sure no more than one pod of a type is running on a node. Below yaml definition achieves this.

apiVersion: v1
kind: Pod
metadata:
  name: web-app
  labels:
    app: web-app
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - web-app
        topologyKey: my-node-label
  containers:
  - name: web-app
    image: <container-image>

The only new element is podAntiAffinity, which is opposite to podAffinity. podAffinity is all about scheduling and podAntiAffinity is about not scheduling.

In above yaml defintion, we are trying to deploy a pod named “web-app” and goal is to make sure no more than one “web-app” pod is running on a node.

Under podAntiAffinity, we are telling scheduler to schedule this pod on a node have a label with key “my-node-label” if and only if a pod with name “web-app” is already not running on this node. So, if a pod with same name is running on this node, new pod will be scheduled on some other node.

A word of caution

Though podAffinity and podAntiAffinity let you control placement of pods but this also has a downside. pod affinity and anti-affinity require substantial amount of processing which can slow down scheduling in large clusters significantly. These are not recommended for clusters larger than several hundred nodes.

Million dollar question: why not place containers in same pod

Why should we get into this much complexity to place 2 pods together. Why not place multiple containers in a single pod, after all pod is nothing but a wrapper around container. So, rather than going though all the complexity above, should we just place multiple containers in a pod. Short answer is, it depends. Longer answer, you need to keep following things in mind:

No one can stop you from putting multiple containers in a pod but it all depends on your application requirements. If you think, your business model and application operation mode would benefit from this go ahead and do this.
If due to some problem in one of your container, your pod crashes, you are losing another application also, which might not be at fault at all.
While horizontal scaling, you will be scaling both applications together, even if there was a need to scale only one. That would mean, you will pay for more resources.

Conclusion

Though with pod affinity and anti-affinity you can control placement of your pods but you should use this feature carefully especially in very large clusters as it can introduce delay with pod scheduling.

Source:
https://balkrishan-nagpal.medium.com/kubernetes-how-to-deploy-different-pods-close-to-each-other-same-node-or-zone-etc-51c6e286f714

Last updated on 20 Feb 2023
Published on 20 Feb 2023
Edit on GitHub