Kubernetes: How to deploy different pods close to each other (same node or zone etc.)
In my previous 2 posts, I talked about, how to avoid scheduling pods on certain nodes and how to schedule pods to certain nodes. You can read then at following links:
Avoid scheduling pods on certain nodes
Schedule pods on certain nodes
In this post, I will talk about, how to schedule your pods near to some other pod or in other words schedule a pod on a node which already has some other pod running on it. I will use some terminology from my earlier post about scheduling pods on certain nodes. So, please make sure you get some understanding from that post. Let’s get started with this one.
Idea here is to make sure when your pod is scheduled on some node, you want to make sure, it is placed on a node which already has some other pod running or it should not have some other pod running. First, let’s talk about possible use cases where you might need this.
- A application can have pods which frequently talk to each other, so you need to make sure, they are placed on same nodes to avoid latency. For ex: In an ecommerce application, every time you place an order, service needs to check inventory before accepting the order. For every order placement, there will be a inventory check service invocation. So, you might want to make sure that node having a order service pod should also have inventory service pod.
- Another example could be, to place a cache pod near to a web application pod for faster access to cache contents.
- You need to make sure that no more than one pod is scheduled on a node to make sure your pods are as distributed as possible.
Pod Affinity: Schedule pods closer to already running pods
In my previous post, I talked about node affinity to schedule pods on specific set of nodes. In this post I will talk about another affinity which is pod affinity. So, here is how pod affinity and anti affinity works:
Schedule a pod (or don’t schedule, in case of anti-affinity) on a node X, if pod Y is already running on that node. Here X is a label key of the node (it can have any value) and Y is the label assigned to already running pod.
Don’t panic, if you could not make much sense out of above statement. We will discuss this in detail.
Like nodeaffinity, podaffinity is of 2 types:
Check my previous post to understand difference between the two.
To understand pod affinity, let’s discuss a scenario from one of the use cases above, where you want to place a web-app pod on the same node where cache pod is already running. (or vice versa depending upon your application)
Here is the yaml definition to achieve the same.
apiVersion: v1 kind: Pod metadata: name: web-app spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: name operator: In values: - web-cache topologyKey: my-node-label containers: - name: web-app image: <container-image>
Some new elements in above yaml definition are defined below. Other elements either you must have seen in my previous post or they are pretty much self explanatory.
podAffinity: This contains all the required rules pod scheduling
topologyKey: This is the label key assigned to the target nodes. Remember, label has key and value and we are talking only about key. This can be any label key (value does not matter) whether set by you or by your cloud vendor.
With above pod definition file, we are deploying a pod containing web application named “we-app” and we want to place this pod on a node which is already running a cache pod named “web-cache”. Under
podAffinity, we are telling scheduler to schedule this pod on a node have a label with key “my-node-label” if a pod with name “web-cache” is already running on this node.
Note: There are multiple other operators which you can use here depending upon your need. Check my previous post where I have talked mentioned possible operators which you can use in affinity.
While trying to find suitable node for this pod, scheduler takes care of above requirement and whether pod is scheduled or not depends upon
podAffinity type used. In our example, we used
podAffinity of type,
requiredDuringSchedulingIgnoredDuringExecution. If there is no node which is running a cache pod your web pod will not be scheduled and will be in pending state until it finds a suitable node. So, it is important that you are aware of how your application works and which pods will be scheduled before other pods and accordingly use the appropriate
That’s it. It is as simple as use
podAffinity and you are good to go.
Pod Anti Affinity: Avoid scheduling pods on nodes which are already running certain pods
To understand pod anti-affinity, let’s discuss another scenario from one of the use cases above, where we want to make sure no more than one pod of a type is running on a node. Below yaml definition achieves this.
apiVersion: v1 kind: Pod metadata: name: web-app labels: app: web-app spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - web-app topologyKey: my-node-label containers: - name: web-app image: <container-image>
The only new element is
podAntiAffinity, which is opposite to
podAffinity is all about scheduling and
podAntiAffinity is about not scheduling.
In above yaml defintion, we are trying to deploy a pod named “web-app” and goal is to make sure no more than one “web-app” pod is running on a node.
podAntiAffinity, we are telling scheduler to schedule this pod on a node have a label with key “my-node-label” if and only if a pod with name “web-app” is already not running on this node. So, if a pod with same name is running on this node, new pod will be scheduled on some other node.
A word of caution
podAntiAffinity let you control placement of pods but this also has a downside. pod affinity and anti-affinity require substantial amount of processing which can slow down scheduling in large clusters significantly. These are not recommended for clusters larger than several hundred nodes.
Million dollar question: why not place containers in same pod
Why should we get into this much complexity to place 2 pods together. Why not place multiple containers in a single pod, after all pod is nothing but a wrapper around container. So, rather than going though all the complexity above, should we just place multiple containers in a pod. Short answer is, it depends. Longer answer, you need to keep following things in mind:
- No one can stop you from putting multiple containers in a pod but it all depends on your application requirements. If you think, your business model and application operation mode would benefit from this go ahead and do this.
- If due to some problem in one of your container, your pod crashes, you are losing another application also, which might not be at fault at all.
- While horizontal scaling, you will be scaling both applications together, even if there was a need to scale only one. That would mean, you will pay for more resources.
Though with pod affinity and anti-affinity you can control placement of your pods but you should use this feature carefully especially in very large clusters as it can introduce delay with pod scheduling.