Technotes for future me


How to preserve the source IP in Kubernetes

externalTrafficPolicy=local on Kubernetes

externalTrafficPolicy=local is an annotation on the Kubernetes service resource that can be set to preserve the client source IP. When this value is set, the actual IP address of a client (e.g., a browser or mobile application) is propagated to the Kubernetes service instead of the IP address of the node.

Here’s what you can find about it in the docs:

externalTrafficPolicy denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints. “Local” preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type services, but risks potentially imbalanced traffic spreading. “Cluster” obscures the client source IP and may cause a second hop to another node, but should have good overall load-spreading.

Pods and Nodes: Recap

In Kubernetes, containers are deployed in individual pods, which are then deployed on one or more nodes. A node is a physical or virtual machine, and represents the actual, concrete compute entity of a Kubernetes cluster. Kubernetes schedules pods to run on nodes based on a variety of criteria such as resource availability. Multiple pods are typically run on a single node.

Routing traffic to a Kubernetes cluster

Traffic entering a Kubernetes cluster arrives at a node. The node then routes traffic to the target pod via kube-proxy. If the pod is not on the same node as the incoming traffic, the node routes the traffic to the node where the pod resides.

Pros and Cons

externalTrafficPolicy: Cluster

This is the default external traffic policy for Kubernetes Services. The assumption here is that you always want to route traffic to all pods running a service with equal distribution.

One of the caveats of using this policy is that you may see unnecessary network hops between nodes as you ingress external traffic. For example, if you receive external traffic via a NodePort, the NodePort proxy may (randomly) route traffic to a pod on another host when it could have routed traffic to a pod on the same host, avoiding that extra hop out to the network.

Likely a bigger problem than extra hops on the network is masquerading. As packets re-route to pods on another node, your traffic will be SNAT’d (source network address translation) so that the destination pod would actually see the proxying node’s IP instead of the true client IP. This is undesirable for many reasons which I won’t be covering in this post.

Although SNATing service traffic is undesirable, this is fundamental for the Kubernetes networking model to work. If we omit SNAT, there would be a mismatch of source and destination addresses which will eventually lead to a connection error. The mismatch occurs because the destination outgoing from client would be the node address on a NodePort (or an external IP), but the destination from the other end would be the pod IP due to the original DNAT from the proxy.

externalTrafficPolicy: Local

By setting ExternalTrafficPolicy=local, nodes only route traffic to pods that are on the same node, which then preserves client IP. It’s important to recognize that ExternalTrafficPolicy is not a way to preserve source IP; it’s a change in networking policy that happens to preserve source IP.

With this external traffic policy, kube-proxy will add proxy rules on a specific NodePort (30000-32767) only for pods that exist on the same node (local) as opposed to every pod for a service regardless of where it was placed.

You’ll notice that if you try to set externalTrafficPolicy: Local on your Service, the Kubernetes API will require you are using the LoadBalancer or NodePort type. This is because the “Local” external traffic policy is only relevant for external traffic which by only applies to those two types.

$ kubectl apply -f mysvc.yml
The Service "mysvc" is invalid: spec.externalTrafficPolicy: Invalid value: "Local": ExternalTrafficPolicy can only be set on NodePort and LoadBalancer service

With this architecture, it’s important that any ingress traffic lands on nodes that are running the corresponding pods for that service, otherwise, the traffic would be dropped. For packets arriving on a node running your application pods, we know that all it’s traffic will route to the local pods, avoiding extra hops to other pods in the cluster.

We can achieve this logic by using a load balancer, hence why this external traffic policy is allowed with Services of type LoadBalancer (which uses the NodePort feature and adds backends to a load balancer with that node port). With a load balancer we would add every Kubernetes node as a backend but we can depend on the load balancer’s health checking capabilities to only send traffic to backends where the corresponding NodePort is responsive (i.e. only nodes who’s NodePort proxy rules point to healthy pods).

This model is great for applications that ingress a lot external traffic and want to avoid unnecessary hops on the network to reduce latency. We can also preserve true client IPs since we no longer need to SNAT traffic from a proxying node! However, the biggest downsides to using the “Local” external traffic policy, as mentioned in the Kubernetes docs is that traffic to your application may be imbalanced. This is better explained in the diagram below:

Because load balancers are typically not aware of the pod placement in your Kubernetes cluster, it will assume that each backend (a Kubernetes node) should receive equal distribution of traffic. As shown in the diagram above, this can lead to select pods for an application receiving significantly more traffic than other pods. In the future, we may see the development of load balancers that can hook into the Kubernetes API and better distribute traffic based on pod placement but I have not seen anything like that (yet). To avoid uneven distribution of traffic we can use pod anti-affinity (against the node’s hostname label) so that pods are spread out across as many nodes as possible:

    - weight: 100
           - key: k8s-app
             operator: In
             - my-app

As your application scales and is spread across more nodes, imbalanced traffic should become less of a concern as a smaller percentage of traffic will be unevenly distributed:


From my experience, if you have a service receiving external traffic from an LB (using NodePorts), you almost always want to use externalTrafficPolicy: Local (with pod anti-affinity to alleviate imbalanced traffic). There are a few cases where externalTrafficPolicy: Cluster makes sense but at the cost of losing client IPs and adding extra hops on your network. An interesting problem related to this is internal traffic policies. As of today, in-cluster traffic (using Service’s clusterIP) is always SNAT’d and often incurs that extra hop on your network. There is no equivalent of the “Local” external traffic policies for internal traffic (and perhaps for good reason) which I think is an interesting, but difficult problem to solve.

Preserving Source IP with Kubernetes ingress

How else can you preserve source IP with Kubernetes? If your external load balancer is a Layer 7 load balancer, the X-Forwarded-For header will also propagate client IP. If you are using a Layer 4 load balancer, you can use the PROXY protocol.


Last updated on 18 Mar 2021
Published on 18 Mar 2021
Edit on GitHub