In Kubernetes environments, it’s common to see intermittent timeouts when using NodePort services, calling external APIs, or communicating between internal services.
Pods appear healthy, nodes have available resources, and logs look clean—yet specific traffic paths intermittently drop packets for no obvious reason.
When this happens repeatedly on Ubuntu-based Kubernetes nodes, the first thing to check is:
Whether the node’s nf_conntrack table has reached its limit.
Ubuntu provides reasonable default values for conntrack, but in Kubernetes—where NAT traffic is concentrated—these defaults are often insufficient.
This guide explains, from a practical standpoint on Ubuntu:
- what conntrack does,
- why it becomes exhausted,
- and how to tune it for stable operation.
1. Why NAT Becomes Excessive in Ubuntu + Kubernetes
Kubernetes ultimately runs on the Linux kernel, and whenever NAT occurs, a session is stored in the conntrack table.
This mechanism is the same regardless of whether the OS is Ubuntu, CentOS, or anything else.
The most common NAT-heavy scenarios on Ubuntu Kubernetes nodes are the following.
1) Pod → External API (SNAT)
If a service performs many outbound calls to external SaaS endpoints, external DBs, or REST APIs,
the NAT session count rises rapidly on the node handling those pods.
2) NodePort
NodePort inherently performs DNAT → SNAT in sequence.
When traffic is heavy, conntrack fills up quickly.
3) ClusterIP (kube-proxy iptables mode)
Kubernetes clusters built with kubeadm on Ubuntu use iptables-based kube-proxy by default.
Depending on the traffic path, iptables-based load balancing may result in one or even two NAT operations.
4) readinessProbe / livenessProbe Overload
Probes themselves are TCP/HTTP connections.
If many probes target a single node, its conntrack count rises significantly.
2. Typical Symptoms of conntrack Exhaustion on Ubuntu Kubernetes Nodes
When the conntrack table becomes full,
the kernel logs a warning and simply drops packets:
nf_conntrack: table full, dropping packet.
On Ubuntu nodes, this results in:
- Intermittent timeouts via NodePort
- Partial failures in outbound API calls
- Ingress 503 errors (often node-specific)
- Sporadic Pod → Pod communication drops
- Increased HTTP 502/503 responses
- One node behaving unstable while others appear fine
The pattern appears random but consistently affects specific nodes.
3. CNI vs conntrack — The Key Point Is Not “Calico or Not”
A common misconception is that conntrack behavior depends on the CNI.
It does not.
CNI handles Pod-to-Pod routing,
while conntrack handles state tracking for NAT.
Therefore, if NAT occurs in Ubuntu + Kubernetes,
conntrack will be used regardless of CNI.
conntrack usage by CNI
| CNI | Uses conntrack? | Reason |
|---|---|---|
| Calico | Yes | kube-proxy NAT stays unchanged |
| Flannel | Yes | SNAT/DNAT behavior identical |
| Weave | Yes | Same kube-proxy NAT behavior |
| Cilium (kube-proxy ON) | Yes | NAT path remains |
| Cilium (kube-proxy replacement / eBPF) | Partial reduction | Internal LB via eBPF, but external SNAT still uses Ubuntu kernel |
Conclusion:
Any Kubernetes architecture that performs NAT will consume conntrack,
and the CNI has little to do with it.
4. Understanding Ubuntu’s conntrack Default Values
This is an important point.
On Ubuntu 20.04, 22.04, and 24.04, it is incorrect to say that conntrack defaults to a specific fixed value.
Ubuntu determines conntrack settings based on factors such as:
- kernel version
- system memory
- automatic calculation of
nf_conntrack_buckets - distro-provided sysctl files in
/usr/lib/sysctl.d/ - prebaked sysctl values in cloud images (EKS, GKE, AKS, etc.)
- kubeadm configuration
Therefore:
Saying “Ubuntu default = 65536” is incorrect.
However, in many real-world Ubuntu nodes, especially those not tuned manually,
values in the 65,536 ~ 131,072 range are frequently observed.
This is why such numbers often appear in troubleshooting discussions.
In practice, the correct approach is:
Check the actual values on the node:
cat /proc/sys/net/netfilter/nf_conntrack_max
cat /proc/sys/net/netfilter/nf_conntrack_count
5. Recommended conntrack Tuning Values for Ubuntu + Kubernetes
In Kubernetes workloads with moderate or high traffic,
Ubuntu’s default conntrack settings are often insufficient.
Recommended practical values:
For mid-sized clusters
nf_conntrack_max = 262144
nf_conntrack_buckets = 65536
For high-throughput workloads
nf_conntrack_max = 524288 ~ 1048576
nf_conntrack_buckets = nf_conntrack_max / 4
On nodes with at least 16GB of RAM,
these settings have negligible memory impact.
6. How to Tune conntrack Properly on Ubuntu
Create the file:
/etc/sysctl.d/99-nf-conntrack.conf
net.netfilter.nf_conntrack_max=262144
net.netfilter.nf_conntrack_buckets=65536
Apply the changes:
sysctl --system
Verify:
cat /proc/sys/net/netfilter/nf_conntrack_max
cat /sys/module/nf_conntrack/parameters/hashsize
If hashsize does not change, Ubuntu may require a reboot
because the module was already loaded.
7. Why conntrack Usage Decreases in Cilium eBPF Mode (Ubuntu-Based)
When Cilium enables kube-proxy replacement,
ClusterIP and NodePort load balancing is performed via eBPF instead of Ubuntu’s iptables.
This reduces NAT operations inside the cluster
and therefore lowers conntrack usage.
However, NAT is still used in these cases:
- Pod → external internet (forced SNAT)
- ExternalTrafficPolicy=Local
- HostNetwork pods
- Services with externalTrafficPolicy=Cluster (partial NAT)
So:
NAT doesn’t disappear—it is simply bypassed in some internal paths via eBPF.
8. Real-World Cases Where conntrack Exhaustion Occurs on Ubuntu Kubernetes Nodes
Examples frequently seen in production:
- Heavy traffic hitting a NodePort API
- All outbound traffic concentrated on a single node
- readinessProbe/livenessProbe firing multiple times per second
- sidecar patterns multiplying connection count
- StatefulSet pods repeatedly calling the same external DB
- Ingress traffic skewed heavily toward certain nodes
If an Ubuntu node’s conntrack limit is in the 65k–130k range,
these patterns almost guarantee exhaustion.
9. Conclusion
In Ubuntu-based Kubernetes nodes with heavy NAT traffic,
issues such as intermittent timeouts, NodePort 503 errors, and sporadic Pod-to-Pod drops
tend to appear repeatedly and are often difficult to diagnose.
The core reason behind these symptoms is usually the same:
- The nf_conntrack table on the Ubuntu node is configured with a default size that is too small.
- The default value is not fixed and varies depending on the environment.
- CNI has almost no direct relationship with conntrack usage.
- As long as kube-proxy is running in iptables mode,
NAT operations are handled by the Ubuntu kernel. - Pod → external traffic, NodePort, and ClusterIP load balancing
all consume conntrack entries.
Therefore:
- Increasing
nf_conntrack_maxis essential in Ubuntu-based Kubernetes environments. - For Ubuntu nodes running Kubernetes,
nf_conntrack_max = 262144or higher is effectively the minimum operational baseline,
and systems with heavier traffic should set it even higher.
🛠 마지막 수정일: 2025.11.19
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.
💡 도움이 필요하신가요?
Zabbix, Kubernetes, 그리고 다양한 오픈소스 인프라 환경에 대한 구축, 운영, 최적화, 장애 분석이 필요하다면 언제든 편하게 연락 주세요.
📧 Contact: jikimy75@gmail.com
💼 Service: 구축 대행 | 성능 튜닝 | 장애 분석 컨설팅
💡 Need Professional Support?
If you need deployment, optimization, or troubleshooting support for Zabbix, Kubernetes, or any other open-source infrastructure in your production environment, feel free to contact me anytime.
📧 Email: jikimy75@gmail.com
💼 Services: Deployment Support | Performance Tuning | Incident Analysis Consulting
답글 남기기
댓글을 달기 위해서는 로그인해야합니다.