1) Default Behavior (Cluster mode) — Kubernetes
Service.type: LoadBalancer+externalTrafficPolicy: Cluster→ traffic to the VIP can land on any node in the cluster.- The receiving node’s kube-proxy forwards the traffic to a node that actually hosts a backend Pod (this hop typically involves SNAT).
- The original client IP is lost, but availability is maintained regardless of where Pods run.
- You cannot know in advance which node will be bound for an incoming connection.
Check your current setting
- Default install (names may vary for custom installs):
kubectl get svc ingress-nginx-controller -n ingress-nginx -o yaml | grep externalTrafficPolicy
2) VIP Binding in MetalLB L2
- MetalLB (L2 mode) binds a VIP to a single node’s MAC.
(Upstream ARP table on the switch/gateway:VIP → Node-A MAC) - All external traffic therefore enters the cluster through that one node (Node-A).
- Whether the Pod is on Node-A or not, kube-proxy will re-route it inside the cluster.
3) Failure Scenario
- VIP initially bound to Node-A (
VIP → Node-A MAC). - The Ingress Controller Pod is rescheduled to Node-B.
- With
externalTrafficPolicy: Cluster, Node-A → Node-B redirection should still work.
- With
- MetalLB changes the VIP owner to Node-B and sends gratuitous ARP (GARP) to update neighbors.
- Some network devices (switch/router) ignore GARP or keep the old MAC cached.
- External traffic still goes to Node-A.
- But once VIP ownership moved, Node-A no longer accepts the VIP → traffic is blackholed.
4) Why a Blackhole Even in Cluster Mode?
- In theory, Cluster mode lets any node accept and forward traffic.
- In MetalLB L2, only the current owner’s MAC answers ARP for the VIP. When ownership flips, the former owner stops responding.
- If upstream ARP still points to the old MAC (Node-A), packets arrive at Node-A, which now drops them → packet loss.
- Root cause is almost always ARP table refresh failure in L2 mode.
- Even if you switch the ingress traffic policy to
Localand ensure the ingress Pod runs on the “intended” node, this does not resolve the underlying ARP-staleness problem.
- Even if you switch the ingress traffic policy to
5) Practical Remediation
- Make network devices honor GARP and be ready to clear/flush ARP on switches/routers when VIP ownership changes.
- Consider BGP mode: multiple nodes advertise the VIP, removing the ARP single-owner dependency.
- This adds network handling requirements and operational complexity—fine for greenfield, but migrating a running cluster can introduce many variables and overhead.
- (I’ll cover BGP details in a separate post.)
✅ Summary
externalTrafficPolicy: Clusteris the default and should be robust.- With MetalLB L2, VIP owner changes can fail if ARP tables don’t update.
- The resulting blackhole is caused by L2/ARP behavior, not by the application stack.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. 본문 및 이미지를 무단 복제·배포할 수 없습니다. 공유 시 반드시 원문 링크를 명시해 주세요.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.
🛠 마지막 수정일: 2025.09.18
답글 남기기
댓글을 달기 위해서는 로그인해야합니다.