When running a Kubernetes cluster in an Ubuntu on-premises environment, powering off (shutdown) and powering on (reboot/startup) is not just about toggling the servers.
If the Master node, etcd quorum, fstab configuration, or NAS startup order is mishandled, the entire cluster can become unstable.
This post outlines how to safely restart a cluster in such an environment.
1. Check Master Redundancy
- Single-Master Environment
→ Only the single Master node needs to boot successfully. - Multi-Master Environment
→ You must ensure etcd quorum (majority of nodes) is available.- 3 Masters → at least 2 nodes must be up
- 5 Masters → at least 3 nodes must be up
- If quorum is not met for an extended period, etcd instability and data inconsistency may occur.
2. fstab (NFS Mount) Configuration Review
Assumption
For Kubernetes, fstab entries are not required to use NAS as PV/PVC.
However, in some cases administrators add an entry in the Master node’s fstab only for NAS visibility or verification purposes.
Problem
- If the NAS server is unreachable or slow during boot, the NFS mount will fail.
- systemd treats this as a critical error → rescue mode is triggered.
- As a result, the Master node fails to boot, and both etcd and kube-apiserver may become inconsistent.
Safer Configuration Example
# ❌ Problematic (risky)
192.168.10.50:/data /mnt/data nfs defaults 0 0
# ✅ Safer
192.168.10.50:/data /mnt/data nfs defaults,nofail,x-systemd.automount,_netdev 0 0
nofail→ continue boot even if the mount fails_netdev→ delay mount until the network is upx-systemd.automount→ mount automatically when accessed
Alternative
- Comment out the NAS entry in
fstabentirely. - Mount NAS manually when needed:
mount -t nfs … - Apply the same policy consistently across all nodes, not just the Master.
3. Power-On Sequence (On-Premises)
Step 1: NAS and Network Equipment
- Power on the NAS and confirm its health via the management console.
- Verify switches, routers, and related network gear are operational.
- Wait at least 10 minutes for stabilization.
Step 2: Master Node(s)
- Single Master → power on the single Master.
- Multi-Master → power on enough Masters to meet etcd quorum.
- Perform network and control-plane health checks:
ping 192.168.10.1 # gateway connectivity curl -k https://127.0.0.1:6443/healthz etcdctl endpoint health systemctl status kubelet - Run
kubectl get nodesto confirm the Master isReady. - If
fstabis configured, verify the NAS mount as well.
Step 3: Worker Nodes
- Only after the Master is fully stabilized, power on Worker nodes one by one.
- Confirm each Worker reaches the
Readystate before proceeding to the next. - If communication issues arise, pause Worker boot until the Master is recovered.
4. Problems Caused by Rescue Mode
- Boot Failure
- NFS mount attempted before network initialization → systemd drops into rescue mode.
- Control Plane Instability
- kubelet, apiserver, and etcd may timeout during startup, leaving the cluster unstable.
- Data Inconsistency
- In a multi-Master setup, quorum can break, leading to inconsistent etcd state.
- Not Fixable by Simple Reboot
- Restoring NAS alone does not resolve the control plane issues.
- In severe cases, you may need to replace the Master and redeploy the cluster.
5. Key Takeaways
fstabis not required, but if used for NAS verification on the Master, always addnofail,_netdev, andx-systemd.automount— or comment the entry out entirely.- Power-On Order: NAS → Master → Worker
- Multi-Master: securing quorum is top priority
- Only bring up Workers after the Master is confirmed
Ready. - Rescue mode is not “just a boot error” — it’s the starting point of control plane failures.
🛠 마지막 수정일: 2025.09.18
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.
💡 도움이 필요하신가요?
Zabbix, Kubernetes, 그리고 다양한 오픈소스 인프라 환경에 대한 구축, 운영, 최적화, 장애 분석,
광고 및 협업 제안이 필요하다면 언제든 편하게 연락 주세요.
📧 Contact: jikimy75@gmail.com
💼 Service: 구축 대행 | 성능 튜닝 | 장애 분석 컨설팅
📖 E-BooK [PDF] 전자책 (Gumroad):
Zabbix 엔터프라이즈 최적화 핸드북
블로그에서 다룬 Zabbix 관련 글들을 기반으로 실무 중심의 지침서로 재구성했습니다.
운영 환경에서 바로 적용할 수 있는 최적화·트러블슈팅 노하우까지 모두 포함되어 있습니다.
💡 Need Professional Support?
If you need deployment, optimization, or troubleshooting support for Zabbix, Kubernetes,
or any other open-source infrastructure in your production environment, or if you are interested in
sponsorships, ads, or technical collaboration, feel free to contact me anytime.
📧 Email: jikimy75@gmail.com
💼 Services: Deployment Support | Performance Tuning | Incident Analysis Consulting
📖 PDF eBook (Gumroad):
Zabbix Enterprise Optimization Handbook
A single, production-ready PDF that compiles my in-depth Zabbix and Kubernetes monitoring guides.
답글 남기기
댓글을 달기 위해서는 로그인해야합니다.