On-Premises Kubernetes Cluster Power Operations Guide: Master, fstab, NAS Procedures

When running a Kubernetes cluster in an Ubuntu on-premises environment, powering off (shutdown) and powering on (reboot/startup) is not just about toggling the servers.
If the Master node, etcd quorum, fstab configuration, or NAS startup order is mishandled, the entire cluster can become unstable.
This post outlines how to safely restart a cluster in such an environment.


1. Check Master Redundancy

  • Single-Master Environment
    → Only the single Master node needs to boot successfully.
  • Multi-Master Environment
    → You must ensure etcd quorum (majority of nodes) is available.
    • 3 Masters → at least 2 nodes must be up
    • 5 Masters → at least 3 nodes must be up
    • If quorum is not met for an extended period, etcd instability and data inconsistency may occur.

2. fstab (NFS Mount) Configuration Review

Assumption

For Kubernetes, fstab entries are not required to use NAS as PV/PVC.
However, in some cases administrators add an entry in the Master node’s fstab only for NAS visibility or verification purposes.

Problem

  • If the NAS server is unreachable or slow during boot, the NFS mount will fail.
  • systemd treats this as a critical error → rescue mode is triggered.
  • As a result, the Master node fails to boot, and both etcd and kube-apiserver may become inconsistent.

Safer Configuration Example

# ❌ Problematic (risky)
192.168.10.50:/data /mnt/data nfs defaults 0 0

# ✅ Safer
192.168.10.50:/data /mnt/data nfs defaults,nofail,x-systemd.automount,_netdev 0 0
  • nofail → continue boot even if the mount fails
  • _netdev → delay mount until the network is up
  • x-systemd.automount → mount automatically when accessed

Alternative

  • Comment out the NAS entry in fstab entirely.
  • Mount NAS manually when needed: mount -t nfs …
  • Apply the same policy consistently across all nodes, not just the Master.

3. Power-On Sequence (On-Premises)

Step 1: NAS and Network Equipment

  • Power on the NAS and confirm its health via the management console.
  • Verify switches, routers, and related network gear are operational.
  • Wait at least 10 minutes for stabilization.

Step 2: Master Node(s)

  • Single Master → power on the single Master.
  • Multi-Master → power on enough Masters to meet etcd quorum.
  • Perform network and control-plane health checks: ping 192.168.10.1 # gateway connectivity curl -k https://127.0.0.1:6443/healthz etcdctl endpoint health systemctl status kubelet
  • Run kubectl get nodes to confirm the Master is Ready.
  • If fstab is configured, verify the NAS mount as well.

Step 3: Worker Nodes

  • Only after the Master is fully stabilized, power on Worker nodes one by one.
  • Confirm each Worker reaches the Ready state before proceeding to the next.
  • If communication issues arise, pause Worker boot until the Master is recovered.

4. Problems Caused by Rescue Mode

  1. Boot Failure
    • NFS mount attempted before network initialization → systemd drops into rescue mode.
  2. Control Plane Instability
    • kubelet, apiserver, and etcd may timeout during startup, leaving the cluster unstable.
  3. Data Inconsistency
    • In a multi-Master setup, quorum can break, leading to inconsistent etcd state.
  4. Not Fixable by Simple Reboot
    • Restoring NAS alone does not resolve the control plane issues.
    • In severe cases, you may need to replace the Master and redeploy the cluster.

5. Key Takeaways

  • fstab is not required, but if used for NAS verification on the Master, always add nofail, _netdev, and x-systemd.automount — or comment the entry out entirely.
  • Power-On Order: NAS → Master → Worker
  • Multi-Master: securing quorum is top priority
  • Only bring up Workers after the Master is confirmed Ready.
  • Rescue mode is not “just a boot error” — it’s the starting point of control plane failures.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. 본문 및 이미지를 무단 복제·배포할 수 없습니다. 공유 시 반드시 원문 링크를 명시해 주세요.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.

🛠 마지막 수정일: 2025.09.18