Kubernetes Cluster Architecture in an Air-Gapped Environment

Operating a Kubernetes cluster in a completely disconnected (air-gapped) environment requires more than just installation.
All operational components — OS packages, container images, backup and recovery systems, and monitoring — must circulate entirely within the internal network.

The following design illustrates a practical Kubernetes cluster architecture applicable to real-world air-gapped environments.


1. Overall Architecture

This architecture is built around a three-layer security boundary:

  • DMZ (Demilitarized Zone): A buffer zone between external users and the internal network
  • Internal Network: Where the Kubernetes cluster, storage, and monitoring systems reside
  • Data Exchange Zone: A tightly controlled link for limited data transfer between DMZ and internal network

External users can never directly access the internal network.
All files (packages, images, backups, etc.) move only through an approved data-transfer mechanism.


2. Component Design

(1) DMZ Zone

⦿ Frontend Web Server (Nginx / Apache)

The only entry point for external users.
Requests are processed at the DMZ web tier and never directly forwarded to internal systems.
Any interaction with internal applications occurs solely through the data-transfer solution.

⦿ Data-Transfer Solution

Controls one-way or bi-directional transmission between the external and internal networks.
Usually implemented as file-based gateways (SFTP or dedicated data-bridge appliances).
In Kubernetes operations, it’s used for importing and exporting container images, packages, and backups under strict supervision.


(2) Internal Network (Air-Gapped Zone)

All critical components operate entirely inside the internal network, without any external connectivity.

⦿ Kubernetes Cluster

  • Fully isolated and operates independently of the Internet
  • Pulls all required images from the internal Nexus repository

Core Components:

  • etcd: Cluster state storage
  • Velero: Resource and PV backup
  • Nexus Repository: Image and package repository
  • Zabbix Agent: Node and application monitoring

⦿ Nexus Repository

Serves as the central dependency hub for all Kubernetes operations.
It replaces external Internet access by providing locally mirrored packages and images.

Repository TypePurpose
Docker (OCI)Kubernetes components and application images
Helm RepoHelm chart distribution
APT / YUM RepoOS and runtime packages
PyPI / npm, etc.Development language dependencies

The repository is updated periodically by mirroring from an external source,
then imported into the internal Nexus either through a data-transfer gateway or via physical media (e.g., USB).
This ensures up-to-date packages and security patches, even in a fully isolated network.

⦿ Storage (Ceph / NFS)

Stores all persistent data such as Pod PVs, Velero backups, and etcd snapshots.
Ceph is recommended for high availability; NFS can be used for simpler setups.

⦿ Velero / etcdctl

  • Velero: Performs full backup of Kubernetes resources and PVs
  • etcdctl: Separately backs up cluster state in case Velero restore fails
  • Both backups are stored on Ceph or NFS
  • Recovery order: Velero restore → etcd snapshot restore

(3) Monitoring and Alerting System

⦿ Zabbix Server

Deployed outside the Kubernetes cluster for resilience during cluster downtime.
Monitors:

  • Node metrics (CPU, Memory, Disk)
  • Pod-level resource usage
  • Application metrics (HTTP response, error rate)
    Metrics are collected via Zabbix Agent2 and API polling.

⦿ Alarm Notification

No external mail or chat servers are used.
Only internal SMTP or Mattermost instances deliver alerts.
Notifications are strictly confined to the internal network to prevent data leakage.


3. Backup and Recovery Strategy

In an air-gapped environment, remote backups via the Internet are impossible.
Therefore, automated in-cluster backups and redundant local storage are essential.

Backup TargetToolFrequencyStorage
Kubernetes Resources & PVVeleroDailyCeph / NFS
etcd Stateetcdctl snapshotDailyCeph / NFS
Zabbix Config / TemplatesExportWeeklyInternal Backup Directory

Recovery Order:
1️⃣ Velero restore → 2️⃣ etcd snapshot restore
This combination ensures both resource and state consistency after recovery.


4. Package and Image Supply Chain

Because the network is isolated, all OS and application dependencies must be locally managed.

  1. Mirror the latest packages and images in the external network
    • Docker Hub, Helm Repo, PyPI, apt/yum mirrors, etc.
  2. Transfer mirrored data into the internal Nexus
    • via data-transfer gateway or physical USB media
  3. All Kubernetes nodes and applications pull exclusively from the internal Nexus

With this flow, the cluster can perform updates, deployments, and scaling
without relying on any external network access.


5. Design Principles Summary

  1. Clear Network Boundaries
    Layered security: DMZ → Data-Transfer Gateway → Internal Firewall → Internal Network
  2. Unified Supply Chain Management
    Nexus consolidates packages, images, Helm charts, and language dependencies
  3. Dual Backup System
    Velero + etcdctl combination preserves both resource data and cluster state
  4. Independent Monitoring
    Zabbix hosted outside the cluster ensures visibility even during failure
  5. Internal-Only Alert Channels
    SMTP and Mattermost provide secure, isolated notifications

6. Conclusion

An air-gapped Kubernetes environment is essentially a cloud without the Internet.
It must function entirely on its own — capable of deployment, monitoring, and recovery
without external dependencies.

ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. 본문 및 이미지를 무단 복제·배포할 수 없습니다. 공유 시 반드시 원문 링크를 명시해 주세요.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.

🛠 마지막 수정일: 2025.10.14