An Examination of Monitoring Metrics: Part 4 Elasticsearch

Ready to streamline your complex Zabbix operations?

I’ve distilled the most valuable insights from this blog into one essential guide. Take full control of your environment with the Zabbix 7 Enterprise Optimization Handbook [Mastering Hybrid Infrastructure Monitoring with a Kubernetes First Approach].

👉 Get your PDF copy here: https://jikimy.gumroad.com/l/zabbixmaster

🧭 Looking for related posts? Search for “An Examination of Monitoring Metrics” in the search bar.

1. Cluster Health Metrics

cluster health
Overall cluster status.

Green = healthy
Yellow = replica shards unassigned
Red = risk of data loss

unassigned shards
Number of shards not assigned to any node.

Normal value: 0
Increases when disk space is low, a node goes down, or shard relocation is delayed

2. Resource Metrics

Total size of all file stores / Total available size to JVM in all file stores

Total = physical disk capacity across all data paths
Available = actual usable space as reported to the JVM (excludes filesystem reservations/quotas)
Used to determine whether new shards can be allocated

⚠️ Problem points when Available decreases rapidly

Caused by index growth, log bursts, or replica expansion
Watermark thresholds (default values):
- 85% used → no new shard allocations
- 90% used → existing shards relocated away from the node
- 95% used → affected indices switched to read-only

Summary

Looking only at Total can be misleading; Available is often much smaller.
Total size = raw physical capacity.
Total available to JVM = what Elasticsearch can actually use.
Not related to JVM Heap; reflects only filesystem availability.
Always monitor Available for real operational decisions.

jvm_heap_usage_percent

JVM Heap utilization.
Sustained 85%+ → Full GC frequency increases, higher risk of latency.
95%+ → OutOfMemoryError becomes likely.

node uptime

Node runtime duration.
Frequent restarts are an early sign of instability.

3. Performance Metrics

query latency
Search query response time.

Rising latency in milliseconds signals degraded user experience.

service response_time
REST API response time.

Persistent increases indicate backend resource bottlenecks.

4. Indexing & Connection Metrics

flush latency
Time required to complete a flush operation.

Indicates disk I/O bottlenecks.

Indexing flow:

Document → in-memory buffer → segment write (recorded in translog)
Refresh → buffer promoted to segment, searchable
Flush → translog safely persisted to disk and segment committed

Operational meaning:

Higher flush latency → slower disk I/O, larger translogs,
and longer recovery times during failures

http connections opened
Number of open HTTP connections.

Spikes may suggest client-side load surges or connection pooling issues.

✅ Operational Takeaways

Cluster Health + unassigned shards → the first and most critical stability check
Disk usage (Available) + JVM Heap → best indicators of capacity risks
Query Latency + Response Time → primary bottleneck detectors
Flush Latency + HTTP Connections → highlight data processing delays and client load pressure

🛠 마지막 수정일: 2025.12.22

ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. 본문 및 이미지를 무단 복제·배포할 수 없습니다. 공유 시 반드시 원문 링크를 명시해 주세요.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.

💡 도움이 필요하신가요?
Zabbix, Kubernetes, 그리고 다양한 오픈소스 인프라 환경에 대한 구축, 운영, 최적화, 장애 분석, 광고 및 협업 제안이 필요하다면 언제든 편하게 연락 주세요.

📧 Contact: jikimy75@gmail.com
💼 Service: 구축 대행 | 성능 튜닝 | 장애 분석 컨설팅

📖 E-BooK [PDF] 전자책 (Gumroad): Zabbix 엔터프라이즈 최적화 핸드북
블로그에서 다룬 Zabbix 관련 글들을 기반으로 실무 중심의 지침서로 재구성했습니다. 운영 환경에서 바로 적용할 수 있는 최적화·트러블슈팅 노하우까지 모두 포함되어 있습니다.

💡 Need Professional Support?
If you need deployment, optimization, or troubleshooting support for Zabbix, Kubernetes, or any other open-source infrastructure in your production environment, or if you are interested in sponsorships, ads, or technical collaboration, feel free to contact me anytime.

📧 Email: jikimy75@gmail.com
💼 Services: Deployment Support | Performance Tuning | Incident Analysis Consulting

📖 PDF eBook (Gumroad): Zabbix Enterprise Optimization Handbook
A single, production-ready PDF that compiles my in-depth Zabbix and Kubernetes monitoring guides.

1. Cluster Health Metrics

2. Resource Metrics

3. Performance Metrics

4. Indexing & Connection Metrics

✅ Operational Takeaways

코멘트

답글 남기기 응답 취소