1. Memory Metrics



used_memory
- The actual memory consumed by the Redis process.
- A sharp increase in usage compared to physical memory may indicate an OOM (Out Of Memory) risk.
- Always verify
maxmemorysettings and eviction policies.
mem_fragmentation_ratio
- Memory fragmentation ratio.
- Close to
1.0is normal. - If ≥ 1.5, fragmentation is severe → consider restart or RDB/AOF rewrite.
evicted_keys
- Number of keys forcibly removed when
maxmemoryis exceeded. - An increase implies rising cache miss probability.
- Review eviction policies (
noeviction,allkeys-lru, etc.).
2. Performance Metrics


instantaneous_ops_per_sec
- Number of commands processed per second (QPS).
- Useful for identifying traffic spikes.
- Watch for sudden increases or drops against baseline.
slowlog
- Number of slow query entries recorded in Redis per second.
- A value near
0is normal. - Persistent growth indicates blocking commands or large dataset operations in applications.
3. Connection Metrics

blocked_clients
- Number of clients waiting due to blocking commands (
BRPOP,BLPOP, etc.). - A surge may indicate queue processing bottlenecks.
connected_clients
- Current number of connected clients.
- Compare with application connection pool configuration.
- Approaching
maxclientsrisks new connection failures.
rejected_connections
- Number of connections rejected due to exceeding concurrency limits.
- Frequent increases suggest adjusting client pool settings.
4. Network Metrics

total_net_input_bytes / total_net_output_bytes
- Cumulative inbound/outbound data volume.
- Useful for identifying network bandwidth trends.
- Spikes at specific times may indicate large value
SET/GEToperations.
5. Persistence Metrics
(Persistence: the property of data being safely preserved beyond memory to disk)

rdb_last_bgsave_status
- Status of the last RDB snapshot (success/failure).
- Failures may be due to insufficient disk space or permission errors.
aof_last_bgrewrite_status
- Status of the last AOF rewrite (success/failure).
- Monitor disk usage when AOF file size grows rapidly.
rdb_changes_since_last_save
- Number of key changes since the last save.
- If it grows excessively beyond the save interval, data loss risk increases in case of failure.
6. Cache Efficiency Metrics

keyspace_hits / keyspace_misses
- Cache Hit Ratio =
hits / (hits + misses)* 100 - Should ideally remain ≥ 90.
- A lower ratio suggests a need to review cache policies or data structures.
⚠ Note: The default Redis template for Zabbix agent2 does not include a Cache Hit Ratio item.
You must create a calculated item based on existing values:
last(//redis.stats.keyspace_hits) /
( last(//redis.stats.keyspace_hits) + last(//redis.stats.keyspace_misses) ) * 100

✅ Operational Guidelines
- Prioritize checking memory usage and fragmentation ratio.
- Monitor connection counts and rejections to identify client connection issues.
- Use ops/sec and slowlog together to detect performance degradation.
- Watch cache hit ratio and evicted_keys growth as signals for reviewing cache policies.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. 본문 및 이미지를 무단 복제·배포할 수 없습니다. 공유 시 반드시 원문 링크를 명시해 주세요.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.
🛠 마지막 수정일: 2025.09.18
답글 남기기
댓글을 달기 위해서는 로그인해야합니다.