An Examination of Monitoring Metrics: Part 3 Redis

1. Memory Metrics

used_memory

  • The actual memory consumed by the Redis process.
  • A sharp increase in usage compared to physical memory may indicate an OOM (Out Of Memory) risk.
  • Always verify maxmemory settings and eviction policies.

mem_fragmentation_ratio

  • Memory fragmentation ratio.
  • Close to 1.0 is normal.
  • If ≥ 1.5, fragmentation is severe → consider restart or RDB/AOF rewrite.

evicted_keys

  • Number of keys forcibly removed when maxmemory is exceeded.
  • An increase implies rising cache miss probability.
  • Review eviction policies (noeviction, allkeys-lru, etc.).

2. Performance Metrics

instantaneous_ops_per_sec

  • Number of commands processed per second (QPS).
  • Useful for identifying traffic spikes.
  • Watch for sudden increases or drops against baseline.

slowlog

  • Number of slow query entries recorded in Redis per second.
  • A value near 0 is normal.
  • Persistent growth indicates blocking commands or large dataset operations in applications.

3. Connection Metrics

blocked_clients

  • Number of clients waiting due to blocking commands (BRPOP, BLPOP, etc.).
  • A surge may indicate queue processing bottlenecks.

connected_clients

  • Current number of connected clients.
  • Compare with application connection pool configuration.
  • Approaching maxclients risks new connection failures.

rejected_connections

  • Number of connections rejected due to exceeding concurrency limits.
  • Frequent increases suggest adjusting client pool settings.

4. Network Metrics

total_net_input_bytes / total_net_output_bytes

  • Cumulative inbound/outbound data volume.
  • Useful for identifying network bandwidth trends.
  • Spikes at specific times may indicate large value SET/GET operations.

5. Persistence Metrics

(Persistence: the property of data being safely preserved beyond memory to disk)

rdb_last_bgsave_status

  • Status of the last RDB snapshot (success/failure).
  • Failures may be due to insufficient disk space or permission errors.

aof_last_bgrewrite_status

  • Status of the last AOF rewrite (success/failure).
  • Monitor disk usage when AOF file size grows rapidly.

rdb_changes_since_last_save

  • Number of key changes since the last save.
  • If it grows excessively beyond the save interval, data loss risk increases in case of failure.

6. Cache Efficiency Metrics

keyspace_hits / keyspace_misses

  • Cache Hit Ratio = hits / (hits + misses) * 100
  • Should ideally remain ≥ 90.
  • A lower ratio suggests a need to review cache policies or data structures.

⚠ Note: The default Redis template for Zabbix agent2 does not include a Cache Hit Ratio item.
You must create a calculated item based on existing values:

last(//redis.stats.keyspace_hits) /
( last(//redis.stats.keyspace_hits) + last(//redis.stats.keyspace_misses) ) * 100

✅ Operational Guidelines

  • Prioritize checking memory usage and fragmentation ratio.
  • Monitor connection counts and rejections to identify client connection issues.
  • Use ops/sec and slowlog together to detect performance degradation.
  • Watch cache hit ratio and evicted_keys growth as signals for reviewing cache policies.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. 본문 및 이미지를 무단 복제·배포할 수 없습니다. 공유 시 반드시 원문 링크를 명시해 주세요.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.

🛠 마지막 수정일: 2025.09.18