Zabbix Database Performance Optimization

(Practical MySQL Tuning Guide for Real-World Operations)

zabbix database performance optimization :

When running Zabbix at any real scale, the first bottleneck you’ll hit is the database.
Long before CPU or memory become an issue, InnoDB I/O and Zabbix’s internal caching layers start choking the system.
Especially in environments collecting hundreds or thousands of metrics every second, database tuning directly determines overall stability.

This post summarizes the most effective database optimization techniques tested in production on a single Zabbix Server + MySQL setup.

For more information on this topic, especially regarding partitioned tables, please refer to the URL below, which provides a detailed explanation along with the full installation guide.


1. Zabbix Database Structure Overview

Zabbix data is divided into two main table groups:

TypeTablesCharacteristics
Historyhistory, history_uint, history_str, history_logStores raw metrics per second, heavy random I/O load
Trendstrends, trends_uintAggregated hourly averages/max/min values, lighter I/O

Almost all database load originates from the history tables.
The key is how efficiently you manage and periodically clean up these tables.


2. Separating History and Trends Storage

Even on a single server, splitting disks logically makes a big difference.
Because Zabbix workloads involve constant random writes, SSDs are strongly recommended for history data.

This moves random write I/O from history tables to SSDs,
while trend read/write operations stay on another disk — reducing contention dramatically.


3. Partitioned Tables

When the housekeeper deletes millions of rows, the database comes under such heavy load that it’s effectively stalled. It’s worth considering a partitioned table design.


4. Housekeeper Tuning

The Housekeeper is responsible for deleting old data.
In large setups, it often causes more harm than good.

It is recommended to operate using partitioned tables, and as of Zabbix 7.4, this configuration must be adjusted via the web UI.

Go to: Administration → Housekeeping

Set as follows:

Trends → ☐ (disable)

Events and alerts → ☐ (disable)

Services → ☐ (disable)

User sessions → ☑ (keep)

History → ☐ (disable)


5. Cache and Process Tuning

Zabbix Server doesn’t write directly to the database — it writes to caches first.
If those caches are too small, data immediately flushes to MySQL, causing heavy I/O spikes.

Check these parameters in /etc/zabbix/zabbix_server.conf:

OptionDescriptionRecommended
CacheSizeConfiguration cache (hosts, templates, items)256M–512M
HistoryCacheSizeReal-time metric cache512M–1G
TrendCacheSizeCache for trend calculations≥256M
ValueCacheSizeRecent value cache (for trigger evaluation)≥512M
StartDBSyncersParallel DB write processes8–16
StartPollersPoller processesabout (number of hosts / 50)

These settings directly reduce DB pressure.


6. Core MySQL InnoDB Tuning Parameters

Zabbix relies heavily on InnoDB.
Default MySQL settings will not survive real-world workloads.
Below are the must-tune parameters for production.

[mysqld]
innodb_buffer_pool_size = 4G
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 2
innodb_doublewrite = 1
innodb_flush_method = O_DIRECT
innodb_io_capacity = 2000
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_buffer_pool_instances = 4

🔹 innodb_buffer_pool_size

Caches both data and indexes — the single most critical variable.
Allocate 60–70% of total system memory.
Too small, and you’ll see a massive rise in disk I/O.


🔹 innodb_flush_log_at_trx_commit

Controls how often InnoDB flushes transaction logs to disk.

ValueBehaviorPerformanceDurability
0Flush every secondFastLow
1Flush on every commitSlowSafest
2Flush to OS cache on commitBalancedSafe enough

For Zabbix — with hundreds of transactions per second — 2 is optimal.
At worst, you may lose one second of data during a power failure,
which is operationally negligible.


🔹 innodb_doublewrite

There is an increased risk of page corruption during power failure or kernel panic.
Even in an NVMe + O_DIRECT environment, it is safer to keep the value at 1 for production databases.
Keep it 1 unless you are hitting a performance bottleneck and are willing to accept the associated risk.


🔹 innodb_flush_method

Bypassing filesystem caching with O_DIRECT eliminates double buffering and reduces latency.


🔹 innodb_io_capacity

Defines the maximum IOPS the InnoDB background threads will try to use.
On SSDs, set it between 2000–4000.
The default (200) is far too conservative and delays page flushing.


7. RAID Chunk Size / I/O Optimization

Zabbix databases perform mostly small, random writes.
With RAID, smaller chunk sizes (16–32 KB) match InnoDB’s 16 KB page size best.

PurposeRecommended Chunk SizeReason
History (random write)16–32 KBReduces head movement
Trends (sequential write)≥64 KBImproves sequential throughput

In RAID10, stripe width = chunk size × (n – 1).
Align stripe size carefully with InnoDB page boundaries to avoid wasted I/O.


8. Monitoring for Bottlenecks

Always confirm where the bottleneck lies before tuning.
Regularly check these metrics:

MetricTargetMeaning
zabbix[queue]< 100Polling queue healthy
zabbix[wcache,history,pfree]> 25%Cache has free space
MySQL Threads_running< 50DB not overloaded
iostat %util< 70%Disk utilization healthy
innodb_row_lock_time_avg< 10 msMinimal lock contention

If all are within range, your tuning is solid.


9. Example: Optimal Single-Server Setup

Zabbix Server
 ├─ MySQL (history → SSD, trends → HDD)
 ├─ InnoDB buffer pool: 4 GB
 ├─ Disable Housekeepers
 ├─ innodb_flush_log_at_trx_commit=2
 ├─ CacheSize=512 MB
 ├─ HistoryCacheSize=1 GB
 └─ TrendCacheSize=256 MB
  • Store history tables on SSD
  • Keep trends on HDD for long-term data
  • Disable the housekeeper
  • Run daily cron jobs to drop old partitions
  • Use InnoDB flush mode 2 (flush to OS cache)

With this setup, even environments ingesting 10 K–20 K metrics per second show almost no queue delay.


10. Conclusion

The core principle of Zabbix DB performance is simple:
Reduce InnoDB I/O and rely on caching as much as possible.

  1. Separate history/trends storage
  2. Use partition-based cleanup
  3. Disable Housekeepers
  4. Use innodb_flush_log_at_trx_commit=2
  5. Maximize innodb_buffer_pool_size
  6. Expand CacheSize and ValueCacheSize

Get these six right, and your Zabbix server will stay stable even under several times its normal load.
Understand why the database fails first, and you’ll keep your monitoring platform alive much longer

ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. 본문 및 이미지를 무단 복제·배포할 수 없습니다. 공유 시 반드시 원문 링크를 명시해 주세요.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.

🛠 마지막 수정일: 2025.10.28