pacemaker vs galera :
Following up on my earlier post about Pacemaker, I wanted to complement that discussion with a more practical comparison against Galera.
If you’re worried about data consistency issues during failover, this guide should help you decide which approach suits your environment.
Summary
- Two nodes, internal L2 network, ICMP restricted, cost-sensitive, brief downtime acceptable →
Pacemaker + Corosync (active/standby) + semi-sync (AFTER_SYNC) is a pragmatic choice.
Usingethmonitorat L2 avoids unnecessary failovers. - Three or more nodes, must continue writes during failures, near-zero RPO, read scalability required →
Galera (multi-primary, but single-writer recommended in practice) + proxy layer (HAProxy/ProxySQL) is the standard solution.
1) Problem Definition
The “best” HA approach depends on:
- Read/write ratio and DDL frequency
- RPO/RTO targets
- Network and security restrictions (ICMP/firewall)
- Number of nodes, budget, and operational maturity
Read-heavy + brief downtime acceptable + only two nodes → A simple, robust setup works best.
Write-heavy + minimal downtime + zero data loss + three nodes → A consensus-based cluster is more suitable.
2) Pacemaker + Corosync (Dual-Master Replication, Active/Standby)
Architecture
- Even with dual-master replication, writes only go to the node holding the VIP (the other runs with
super_read_only=ON). - Failover is handled by moving the VIP; replication should use GTID + ROW format.
Failure Detection (Without ICMP)
- In internal L2 segments where applications and DB nodes sit on the same switch, L3 ping checks cause false positives (e.g., gateway issue → failover triggered while service is fine).
- With
ethmonitor, you only check L1/L2 link state, so failover only occurs when the actual interface goes down. - Optional extras: Bond ARP monitor, route existence checks, or lightweight TCP health probes — but keep L2 as the baseline.
Consistency & Minimizing Data Loss
- Semi-sync (AFTER_SYNC) ensures that a commit is only acknowledged after at least one standby has received and logged the event.
- This reduces the risk of “vanished commits” during failover.
- Note: semi-sync falls back to async after timeout → monitoring must catch this immediately.
Recommended Configuration
MySQL (my.cnf):
binlog_format=ROW
gtid_mode=ON
enforce_gtid_consistency=ON
sync_binlog=1
innodb_flush_log_at_trx_commit=1
rpl_semi_sync_master_enabled=ON
rpl_semi_sync_slave_enabled=ON
rpl_semi_sync_master_wait_for_slave_count=1
rpl_semi_sync_master_wait_point=AFTER_SYNC
rpl_semi_sync_master_timeout=3000 # ms, tune as needed
Pacemaker (example flow):
# L2 monitoring
pcs resource create net_l2 ocf:heartbeat:ethmonitor interface=eno8 \
op monitor interval=2s timeout=5s on-fail=standby
# MySQL resource and VIP
pcs resource create mysql ocf:percona:mysql op monitor interval=5s timeout=20s
pcs resource create mysql_vip ocf:heartbeat:IPaddr2 ip=10.0.0.50 cidr_netmask=24
# Order: promote → VIP, VIP only on Master
pcs constraint order promote mysql then start mysql_vip
pcs constraint colocation add mysql_vip INFINITY: mysql Master:mysql
# Prevent flapping
pcs resource defaults resource-stickiness=200
pcs resource defaults migration-threshold=3
pcs property set failure-timeout=60s
Pros
- Works with just two nodes
- Strong fit for internal networks with ICMP restrictions
- Simple to manage, cost-efficient
Cons
- Risk of data loss if semi-sync falls back to async
- Brief downtime inevitable during failover
- No built-in read scale-out
3) Galera Cluster (Percona XtraDB Cluster / MariaDB Galera)
Architecture
- Technically multi-primary (all nodes can write), but in practice: single writer + multiple readers to avoid conflicts.
- Never build a master→slave chain inside the cluster. External slaves (for DR/reporting) can replicate off a single node.
Consistency & Failover
- Every commit is broadcast as a WriteSet, certified cluster-wide, and only applied if in the Primary Component (quorum).
- If the cluster partitions, the minority group becomes Non-Primary and rejects writes → split-brain prevented.
- RPO is effectively zero (virtually synchronous).
Routing (Proxies / Load Balancers)
- ProxySQL: built-in Galera awareness (writer/reader groups, health checks).
- HAProxy: uses
clustercheckscript (HTTP 200/503) to filter healthy backends. - L4-only option:
- (A) Assign a Writer VIP only to the writer node → L4 forwards to VIP.
- (B) Expose role health on a dedicated TCP port (e.g., 9200) → L4 health checks it.
HAProxy Example:
backend be_mysql_writer
mode tcp
option mysql-check user clustercheck
server n1 10.0.0.11:3306 check
server n2 10.0.0.12:3306 check backup
server n3 10.0.0.13:3306 check backup
backend be_mysql_readers
mode tcp
balance roundrobin
option mysql-check user clustercheck
server n2 10.0.0.12:3306 check
server n3 10.0.0.13:3306 check
Recommended Configuration
binlog_format=ROW
default_storage_engine=InnoDB
innodb_flush_log_at_trx_commit=1
sync_binlog=1
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name=galera-prod
wsrep_cluster_address="gcomm://10.0.0.11,10.0.0.12,10.0.0.13"
wsrep_node_address=<this_node_ip>
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sstuser:sstpass
wsrep_slave_threads=<~CPU cores>
gcache.size=1G
- All tables must have a primary key
- InnoDB only (avoid MyISAM)
- Use
wsrep_sync_waitif read-after-write consistency is required
Pros
- Near-zero RPO, strong data consistency
- Built-in quorum prevents split-brain
- Easy read scaling
Cons
- Requires at least three nodes (or garbd arbitrator)
- More complex (proxy layer, SST/IST, gcache tuning)
- Sensitive to latency and heavy writes/DDL
4) Network & Security Comparison
| Aspect | Pacemaker (Active/Standby) | Galera (3 nodes) |
|---|---|---|
| ICMP-restricted env. | Strong fit (L2 monitoring) | Requires proxy health endpoints (no ICMP needed though) |
| Same-switch L2 apps | Easy to avoid false failover | Added complexity with proxy/routing |
| L4-only setup | Very simple (VIP approach) | Possible with Writer-VIP or TCP health port |
5) Performance, Latency & Ops
| Aspect | Pacemaker | Galera |
|---|---|---|
| Commit latency | Low (LAN + semi-sync) | Higher (certification/consensus) |
| Read scaling | Limited (external replicas) | Strong (multi-reader nodes) |
| DDL / heavy txn | Relatively free | Careful planning needed (SST/IST impact) |
| Operational cost | Simple (2 nodes) | More complex (3+ nodes + proxy) |
6) Decision Matrix
Choose Pacemaker if:
- You must stick to two nodes
- ICMP restricted / internal L2 network
- Brief downtime is acceptable, mostly read traffic
- Ops team is comfortable with VIP + semi-sync
Choose Galera if:
- You can run three or more nodes
- Writes must continue during failures (RPO≈0)
- You need read scaling
- Ops team can handle ProxySQL/HAProxy and cluster tuning
7) Operational Checklist
Pacemaker
- Semi-sync AFTER_SYNC + monitor async fallback
- Order: promote → relay applied →
read_only=0→ VIP - Resource stickiness & migration threshold to avoid flapping
- Manage backup/restore/DDL with lifecycle controls
Galera
- Proxy health must check:
wsrep_local_state=4,wsrep_cluster_status='Primary', writerread_only=OFF - Tune gcache, ensure IST success, plan for SST load
- Use online schema tools (gh-ost, pt-osc)
- Never run only two nodes (use garbd if needed)
8) L4-Only Patterns
- Pacemaker: Single Writer-VIP → L4 forwards only to that VIP
- Galera:
- (A) Writer-VIP + reader IPs, or
- (B) Role health TCP ports (e.g., 9200) for L4 health checks
Conclusion
- If you run in an internal L2 environment, with two nodes, cost-sensitive, and can tolerate brief downtime,
→ Pacemaker + Corosync (active/standby) + semi-sync (AFTER_SYNC) + ethmonitor (L2) is practical and proven. - If you have three or more nodes and need near-zero RPO with continuous writes,
→ Galera + proxy layer (HAProxy/ProxySQL) is the standard. - In reality, mixed strategies often work best:
Galera for critical transactional DBs, Pacemaker for cache-like or read-heavy DBs.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. 본문 및 이미지를 무단 복제·배포할 수 없습니다. 공유 시 반드시 원문 링크를 명시해 주세요.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.
🛠 마지막 수정일: 2025.09.30
답글 남기기
댓글을 달기 위해서는 로그인해야합니다.