Abstract
Disaster Recovery(DR) is critical for business continuity in the face of widespread outages taking down entire data centers or cloud provider regions. DR relies on deployment to multiple locations, data replication, monitoring for failure and failover. The process is typically manual involving several moving parts, and, even in the best case, involves some downtime for end-users. A multi-cluster K8s control plane presents the opportunity to automate the DR setup as well as the failure detection and failover. Such automation can dramatically reduce RTO and improve availability for end-users. This talk (and demo) describes one such setup using the open source Percona Operator for PostgreSQL and a multi-cluster K8s orchestrator. The orchestrator will use policy driven placement to replicate the entire workload on multiple clusters (in different regions), detect failure using pluggable logic, and do failover processing by promoting the standby as well as redirecting application traffic





