# KiteStacks Disaster Recovery Runbook ## Purpose This document describes how to restore the entire KiteStacks platform if a host fails. As of 2026-06-10, KiteStacks runs active-active across two hosts plus Cloudflare Tunnel, so no single host is a hard dependency for the site to stay up. ## Current Infrastructure Primary Production: - Host: monk - LAN IP: Cloud Failover (PERMANENT, active-active - NOT cold standby): - Host: kscloud1 (Hetzner VPS) - Public IP: - Tailscale IP: - Runs a full replica of all 9 services T14s: Active cluster node (GitOps). Domains: - www.kitestacks.com (+ ai, auth, gitforge, grafana, kavita, links, status, tasks) - www-backup.kitestacks.com / git-backup.kitestacks.com (kscloud1 direct A-records via local Caddy on port , separate from the Tunnel) Cloudflare Tunnel: - 3 connectors load-balance ACTIVE-ACTIVE across all 9 *.kitestacks.com hostnames - no primary/backup priority. ## Recovery Priority 1. Forgejo 2. Website 3. Authentik 4. Monitoring 5. AI Services 6. Knowledge Services ## Current Backup Status Website: - Full replica running on kscloud1, served live via the Tunnel. Forgejo: - Full replica running on kscloud1, but with a SEPARATE database - repos and commits pushed to monk's Forgejo do NOT appear on kscloud1's Forgejo (and vice versa). Accepted tradeoff for uptime. - The portal's Recent Activity widget on BOTH hosts queries monk's Forgejo directly (FORGEJO_API_BASE -> http://: over Tailscale from kscloud1, http://localhost: on monk) so it stays consistent regardless of which connector serves the page. Authentik: - Shared Postgres+Redis hosted on kscloud1, reachable only over Tailscale (). Both monk's and kscloud1's authentik+worker use this single database/cache - fixes invalid_grant SSO caused by active-active routing splitting an OAuth flow across connectors. Other stateful apps (kavita, karakeep, openproject, etc.): - Fresh/separate data on kscloud1 - may show different/stale data depending on which connector serves a request. Accepted as the cost of guaranteed uptime. ## Validation Checklist - [ ] Website accessible (www.kitestacks.com) - [ ] kscloud1 replica accessible (www-backup.kitestacks.com) - [ ] Forgejo operational (gitforge.kitestacks.com) - [ ] kscloud1 Forgejo replica operational (git-backup.kitestacks.com) - [ ] Authentik SSO works (auth.kitestacks.com) - [ ] Cloudflare DNS verified - [ ] Cloudflare Tunnel: all 3 connectors healthy