docs: correct disaster recovery runbook (monk primary, kscloud1 active-active, Forgejo activity fix)
This commit is contained in:
parent
8a014d27bc
commit
dbbb776808
1 changed files with 45 additions and 20 deletions
|
|
@ -2,23 +2,33 @@
|
|||
|
||||
## Purpose
|
||||
|
||||
This document describes how to restore the entire KiteStacks platform if the primary server (Assassin) fails.
|
||||
This document describes how to restore the entire KiteStacks platform if a
|
||||
host fails. As of 2026-06-10, KiteStacks runs active-active across two hosts
|
||||
plus Cloudflare Tunnel, so no single host is a hard dependency for the site
|
||||
to stay up.
|
||||
|
||||
## Current Infrastructure
|
||||
|
||||
Primary Production:
|
||||
- Host: Assassin
|
||||
- IP: 192.168.1.205
|
||||
- Host: monk
|
||||
- LAN IP: 192.168.1.205
|
||||
|
||||
Cloud Backup:
|
||||
- Host: kscloud1
|
||||
Cloud Failover (PERMANENT, active-active - NOT cold standby):
|
||||
- Host: kscloud1 (Hetzner VPS)
|
||||
- Public IP: 5.78.233.28
|
||||
- Tailscale IP: 100.123.254.52
|
||||
- Runs a full replica of all 9 services
|
||||
|
||||
assassin (T14): retired/OFF, no longer part of the topology.
|
||||
|
||||
Domains:
|
||||
- www.kitestacks.com
|
||||
- gitforge.kitestacks.com
|
||||
- www-backup.kitestacks.com
|
||||
- git-backup.kitestacks.com
|
||||
- www.kitestacks.com (+ ai, auth, gitforge, grafana, kavita, links, status, tasks)
|
||||
- www-backup.kitestacks.com / git-backup.kitestacks.com (kscloud1 direct
|
||||
A-records via local Caddy on port 80, separate from the Tunnel)
|
||||
|
||||
Cloudflare Tunnel:
|
||||
- 3 connectors load-balance ACTIVE-ACTIVE across all 9 *.kitestacks.com
|
||||
hostnames - no primary/backup priority.
|
||||
|
||||
## Recovery Priority
|
||||
|
||||
|
|
@ -31,20 +41,35 @@ Domains:
|
|||
|
||||
## Current Backup Status
|
||||
|
||||
Website Backup:
|
||||
- Operational on kscloud1
|
||||
Website:
|
||||
- Full replica running on kscloud1, served live via the Tunnel.
|
||||
|
||||
Forgejo Backup:
|
||||
- Operational on kscloud1
|
||||
Forgejo:
|
||||
- Full replica running on kscloud1, but with a SEPARATE database - repos and
|
||||
commits pushed to monk's Forgejo do NOT appear on kscloud1's Forgejo (and
|
||||
vice versa). Accepted tradeoff for uptime.
|
||||
- The portal's Recent Activity widget on BOTH hosts queries monk's Forgejo
|
||||
directly (FORGEJO_API_BASE -> http://100.85.209.116:3006 over Tailscale
|
||||
from kscloud1, http://localhost:3006 on monk) so it stays consistent
|
||||
regardless of which connector serves the page.
|
||||
|
||||
Git Repository:
|
||||
- Synced to both Forgejo instances
|
||||
Authentik:
|
||||
- Shared Postgres+Redis hosted on kscloud1, reachable only over Tailscale
|
||||
(100.123.254.52). Both monk's and kscloud1's authentik+worker use this
|
||||
single database/cache - fixes invalid_grant SSO caused by active-active
|
||||
routing splitting an OAuth flow across connectors.
|
||||
|
||||
Other stateful apps (kavita, karakeep, openproject, etc.):
|
||||
- Fresh/separate data on kscloud1 - may show different/stale data depending
|
||||
on which connector serves a request. Accepted as the cost of guaranteed
|
||||
uptime.
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
- [ ] Website accessible
|
||||
- [ ] Backup website accessible
|
||||
- [ ] Forgejo operational
|
||||
- [ ] Backup Forgejo operational
|
||||
- [ ] Website accessible (www.kitestacks.com)
|
||||
- [ ] kscloud1 replica accessible (www-backup.kitestacks.com)
|
||||
- [ ] Forgejo operational (gitforge.kitestacks.com)
|
||||
- [ ] kscloud1 Forgejo replica operational (git-backup.kitestacks.com)
|
||||
- [ ] Authentik SSO works (auth.kitestacks.com)
|
||||
- [ ] Cloudflare DNS verified
|
||||
- [ ] Cloudflare Tunnel verified
|
||||
- [ ] Cloudflare Tunnel: all 3 connectors healthy
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue