docs: document phantom 3rd tunnel replica fix + update runbook for 2-connector arch
- DEBUGGING.md: add issue #9 — native cloudflared systemd running alongside Docker container causes phantom 3rd replica in CF dashboard; fix is to disable systemd service - RUNBOOK.md: correct architecture diagram from 3 connectors to 2 (monk Docker + kscloud1); add warning to disable native cloudflared systemd after containerizing; update failover test procedure with verified 2026-06-16 results (zero downtime confirmed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
d439a1bb46
commit
e69f236c89
2 changed files with 42 additions and 9 deletions
27
RUNBOOK.md
27
RUNBOOK.md
|
|
@ -12,10 +12,9 @@
|
|||
Internet
|
||||
│
|
||||
└── Cloudflare (DNS + Tunnel)
|
||||
│ Active-Active across 3 connectors
|
||||
├── cloudflared on monk (primary home machine)
|
||||
├── cloudflared on kscloud1 (Hetzner VPS, <KSCLOUD1_PUBLIC_IP>)
|
||||
└── cloudflared on T14s (currently OFF)
|
||||
│ Active-Active across 2 connectors
|
||||
├── cloudflared on monk (primary home machine, Docker container)
|
||||
└── cloudflared on kscloud1 (Hetzner VPS, <KSCLOUD1_PUBLIC_IP>)
|
||||
|
||||
Tailscale overlay network (VPN mesh):
|
||||
monk <MONK_TAILSCALE_IP>
|
||||
|
|
@ -327,6 +326,12 @@ networks:
|
|||
cd ~/kitestacks-live/docker/cloudflared && docker compose up -d
|
||||
```
|
||||
|
||||
> **Important:** After starting the Docker container, check for a pre-existing native cloudflared systemd service and disable it — both will connect with the same token and register as separate phantom replicas in the CF dashboard:
|
||||
> ```bash
|
||||
> systemctl status cloudflared
|
||||
> sudo systemctl stop cloudflared && sudo systemctl disable cloudflared
|
||||
> ```
|
||||
|
||||
### 5.2 Authentik (monk side — points to shared DB on kscloud1)
|
||||
|
||||
`~/kitestacks-live/docker/authentik/docker-compose.yml`:
|
||||
|
|
@ -1121,7 +1126,7 @@ All 9 service directories live under `/opt/kitestacks/docker/` on kscloud1. The
|
|||
- `FORGEJO_API_BASE=http://<MONK_TAILSCALE_IP>:<port>` for metrics-api (monk's Forgejo over Tailscale)
|
||||
- Authentik on kscloud1 uses the same shared DB (it's the host — localhost resolves fine; use `<KSCLOUD1_TAILSCALE_IP>` for consistency)
|
||||
|
||||
### 7.1 Deploy cloudflared on kscloud1 (3rd connector)
|
||||
### 7.1 Deploy cloudflared on kscloud1 (2nd connector)
|
||||
|
||||
Same `docker-compose.yml` as monk — same `TUNNEL_TOKEN`. Cloudflare assigns a new connector ID automatically.
|
||||
|
||||
|
|
@ -1314,17 +1319,21 @@ Expected: all return 200 (or 301/302 for redirect-based logins).
|
|||
- [ ] `https://links.kitestacks.com` → Karakeep login with Authentik → works
|
||||
- [ ] `https://kavita.kitestacks.com` → "Sign in with authentik" → works
|
||||
|
||||
### Failover test (disconnect monk's internet)
|
||||
### Failover test (stop monk's cloudflared)
|
||||
|
||||
With monk's home network off (phone hotspot or at a different location):
|
||||
```bash
|
||||
for sub in www auth gitforge tasks ai links kavita grafana status; do
|
||||
docker stop cloudflared
|
||||
sleep 5
|
||||
for sub in www auth gitforge tasks ai links kavita grafana status portainer; do
|
||||
code=$(curl -sk -o /dev/null -w "%{http_code}" "https://${sub}.kitestacks.com")
|
||||
echo "$sub: $code"
|
||||
done
|
||||
docker start cloudflared
|
||||
```
|
||||
|
||||
All 9 should still return 200 (served by kscloud1).
|
||||
All subdomains should return 200/302 (served by kscloud1 alone).
|
||||
|
||||
**Verified 2026-06-16:** www=200, auth=302, status=302, portainer=200 — zero downtime during monk cloudflared outage. kscloud1 took over immediately.
|
||||
|
||||
### Authentik shared DB health
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue