diff --git a/RUNBOOK.md b/RUNBOOK.md index 834ce56..0fab624 100644 --- a/RUNBOOK.md +++ b/RUNBOOK.md @@ -1417,3 +1417,5 @@ The entire KiteStacks homelab is secured by a Zero Trust architecture: ## Troubleshooting For detailed diagnostics, password resets, and specific issue fixes (such as missing osTicket emails, Uptime Kuma connection loops, or Authentik token errors), please see: [**docs/DEBUGGING.md**](docs/DEBUGGING.md) + +**Important Architecture Note:** The Cloudflare Tunnel is load-balanced between `monk` and `kscloud1`. Any new container (e.g., `ntfy`) MUST be deployed to **both** nodes to prevent random 502 Bad Gateway errors. diff --git a/docs/DEBUGGING.md b/docs/DEBUGGING.md index cab7eab..810c516 100644 --- a/docs/DEBUGGING.md +++ b/docs/DEBUGGING.md @@ -42,3 +42,8 @@ This document contains solutions and diagnostic steps for known issues that have ## 7. Forgejo Authentication Failures (LDAP/OIDC) **Symptom:** Forgejo throws a 500 or redirect error after SSO login. **Fix:** Ensure the `ROOT_URL` in Forgejo's `app.ini` exactly matches the public domain (`https://gitforge.kitestacks.com/`), and that the Authentik Application Launch URL strictly matches the OAuth redirect URI configured in Forgejo. + +## 8. Random 502 Errors on New Subdomains (e.g., ntfy) +**Symptom:** Accessing a newly created subdomain (like `ntfy.kitestacks.com`) randomly returns a 502 Bad Gateway error from Cloudflare, even though it works internally. +**Root Cause:** The KiteStacks architecture uses a single Cloudflare Tunnel with multiple connectors (`monk` and `kscloud1`) for active-active high availability. Cloudflare load balances traffic across all active connectors blindly. If you deploy a new service (like `ntfy`) only on `monk`, any request that Cloudflare sends to the `kscloud1` connector will fail with a 502 because the container doesn't exist on that node. +**Fix:** For a multi-connector tunnel setup, you **must** deploy the identical service stack on all nodes. Deploy the missing container (e.g., `ntfy`) to the `kscloud1` replica to ensure both connectors can route the traffic successfully.