# KiteStacks Homelab - Debugging & Troubleshooting This document contains solutions and diagnostic steps for known issues that have occurred during the setup and operation of the KiteStacks homelab. --- ## 1. osTicket: New User Activation Emails Not Sending **Symptom:** When a new user registers for a Help Desk account, they do not receive the activation email. **Root Cause:** osTicket runs in a Docker container without a local Mail Transfer Agent (MTA) like Postfix or Sendmail. By default, PHP's internal `mail()` function silently fails because it cannot route the email. **Fix:** You must configure an external SMTP server in the osTicket Admin Panel. 1. Log into the osTicket Staff Control Panel (`/scp/`). 2. Go to **Emails > Emails**. 3. Select the default outbound email address (e.g., `noreply@kitestacks.com`). 4. Scroll down to **SMTP Settings** and configure it to use a real mail provider (e.g., SendGrid, Mailgun, Amazon SES, or Gmail SMTP). 5. Ensure **Authentication Required** is set to **Yes**. 6. Save and send a test email. ## 2. Cloudflare Tunnel "Hmm. We're having trouble finding that site" **Symptom:** A subdomain is correctly configured in Cloudflare Zero Trust, but visiting the site returns a Cloudflare error. **Root Cause 1:** The internal service is down or restarting. Check the Docker container logs. **Root Cause 2:** Multi-node load balancing cache. Cloudflare balances requests between `monk` and `kscloud1`. If you update a container on `monk` but forget to update it on `kscloud1`, 50% of requests will fail or show stale data. **Fix:** Ensure Docker containers on both hosts are perfectly mirrored or explicitly configure Cloudflare to route only to the active host for that specific subdomain. ## 3. Authentik "invalid_grant" or "Code does not exist" **Symptom:** Logging into a service via Authentik SSO randomly fails with "invalid_grant". **Root Cause:** Initially, `monk` and `kscloud1` ran separate Authentik Postgres databases. Auth codes were generated on one node and consumed on the other, failing validation. **Fix:** The Authentik databases are now **Unified** over Tailscale. `monk` points its Postgres and Redis connections to `100.123.254.52` (kscloud1). Do not run local Postgres for Authentik on `monk`. If Tailscale goes down, SSO will fail. ## 4. Kavita "Sign in with Authentik" Button Missing **Symptom:** The Authentik OIDC login button does not appear on the Kavita login screen. **Root Cause:** Kavita stores OIDC settings in its internal SQLite database (`kavita.db`), not in an environment variable. **Fix:** The OIDC settings must be configured manually via the Kavita UI (Admin Settings -> OIDC). Direct SQL edits are overwritten by the Kavita container upon restart. ## 5. Portainer Password Reset **Symptom:** Admin password is lost. **Fix:** Stop the Portainer container. You must use a Go container with `bbolt` to patch the underlying BoltDB directly, or temporarily pass the `--admin-password` flag to the container entrypoint to reset it. ## 6. Uptime Kuma "Reconnecting to server..." Loop **Symptom:** The Uptime Kuma UI constantly shows "Reconnecting...". **Fix:** Uptime Kuma requires WebSockets. Ensure Cloudflare Tunnel does not aggressively cache the HTML/JS and that Nginx proxy timeouts are not aggressively closing the WebSocket connection. ## 7. Forgejo Authentication Failures (LDAP/OIDC) **Symptom:** Forgejo throws a 500 or redirect error after SSO login. **Fix:** Ensure the `ROOT_URL` in Forgejo's `app.ini` exactly matches the public domain (`https://gitforge.kitestacks.com/`), and that the Authentik Application Launch URL strictly matches the OAuth redirect URI configured in Forgejo. ## 8. Random 502 Errors on New Subdomains (e.g., ntfy) **Symptom:** Accessing a newly created subdomain (like `ntfy.kitestacks.com`) randomly returns a 502 Bad Gateway error from Cloudflare, even though it works internally. **Root Cause:** The KiteStacks architecture uses a single Cloudflare Tunnel with multiple connectors (`monk` and `kscloud1`) for active-active high availability. Cloudflare load balances traffic across all active connectors blindly. If you deploy a new service (like `ntfy`) only on `monk`, any request that Cloudflare sends to the `kscloud1` connector will fail with a 502 because the container doesn't exist on that node. **Fix:** For a multi-connector tunnel setup, you **must** deploy the identical service stack on all nodes. Deploy the missing container (e.g., `ntfy`) to the `kscloud1` replica to ensure both connectors can route the traffic successfully.