4.5 KiB
KiteStacks Homelab - Debugging & Troubleshooting
This document contains solutions and diagnostic steps for known issues that have occurred during the setup and operation of the KiteStacks homelab.
1. osTicket: New User Activation Emails Not Sending
Symptom: When a new user registers for a Help Desk account, they do not receive the activation email.
Root Cause: osTicket runs in a Docker container without a local Mail Transfer Agent (MTA) like Postfix or Sendmail. By default, PHP's internal mail() function silently fails because it cannot route the email.
Fix: You must configure an external SMTP server in the osTicket Admin Panel.
- Log into the osTicket Staff Control Panel (
/scp/). - Go to Emails > Emails.
- Select the default outbound email address (e.g.,
noreply@kitestacks.com). - Scroll down to SMTP Settings and configure it to use a real mail provider (e.g., SendGrid, Mailgun, Amazon SES, or Gmail SMTP).
- Ensure Authentication Required is set to Yes.
- Save and send a test email.
2. Cloudflare Tunnel "Hmm. We're having trouble finding that site"
Symptom: A subdomain is correctly configured in Cloudflare Zero Trust, but visiting the site returns a Cloudflare error.
Root Cause 1: The internal service is down or restarting. Check the Docker container logs.
Root Cause 2: Multi-node load balancing cache. Cloudflare balances requests between monk and kscloud1. If you update a container on monk but forget to update it on kscloud1, 50% of requests will fail or show stale data.
Fix: Ensure Docker containers on both hosts are perfectly mirrored or explicitly configure Cloudflare to route only to the active host for that specific subdomain.
3. Authentik "invalid_grant" or "Code does not exist"
Symptom: Logging into a service via Authentik SSO randomly fails with "invalid_grant".
Root Cause: Initially, monk and kscloud1 ran separate Authentik Postgres databases. Auth codes were generated on one node and consumed on the other, failing validation.
Fix: The Authentik databases are now Unified over Tailscale. monk points its Postgres and Redis connections to 100.123.254.52 (kscloud1). Do not run local Postgres for Authentik on monk. If Tailscale goes down, SSO will fail.
4. Kavita "Sign in with Authentik" Button Missing
Symptom: The Authentik OIDC login button does not appear on the Kavita login screen.
Root Cause: Kavita stores OIDC settings in its internal SQLite database (kavita.db), not in an environment variable.
Fix: The OIDC settings must be configured manually via the Kavita UI (Admin Settings -> OIDC). Direct SQL edits are overwritten by the Kavita container upon restart.
5. Portainer Password Reset
Symptom: Admin password is lost.
Fix: Stop the Portainer container. You must use a Go container with bbolt to patch the underlying BoltDB directly, or temporarily pass the --admin-password flag to the container entrypoint to reset it.
6. Uptime Kuma "Reconnecting to server..." Loop
Symptom: The Uptime Kuma UI constantly shows "Reconnecting...". Fix: Uptime Kuma requires WebSockets. Ensure Cloudflare Tunnel does not aggressively cache the HTML/JS and that Nginx proxy timeouts are not aggressively closing the WebSocket connection.
7. Forgejo Authentication Failures (LDAP/OIDC)
Symptom: Forgejo throws a 500 or redirect error after SSO login.
Fix: Ensure the ROOT_URL in Forgejo's app.ini exactly matches the public domain (https://gitforge.kitestacks.com/), and that the Authentik Application Launch URL strictly matches the OAuth redirect URI configured in Forgejo.
8. Random 502 Errors on New Subdomains (e.g., ntfy)
Symptom: Accessing a newly created subdomain (like ntfy.kitestacks.com) randomly returns a 502 Bad Gateway error from Cloudflare, even though it works internally.
Root Cause: The KiteStacks architecture uses a single Cloudflare Tunnel with multiple connectors (monk and kscloud1) for active-active high availability. Cloudflare load balances traffic across all active connectors blindly. If you deploy a new service (like ntfy) only on monk, any request that Cloudflare sends to the kscloud1 connector will fail with a 502 because the container doesn't exist on that node.
Fix: For a multi-connector tunnel setup, you must deploy the identical service stack on all nodes. Deploy the missing container (e.g., ntfy) to the kscloud1 replica to ensure both connectors can route the traffic successfully.