From c4c6b49bf4068419c4ec1abe944cdab5b1d2f8a3 Mon Sep 17 00:00:00 2001 From: kenpat Date: Mon, 15 Jun 2026 15:23:44 -0500 Subject: [PATCH] docs: clean up runbook and create dedicated debugging guide --- RUNBOOK.md | 82 ++++++----------------------------------------- docs/DEBUGGING.md | 44 +++++++++++++++++++++++++ 2 files changed, 53 insertions(+), 73 deletions(-) create mode 100644 docs/DEBUGGING.md diff --git a/RUNBOOK.md b/RUNBOOK.md index bbf478f..834ce56 100644 --- a/RUNBOOK.md +++ b/RUNBOOK.md @@ -1407,77 +1407,13 @@ Do not commit `.env` files, webhook URLs, or database files to this repo. --- -## Phase 8: Forgejo Sync + osTicket Authentik LDAP SSO (2026-06-14/15) +## Security Posture +The entire KiteStacks homelab is secured by a Zero Trust architecture: +1. **No Open Inbound Ports:** All public subdomains are routed through Cloudflare Tunnels (edge-to-container). The home router has 0 forwarded ports. +2. **Network Isolation:** Internal communication between nodes (`monk` and `kscloud1`) happens strictly over a WireGuard-based Tailscale mesh network (`100.x.x.x`). +3. **SSO Protection:** Authentik acts as the Identity Provider (OIDC/LDAP) and reverse proxy (Embedded Outpost) protecting all sensitive endpoints (`/scp/` for osTicket, Portainer, Grafana, Kite AI, etc.). +4. **Standalone Bypasses:** The only apps fully public without SSO are the main Portal (`www`) and the read-only FluxCD GitOps Dashboard (`flux`), which was isolated into a standalone Nginx container specifically to decouple it from Authentik. -### Forgejo Sync (monk → kscloud1) - -Monk is authoritative. kscloud1 Forgejo is a read replica synced every 6 hours. - -**Sync script:** `~/kitestacks-live/docker/forgejo/sync-to-cloud.sh` -**Cron:** `0 */6 * * *` on monk, logs to `/tmp/forgejo-sync.log` - -Manual sync: -```bash -~/kitestacks-live/docker/forgejo/sync-to-cloud.sh -``` - -To re-do a full restore from scratch (e.g., after kscloud1 rebuild): -```bash -# On monk: create dump -docker exec -u git forgejo /app/gitea/gitea dump --type zip -f /tmp/forgejo-backup.zip -docker cp forgejo:/tmp/forgejo-backup.zip /tmp/forgejo-backup.zip -# Transfer and restore on cloud host — see claude-memory for detailed steps -``` - -### osTicket Authentik LDAP SSO - -Staff log into `tasks.kitestacks.com/scp/` using their **Authentik credentials** (not a separate osTicket password). - -**Architecture:** -``` -osticket-app → authentik-ldap-proxy:389 (socat) → authentik-ldap:3389 → auth.kitestacks.com -``` - -**Services deployed:** -- `~/kitestacks-live/docker/authentik-ldap/` — LDAP outpost + socat proxy on monk -- `/opt/kitestacks/docker/authentik-ldap/` — LDAP outpost on kscloud1 - -**LDAP search account:** `cn=ldap-svc,ou=users,dc=ldap,dc=goauthentik,dc=io` -Password stored in Authentik and in osTicket's `ost_config` (namespace `plugin.2`, key `bind_pw`, encrypted). - -**auth-ldap.phar** at `/data/upload/include/plugins/auth-ldap.phar` inside the osticket-app container has been patched (original at `.phar.orig`). Do NOT replace it with the upstream version — the patch is required for PHP 7.3 + PEAR compatibility. - -**If LDAP login stops working:** -```bash -# Check LDAP outpost is running and connected -docker logs authentik-ldap --since 5m | grep -v debug -docker logs authentik-ldap-proxy 2>&1 | tail -5 - -# Test bind from osticket-app container -docker exec osticket-app php -r " - \$c = @ldap_connect('authentik-ldap-proxy'); - ldap_set_option(\$c, LDAP_OPT_PROTOCOL_VERSION, 3); - \$r = @ldap_bind(\$c, 'cn=ldap-svc,ou=users,dc=ldap,dc=goauthentik,dc=io', 'PASSWORD'); - echo \$r ? 'OK' : ldap_error(\$c); -" -# Verify kscloud1 outpost reachable -nc -zv 100.123.254.52 3389 -``` - -**Reset a staff member's Authentik password:** -```bash -docker exec authentik ak shell -c " -from authentik.core.models import User -u = User.objects.get(username='kenpat7177') -u.set_password('NewPassword123!') -u.save() -print('done') -" -``` - -**Clear osTicket login lockout:** -```bash -docker run --rm --network host mariadb:10.11 mysql \ - -h 100.123.254.52 -u osticket -p osticket \ - -e "DELETE FROM ost_session;" -``` +## Troubleshooting +For detailed diagnostics, password resets, and specific issue fixes (such as missing osTicket emails, Uptime Kuma connection loops, or Authentik token errors), please see: +[**docs/DEBUGGING.md**](docs/DEBUGGING.md) diff --git a/docs/DEBUGGING.md b/docs/DEBUGGING.md new file mode 100644 index 0000000..cab7eab --- /dev/null +++ b/docs/DEBUGGING.md @@ -0,0 +1,44 @@ +# KiteStacks Homelab - Debugging & Troubleshooting + +This document contains solutions and diagnostic steps for known issues that have occurred during the setup and operation of the KiteStacks homelab. + +--- + +## 1. osTicket: New User Activation Emails Not Sending +**Symptom:** When a new user registers for a Help Desk account, they do not receive the activation email. +**Root Cause:** osTicket runs in a Docker container without a local Mail Transfer Agent (MTA) like Postfix or Sendmail. By default, PHP's internal `mail()` function silently fails because it cannot route the email. +**Fix:** You must configure an external SMTP server in the osTicket Admin Panel. +1. Log into the osTicket Staff Control Panel (`/scp/`). +2. Go to **Emails > Emails**. +3. Select the default outbound email address (e.g., `noreply@kitestacks.com`). +4. Scroll down to **SMTP Settings** and configure it to use a real mail provider (e.g., SendGrid, Mailgun, Amazon SES, or Gmail SMTP). +5. Ensure **Authentication Required** is set to **Yes**. +6. Save and send a test email. + +## 2. Cloudflare Tunnel "Hmm. We're having trouble finding that site" +**Symptom:** A subdomain is correctly configured in Cloudflare Zero Trust, but visiting the site returns a Cloudflare error. +**Root Cause 1:** The internal service is down or restarting. Check the Docker container logs. +**Root Cause 2:** Multi-node load balancing cache. Cloudflare balances requests between `monk` and `kscloud1`. If you update a container on `monk` but forget to update it on `kscloud1`, 50% of requests will fail or show stale data. +**Fix:** Ensure Docker containers on both hosts are perfectly mirrored or explicitly configure Cloudflare to route only to the active host for that specific subdomain. + +## 3. Authentik "invalid_grant" or "Code does not exist" +**Symptom:** Logging into a service via Authentik SSO randomly fails with "invalid_grant". +**Root Cause:** Initially, `monk` and `kscloud1` ran separate Authentik Postgres databases. Auth codes were generated on one node and consumed on the other, failing validation. +**Fix:** The Authentik databases are now **Unified** over Tailscale. `monk` points its Postgres and Redis connections to `100.123.254.52` (kscloud1). Do not run local Postgres for Authentik on `monk`. If Tailscale goes down, SSO will fail. + +## 4. Kavita "Sign in with Authentik" Button Missing +**Symptom:** The Authentik OIDC login button does not appear on the Kavita login screen. +**Root Cause:** Kavita stores OIDC settings in its internal SQLite database (`kavita.db`), not in an environment variable. +**Fix:** The OIDC settings must be configured manually via the Kavita UI (Admin Settings -> OIDC). Direct SQL edits are overwritten by the Kavita container upon restart. + +## 5. Portainer Password Reset +**Symptom:** Admin password is lost. +**Fix:** Stop the Portainer container. You must use a Go container with `bbolt` to patch the underlying BoltDB directly, or temporarily pass the `--admin-password` flag to the container entrypoint to reset it. + +## 6. Uptime Kuma "Reconnecting to server..." Loop +**Symptom:** The Uptime Kuma UI constantly shows "Reconnecting...". +**Fix:** Uptime Kuma requires WebSockets. Ensure Cloudflare Tunnel does not aggressively cache the HTML/JS and that Nginx proxy timeouts are not aggressively closing the WebSocket connection. + +## 7. Forgejo Authentication Failures (LDAP/OIDC) +**Symptom:** Forgejo throws a 500 or redirect error after SSO login. +**Fix:** Ensure the `ROOT_URL` in Forgejo's `app.ini` exactly matches the public domain (`https://gitforge.kitestacks.com/`), and that the Authentik Application Launch URL strictly matches the OAuth redirect URI configured in Forgejo.