docs: clean up runbook and create dedicated debugging guide

This commit is contained in:
kenpat 2026-06-15 15:23:44 -05:00
parent bdec86b16f
commit c4c6b49bf4
2 changed files with 53 additions and 73 deletions

View file

@ -1407,77 +1407,13 @@ Do not commit `.env` files, webhook URLs, or database files to this repo.
---
## Phase 8: Forgejo Sync + osTicket Authentik LDAP SSO (2026-06-14/15)
## Security Posture
The entire KiteStacks homelab is secured by a Zero Trust architecture:
1. **No Open Inbound Ports:** All public subdomains are routed through Cloudflare Tunnels (edge-to-container). The home router has 0 forwarded ports.
2. **Network Isolation:** Internal communication between nodes (`monk` and `kscloud1`) happens strictly over a WireGuard-based Tailscale mesh network (`100.x.x.x`).
3. **SSO Protection:** Authentik acts as the Identity Provider (OIDC/LDAP) and reverse proxy (Embedded Outpost) protecting all sensitive endpoints (`/scp/` for osTicket, Portainer, Grafana, Kite AI, etc.).
4. **Standalone Bypasses:** The only apps fully public without SSO are the main Portal (`www`) and the read-only FluxCD GitOps Dashboard (`flux`), which was isolated into a standalone Nginx container specifically to decouple it from Authentik.
### Forgejo Sync (monk → kscloud1)
Monk is authoritative. kscloud1 Forgejo is a read replica synced every 6 hours.
**Sync script:** `~/kitestacks-live/docker/forgejo/sync-to-cloud.sh`
**Cron:** `0 */6 * * *` on monk, logs to `/tmp/forgejo-sync.log`
Manual sync:
```bash
~/kitestacks-live/docker/forgejo/sync-to-cloud.sh
```
To re-do a full restore from scratch (e.g., after kscloud1 rebuild):
```bash
# On monk: create dump
docker exec -u git forgejo /app/gitea/gitea dump --type zip -f /tmp/forgejo-backup.zip
docker cp forgejo:/tmp/forgejo-backup.zip /tmp/forgejo-backup.zip
# Transfer and restore on cloud host — see claude-memory for detailed steps
```
### osTicket Authentik LDAP SSO
Staff log into `tasks.kitestacks.com/scp/` using their **Authentik credentials** (not a separate osTicket password).
**Architecture:**
```
osticket-app → authentik-ldap-proxy:389 (socat) → authentik-ldap:3389 → auth.kitestacks.com
```
**Services deployed:**
- `~/kitestacks-live/docker/authentik-ldap/` — LDAP outpost + socat proxy on monk
- `/opt/kitestacks/docker/authentik-ldap/` — LDAP outpost on kscloud1
**LDAP search account:** `cn=ldap-svc,ou=users,dc=ldap,dc=goauthentik,dc=io`
Password stored in Authentik and in osTicket's `ost_config` (namespace `plugin.2`, key `bind_pw`, encrypted).
**auth-ldap.phar** at `/data/upload/include/plugins/auth-ldap.phar` inside the osticket-app container has been patched (original at `.phar.orig`). Do NOT replace it with the upstream version — the patch is required for PHP 7.3 + PEAR compatibility.
**If LDAP login stops working:**
```bash
# Check LDAP outpost is running and connected
docker logs authentik-ldap --since 5m | grep -v debug
docker logs authentik-ldap-proxy 2>&1 | tail -5
# Test bind from osticket-app container
docker exec osticket-app php -r "
\$c = @ldap_connect('authentik-ldap-proxy');
ldap_set_option(\$c, LDAP_OPT_PROTOCOL_VERSION, 3);
\$r = @ldap_bind(\$c, 'cn=ldap-svc,ou=users,dc=ldap,dc=goauthentik,dc=io', 'PASSWORD');
echo \$r ? 'OK' : ldap_error(\$c);
"
# Verify kscloud1 outpost reachable
nc -zv 100.123.254.52 3389
```
**Reset a staff member's Authentik password:**
```bash
docker exec authentik ak shell -c "
from authentik.core.models import User
u = User.objects.get(username='kenpat7177')
u.set_password('NewPassword123!')
u.save()
print('done')
"
```
**Clear osTicket login lockout:**
```bash
docker run --rm --network host mariadb:10.11 mysql \
-h 100.123.254.52 -u osticket -p<DB_PASS> osticket \
-e "DELETE FROM ost_session;"
```
## Troubleshooting
For detailed diagnostics, password resets, and specific issue fixes (such as missing osTicket emails, Uptime Kuma connection loops, or Authentik token errors), please see:
[**docs/DEBUGGING.md**](docs/DEBUGGING.md)

44
docs/DEBUGGING.md Normal file
View file

@ -0,0 +1,44 @@
# KiteStacks Homelab - Debugging & Troubleshooting
This document contains solutions and diagnostic steps for known issues that have occurred during the setup and operation of the KiteStacks homelab.
---
## 1. osTicket: New User Activation Emails Not Sending
**Symptom:** When a new user registers for a Help Desk account, they do not receive the activation email.
**Root Cause:** osTicket runs in a Docker container without a local Mail Transfer Agent (MTA) like Postfix or Sendmail. By default, PHP's internal `mail()` function silently fails because it cannot route the email.
**Fix:** You must configure an external SMTP server in the osTicket Admin Panel.
1. Log into the osTicket Staff Control Panel (`/scp/`).
2. Go to **Emails > Emails**.
3. Select the default outbound email address (e.g., `noreply@kitestacks.com`).
4. Scroll down to **SMTP Settings** and configure it to use a real mail provider (e.g., SendGrid, Mailgun, Amazon SES, or Gmail SMTP).
5. Ensure **Authentication Required** is set to **Yes**.
6. Save and send a test email.
## 2. Cloudflare Tunnel "Hmm. We're having trouble finding that site"
**Symptom:** A subdomain is correctly configured in Cloudflare Zero Trust, but visiting the site returns a Cloudflare error.
**Root Cause 1:** The internal service is down or restarting. Check the Docker container logs.
**Root Cause 2:** Multi-node load balancing cache. Cloudflare balances requests between `monk` and `kscloud1`. If you update a container on `monk` but forget to update it on `kscloud1`, 50% of requests will fail or show stale data.
**Fix:** Ensure Docker containers on both hosts are perfectly mirrored or explicitly configure Cloudflare to route only to the active host for that specific subdomain.
## 3. Authentik "invalid_grant" or "Code does not exist"
**Symptom:** Logging into a service via Authentik SSO randomly fails with "invalid_grant".
**Root Cause:** Initially, `monk` and `kscloud1` ran separate Authentik Postgres databases. Auth codes were generated on one node and consumed on the other, failing validation.
**Fix:** The Authentik databases are now **Unified** over Tailscale. `monk` points its Postgres and Redis connections to `100.123.254.52` (kscloud1). Do not run local Postgres for Authentik on `monk`. If Tailscale goes down, SSO will fail.
## 4. Kavita "Sign in with Authentik" Button Missing
**Symptom:** The Authentik OIDC login button does not appear on the Kavita login screen.
**Root Cause:** Kavita stores OIDC settings in its internal SQLite database (`kavita.db`), not in an environment variable.
**Fix:** The OIDC settings must be configured manually via the Kavita UI (Admin Settings -> OIDC). Direct SQL edits are overwritten by the Kavita container upon restart.
## 5. Portainer Password Reset
**Symptom:** Admin password is lost.
**Fix:** Stop the Portainer container. You must use a Go container with `bbolt` to patch the underlying BoltDB directly, or temporarily pass the `--admin-password` flag to the container entrypoint to reset it.
## 6. Uptime Kuma "Reconnecting to server..." Loop
**Symptom:** The Uptime Kuma UI constantly shows "Reconnecting...".
**Fix:** Uptime Kuma requires WebSockets. Ensure Cloudflare Tunnel does not aggressively cache the HTML/JS and that Nginx proxy timeouts are not aggressively closing the WebSocket connection.
## 7. Forgejo Authentication Failures (LDAP/OIDC)
**Symptom:** Forgejo throws a 500 or redirect error after SSO login.
**Fix:** Ensure the `ROOT_URL` in Forgejo's `app.ini` exactly matches the public domain (`https://gitforge.kitestacks.com/`), and that the Authentik Application Launch URL strictly matches the OAuth redirect URI configured in Forgejo.