From f79478158d7404a6720db7fef3f858eb0b70ea3d Mon Sep 17 00:00:00 2001 From: kenpat Date: Thu, 18 Jun 2026 21:08:10 +0000 Subject: [PATCH] Update DEBUG-DOCUMENTATION with BookStack SSO and kscloud1 SSH incidents --- DEBUG-DOCUMENTATION.md | 226 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 216 insertions(+), 10 deletions(-) diff --git a/DEBUG-DOCUMENTATION.md b/DEBUG-DOCUMENTATION.md index 141286f..664fe35 100644 --- a/DEBUG-DOCUMENTATION.md +++ b/DEBUG-DOCUMENTATION.md @@ -1,26 +1,232 @@ -# KiteStacks Homelab Debug Documentation +# KiteStacks Homelab — Debug Documentation + +All known incidents, root causes, and fixes. Most recent first. + +--- + +## 2026-06-18 — kscloud1 SSH Key Lost / Cannot SSH + +**Symptom:** `Permission denied (publickey,password)` connecting to kscloud1. + +**Root cause:** SSH public key was removed from kscloud1's `authorized_keys`. + +**Fix:** +1. Open Hetzner Cloud console → VNC terminal → log in as `root` +2. On monk, serve the public key temporarily: + ```bash + cat ~/.ssh/id_ed25519_kscloud1.pub > ~/key.txt + python3 -m http.server 7777 --directory ~/ + ``` +3. In Hetzner console, type: + ```bash + curl http://:7777/key.txt > /root/.ssh/authorized_keys + ``` +4. If root SSH login was disabled: + ```bash + sed -i 's/^#*PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config + systemctl restart ssh + ``` + +**Note:** Hetzner VNC console does not support clipboard paste for long strings. +Serving the key via HTTP from monk's Tailscale IP is the reliable workaround. + +--- + +## 2026-06-18 — BookStack SSO "An Error Occurred / An unknown error occurred" + +**Symptom:** Clicking "Login with authentik" on BookStack shows a generic error page. +No stack trace even with `APP_DEBUG=true`. `laravel.log` is 0 bytes. + +**Root cause (3 compounding issues):** + +**Issue 1 — Wrong `OIDC_ISSUER_DISCOVER` default** +BookStack defaults to `OIDC_ISSUER_DISCOVER=false`. Without it set to `true`, BookStack +does not auto-discover endpoints from Authentik and cannot verify JWT tokens. + +**Issue 2 — Authentik `issuer_mode=global` breaks discovery** +When `OIDC_ISSUER=https://auth.kitestacks.com/` (the global URL), BookStack tries to +fetch the discovery doc at `https://auth.kitestacks.com/.well-known/openid-configuration`. +Authentik's global URL returns an HTML login page, not JSON. +The app crashes silently trying to parse HTML as JSON. + +**Issue 3 — Root-owned cache directory blocks write** +Running `php artisan` commands inside the container as root creates cache subdirectories +owned by `root:root`. BookStack's PHP process runs as `abc` (UID 1000) and cannot write +to these directories, causing a `Permission denied` on the first OIDC login attempt. +This exception is caught by BookStack's generic handler → "An unknown error occurred". + +**Fix:** + +Step 1 — Change Authentik bookstack provider to `per_provider` issuer mode: +```bash +docker run --rm --network host \ + -e PGPASSWORD="" \ + postgres:16 psql -h -U authentik authentik -c \ + "UPDATE authentik_providers_oauth2_oauth2provider SET issuer_mode='per_provider' WHERE provider_ptr_id=;" +``` + +Step 2 — Update BookStack compose env vars: +```yaml +- OIDC_ISSUER=https://auth.kitestacks.com/application/o/bookstack/ +- OIDC_ISSUER_DISCOVER=true +``` + +Step 3 — Fix cache permissions: +```bash +docker exec bookstack chown -R abc:users /config/www/framework/cache/ +``` + +Step 4 — Restart BookStack and test: +```bash +docker compose up -d +# Verify OIDC redirect works +curl -sc /tmp/c.txt http://localhost:6875/login -o /tmp/l.html +CSRF=$(grep -oP 'name="_token" value="\K[^"]+' /tmp/l.html | head -1) +curl -v -b /tmp/c.txt -X POST http://localhost:6875/oidc/login -d "_token=$CSRF" --max-redirs 0 2>&1 | grep "Location:" +# Should show: Location: https://auth.kitestacks.com/application/o/authorize/?... +``` + +**Key insight:** When Authentik's `issuer_mode=per_provider`, the discovery doc at +`https://auth.kitestacks.com/application/o/bookstack/.well-known/openid-configuration` +returns `issuer: https://auth.kitestacks.com/application/o/bookstack/` — this must match +`OIDC_ISSUER` exactly for JWT validation to pass. + +--- + +## 2026-06-18 — Portainer OAuth Users Can't See Environments + +**Symptom:** After logging in via Authentik SSO, Portainer shows no environments. + +**Root cause:** Portainer CE creates OAuth users as Role:2 (regular user). Regular users +have no access to environments by default — only admins do. + +**Fix:** Pre-create the OAuth user as Role:1 (admin) via API *before* their first login: +```bash +TOKEN=$(curl -sk -X POST https://portainer.kitestacks.com/api/auth \ + -H "Content-Type: application/json" \ + -d '{"username":"admin","password":""}' | python3 -c "import sys,json; print(json.load(sys.stdin)['jwt'])") + +# Note: do NOT include "Password" field for OAuth users +curl -sk -X POST "https://portainer.kitestacks.com/api/users" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"username":"user@example.com","role":1}' +``` + +If the user already logged in as Role:2, promote them via API: +```bash +curl -sk -X PUT "https://portainer.kitestacks.com/api/users/" \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"role":1}' +``` + +--- + +## 2026-06-17 — Cloudflare Tunnel Phantom 3rd Connector + +**Symptom:** `cloudflared tunnel info` shows 3 connectors instead of 2. +Authentik OAuth codes fail with `invalid_grant` intermittently. + +**Root cause:** The native cloudflared systemd service on monk was running alongside +the Docker container — two connectors from the same host, causing session/auth split. + +**Fix:** +```bash +sudo systemctl disable --now cloudflared +``` + +Verify only 2 connectors remain in Cloudflare Zero Trust → Networks → Tunnels. + +**Also fixed:** Authentik OAuth2 code TTL bumped from 1 min → 10 min to tolerate +reconnect windows when monk comes back online. + +--- + +## 2026-06-17 — BookStack MariaDB Crash Loop ("Table 'mysql.db' doesn't exist") + +**Symptom:** `bookstack-db` container in crash loop, logs show: +`Table 'mysql.db' doesn't exist` + +**Root cause:** Stale/corrupt data in `./db/` from a previous partial MariaDB initialization. + +**Fix:** Wipe the data directory (files are root-owned inside the container): +```bash +docker run --rm -v $(pwd)/db:/db alpine sh -c 'rm -rf /db/*' +docker compose up -d +``` + +--- + +## 2026-06-17 — BookStack "Name does not resolve" for bookstack-db + +**Symptom:** BookStack Laravel log shows DB hostname resolution failure on first boot. + +**Root cause:** Race condition — BookStack ran DB migrations before MariaDB was fully +initialized and registered with Docker's embedded DNS (127.0.0.11). + +**Fix:** Wait for `bookstack-db` to be healthy, then restart the BookStack container: +```bash +docker restart bookstack +``` + +--- ## 2026-06-09 — Root CHANGELOG Permission Issue -Problem: CHANGELOG.md could not be read by the normal user. +**Symptom:** CHANGELOG.md could not be read/written by the normal user. -Cause: CHANGELOG.md was owned by root and had 600 permissions. +**Root cause:** CHANGELOG.md was owned by root with 600 permissions. -Fix: +**Fix:** +```bash sudo chown kenpat:kenpat CHANGELOG.md chmod 644 CHANGELOG.md +``` + +--- ## 2026-06-09 — Repo Folder Ownership Issue -Problem: The repo root folder was owned by root, which prevented creating RUNBOOK.md. +**Symptom:** Could not create new files in the kitestacks-homelab repo directory. -Fix: +**Root cause:** Repo root folder was owned by root. + +**Fix:** +```bash sudo chown -R kenpat:kenpat /opt/kitestacks-autosync/kitestacks-homelab +``` -## 2026-06-09 — Autosync Changelog Pollution +--- -Problem: CHANGELOG.md contains noisy autosync entries from live app/database files. +## Diagnostic Quick Reference -Examples: apps/authentik/postgres, apps/forgejo/data, apps/grafana/data, journal files, pg_wal files. +```bash +# Check which container is causing issues +docker ps --format "table {{.Names}}\t{{.Status}}" -Next step: Review autosync excludes so database/session/cache/journal files are not committed or added to changelogs. +# Tail any service log +docker logs --tail 50 -f + +# BookStack PHP log +docker exec bookstack cat /app/www/storage/logs/laravel.log | tail -50 + +# Test BookStack OIDC flow directly +curl -sc /tmp/c.txt http://localhost:6875/login -o /tmp/l.html && \ + CSRF=$(grep -oP 'name="_token" value="\K[^"]+' /tmp/l.html | head -1) && \ + curl -v -b /tmp/c.txt -X POST http://localhost:6875/oidc/login \ + -d "_token=$CSRF" --max-redirs 0 2>&1 | grep -E "HTTP|Location" + +# Test Authentik discovery document +curl -s https://auth.kitestacks.com/application/o//.well-known/openid-configuration | python3 -m json.tool + +# Check Cloudflare tunnel connector count +docker exec cloudflared cloudflared tunnel info + +# Check Tailscale connectivity +tailscale status + +# PostgreSQL connectivity check (from monk) +docker run --rm --network host -e PGPASSWORD="" \ + postgres:16 psql -h -U authentik authentik -c "\l" +```