kitestacks-homelab/DEBUG-DOCUMENTATION.md

7.8 KiB

KiteStacks Homelab — Debug Documentation

All known incidents, root causes, and fixes. Most recent first.


2026-06-18 — kscloud1 SSH Key Lost / Cannot SSH

Symptom: Permission denied (publickey,password) connecting to kscloud1.

Root cause: SSH public key was removed from kscloud1's authorized_keys.

Fix:

  1. Open Hetzner Cloud console → VNC terminal → log in as root
  2. On monk, serve the public key temporarily:
    cat ~/.ssh/id_ed25519_kscloud1.pub > ~/key.txt
    python3 -m http.server 7777 --directory ~/
    
  3. In Hetzner console, type:
    curl http://<MONK_TAILSCALE_IP>:7777/key.txt > /root/.ssh/authorized_keys
    
  4. If root SSH login was disabled:
    sed -i 's/^#*PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config
    systemctl restart ssh
    

Note: Hetzner VNC console does not support clipboard paste for long strings. Serving the key via HTTP from monk's Tailscale IP is the reliable workaround.


2026-06-18 — BookStack SSO "An Error Occurred / An unknown error occurred"

Symptom: Clicking "Login with authentik" on BookStack shows a generic error page. No stack trace even with APP_DEBUG=true. laravel.log is 0 bytes.

Root cause (3 compounding issues):

Issue 1 — Wrong OIDC_ISSUER_DISCOVER default BookStack defaults to OIDC_ISSUER_DISCOVER=false. Without it set to true, BookStack does not auto-discover endpoints from Authentik and cannot verify JWT tokens.

Issue 2 — Authentik issuer_mode=global breaks discovery When OIDC_ISSUER=https://auth.kitestacks.com/ (the global URL), BookStack tries to fetch the discovery doc at https://auth.kitestacks.com/.well-known/openid-configuration. Authentik's global URL returns an HTML login page, not JSON. The app crashes silently trying to parse HTML as JSON.

Issue 3 — Root-owned cache directory blocks write Running php artisan commands inside the container as root creates cache subdirectories owned by root:root. BookStack's PHP process runs as abc (UID 1000) and cannot write to these directories, causing a Permission denied on the first OIDC login attempt. This exception is caught by BookStack's generic handler → "An unknown error occurred".

Fix:

Step 1 — Change Authentik bookstack provider to per_provider issuer mode:

docker run --rm --network host \
  -e PGPASSWORD="<REDACTED>" \
  postgres:16 psql -h <KSCLOUD1_TAILSCALE_IP> -U authentik authentik -c \
  "UPDATE authentik_providers_oauth2_oauth2provider SET issuer_mode='per_provider' WHERE provider_ptr_id=<ID>;"

Step 2 — Update BookStack compose env vars:

- OIDC_ISSUER=https://auth.kitestacks.com/application/o/bookstack/
- OIDC_ISSUER_DISCOVER=true

Step 3 — Fix cache permissions:

docker exec bookstack chown -R abc:users /config/www/framework/cache/

Step 4 — Restart BookStack and test:

docker compose up -d
# Verify OIDC redirect works
curl -sc /tmp/c.txt http://localhost:6875/login -o /tmp/l.html
CSRF=$(grep -oP 'name="_token" value="\K[^"]+' /tmp/l.html | head -1)
curl -v -b /tmp/c.txt -X POST http://localhost:6875/oidc/login -d "_token=$CSRF" --max-redirs 0 2>&1 | grep "Location:"
# Should show: Location: https://auth.kitestacks.com/application/o/authorize/?...

Key insight: When Authentik's issuer_mode=per_provider, the discovery doc at https://auth.kitestacks.com/application/o/bookstack/.well-known/openid-configuration returns issuer: https://auth.kitestacks.com/application/o/bookstack/ — this must match OIDC_ISSUER exactly for JWT validation to pass.


2026-06-18 — Portainer OAuth Users Can't See Environments

Symptom: After logging in via Authentik SSO, Portainer shows no environments.

Root cause: Portainer CE creates OAuth users as Role:2 (regular user). Regular users have no access to environments by default — only admins do.

Fix: Pre-create the OAuth user as Role:1 (admin) via API before their first login:

TOKEN=$(curl -sk -X POST https://portainer.kitestacks.com/api/auth \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"<REDACTED>"}' | python3 -c "import sys,json; print(json.load(sys.stdin)['jwt'])")

# Note: do NOT include "Password" field for OAuth users
curl -sk -X POST "https://portainer.kitestacks.com/api/users" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"username":"user@example.com","role":1}'

If the user already logged in as Role:2, promote them via API:

curl -sk -X PUT "https://portainer.kitestacks.com/api/users/<USER_ID>" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"role":1}'

2026-06-17 — Cloudflare Tunnel Phantom 3rd Connector

Symptom: cloudflared tunnel info shows 3 connectors instead of 2. Authentik OAuth codes fail with invalid_grant intermittently.

Root cause: The native cloudflared systemd service on monk was running alongside the Docker container — two connectors from the same host, causing session/auth split.

Fix:

sudo systemctl disable --now cloudflared

Verify only 2 connectors remain in Cloudflare Zero Trust → Networks → Tunnels.

Also fixed: Authentik OAuth2 code TTL bumped from 1 min → 10 min to tolerate reconnect windows when monk comes back online.


2026-06-17 — BookStack MariaDB Crash Loop ("Table 'mysql.db' doesn't exist")

Symptom: bookstack-db container in crash loop, logs show: Table 'mysql.db' doesn't exist

Root cause: Stale/corrupt data in ./db/ from a previous partial MariaDB initialization.

Fix: Wipe the data directory (files are root-owned inside the container):

docker run --rm -v $(pwd)/db:/db alpine sh -c 'rm -rf /db/*'
docker compose up -d

2026-06-17 — BookStack "Name does not resolve" for bookstack-db

Symptom: BookStack Laravel log shows DB hostname resolution failure on first boot.

Root cause: Race condition — BookStack ran DB migrations before MariaDB was fully initialized and registered with Docker's embedded DNS (127.0.0.11).

Fix: Wait for bookstack-db to be healthy, then restart the BookStack container:

docker restart bookstack

2026-06-09 — Root CHANGELOG Permission Issue

Symptom: CHANGELOG.md could not be read/written by the normal user.

Root cause: CHANGELOG.md was owned by root with 600 permissions.

Fix:

sudo chown kenpat:kenpat CHANGELOG.md
chmod 644 CHANGELOG.md

2026-06-09 — Repo Folder Ownership Issue

Symptom: Could not create new files in the kitestacks-homelab repo directory.

Root cause: Repo root folder was owned by root.

Fix:

sudo chown -R kenpat:kenpat /opt/kitestacks-autosync/kitestacks-homelab

Diagnostic Quick Reference

# Check which container is causing issues
docker ps --format "table {{.Names}}\t{{.Status}}"

# Tail any service log
docker logs <container> --tail 50 -f

# BookStack PHP log
docker exec bookstack cat /app/www/storage/logs/laravel.log | tail -50

# Test BookStack OIDC flow directly
curl -sc /tmp/c.txt http://localhost:6875/login -o /tmp/l.html && \
  CSRF=$(grep -oP 'name="_token" value="\K[^"]+' /tmp/l.html | head -1) && \
  curl -v -b /tmp/c.txt -X POST http://localhost:6875/oidc/login \
  -d "_token=$CSRF" --max-redirs 0 2>&1 | grep -E "HTTP|Location"

# Test Authentik discovery document
curl -s https://auth.kitestacks.com/application/o/<slug>/.well-known/openid-configuration | python3 -m json.tool

# Check Cloudflare tunnel connector count
docker exec cloudflared cloudflared tunnel info <TUNNEL_ID>

# Check Tailscale connectivity
tailscale status

# PostgreSQL connectivity check (from monk)
docker run --rm --network host -e PGPASSWORD="<REDACTED>" \
  postgres:16 psql -h <KSCLOUD1_TAILSCALE_IP> -U authentik authentik -c "\l"