kitestacks-homelab/DEBUG-DOCUMENTATION.md
kenpat fb822d5142 Reorganize repos into kitestacks-homelab + plain-English doc rewrite
- Rewrote RUNBOOK.md and DEBUG-DOCUMENTATION.md in simple 5th-grade language
  with real-world analogies for every technical concept
- Updated README.md with current service inventory and folder map
- Added cloud-migration/ subdirectory (from kitestacks-cloud-migration repo)
- Added autosync/ subdirectory (from kitestacks-homelab-autosync-test repo)
- Added osticket/ subdirectory (from OSTicketSystem repo)
- Added cloud/ placeholder for future cloud configs
- Excluded binary DB/postgres files from autosync subdirectory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-18 18:37:58 -05:00

7 KiB

KiteStacks Homelab — Problems We've Seen and How We Fixed Them

Newest problems at the top.


2026-06-18 — Can't SSH into kscloud1

What happened: Trying to connect to the cloud machine (kscloud1) gave a "Permission denied" error. The SSH key was missing from the machine.

How we found it: The error message said publickey,password — meaning it tried the SSH key first and then tried a password, both failed.

How we fixed it:

  1. Used Hetzner's browser console (like a TV remote for the server) to log in as root
  2. Served the SSH public key from monk as a temporary download:
    # On monk — share the key file over a mini web server
    cat ~/.ssh/id_ed25519_kscloud1.pub > ~/key.txt
    python3 -m http.server 7777 --directory ~/
    
  3. Downloaded it from the Hetzner console:
    curl http://MONK_TAILSCALE_IP:7777/key.txt > /root/.ssh/authorized_keys
    
  4. If the machine had root SSH login disabled:
    sed -i 's/^#*PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config
    systemctl restart ssh
    

Why this works: The Hetzner console bypasses SSH entirely — it's like plugging a keyboard and monitor directly into the server. So even when SSH is broken, you can still type commands.


2026-06-18 — BookStack Login Said "An Error Occurred"

What happened: Clicking "Login with Authentik" on the wiki showed a generic error. No details, no clues — just "An unknown error occurred."

Why it happened (three problems at once):

Problem 1 — Missing setting in BookStack BookStack needs OIDC_ISSUER_DISCOVER=true to automatically find all the login endpoints from Authentik. Without it, BookStack can't verify login tokens.

Problem 2 — Authentik was using the wrong login URL format Authentik can either use one shared URL for all apps or a unique URL per app. BookStack expects a per-app URL. When the wrong type was set, BookStack tried to download login instructions from a URL that returned an HTML page instead of data, and then crashed trying to read it.

Problem 3 — File permission error hidden by BookStack Running a setup command inside the BookStack container as root created some folders that only root could write to. When the normal BookStack process tried to save a login session, it couldn't — and BookStack showed a generic error instead of the real one.

How we fixed it:

Step 1 — Change Authentik to use per-app URLs (run this once):

docker run --rm --network host \
  -e PGPASSWORD="YOUR_DB_PASSWORD" \
  postgres:16 psql -h KSCLOUD1_TAILSCALE_IP -U authentik authentik -c \
  "UPDATE authentik_providers_oauth2_oauth2provider SET issuer_mode='per_provider' WHERE provider_ptr_id=PROVIDER_ID;"

Step 2 — Make sure BookStack's settings include:

OIDC_ISSUER=https://auth.kitestacks.com/application/o/bookstack/
OIDC_ISSUER_DISCOVER=true

Step 3 — Fix the file permission problem:

docker exec bookstack chown -R abc:users /config/www/framework/cache/

Step 4 — Restart BookStack:

docker compose up -d

2026-06-18 — Portainer OAuth Login Couldn't See Any Servers

What happened: Logged in through Authentik, got into Portainer, but no environments (no servers, nothing to manage) were visible.

Why it happened: Portainer creates new SSO users as "regular users." Regular users can't see environments — only admins can. The fix is to create the user as an admin before they log in for the first time.

How we fixed it:

Create the user as admin before first login:

# Get a temporary auth token
TOKEN=$(curl -sk -X POST https://portainer.kitestacks.com/api/auth \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"YOUR_PASSWORD"}' | python3 -c "import sys,json; print(json.load(sys.stdin)['jwt'])")

# Create the user with admin role (role 1 = admin)
curl -sk -X POST "https://portainer.kitestacks.com/api/users" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"username":"user@example.com","role":1}'

If they already logged in as a regular user, promote them:

curl -sk -X PUT "https://portainer.kitestacks.com/api/users/USER_ID" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"role":1}'

2026-06-17 — Three Cloudflare Connectors Instead of Two

What happened: The Cloudflare dashboard was showing 3 tunnel connectors when there should only be 2 (one from monk, one from kscloud1). This caused Authentik logins to fail randomly — about half the time, the code from the login form would reach the wrong connector and get rejected.

Why it happened: The system's built-in cloudflared service was still running on monk, alongside the Docker container version. So monk was connecting to Cloudflare twice.

How we fixed it:

sudo systemctl disable --now cloudflared

That stopped the duplicate. Now only the Docker container runs.

After fixing: verified only 2 connectors in Cloudflare Zero Trust → Networks → Tunnels.


2026-06-17 — BookStack Database Kept Crashing

What happened: The BookStack database container (bookstack-db) kept restarting and never stayed running. Logs showed: Table 'mysql.db' doesn't exist

Why it happened: The database's data folder had leftover files from a previous incomplete setup. When MariaDB started, it saw partial old data and crashed trying to use it.

How we fixed it:

# Wipe the broken database files (they're owned by root inside the container)
docker run --rm -v $(pwd)/db:/db alpine sh -c 'rm -rf /db/*'

# Start fresh
docker compose up -d

2026-06-17 — BookStack Said It Couldn't Find the Database

What happened: BookStack started but immediately errored saying it couldn't connect to the database (bookstack-db).

Why it happened: BookStack was too fast. It started before the database was fully ready, and when it tried to find bookstack-db on the internal network, Docker hadn't finished registering it yet.

How we fixed it:

# Just wait a few seconds and restart BookStack
docker restart bookstack

That's it — the database had finished starting up by then.


Quick Diagnostic Commands

# See which containers are running (and which are crashing)
docker ps --format "table {{.Names}}\t{{.Status}}"

# Follow the live logs of any service
docker logs CONTAINER_NAME --tail 50 -f

# Read BookStack's PHP error log
docker exec bookstack cat /app/www/storage/logs/laravel.log | tail -50

# Test if BookStack's login redirect works
curl -sc /tmp/c.txt http://localhost:6875/login -o /tmp/l.html && \
  CSRF=$(grep -oP 'name="_token" value="\K[^"]+' /tmp/l.html | head -1) && \
  curl -v -b /tmp/c.txt -X POST http://localhost:6875/oidc/login \
  -d "_token=$CSRF" --max-redirs 0 2>&1 | grep -E "HTTP|Location"
# Should show: Location: https://auth.kitestacks.com/application/o/authorize/?...

# Check Tailscale connections between machines
tailscale status

# See if both Cloudflare connectors are working
docker exec cloudflared cloudflared tunnel info TUNNEL_ID