Reorganize repos into kitestacks-homelab + plain-English doc rewrite
- Rewrote RUNBOOK.md and DEBUG-DOCUMENTATION.md in simple 5th-grade language with real-world analogies for every technical concept - Updated README.md with current service inventory and folder map - Added cloud-migration/ subdirectory (from kitestacks-cloud-migration repo) - Added autosync/ subdirectory (from kitestacks-homelab-autosync-test repo) - Added osticket/ subdirectory (from OSTicketSystem repo) - Added cloud/ placeholder for future cloud configs - Excluded binary DB/postgres files from autosync subdirectory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
f79478158d
commit
fb822d5142
75 changed files with 11711 additions and 338 deletions
|
|
@ -1,121 +1,120 @@
|
|||
# KiteStacks Homelab — Debug Documentation
|
||||
# KiteStacks Homelab — Problems We've Seen and How We Fixed Them
|
||||
|
||||
All known incidents, root causes, and fixes. Most recent first.
|
||||
Newest problems at the top.
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-18 — kscloud1 SSH Key Lost / Cannot SSH
|
||||
## 2026-06-18 — Can't SSH into kscloud1
|
||||
|
||||
**Symptom:** `Permission denied (publickey,password)` connecting to kscloud1.
|
||||
**What happened:** Trying to connect to the cloud machine (kscloud1) gave a
|
||||
"Permission denied" error. The SSH key was missing from the machine.
|
||||
|
||||
**Root cause:** SSH public key was removed from kscloud1's `authorized_keys`.
|
||||
**How we found it:** The error message said `publickey,password` — meaning it tried
|
||||
the SSH key first and then tried a password, both failed.
|
||||
|
||||
**Fix:**
|
||||
1. Open Hetzner Cloud console → VNC terminal → log in as `root`
|
||||
2. On monk, serve the public key temporarily:
|
||||
**How we fixed it:**
|
||||
1. Used Hetzner's browser console (like a TV remote for the server) to log in as root
|
||||
2. Served the SSH public key from monk as a temporary download:
|
||||
```bash
|
||||
# On monk — share the key file over a mini web server
|
||||
cat ~/.ssh/id_ed25519_kscloud1.pub > ~/key.txt
|
||||
python3 -m http.server 7777 --directory ~/
|
||||
```
|
||||
3. In Hetzner console, type:
|
||||
3. Downloaded it from the Hetzner console:
|
||||
```bash
|
||||
curl http://<MONK_TAILSCALE_IP>:7777/key.txt > /root/.ssh/authorized_keys
|
||||
curl http://MONK_TAILSCALE_IP:7777/key.txt > /root/.ssh/authorized_keys
|
||||
```
|
||||
4. If root SSH login was disabled:
|
||||
4. If the machine had root SSH login disabled:
|
||||
```bash
|
||||
sed -i 's/^#*PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config
|
||||
systemctl restart ssh
|
||||
```
|
||||
|
||||
**Note:** Hetzner VNC console does not support clipboard paste for long strings.
|
||||
Serving the key via HTTP from monk's Tailscale IP is the reliable workaround.
|
||||
**Why this works:** The Hetzner console bypasses SSH entirely — it's like plugging a
|
||||
keyboard and monitor directly into the server. So even when SSH is broken, you can still
|
||||
type commands.
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-18 — BookStack SSO "An Error Occurred / An unknown error occurred"
|
||||
## 2026-06-18 — BookStack Login Said "An Error Occurred"
|
||||
|
||||
**Symptom:** Clicking "Login with authentik" on BookStack shows a generic error page.
|
||||
No stack trace even with `APP_DEBUG=true`. `laravel.log` is 0 bytes.
|
||||
**What happened:** Clicking "Login with Authentik" on the wiki showed a generic error.
|
||||
No details, no clues — just "An unknown error occurred."
|
||||
|
||||
**Root cause (3 compounding issues):**
|
||||
**Why it happened (three problems at once):**
|
||||
|
||||
**Issue 1 — Wrong `OIDC_ISSUER_DISCOVER` default**
|
||||
BookStack defaults to `OIDC_ISSUER_DISCOVER=false`. Without it set to `true`, BookStack
|
||||
does not auto-discover endpoints from Authentik and cannot verify JWT tokens.
|
||||
**Problem 1 — Missing setting in BookStack**
|
||||
BookStack needs `OIDC_ISSUER_DISCOVER=true` to automatically find all the login
|
||||
endpoints from Authentik. Without it, BookStack can't verify login tokens.
|
||||
|
||||
**Issue 2 — Authentik `issuer_mode=global` breaks discovery**
|
||||
When `OIDC_ISSUER=https://auth.kitestacks.com/` (the global URL), BookStack tries to
|
||||
fetch the discovery doc at `https://auth.kitestacks.com/.well-known/openid-configuration`.
|
||||
Authentik's global URL returns an HTML login page, not JSON.
|
||||
The app crashes silently trying to parse HTML as JSON.
|
||||
**Problem 2 — Authentik was using the wrong login URL format**
|
||||
Authentik can either use one shared URL for all apps or a unique URL per app.
|
||||
BookStack expects a per-app URL. When the wrong type was set, BookStack tried to
|
||||
download login instructions from a URL that returned an HTML page instead of data,
|
||||
and then crashed trying to read it.
|
||||
|
||||
**Issue 3 — Root-owned cache directory blocks write**
|
||||
Running `php artisan` commands inside the container as root creates cache subdirectories
|
||||
owned by `root:root`. BookStack's PHP process runs as `abc` (UID 1000) and cannot write
|
||||
to these directories, causing a `Permission denied` on the first OIDC login attempt.
|
||||
This exception is caught by BookStack's generic handler → "An unknown error occurred".
|
||||
**Problem 3 — File permission error hidden by BookStack**
|
||||
Running a setup command inside the BookStack container as root created some folders
|
||||
that only root could write to. When the normal BookStack process tried to save
|
||||
a login session, it couldn't — and BookStack showed a generic error instead of
|
||||
the real one.
|
||||
|
||||
**Fix:**
|
||||
**How we fixed it:**
|
||||
|
||||
Step 1 — Change Authentik bookstack provider to `per_provider` issuer mode:
|
||||
Step 1 — Change Authentik to use per-app URLs (run this once):
|
||||
```bash
|
||||
docker run --rm --network host \
|
||||
-e PGPASSWORD="<REDACTED>" \
|
||||
postgres:16 psql -h <KSCLOUD1_TAILSCALE_IP> -U authentik authentik -c \
|
||||
"UPDATE authentik_providers_oauth2_oauth2provider SET issuer_mode='per_provider' WHERE provider_ptr_id=<ID>;"
|
||||
-e PGPASSWORD="YOUR_DB_PASSWORD" \
|
||||
postgres:16 psql -h KSCLOUD1_TAILSCALE_IP -U authentik authentik -c \
|
||||
"UPDATE authentik_providers_oauth2_oauth2provider SET issuer_mode='per_provider' WHERE provider_ptr_id=PROVIDER_ID;"
|
||||
```
|
||||
|
||||
Step 2 — Update BookStack compose env vars:
|
||||
```yaml
|
||||
- OIDC_ISSUER=https://auth.kitestacks.com/application/o/bookstack/
|
||||
- OIDC_ISSUER_DISCOVER=true
|
||||
Step 2 — Make sure BookStack's settings include:
|
||||
```
|
||||
OIDC_ISSUER=https://auth.kitestacks.com/application/o/bookstack/
|
||||
OIDC_ISSUER_DISCOVER=true
|
||||
```
|
||||
|
||||
Step 3 — Fix cache permissions:
|
||||
Step 3 — Fix the file permission problem:
|
||||
```bash
|
||||
docker exec bookstack chown -R abc:users /config/www/framework/cache/
|
||||
```
|
||||
|
||||
Step 4 — Restart BookStack and test:
|
||||
Step 4 — Restart BookStack:
|
||||
```bash
|
||||
docker compose up -d
|
||||
# Verify OIDC redirect works
|
||||
curl -sc /tmp/c.txt http://localhost:6875/login -o /tmp/l.html
|
||||
CSRF=$(grep -oP 'name="_token" value="\K[^"]+' /tmp/l.html | head -1)
|
||||
curl -v -b /tmp/c.txt -X POST http://localhost:6875/oidc/login -d "_token=$CSRF" --max-redirs 0 2>&1 | grep "Location:"
|
||||
# Should show: Location: https://auth.kitestacks.com/application/o/authorize/?...
|
||||
```
|
||||
|
||||
**Key insight:** When Authentik's `issuer_mode=per_provider`, the discovery doc at
|
||||
`https://auth.kitestacks.com/application/o/bookstack/.well-known/openid-configuration`
|
||||
returns `issuer: https://auth.kitestacks.com/application/o/bookstack/` — this must match
|
||||
`OIDC_ISSUER` exactly for JWT validation to pass.
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-18 — Portainer OAuth Users Can't See Environments
|
||||
## 2026-06-18 — Portainer OAuth Login Couldn't See Any Servers
|
||||
|
||||
**Symptom:** After logging in via Authentik SSO, Portainer shows no environments.
|
||||
**What happened:** Logged in through Authentik, got into Portainer, but no environments
|
||||
(no servers, nothing to manage) were visible.
|
||||
|
||||
**Root cause:** Portainer CE creates OAuth users as Role:2 (regular user). Regular users
|
||||
have no access to environments by default — only admins do.
|
||||
**Why it happened:** Portainer creates new SSO users as "regular users." Regular users
|
||||
can't see environments — only admins can. The fix is to create the user as an admin
|
||||
**before** they log in for the first time.
|
||||
|
||||
**Fix:** Pre-create the OAuth user as Role:1 (admin) via API *before* their first login:
|
||||
**How we fixed it:**
|
||||
|
||||
Create the user as admin before first login:
|
||||
```bash
|
||||
# Get a temporary auth token
|
||||
TOKEN=$(curl -sk -X POST https://portainer.kitestacks.com/api/auth \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"username":"admin","password":"<REDACTED>"}' | python3 -c "import sys,json; print(json.load(sys.stdin)['jwt'])")
|
||||
-d '{"username":"admin","password":"YOUR_PASSWORD"}' | python3 -c "import sys,json; print(json.load(sys.stdin)['jwt'])")
|
||||
|
||||
# Note: do NOT include "Password" field for OAuth users
|
||||
# Create the user with admin role (role 1 = admin)
|
||||
curl -sk -X POST "https://portainer.kitestacks.com/api/users" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"username":"user@example.com","role":1}'
|
||||
```
|
||||
|
||||
If the user already logged in as Role:2, promote them via API:
|
||||
If they already logged in as a regular user, promote them:
|
||||
```bash
|
||||
curl -sk -X PUT "https://portainer.kitestacks.com/api/users/<USER_ID>" \
|
||||
curl -sk -X PUT "https://portainer.kitestacks.com/api/users/USER_ID" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"role":1}'
|
||||
|
|
@ -123,110 +122,88 @@ curl -sk -X PUT "https://portainer.kitestacks.com/api/users/<USER_ID>" \
|
|||
|
||||
---
|
||||
|
||||
## 2026-06-17 — Cloudflare Tunnel Phantom 3rd Connector
|
||||
## 2026-06-17 — Three Cloudflare Connectors Instead of Two
|
||||
|
||||
**Symptom:** `cloudflared tunnel info` shows 3 connectors instead of 2.
|
||||
Authentik OAuth codes fail with `invalid_grant` intermittently.
|
||||
**What happened:** The Cloudflare dashboard was showing 3 tunnel connectors when there
|
||||
should only be 2 (one from monk, one from kscloud1). This caused Authentik logins to
|
||||
fail randomly — about half the time, the code from the login form would reach the wrong
|
||||
connector and get rejected.
|
||||
|
||||
**Root cause:** The native cloudflared systemd service on monk was running alongside
|
||||
the Docker container — two connectors from the same host, causing session/auth split.
|
||||
**Why it happened:** The system's built-in cloudflared service was still running on monk,
|
||||
alongside the Docker container version. So monk was connecting to Cloudflare twice.
|
||||
|
||||
**Fix:**
|
||||
**How we fixed it:**
|
||||
```bash
|
||||
sudo systemctl disable --now cloudflared
|
||||
```
|
||||
|
||||
Verify only 2 connectors remain in Cloudflare Zero Trust → Networks → Tunnels.
|
||||
That stopped the duplicate. Now only the Docker container runs.
|
||||
|
||||
**Also fixed:** Authentik OAuth2 code TTL bumped from 1 min → 10 min to tolerate
|
||||
reconnect windows when monk comes back online.
|
||||
After fixing: verified only 2 connectors in Cloudflare Zero Trust → Networks → Tunnels.
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-17 — BookStack MariaDB Crash Loop ("Table 'mysql.db' doesn't exist")
|
||||
## 2026-06-17 — BookStack Database Kept Crashing
|
||||
|
||||
**Symptom:** `bookstack-db` container in crash loop, logs show:
|
||||
`Table 'mysql.db' doesn't exist`
|
||||
**What happened:** The BookStack database container (bookstack-db) kept restarting
|
||||
and never stayed running. Logs showed: `Table 'mysql.db' doesn't exist`
|
||||
|
||||
**Root cause:** Stale/corrupt data in `./db/` from a previous partial MariaDB initialization.
|
||||
**Why it happened:** The database's data folder had leftover files from a previous
|
||||
incomplete setup. When MariaDB started, it saw partial old data and crashed trying
|
||||
to use it.
|
||||
|
||||
**Fix:** Wipe the data directory (files are root-owned inside the container):
|
||||
**How we fixed it:**
|
||||
```bash
|
||||
# Wipe the broken database files (they're owned by root inside the container)
|
||||
docker run --rm -v $(pwd)/db:/db alpine sh -c 'rm -rf /db/*'
|
||||
|
||||
# Start fresh
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-17 — BookStack "Name does not resolve" for bookstack-db
|
||||
## 2026-06-17 — BookStack Said It Couldn't Find the Database
|
||||
|
||||
**Symptom:** BookStack Laravel log shows DB hostname resolution failure on first boot.
|
||||
**What happened:** BookStack started but immediately errored saying it couldn't connect
|
||||
to the database (bookstack-db).
|
||||
|
||||
**Root cause:** Race condition — BookStack ran DB migrations before MariaDB was fully
|
||||
initialized and registered with Docker's embedded DNS (127.0.0.11).
|
||||
**Why it happened:** BookStack was too fast. It started before the database was fully
|
||||
ready, and when it tried to find `bookstack-db` on the internal network, Docker hadn't
|
||||
finished registering it yet.
|
||||
|
||||
**Fix:** Wait for `bookstack-db` to be healthy, then restart the BookStack container:
|
||||
**How we fixed it:**
|
||||
```bash
|
||||
# Just wait a few seconds and restart BookStack
|
||||
docker restart bookstack
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-09 — Root CHANGELOG Permission Issue
|
||||
|
||||
**Symptom:** CHANGELOG.md could not be read/written by the normal user.
|
||||
|
||||
**Root cause:** CHANGELOG.md was owned by root with 600 permissions.
|
||||
|
||||
**Fix:**
|
||||
```bash
|
||||
sudo chown kenpat:kenpat CHANGELOG.md
|
||||
chmod 644 CHANGELOG.md
|
||||
```
|
||||
That's it — the database had finished starting up by then.
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-09 — Repo Folder Ownership Issue
|
||||
|
||||
**Symptom:** Could not create new files in the kitestacks-homelab repo directory.
|
||||
|
||||
**Root cause:** Repo root folder was owned by root.
|
||||
|
||||
**Fix:**
|
||||
```bash
|
||||
sudo chown -R kenpat:kenpat /opt/kitestacks-autosync/kitestacks-homelab
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Diagnostic Quick Reference
|
||||
## Quick Diagnostic Commands
|
||||
|
||||
```bash
|
||||
# Check which container is causing issues
|
||||
# See which containers are running (and which are crashing)
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}"
|
||||
|
||||
# Tail any service log
|
||||
docker logs <container> --tail 50 -f
|
||||
# Follow the live logs of any service
|
||||
docker logs CONTAINER_NAME --tail 50 -f
|
||||
|
||||
# BookStack PHP log
|
||||
# Read BookStack's PHP error log
|
||||
docker exec bookstack cat /app/www/storage/logs/laravel.log | tail -50
|
||||
|
||||
# Test BookStack OIDC flow directly
|
||||
# Test if BookStack's login redirect works
|
||||
curl -sc /tmp/c.txt http://localhost:6875/login -o /tmp/l.html && \
|
||||
CSRF=$(grep -oP 'name="_token" value="\K[^"]+' /tmp/l.html | head -1) && \
|
||||
curl -v -b /tmp/c.txt -X POST http://localhost:6875/oidc/login \
|
||||
-d "_token=$CSRF" --max-redirs 0 2>&1 | grep -E "HTTP|Location"
|
||||
# Should show: Location: https://auth.kitestacks.com/application/o/authorize/?...
|
||||
|
||||
# Test Authentik discovery document
|
||||
curl -s https://auth.kitestacks.com/application/o/<slug>/.well-known/openid-configuration | python3 -m json.tool
|
||||
|
||||
# Check Cloudflare tunnel connector count
|
||||
docker exec cloudflared cloudflared tunnel info <TUNNEL_ID>
|
||||
|
||||
# Check Tailscale connectivity
|
||||
# Check Tailscale connections between machines
|
||||
tailscale status
|
||||
|
||||
# PostgreSQL connectivity check (from monk)
|
||||
docker run --rm --network host -e PGPASSWORD="<REDACTED>" \
|
||||
postgres:16 psql -h <KSCLOUD1_TAILSCALE_IP> -U authentik authentik -c "\l"
|
||||
# See if both Cloudflare connectors are working
|
||||
docker exec cloudflared cloudflared tunnel info TUNNEL_ID
|
||||
```
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue