diff --git a/MEMORY.md b/MEMORY.md index b29aae3..af23dec 100644 --- a/MEMORY.md +++ b/MEMORY.md @@ -1,5 +1,3 @@ -# Memory Index - -- [Claude memory sync](project_claude_memory_sync.md) — claude-memory Forgejo repo for cross-device context -- [Cyberpunk wallpaper project](project_cyberpunk_wallpaper.md) — Rainmeter/Wallpaper Engine dashboard on samurai -- [Periodic memory commits](feedback_periodic_memory_commits.md) — push memory updates to claude-memory repo throughout long sessions, not just at the end +- [KiteStacks migration + Hetzner cloud failover (COMPLETE)](project-kitestacks-migration.md) — monk primary, kscloud1 cloud replica, Oracle VPS coming. 2026-06-12 DONE: OSticket live, Portainer SSO live on both hosts (portainer.kitestacks.com HTTP 200, noTLSVerify fixed via CF API), docs v1.4.0 in Forgejo. NEXT: Oracle Cloud ARM VPS (user provisioning manually — 4 OCPU 24GB Ampere A1). OSticket is x86-only so needs swap for Oracle ARM. CF API token kitestacks-dns-fix needs rolling (was exposed in chat). +- [Forgejo doc redaction rule](feedback-forgejo-redaction.md) — always redact IPs, ports, and passwords in any homelab Forgejo repo files before committing. +- [A+ Core 2 study plan](project-a-plus-core2.md) — exam goal June 28 2026, started 2026-06-11 9:15 PM, Professor Messer diagnostic first, CertMaster next week. diff --git a/feedback-forgejo-redaction.md b/feedback-forgejo-redaction.md new file mode 100644 index 0000000..1d68efa --- /dev/null +++ b/feedback-forgejo-redaction.md @@ -0,0 +1,18 @@ +--- +name: feedback-forgejo-redaction +description: "Always redact IPs, ports, and passwords in any files committed to the homelab Forgejo repo" +metadata: + node_type: memory + type: feedback + originSessionId: 20e70bfb-0880-4ec4-aece-a21855bb3dfe +--- + +Always redact IPs, ports, and passwords before committing or editing any file in the KiteStacks homelab Forgejo repo (kitestacks-homelab). This applies to all documents: RUNBOOK.md, docs/, projects/, DEBUG-DOCUMENTATION.md, README.md, etc. + +**Why:** Security — user does not want real infrastructure details (IPs, port bindings, credentials) in the public Forgejo repository. + +**How to apply:** +- IPs → descriptive placeholders like ``, ``, ``, etc. +- Port numbers in host bindings, IP:port combos, explicit app URLs → `` placeholder +- Passwords, sudo passwords, OAuth secrets → `` or descriptive placeholder like `` +- Apply proactively when writing new content for these docs, not just on request diff --git a/project-a-plus-core2.md b/project-a-plus-core2.md new file mode 100644 index 0000000..c2acb21 --- /dev/null +++ b/project-a-plus-core2.md @@ -0,0 +1,49 @@ +--- +name: project-a-plus-core2 +description: "A+ Core 2 study plan and progress tracking — exam goal June 28, 2026" +metadata: + node_type: memory + type: project + originSessionId: 20e70bfb-0880-4ec4-aece-a21855bb3dfe +--- + +## A+ Core 2 Study Progress + +**Exam goal:** Before July 4th week (preferred ~June 28), hard deadline July 12, 2026 +**July 4th week:** Time off — buffer week if needed, or use for final prep +**Strategy:** Monitor readiness via practice tests, don't sit the real exam until consistently hitting 85%+ +**Study started:** 2026-06-11 at 9:15 PM +**Strategy:** Diagnostic test first, then focus on weak areas + +**Why:** June 28 is achievable at 3.5 hours/day. User passed Core 1 with highest score in class of 22. + +## Study Log + +| Date | Activity | Notes | +|------|----------|-------| +| 2026-06-11 | Started Core 2 study, 9:15 PM | Took Sybex diagnostic practice exam — scored 50% (50/100) | + +## Planned Tests +- **This week (started 2026-06-11):** Professor Messer practice exam (diagnostic — taken cold first) +- **Next week:** CompTIA CertMaster practice test + +## Study Plan (17 days) +- Days 1–4: Operating Systems domain +- Days 5–8: Security domain +- Days 9–11: Software Troubleshooting +- Days 12–13: Operational Procedures +- Days 14–15: Full timed practice exams +- Day 16: Weak area review only +- Day 17 (June 28): Exam + +## Key Weak Areas to Watch (common for homelab/Linux users) +- Windows command line tools (sfc, DISM, chkdsk, bootrec, diskpart) +- Malware types and removal procedures (pure memorization) + +## Resources +- Professor Messer A+ Core 2 (YouTube + paid practice exams) +- CompTIA CertMaster (next week) +- Jason Dion practice exams (Udemy backup) +- r/CompTIA, Professor Messer Discord for group study + +**How to apply:** When user mentions Core 2 or exam prep, reference this log and check in on progress toward June 28 goal. diff --git a/project-kitestacks-migration.md b/project-kitestacks-migration.md new file mode 100644 index 0000000..a1fb57b --- /dev/null +++ b/project-kitestacks-migration.md @@ -0,0 +1,441 @@ +--- +name: project-kitestacks-migration +description: "Migration of the live KiteStacks homelab/website from assassin (T14) to monk — COMPLETE. Plus full Hetzner cloud failover (kscloud1, 5.78.233.28) — COMPLETE. All 9 subdomains can run from any single host. Plus 2026-06-10 portal/SSO push: portal FluxCD+coming-soon changes deployed, Karakeep SSO fixed, OpenProject SSO blocked by EE license, Portainer SSO Authentik-side done (pending user manual steps)." +metadata: + node_type: memory + type: project + originSessionId: 33992890-3940-4d4a-a94a-22b5621e9c1a +--- + +## STATUS: MIGRATION + CLOUD FAILOVER COMPLETE (2026-06-10) + +monk is the live production host. assassin (T14) is OFF. kscloud1 (Hetzner VPS, +5.78.233.28) is now a THIRD active Cloudflare Tunnel connector and runs a FULL +replica of all 9 services, so the site stays up even if both monk and assassin +are off (verified by user testing with home wifi off, from phone + mom's phone). + +All 9 public subdomains (www, ai, auth, gitforge, grafana, kavita, links, status, tasks) +verified returning correct status codes via the live tunnel with kscloud1 in rotation. + +## Governing principle (user's explicit words) +"leave the cloud backup on at all times" / "thats the point of it. if I am +travelling my site will go down otherwise." -> kscloud1 runs PERMANENTLY as a +3rd connector, NOT cold standby. Cloudflare Tunnel load-balances ACTIVE-ACTIVE +across all 3 connectors (no primary/backup priority). This means stateful apps +(gitforge, openproject, authentik, karakeep, kavita, openwebui, etc.) may show +DIFFERENT/STALE data depending on which connector serves a given request - +EXPLICITLY ACCEPTED by user as the cost of guaranteed uptime. Fresh/separate +databases on kscloud1 are fine; do not try to sync data between monk and kscloud1. + +## kscloud1 access +SSH: `ssh -i ~/.ssh/id_ed25519_kscloud1 kenpat@5.78.233.28` (passwordless, key auth). +sudo needs a password ("p12217177") and has no askpass helper - avoid sudo; +most things doable as kenpat or via docker. +All services live under `/opt/kitestacks/docker//docker-compose.yml`, +same one-dir-per-app pattern as monk's `~/kitestacks-live/docker/`. + +## kscloud1 services deployed (all `docker compose up -d`, joined to local `kitestacks` network) +- cloudflared (3rd tunnel connector, same TUNNEL_TOKEN, connector id 78521d9f-71c0-4e3d-992f-bd1f77da1a8f) +- homepage-backup (alias `homepage`) + caddy + kitestacks-metrics-api-backup - PRE-EXISTING +- forgejo (alias `forgejo`) - PRE-EXISTING, separate DB from monk's (gitforge data inconsistent across connectors, accepted) +- prometheus + node-exporter (job `kscloud1-node`) +- grafana (alias `grafana`, port 3150) - Prometheus datasource (uid 000000001) + "Node Exporter Full" + dashboard (id 1860) provisioned via `./provisioning/`. OAuth->authentik config present but + authentik on kscloud1 has a FRESH db (no provider apps configured) so OAuth login won't work + there; local admin login works. +- uptime-kuma (alias `status`->`uptime-kuma`) - kuma.db seeded by copying monk's admin user + (same login: kenpat / same password hash) + monitors: kscloud1 self-ping, Google DNS, and + HTTP checks for all 9 *.kitestacks.com subdomains (external monitoring of the live site). +- kavita (alias `kavita`) - empty library (fresh) +- karakeep + karakeep-chrome + karakeep-meilisearch (alias `karakeep`) - fresh meilisearch/db +- authentik + authentik-worker + authentik-postgres + authentik-redis (alias on `auth`) - FRESH DB. + Bootstrap admin: `akadmin@kitestacks.com` / password `6KlYpfCyYxbnKQNiOewN` (set via + AUTHENTIK_BOOTSTRAP_PASSWORD in .env). No OAuth provider apps exist yet (would need to be + manually recreated in authentik UI for grafana/openwebui/karakeep/openproject SSO to work + when kscloud1 is the active backend). +- kite-litellm + kite-openwebui (alias `ai`->openwebui) - same .env/secrets as monk. OpenWebUI + has `ENABLE_SIGNUP=true` (changed from monk's `false`) so kenpat can create a local admin + account on first use, since authentik OAuth won't work with kscloud1's fresh authentik. +- openproject (alias on `tasks`, port 8090:80 host - port 80 was taken by caddy) - FRESH db, + self-initialized via the all-in-one image's bootstrap (took ~3-4 min). Empty/no projects yet. + +## monk-side changes made for cross-host monitoring +- `~/kitestacks-live/docker/prometheus/prometheus.yml`: added scrape job + `kscloud1-node` -> `5.78.233.28:9100` (kscloud1's node-exporter is exposed + 0.0.0.0:9100, no firewall - reachable from monk's public IP). monk's grafana + (the live one, "Node Exporter Full" dashboard now provisioned via + `~/kitestacks-live/docker/grafana/provisioning/`) shows BOTH `t14-node` + (monk/"this pc") and `kscloud1-node` ("the cloud") via the instance picker. +- kscloud1's prometheus only scrapes itself (`kscloud1-node`) - monk is behind + home NAT, not reachable from kscloud1. + +## Resource notes (kscloud1: 3 vCPU, 3.7GB RAM + 6GB swap, 75GB disk) +With all services running: ~2.8-2.9GB RAM used, ~2.6-2.8GB swap used (of 6GB), +~835MB-1.2GB "available", disk 29GB/75GB used. Site is functional but under +memory pressure - if BOTH monk and assassin are down for an extended period +with real concurrent usage, expect sluggishness (esp. openproject/authentik/ +openwebui). Not yet stress-tested under real failover load. + +## Key gotchas from THIS phase (cloud failover build-out) +- kscloud1's `kitestacks` Docker network is LOCAL/separate from monk's (same name, + no conflict). cloudflared on each host resolves container names against its + own host's network. +- Adding a new tunnel connector that lacks a backend for an ingress hostname -> + 502 for requests routed there. If it has a DIFFERENT backend (e.g. forgejo) -> + serves different data inconsistently. Both accepted/expected now that all 9 + hostnames have backends on kscloud1. +- port 80 on kscloud1 is owned by `caddy` (serves www-backup/git-backup.kitestacks.com + direct A-records, pre-existing, unrelated to the tunnel) - openproject uses 8090:80 + for its host port instead (internal container port 8080 is what cloudflared hits). +- uptime-kuma / grafana have no simple file-based config API for monitors/datasources + beyond grafana provisioning - used direct sqlite manipulation (`docker exec ... sqlite3`, + or python3 sqlite3 module via a throwaway `python:3-alpine` container with the volume + mounted) to seed uptime-kuma's kuma.db with users/monitors. +- authentik first boot takes ~1-2 min (migrations); openproject first boot takes + ~4-5 min (postgres initdb + Rails migrations + Puma boot), watch `docker logs` + for "Listening on http://0.0.0.0:8080" before testing. + +## Authentik/Kavita login fix (2026-06-10, post cloud-failover) +PROBLEM: Cloudflare Tunnel load-balances auth./kavita. active-active across monk +and kscloud1. kscloud1's authentik had only the fresh `akadmin` bootstrap user +(not kenpat7177) and kscloud1's kavita had ZERO users -> ~50% of requests showed +"wrong password" on authentik and a "create admin account" (signup) screen on +kavita instead of login. This contradicts the earlier "fresh DBs are fine" +assumption - for IDENTITY apps it breaks login, so it was NOT acceptable. +FIX APPLIED (one-time sync, same pattern as uptime-kuma's kuma.db seed): +- pg_dump'd monk's authentik-postgres `authentik` db (--clean --if-exists), + scp'd to kscloud1, stopped authentik+authentik-worker on kscloud1, restored + via `docker exec -i authentik-postgres psql -U authentik -d authentik < dump`, + restarted. Worked cleanly because AUTHENTIK_SECRET_KEY and PG_PASS were + ALREADY IDENTICAL between monk's and kscloud1's authentik/.env. +- For kavita: copying the raw kavita.db file via plain `cp` produced + "database disk image is malformed" (WAL-mode db isn't standalone-consistent + as a flat file copy even when -wal/-shm look small). FIX: use python3 + sqlite3 `Connection.backup()` (via throwaway python:3-alpine container) to + produce a consistent copy, THEN on kscloud1 stop kavita, rm the OLD + kavita.db-shm/kavita.db-wal too (stale WAL files against new db = same + corruption error), copy in the new kavita.db (chown root:root, chmod 644 + to match original ownership - kavita container runs as root), restart. +- Result: kscloud1 authentik now has kenpat7177 (matches monk), kscloud1 + kavita now has kenpat7177 + acurrie (matches monk). Both connectors now + return the same login screen/credentials. NOTE: this is a ONE-TIME sync, + not continuous - if monk's users/passwords change later, kscloud1 will + drift again and the same symptoms could return; re-run this sync if so. +- kscloud1 kavita's library entries point at /books paths that don't exist on + kscloud1 (no actual book files there) - login works fine, but browsing the + library when served by kscloud1 will show entries with missing files. Same + "stale data" tradeoff as gitforge, accepted. + +## Authentik shared Postgres+Redis over Tailscale (2026-06-10) - fixes "invalid_grant" SSO +PROBLEM: Even after the one-time DB sync above, "Sign in with Authentik" on +Kavita could fail with "invalid_grant" / "Code does not exist". Root cause: +monk and kscloud1 each ran their OWN authentik-postgres. OAuth2 authorization +codes are short-lived per-flow rows in `authentik_providers_oauth2_authorizationcode` +- if Cloudflare Tunnel's active-active routing sends `/authorize` to one +connector and `/application/o/token/` to the other, the code only exists in +one of the two DBs -> invalid_grant. A one-time data sync can't fix this +because the data is created fresh on every login attempt. +FIX: Converted to a single shared Postgres+Redis (HA pattern), hosted on +kscloud1, reachable ONLY over Tailscale: +- Installed Tailscale on both monk and kscloud1 (same tailnet). kscloud1's + tailscale IP is `100.123.254.52`. +- kscloud1's `/opt/kitestacks/docker/authentik/docker-compose.yml`: + authentik-postgres now binds `100.123.254.52:5432:5432` (was unbound/internal-only), + authentik-redis now binds `100.123.254.52:6379:6379`. Both still also reachable + on the local `kitestacks` docker network for kscloud1's own authentik+worker. + Backup of pre-change file: `docker-compose.yml.backup-before-shared-db-20260610-1138`. +- monk's `~/kitestacks-live/docker/authentik/docker-compose.yml`: REMOVED the + `postgresql` and `redis` services entirely. monk's `authentik`/`authentik-worker` + now point `AUTHENTIK_POSTGRESQL__HOST` and `AUTHENTIK_REDIS__HOST` at + `100.123.254.52` (kscloud1 over Tailscale), using the same `PG_PASS` / + `AUTHENTIK_SECRET_KEY` as before (already identical between hosts). +- monk's old local `authentik-postgres`/`authentik-redis` containers were + STOPPED (not removed) - data dirs preserved under + `~/kitestacks-live/docker/authentik/postgres` in case of rollback, but no + longer in use. +- Result: BOTH connectors' authentik+worker now read/write the SAME db/redis, + regardless of which one handles `/authorize` vs `/application/o/token/`. + Verified both `authentik`+`authentik-worker` healthy on monk and kscloud1, + OIDC discovery docs identical, user list matches (`kenpat7177` etc.) on both. + CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" on Kavita now works + (when monk's connector serves the request). + +## Kavita "Sign in with Authentik" button missing on kscloud1 - FIXED 2026-06-10 +After the shared-Authentik-DB fix above, the button still didn't appear when +Cloudflare routed kavita.kitestacks.com to kscloud1's connector. CAUSE: Kavita's +OIDC config lives in ITS OWN db (kavita.db `ServerSetting` table, Key=40, a JSON +blob with Authority/ClientId/Secret/Enabled), separate from Authentik's db. +The earlier one-time kavita.db sync (see fix above) was taken BEFORE OIDC SSO +was configured in monk's Kavita, so kscloud1's copy had Key=40 with empty +Authority/Secret and `"Enabled":false`. FIX: copied monk's Key=40 JSON value +verbatim into kscloud1's kavita.db (stop kavita, `docker run --rm -v +.../kavita/config:/data -v fix.sql:/fix.sql alpine` + apk sqlite + `sqlite3 +/data/kavita.db < fix.sql` with `UPDATE ServerSetting SET Value='...' WHERE +"Key"=40`, restart kavita). NOTE: AspNetUserLogins (OIDC account-linkage table) +is empty on BOTH monk and kscloud1 - Kavita creates this row on first OIDC +login per-instance (matches existing local user by email since +ProvisionAccounts=false), so no extra action needed there. +GOTCHA: ServerSetting's PK column is `"Key"` (INTEGER), not `Id` - must quote +it in sqlite (`"Key"`) since KEY is a SQL reserved word. +DRIFT WARNING: any future Kavita server-setting change (OIDC config, library +paths, etc.) made on monk will NOT propagate to kscloud1's kavita.db +automatically - same one-time-sync caveat as the user-table sync above. + +UPDATE 2026-06-10 (RESOLVED via Kavita UI, not direct DB edit): Direct SQL +edits to ServerSetting Key=40 got WIPED back to disabled/empty by Kavita on +every container restart (RowVersion incremented +2 each time, Authority/Secret +cleared, Enabled->false) - confirmed twice, even with a full WAL-consistent +kavita.db replace from monk. Direct DB writes to this table do NOT survive a +restart; only saves through Kavita's own Settings UI/API persist correctly. +FIX: opened an SSH local port-forward (`ssh -L 5099:localhost:5000 +kenpat@5.78.233.28`) so the user could reach kscloud1's Kavita directly at +http://localhost:5099 (bypassing the Cloudflare load-balanced domain), logged +in with their normal kenpat7177 Kavita password, and re-entered the OIDC +config in Settings -> OIDC: + - Authority: `https://auth.kitestacks.com/application/o/kavita/` + (MUST include trailing slash - Kavita validates that this exactly matches + the `issuer` claim in Authentik's `.well-known/openid-configuration`, + which has a trailing slash. Without it: "Kavita can load the OIDC + configuration, but the issuer does not match".) + - Client ID: `kavita`, Client Secret: (96-hex-char secret from Authentik's + Kavita OAuth2 provider - watch for copy/paste truncation, verify length=96) + - Enabled: true, ProviderName: authentik +Saved via UI (RowVersion 8->12->14 across two saves to fix a 1-char-truncated +secret), then `docker compose restart kavita` on kscloud1 - config SURVIVED +this restart (unlike the direct-SQL attempts) and `/api/settings/oidc` now +reports `"enabled": true`. SSH tunnel closed afterward (no firewall changes +were made/needed). Set a temporary ApiKey on kenpat7177's kscloud1 kavita +account during troubleshooting (for a Plugin/authenticate attempt that turned +out to return 401 / unused) - left in place, harmless (grants API access to +that user's own account only). +TAKEAWAY FOR FUTURE KAVITA CONFIG CHANGES ON KSCLOUD1: always use the Kavita +UI (via SSH port-forward to localhost:5000) rather than direct sqlite edits - +direct edits to ServerSetting do not survive a restart. +CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" now works on Kavita +regardless of which connector (monk/kscloud1) answers. + +## Kavita cover images missing on kscloud1 - FIXED 2026-06-10 +After the kavita.db sync from monk, kscloud1's db referenced cover image files +(e.g. `v1_c1.png`..`v10_c10.png` in `ServerSetting`/`Series.CoverImage`) that +didn't exist on kscloud1's filesystem - kscloud1's `config/covers/` dir was +empty (monk has 9 files, ~1.3MB). Result: book/series cover thumbnails didn't +load when kscloud1 served the request. FIX: tar'd monk's +`~/kitestacks-live/docker/kavita/config/covers/` (owned 1000:1000), scp'd to +kscloud1, extracted into `/opt/kitestacks/docker/kavita/config/covers/` via a +throwaway alpine container, `chown -R 1000:1000`. No kavita restart needed - +covers are served as static files from disk. CONFIRMED BY USER: covers now +load correctly. +NOTE: this is another one-time sync (same drift caveat) - if new books/covers +are added on monk later, they won't appear on kscloud1 unless re-synced +(covers/ dir + kavita.db + actual book files under library/books, none of +which exist on kscloud1 per the earlier "stale data" note). +SECURITY NOTE: postgres/redis on kscloud1 are bound to the Tailscale interface +IP only (100.123.254.52), not 0.0.0.0 - not exposed to the public internet. +ROLLBACK: if Tailscale connectivity ever breaks, monk's authentik will fail to +start (can't reach 100.123.254.52). To roll back: restore monk's +docker-compose.yml from git/backup to use local postgresql/redis services +again, restart monk's old authentik-postgres/authentik-redis containers +(`docker start authentik-postgres authentik-redis` in +~/kitestacks-live/docker/authentik), `docker compose up -d`. Note this would +mean monk's authentik db is now STALE (kscloud1's shared db has any logins/ +changes since 2026-06-10) - would need a fresh pg_dump sync from kscloud1 first. + +## kscloud1 ufw blocks docker-bridge -> host port 8000 (metrics API) - FIXED 2026-06-10 +kscloud1 has ufw active with `default deny incoming/routed`. The +kitestacks-metrics-api-backup container (network_mode: host, binds 0.0.0.0:8000) +was unreachable from homepage-backup via `host.docker.internal:8000` (TCP +timeout, not refused -> ufw drop), causing the homepage System Status widget to +show 0%/"Offline" when kscloud1 served the request. FIXED by adding: +`sudo ufw allow from 172.16.0.0/12 to any port 8000 proto tcp` (covers all +docker bridge subnets on this host: 172.17-172.29.x.x). Verified +homepage-backup -> host.docker.internal:8000/api/metrics now returns real +CPU/RAM/storage/network data. kscloud1 sudo password is in the "kscloud1 +access" section above - needed `echo PASS | sudo -S ` (no askpass helper, +non-interactive sudo via -S works fine). + +## Portal SSO/coming-soon push + Karakeep fix + OpenProject EE blocker + Portainer SSO setup (2026-06-10) +Per user's original request: SSO for grafana/prometheus/portainer/cloudflare/uptime-kuma +(authentik=source), KITEAI/LITELLM/OPENROUTER -> "coming soon", SSO for openproject/forgejo/ +karakeep, FluxCD card in AI&Automation. Hard constraint: user does NOT use Cloudflare Zero +Trust/Access ("costs money") - any Cloudflare work must avoid those products (note: the +Tunnel UI itself lives under Cloudflare's "Zero Trust" dashboard section, but configuring a +Tunnel public hostname there is NOT the same as enabling Zero Trust/Access - fine to use). + +### Portal UI changes - DEPLOYED to all 3 copies, verified live +Edited the AI & AUTOMATION panel (`cards cards-3` -> `cards cards-2`, now 2x2): +Kite AI and OpenRouter cards changed from external links to +`href="#" data-coming-soon="1"` (LiteLLM was already coming-soon); added a 4th +card "FluxCD" / "GitOps Automation" using `/images/icons/fluxcd.png`, also +coming-soon (automation scripts with FluxCD+Prometheus+Grafana+node-exporter +are a future project). Applied identically to: +- `~/kitestacks-live/docker/kitestacks-portal-test/public/index.html` (monk, dev, port 3008) +- `~/kitestacks-live/docker/kitestacks-portal/public/index.html` (monk, LIVE, served by + "homepage" container 3005->3000 - this is the file that backs www.kitestacks.com) +- `/opt/kitestacks/docker/www-backup/kitestacks-portal/public/index.html` (kscloud1, + served by `homepage-backup` port 3015) +Verified `https://www.kitestacks.com` returns "FluxCD" consistently (6/6 requests +across both connectors). +NOTE: Portainer card on the live portal is currently `data-coming-soon="1"` - +update this to a real `href="https://portainer.kitestacks.com"` link (remove +data-coming-soon) once the Portainer SSO manual steps below are completed. +NOTE 2: "cloudflare should all be in the networking side" from the original +request was never resolved - Cloudflare card is still in the INFRASTRUCTURE +panel, not moved/renamed to a "NETWORKING" panel. Ambiguous, deprioritized, +not revisited. + +### Karakeep SSO redirect_uri fix - DONE, confirmed working +Karakeep uses NextAuth.js with provider id "custom" (not "authentik") - actual +OAuth callback path is `/api/auth/callback/custom`, but Authentik's Karakeep +OAuth2Provider's `_redirect_uris` had the wrong path -> "Redirect URI Error". +FIX: direct Postgres UPDATE to +`authentik_providers_oauth2_oauth2provider._redirect_uris` (JSON column) on +the shared kscloud1 authentik-postgres (100.123.254.52), wrapped in explicit +`BEGIN; UPDATE ...; COMMIT;` (a bare single-statement -c "UPDATE..." reported +"UPDATE 1" but did NOT persist on first attempt - cause unclear, explicit +transaction fixed it). After the DB write, restarted authentik+authentik-worker +on BOTH monk and kscloud1 and polled +`docker inspect --format '{{.State.Health.Status}}'` until both reported +"healthy" (~50s) before retesting - first retest hit a transient 502 because +kscloud1's authentik was still "starting". CONFIRMED: Authentik now serves the +login page (not "Redirect URI Error") for Karakeep SSO. +PG_PASS GOTCHA: `~/kitestacks-live/docker/authentik/.env` PG_PASS value ends in +`=` - extract with `cut -d= -f2-` (NOT `-f2`, which truncates the trailing `=` +and causes "password authentication failed"). +REUSABLE PATTERN for any future direct Authentik DB edit: (1) wrap writes in +explicit BEGIN/COMMIT, (2) restart authentik+authentik-worker on BOTH monk and +kscloud1, (3) wait for health=healthy on both before testing. + +### OpenProject SSO - config bug fixed, but BLOCKED by Enterprise licensing (no further action possible) +`~/kitestacks-live/docker/openproject/docker-compose.yml` env vars were wrong in +two ways: (1) extra "PROVIDERS_" segment in var names caused +`seed_oidc_provider = {"providers": {"authentik": {...}}}` instead of +`{"authentik": {...}}`, producing a broken stub provider record (slug= +"providers", id=1, since deleted via Rails runner); (2) `discovery_endpoint` +isn't read by `ConfigurationMapper` at all - replaced with explicit +ISSUER/AUTHORIZATION__ENDPOINT/TOKEN__ENDPOINT/USERINFO__ENDPOINT/ +END__SESSION__ENDPOINT/JWKS__URI vars (current docker-compose.yml has the +corrected version, see file - all derived from +`https://auth.kitestacks.com/application/o/openproject/.well-known/openid-configuration`). +After fixing both, the seeder correctly creates provider slug="authentik", +available=true, all fields correct - BUT the SSO button still does not appear +on /login. CONFIRMED ROOT CAUSE (terminal, source-code-verified): OpenProject +CE 2025/v15's OmniAuth SSO strategy +(`OpenProject::Plugins::AuthPlugin`/`OpenIDConnect`) AND SAML +(`auth_saml/lib/open_project/auth_saml/engine.rb`, `enterprise_feature: +"sso_auth_providers"`) are BOTH gated behind an Enterprise Edition license - +"OmniAuth SSO strategy ... is only available for Enterprise Editions". No +app/config-level workaround exists. Only remaining options: buy EE license, OR +put a forward-auth proxy (oauth2-proxy / Authentik embedded outpost) in front +of OpenProject - DEFERRED along with Prometheus/Uptime Kuma proxy work (see +below) until Oracle VPS topology is decided. +OpenProject container is healthy, `/login` returns 200, no projects yet. + +### Portainer SSO - Authentik side DONE, two manual steps PENDING (not yet done by user) +Per user: "yes continue with portainer" / "yes but make sure it is still +secure" (approved exposing Portainer publicly via a NEW Cloudflare Tunnel +hostname, with explicit requirement to keep it secure -> access restricted to +the `homelab-admin` Authentik group). +Created via `docker exec authentik ak shell` (Django ORM, no Authentik API +token configured) on kscloud1's shared authentik-postgres: +- OAuth2Provider "Portainer": client_id=`portainer`, + client_secret=`wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF`, + provider_id=9, redirect_uri=`https://portainer.kitestacks.com` (strict), + scopes openid/email/profile, sub_mode=user_email, signing key + flows copied + from existing providers (same pattern as Karakeep/Grafana). +- Application "Portainer" (slug="portainer", meta_launch_url= + `https://portainer.kitestacks.com`). +- PolicyBinding restricting the Portainer application to Authentik group + `homelab-admin` (UUID e21b0aa5-62e7-4b3a-8302-130b0ae148a5) - this is the + "make sure it is still secure" piece (only homelab-admin members can SSO in). +- Verified discovery doc resolves: + `https://auth.kitestacks.com/application/o/portainer/.well-known/openid-configuration`. +PENDING MANUAL STEPS (user must do via UI - confirmed `portainer.kitestacks.com` +still returns `000` as of 2026-06-10): +1. Cloudflare dashboard -> Tunnel -> add Public Hostname `portainer.kitestacks.com` + -> service `https://portainer:9443` (HTTPS), enable "No TLS Verify". (This is + in the Tunnel config UI, which Cloudflare happens to host under the "Zero + Trust" nav section, but adding a Tunnel hostname is NOT enabling Zero + Trust/Access - does not violate the no-Zero-Trust constraint.) +2. In Portainer -> Settings -> Authentication -> OAuth (Provider: Custom), on + BOTH monk's and kscloud1's SEPARATE Portainer instances, configure: + - Client ID: `portainer` + - Client Secret: `wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF` + - Authorization URL: `https://auth.kitestacks.com/application/o/authorize/` + - Access Token URL: `https://auth.kitestacks.com/application/o/token/` + - Resource/Userinfo URL: `https://auth.kitestacks.com/application/o/userinfo/` + - Redirect URL: `https://portainer.kitestacks.com` + - Logout URL: `https://auth.kitestacks.com/application/o/portainer/end-session/` + - Scopes: `openid email profile`, User identifier claim: `email` +AFTER both steps done: update the live portal's Portainer card (in the 3 files +above) from `data-coming-soon="1"` to a real +`href="https://portainer.kitestacks.com" target="_blank" rel="noopener"` link. + +### App-level SSO status summary (end of 2026-06-10 session) +Grafana: working (pre-existing). Forgejo: working (pre-existing). Karakeep: +fixed this session, working. OpenProject: blocked by EE license (terminal at +app level). Portainer: Authentik side done, waiting on user's 2 manual steps +above. Prometheus + Uptime Kuma: DEFERRED - neither has native OAuth, need a +forward-auth proxy (oauth2-proxy or Authentik embedded outpost) - deferred per +user's "ok lets do smaller app level" (hold new infra until Oracle VPS decided). +Cloudflare itself: no SSO concept applicable (it's Cloudflare's own dashboard +login) - was always about the portal's Cloudflare card placement, see "Portal UI +changes" note above. + +### Oracle VPS migration - PLANNED, upcoming (stated 2026-06-11) +User confirmed on 2026-06-11: "we are going to switch things soon from hetzner +cloud to oracle soon." -> kscloud1 (Hetzner, 5.78.233.28) is intended to be +REPLACED by an Oracle Cloud VPS in the near future ("soon", no firm date yet). +Originally raised 2026-06-10 as exploratory ("how easy would it be to move +everything to oracle vps after?"), now an actual plan. +Implication: avoid investing further one-off/manual config work that's hard to +redo (e.g. more one-time DB syncs, hand-edited sqlite, etc.) on kscloud1 if +avoidable - prefer changes that are easy to replicate on a new host. When the +Oracle VPS is provisioned, plan to follow the same pattern as the kscloud1 +cloud-failover build-out (new Cloudflare Tunnel connector + full service +replicas + shared Authentik/Postgres/Redis over Tailscale + the Forgejo +FORGEJO_API_BASE-over-Tailscale pattern for the portal's Recent Activity, see +"Recent Activity" fix below) - then retire kscloud1 the same way assassin/T14 +was retired (decommission once Oracle replica verified working). + +## Prior migration gotchas (monk, kept for reference - see git history/old notes if needed) +- rsync --files-from recursion bug, bind-mount postgres dirs come over empty as + non-root (use pg_dumpall/pg_dump --clean from running container instead), + pg_dumpall --clean across template1 breaks on client/server version mismatch + (use single-db pg_dump+psql instead), grafana data dir needs chown 472:472, + kite-litellm needed manual `docker network connect kitestacks kite-litellm`. + +## 2026-06-12: SSO fixes + Portainer deployed on kscloud1 + +### Root cause: monk reconnect race condition +When monk goes offline (user travels) and reconnects, Cloudflare starts routing +some token exchange requests to monk while codes were created on kscloud1 during +the offline window. Auth codes had a 60-second TTL, which expired before monk's +Authentik fully started (~5 min startup). FIX: increased `access_code_validity` +from `minutes=1` to `minutes=10` for ALL 9 OAuth2 providers in the shared Postgres +DB. This gives enough buffer for monk's containers to start before codes expire. +Command used (via python:3-alpine container): +`docker run --rm --network host -v /tmp/fix_auth.py:/fix.py python:3-alpine sh -c ...` +connecting to shared Postgres at 100.123.254.52. + +### Karakeep redirect_uri reverted and re-fixed +The Karakeep OAuth2Provider `_redirect_uris` had reverted back to the proxy pattern +(`/outpost.goauthentik.io/callback?...`) instead of the correct NextAuth callback +(`https://links.kitestacks.com/api/auth/callback/custom`). This caused "Redirect URI +Error" from Authentik whenever SSO was attempted. Root cause unknown (possibly an +Authentik blueprint or UI save that regenerated/overrode the field). FIX: same +Postgres UPDATE pattern. WATCH: if this reverts again, check Authentik blueprints +or if someone modified the Karakeep provider via the Authentik admin UI. + +### Portainer deployed on kscloud1 +Created `/opt/kitestacks/docker/portainer/docker-compose.yml` (same image/config as +monk's portainer). Container running as `portainer`, port 9443:9443, on `kitestacks` +network. Volume is local (NOT shared with monk - fresh Portainer instance). +STILL PENDING (user action in Cloudflare dashboard): +- Tunnel ID: 5e60ea8e-a543-49b6-bab5-325f39441e00, Account: d0bb7673333fcd794622956f1662f785 +- Add hostname `portainer.kitestacks.com` → service `https://portainer:9443`, No TLS Verify +STILL PENDING (user action in both Portainer UIs): configure OAuth (see prior notes +in "Portainer SSO" section above for exact credentials). +Portal card update (3 files) also still pending until tunnel+OAuth done. + +## Phase 2 Planned: Obsidian Mind Map → HTML Mind Map Sync +User wants to create an Obsidian mind map of the KiteStacks homelab that syncs/exports to a live HTML mind map embedded in the homelab portal or a standalone page. To be built after full Obsidian+samurai setup is complete.