claude-memory/project-kitestacks-migration.md

56 KiB

name description metadata
project-kitestacks-migration Migration of the live KiteStacks homelab/website from assassin (T14) to monk — COMPLETE. Plus full Hetzner cloud failover (kscloud1, 5.78.233.28) — COMPLETE. All 9 subdomains can run from any single host. Plus 2026-06-10 portal/SSO push: portal FluxCD+coming-soon changes deployed, Karakeep SSO fixed, OpenProject SSO blocked by EE license, Portainer SSO Authentik-side done (pending user manual steps).
node_type type originSessionId
memory project 33992890-3940-4d4a-a94a-22b5621e9c1a

Final Polish, Security, and Runbook Completion (2026-06-15)

The KiteStacks infrastructure is now in its final, secured, and documented state:

  • GitOps UI/Dashboard: Added a standalone Nginx container for FluxCD status, bypassing Authentik so Cloudflare edge can route it freely. The dashboard is live at flux.kitestacks.com.
  • Security Posture: Validated Zero Trust architecture. No inbound open ports, strict mesh networking via Tailscale 100.x.x.x, and Authentik protecting all administrative dashboards (/scp/ for osTicket, Portainer, Grafana, Kite AI).
  • Runbook Cleaned: RUNBOOK.md truncated and organized. Historical issues (like Authentik invalid_grant, osTicket email SMTP lack of MTA) have been relocated to docs/DEBUGGING.md.
  • osTicket Diagnostics: Documented that activation emails fail because Docker containers lack a local MTA. Fix involves adding an external SMTP server in the osTicket Admin Panel.
  • Cloudflare Multi-Node Routing: Diagnosed persistent 502 errors on new subdomains (like ntfy). Cloudflare Tunnels actively load balance between monk and kscloud1. Documented that all new services must be deployed to both nodes to prevent the load balancer from sending traffic to a missing container. Subsequently resolved the ntfy 502 error by deploying the container to the kscloud1 replica and syncing its user.db via Tailscale SSH.

T14s GitOps Automation SUCCESS (2026-06-15)

The cluster configuration originally for "assassin" (T14) has been moved to the T14s. The machine is now fully bootstrapped with FluxCD GitOps.

  • Cluster Hostname: monk (T14s)
  • GitOps Repo: kitestacks-homelab (main branch)
  • Path: clusters/T14s
  • Automation: FluxCD is now managing the kavita namespace.
  • Kavita Manifests:
    • Deployment, Service, PVC (2Gi local-path), and Namespace.
    • Successfully synced and running (verified 2026-06-15).
  • Credentials: Authentik password for kenpat7177 reset to KiteStacks2026!.
  • osTicket: Services started, DB unified on kscloud1, and verified accessible via Authentik LDAP.

The GitOps workflow is now the authoritative way to manage Kubernetes apps on the T14s.

monk is the live production host. assassin (T14) is OFF. kscloud1 (Hetzner VPS, 5.78.233.28) is now a THIRD active Cloudflare Tunnel connector and runs a FULL replica of all 9 services, so the site stays up even if both monk and assassin are off (verified by user testing with home wifi off, from phone + mom's phone).

All 9 public subdomains (www, ai, auth, gitforge, grafana, kavita, links, status, tasks) verified returning correct status codes via the live tunnel with kscloud1 in rotation.

Governing principle (user's explicit words)

"leave the cloud backup on at all times" / "thats the point of it. if I am travelling my site will go down otherwise." -> kscloud1 runs PERMANENTLY as a 3rd connector, NOT cold standby. Cloudflare Tunnel load-balances ACTIVE-ACTIVE across all 3 connectors (no primary/backup priority). This means stateful apps (gitforge, openproject, authentik, karakeep, kavita, openwebui, etc.) may show DIFFERENT/STALE data depending on which connector serves a given request - EXPLICITLY ACCEPTED by user as the cost of guaranteed uptime. Fresh/separate databases on kscloud1 are fine; do not try to sync data between monk and kscloud1.

kscloud1 access

SSH: ssh -i ~/.ssh/id_ed25519_kscloud1 kenpat@5.78.233.28 (passwordless, key auth). sudo needs a password ("p12217177") and has no askpass helper - avoid sudo; most things doable as kenpat or via docker. All services live under /opt/kitestacks/docker/<service>/docker-compose.yml, same one-dir-per-app pattern as monk's ~/kitestacks-live/docker/.

kscloud1 services deployed (all docker compose up -d, joined to local kitestacks network)

  • cloudflared (3rd tunnel connector, same TUNNEL_TOKEN, connector id 78521d9f-71c0-4e3d-992f-bd1f77da1a8f)
  • homepage-backup (alias homepage) + caddy + kitestacks-metrics-api-backup - PRE-EXISTING
  • forgejo (alias forgejo) - PRE-EXISTING, separate DB from monk's (gitforge data inconsistent across connectors, accepted)
  • prometheus + node-exporter (job kscloud1-node)
  • grafana (alias grafana, port 3150) - Prometheus datasource (uid 000000001) + "Node Exporter Full" dashboard (id 1860) provisioned via ./provisioning/. OAuth->authentik config present but authentik on kscloud1 has a FRESH db (no provider apps configured) so OAuth login won't work there; local admin login works.
  • uptime-kuma (alias status->uptime-kuma) - kuma.db seeded by copying monk's admin user (same login: kenpat / same password hash) + monitors: kscloud1 self-ping, Google DNS, and HTTP checks for all 9 *.kitestacks.com subdomains (external monitoring of the live site).
  • kavita (alias kavita) - empty library (fresh)
  • karakeep + karakeep-chrome + karakeep-meilisearch (alias karakeep) - fresh meilisearch/db
  • authentik + authentik-worker + authentik-postgres + authentik-redis (alias on auth) - FRESH DB. Bootstrap admin: akadmin@kitestacks.com / password 6KlYpfCyYxbnKQNiOewN (set via AUTHENTIK_BOOTSTRAP_PASSWORD in .env). No OAuth provider apps exist yet (would need to be manually recreated in authentik UI for grafana/openwebui/karakeep/openproject SSO to work when kscloud1 is the active backend).
  • kite-litellm + kite-openwebui (alias ai->openwebui) - same .env/secrets as monk. OpenWebUI has ENABLE_SIGNUP=true (changed from monk's false) so kenpat can create a local admin account on first use, since authentik OAuth won't work with kscloud1's fresh authentik.
  • openproject (alias on tasks, port 8090:80 host - port 80 was taken by caddy) - FRESH db, self-initialized via the all-in-one image's bootstrap (took ~3-4 min). Empty/no projects yet.

monk-side changes made for cross-host monitoring

  • ~/kitestacks-live/docker/prometheus/prometheus.yml: added scrape job kscloud1-node -> 5.78.233.28:9100 (kscloud1's node-exporter is exposed 0.0.0.0:9100, no firewall - reachable from monk's public IP). monk's grafana (the live one, "Node Exporter Full" dashboard now provisioned via ~/kitestacks-live/docker/grafana/provisioning/) shows BOTH t14-node (monk/"this pc") and kscloud1-node ("the cloud") via the instance picker.
  • kscloud1's prometheus only scrapes itself (kscloud1-node) - monk is behind home NAT, not reachable from kscloud1.

Resource notes (kscloud1: 3 vCPU, 3.7GB RAM + 6GB swap, 75GB disk)

With all services running: ~2.8-2.9GB RAM used, ~2.6-2.8GB swap used (of 6GB), ~835MB-1.2GB "available", disk 29GB/75GB used. Site is functional but under memory pressure - if BOTH monk and assassin are down for an extended period with real concurrent usage, expect sluggishness (esp. openproject/authentik/ openwebui). Not yet stress-tested under real failover load.

Key gotchas from THIS phase (cloud failover build-out)

  • kscloud1's kitestacks Docker network is LOCAL/separate from monk's (same name, no conflict). cloudflared on each host resolves container names against its own host's network.
  • Adding a new tunnel connector that lacks a backend for an ingress hostname -> 502 for requests routed there. If it has a DIFFERENT backend (e.g. forgejo) -> serves different data inconsistently. Both accepted/expected now that all 9 hostnames have backends on kscloud1.
  • port 80 on kscloud1 is owned by caddy (serves www-backup/git-backup.kitestacks.com direct A-records, pre-existing, unrelated to the tunnel) - openproject uses 8090:80 for its host port instead (internal container port 8080 is what cloudflared hits).
  • uptime-kuma / grafana have no simple file-based config API for monitors/datasources beyond grafana provisioning - used direct sqlite manipulation (docker exec ... sqlite3, or python3 sqlite3 module via a throwaway python:3-alpine container with the volume mounted) to seed uptime-kuma's kuma.db with users/monitors.
  • authentik first boot takes ~1-2 min (migrations); openproject first boot takes ~4-5 min (postgres initdb + Rails migrations + Puma boot), watch docker logs for "Listening on http://0.0.0.0:8080" before testing.

Authentik/Kavita login fix (2026-06-10, post cloud-failover)

PROBLEM: Cloudflare Tunnel load-balances auth./kavita. active-active across monk and kscloud1. kscloud1's authentik had only the fresh akadmin bootstrap user (not kenpat7177) and kscloud1's kavita had ZERO users -> ~50% of requests showed "wrong password" on authentik and a "create admin account" (signup) screen on kavita instead of login. This contradicts the earlier "fresh DBs are fine" assumption - for IDENTITY apps it breaks login, so it was NOT acceptable. FIX APPLIED (one-time sync, same pattern as uptime-kuma's kuma.db seed):

  • pg_dump'd monk's authentik-postgres authentik db (--clean --if-exists), scp'd to kscloud1, stopped authentik+authentik-worker on kscloud1, restored via docker exec -i authentik-postgres psql -U authentik -d authentik < dump, restarted. Worked cleanly because AUTHENTIK_SECRET_KEY and PG_PASS were ALREADY IDENTICAL between monk's and kscloud1's authentik/.env.
  • For kavita: copying the raw kavita.db file via plain cp produced "database disk image is malformed" (WAL-mode db isn't standalone-consistent as a flat file copy even when -wal/-shm look small). FIX: use python3 sqlite3 Connection.backup() (via throwaway python:3-alpine container) to produce a consistent copy, THEN on kscloud1 stop kavita, rm the OLD kavita.db-shm/kavita.db-wal too (stale WAL files against new db = same corruption error), copy in the new kavita.db (chown root:root, chmod 644 to match original ownership - kavita container runs as root), restart.
  • Result: kscloud1 authentik now has kenpat7177 (matches monk), kscloud1 kavita now has kenpat7177 + acurrie (matches monk). Both connectors now return the same login screen/credentials. NOTE: this is a ONE-TIME sync, not continuous - if monk's users/passwords change later, kscloud1 will drift again and the same symptoms could return; re-run this sync if so.
  • kscloud1 kavita's library entries point at /books paths that don't exist on kscloud1 (no actual book files there) - login works fine, but browsing the library when served by kscloud1 will show entries with missing files. Same "stale data" tradeoff as gitforge, accepted.

Authentik shared Postgres+Redis over Tailscale (2026-06-10) - fixes "invalid_grant" SSO

PROBLEM: Even after the one-time DB sync above, "Sign in with Authentik" on Kavita could fail with "invalid_grant" / "Code does not exist". Root cause: monk and kscloud1 each ran their OWN authentik-postgres. OAuth2 authorization codes are short-lived per-flow rows in authentik_providers_oauth2_authorizationcode

  • if Cloudflare Tunnel's active-active routing sends /authorize to one connector and /application/o/token/ to the other, the code only exists in one of the two DBs -> invalid_grant. A one-time data sync can't fix this because the data is created fresh on every login attempt. FIX: Converted to a single shared Postgres+Redis (HA pattern), hosted on kscloud1, reachable ONLY over Tailscale:
  • Installed Tailscale on both monk and kscloud1 (same tailnet). kscloud1's tailscale IP is 100.123.254.52.
  • kscloud1's /opt/kitestacks/docker/authentik/docker-compose.yml: authentik-postgres now binds 100.123.254.52:5432:5432 (was unbound/internal-only), authentik-redis now binds 100.123.254.52:6379:6379. Both still also reachable on the local kitestacks docker network for kscloud1's own authentik+worker. Backup of pre-change file: docker-compose.yml.backup-before-shared-db-20260610-1138.
  • monk's ~/kitestacks-live/docker/authentik/docker-compose.yml: REMOVED the postgresql and redis services entirely. monk's authentik/authentik-worker now point AUTHENTIK_POSTGRESQL__HOST and AUTHENTIK_REDIS__HOST at 100.123.254.52 (kscloud1 over Tailscale), using the same PG_PASS / AUTHENTIK_SECRET_KEY as before (already identical between hosts).
  • monk's old local authentik-postgres/authentik-redis containers were STOPPED (not removed) - data dirs preserved under ~/kitestacks-live/docker/authentik/postgres in case of rollback, but no longer in use.
  • Result: BOTH connectors' authentik+worker now read/write the SAME db/redis, regardless of which one handles /authorize vs /application/o/token/. Verified both authentik+authentik-worker healthy on monk and kscloud1, OIDC discovery docs identical, user list matches (kenpat7177 etc.) on both. CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" on Kavita now works (when monk's connector serves the request).

Kavita "Sign in with Authentik" button missing on kscloud1 - FIXED 2026-06-10

After the shared-Authentik-DB fix above, the button still didn't appear when Cloudflare routed kavita.kitestacks.com to kscloud1's connector. CAUSE: Kavita's OIDC config lives in ITS OWN db (kavita.db ServerSetting table, Key=40, a JSON blob with Authority/ClientId/Secret/Enabled), separate from Authentik's db. The earlier one-time kavita.db sync (see fix above) was taken BEFORE OIDC SSO was configured in monk's Kavita, so kscloud1's copy had Key=40 with empty Authority/Secret and "Enabled":false. FIX: copied monk's Key=40 JSON value verbatim into kscloud1's kavita.db (stop kavita, docker run --rm -v .../kavita/config:/data -v fix.sql:/fix.sql alpine + apk sqlite + sqlite3 /data/kavita.db < fix.sql with UPDATE ServerSetting SET Value='...' WHERE "Key"=40, restart kavita). NOTE: AspNetUserLogins (OIDC account-linkage table) is empty on BOTH monk and kscloud1 - Kavita creates this row on first OIDC login per-instance (matches existing local user by email since ProvisionAccounts=false), so no extra action needed there. GOTCHA: ServerSetting's PK column is "Key" (INTEGER), not Id - must quote it in sqlite ("Key") since KEY is a SQL reserved word. DRIFT WARNING: any future Kavita server-setting change (OIDC config, library paths, etc.) made on monk will NOT propagate to kscloud1's kavita.db automatically - same one-time-sync caveat as the user-table sync above.

UPDATE 2026-06-10 (RESOLVED via Kavita UI, not direct DB edit): Direct SQL edits to ServerSetting Key=40 got WIPED back to disabled/empty by Kavita on every container restart (RowVersion incremented +2 each time, Authority/Secret cleared, Enabled->false) - confirmed twice, even with a full WAL-consistent kavita.db replace from monk. Direct DB writes to this table do NOT survive a restart; only saves through Kavita's own Settings UI/API persist correctly. FIX: opened an SSH local port-forward (ssh -L 5099:localhost:5000 kenpat@5.78.233.28) so the user could reach kscloud1's Kavita directly at http://localhost:5099 (bypassing the Cloudflare load-balanced domain), logged in with their normal kenpat7177 Kavita password, and re-entered the OIDC config in Settings -> OIDC:

  • Authority: https://auth.kitestacks.com/application/o/kavita/ (MUST include trailing slash - Kavita validates that this exactly matches the issuer claim in Authentik's .well-known/openid-configuration, which has a trailing slash. Without it: "Kavita can load the OIDC configuration, but the issuer does not match".)
  • Client ID: kavita, Client Secret: (96-hex-char secret from Authentik's Kavita OAuth2 provider - watch for copy/paste truncation, verify length=96)
  • Enabled: true, ProviderName: authentik Saved via UI (RowVersion 8->12->14 across two saves to fix a 1-char-truncated secret), then docker compose restart kavita on kscloud1 - config SURVIVED this restart (unlike the direct-SQL attempts) and /api/settings/oidc now reports "enabled": true. SSH tunnel closed afterward (no firewall changes were made/needed). Set a temporary ApiKey on kenpat7177's kscloud1 kavita account during troubleshooting (for a Plugin/authenticate attempt that turned out to return 401 / unused) - left in place, harmless (grants API access to that user's own account only). TAKEAWAY FOR FUTURE KAVITA CONFIG CHANGES ON KSCLOUD1: always use the Kavita UI (via SSH port-forward to localhost:5000) rather than direct sqlite edits - direct edits to ServerSetting do not survive a restart. CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" now works on Kavita regardless of which connector (monk/kscloud1) answers.

Kavita cover images missing on kscloud1 - FIXED 2026-06-10

After the kavita.db sync from monk, kscloud1's db referenced cover image files (e.g. v1_c1.png..v10_c10.png in ServerSetting/Series.CoverImage) that didn't exist on kscloud1's filesystem - kscloud1's config/covers/ dir was empty (monk has 9 files, ~1.3MB). Result: book/series cover thumbnails didn't load when kscloud1 served the request. FIX: tar'd monk's ~/kitestacks-live/docker/kavita/config/covers/ (owned 1000:1000), scp'd to kscloud1, extracted into /opt/kitestacks/docker/kavita/config/covers/ via a throwaway alpine container, chown -R 1000:1000. No kavita restart needed - covers are served as static files from disk. CONFIRMED BY USER: covers now load correctly. NOTE: this is another one-time sync (same drift caveat) - if new books/covers are added on monk later, they won't appear on kscloud1 unless re-synced (covers/ dir + kavita.db + actual book files under library/books, none of which exist on kscloud1 per the earlier "stale data" note). SECURITY NOTE: postgres/redis on kscloud1 are bound to the Tailscale interface IP only (100.123.254.52), not 0.0.0.0 - not exposed to the public internet. ROLLBACK: if Tailscale connectivity ever breaks, monk's authentik will fail to start (can't reach 100.123.254.52). To roll back: restore monk's docker-compose.yml from git/backup to use local postgresql/redis services again, restart monk's old authentik-postgres/authentik-redis containers (docker start authentik-postgres authentik-redis in ~/kitestacks-live/docker/authentik), docker compose up -d. Note this would mean monk's authentik db is now STALE (kscloud1's shared db has any logins/ changes since 2026-06-10) - would need a fresh pg_dump sync from kscloud1 first.

kscloud1 ufw blocks docker-bridge -> host port 8000 (metrics API) - FIXED 2026-06-10

kscloud1 has ufw active with default deny incoming/routed. The kitestacks-metrics-api-backup container (network_mode: host, binds 0.0.0.0:8000) was unreachable from homepage-backup via host.docker.internal:8000 (TCP timeout, not refused -> ufw drop), causing the homepage System Status widget to show 0%/"Offline" when kscloud1 served the request. FIXED by adding: sudo ufw allow from 172.16.0.0/12 to any port 8000 proto tcp (covers all docker bridge subnets on this host: 172.17-172.29.x.x). Verified homepage-backup -> host.docker.internal:8000/api/metrics now returns real CPU/RAM/storage/network data. kscloud1 sudo password is in the "kscloud1 access" section above - needed echo PASS | sudo -S <cmd> (no askpass helper, non-interactive sudo via -S works fine).

Portal SSO/coming-soon push + Karakeep fix + OpenProject EE blocker + Portainer SSO setup (2026-06-10)

Per user's original request: SSO for grafana/prometheus/portainer/cloudflare/uptime-kuma (authentik=source), KITEAI/LITELLM/OPENROUTER -> "coming soon", SSO for openproject/forgejo/ karakeep, FluxCD card in AI&Automation. Hard constraint: user does NOT use Cloudflare Zero Trust/Access ("costs money") - any Cloudflare work must avoid those products (note: the Tunnel UI itself lives under Cloudflare's "Zero Trust" dashboard section, but configuring a Tunnel public hostname there is NOT the same as enabling Zero Trust/Access - fine to use).

Portal UI changes - DEPLOYED to all 3 copies, verified live

Edited the AI & AUTOMATION panel (cards cards-3 -> cards cards-2, now 2x2): Kite AI and OpenRouter cards changed from external links to href="#" data-coming-soon="1" (LiteLLM was already coming-soon); added a 4th card "FluxCD" / "GitOps Automation" using /images/icons/fluxcd.png, also coming-soon (automation scripts with FluxCD+Prometheus+Grafana+node-exporter are a future project). Applied identically to:

  • ~/kitestacks-live/docker/kitestacks-portal-test/public/index.html (monk, dev, port 3008)
  • ~/kitestacks-live/docker/kitestacks-portal/public/index.html (monk, LIVE, served by "homepage" container 3005->3000 - this is the file that backs www.kitestacks.com)
  • /opt/kitestacks/docker/www-backup/kitestacks-portal/public/index.html (kscloud1, served by homepage-backup port 3015) Verified https://www.kitestacks.com returns "FluxCD" consistently (6/6 requests across both connectors). NOTE: Portainer card on the live portal is currently data-coming-soon="1" - update this to a real href="https://portainer.kitestacks.com" link (remove data-coming-soon) once the Portainer SSO manual steps below are completed. NOTE 2: "cloudflare should all be in the networking side" from the original request was never resolved - Cloudflare card is still in the INFRASTRUCTURE panel, not moved/renamed to a "NETWORKING" panel. Ambiguous, deprioritized, not revisited.

Karakeep SSO redirect_uri fix - DONE, confirmed working

Karakeep uses NextAuth.js with provider id "custom" (not "authentik") - actual OAuth callback path is /api/auth/callback/custom, but Authentik's Karakeep OAuth2Provider's _redirect_uris had the wrong path -> "Redirect URI Error". FIX: direct Postgres UPDATE to authentik_providers_oauth2_oauth2provider._redirect_uris (JSON column) on the shared kscloud1 authentik-postgres (100.123.254.52), wrapped in explicit BEGIN; UPDATE ...; COMMIT; (a bare single-statement -c "UPDATE..." reported "UPDATE 1" but did NOT persist on first attempt - cause unclear, explicit transaction fixed it). After the DB write, restarted authentik+authentik-worker on BOTH monk and kscloud1 and polled docker inspect --format '{{.State.Health.Status}}' until both reported "healthy" (~50s) before retesting - first retest hit a transient 502 because kscloud1's authentik was still "starting". CONFIRMED: Authentik now serves the login page (not "Redirect URI Error") for Karakeep SSO. PG_PASS GOTCHA: ~/kitestacks-live/docker/authentik/.env PG_PASS value ends in = - extract with cut -d= -f2- (NOT -f2, which truncates the trailing = and causes "password authentication failed"). REUSABLE PATTERN for any future direct Authentik DB edit: (1) wrap writes in explicit BEGIN/COMMIT, (2) restart authentik+authentik-worker on BOTH monk and kscloud1, (3) wait for health=healthy on both before testing.

OpenProject SSO - config bug fixed, but BLOCKED by Enterprise licensing (no further action possible)

~/kitestacks-live/docker/openproject/docker-compose.yml env vars were wrong in two ways: (1) extra "PROVIDERS_" segment in var names caused seed_oidc_provider = {"providers": {"authentik": {...}}} instead of {"authentik": {...}}, producing a broken stub provider record (slug= "providers", id=1, since deleted via Rails runner); (2) discovery_endpoint isn't read by ConfigurationMapper at all - replaced with explicit ISSUER/AUTHORIZATION__ENDPOINT/TOKEN__ENDPOINT/USERINFO__ENDPOINT/ END__SESSION__ENDPOINT/JWKS__URI vars (current docker-compose.yml has the corrected version, see file - all derived from https://auth.kitestacks.com/application/o/openproject/.well-known/openid-configuration). After fixing both, the seeder correctly creates provider slug="authentik", available=true, all fields correct - BUT the SSO button still does not appear on /login. CONFIRMED ROOT CAUSE (terminal, source-code-verified): OpenProject CE 2025/v15's OmniAuth SSO strategy (OpenProject::Plugins::AuthPlugin/OpenIDConnect) AND SAML (auth_saml/lib/open_project/auth_saml/engine.rb, enterprise_feature: "sso_auth_providers") are BOTH gated behind an Enterprise Edition license - "OmniAuth SSO strategy ... is only available for Enterprise Editions". No app/config-level workaround exists. Only remaining options: buy EE license, OR put a forward-auth proxy (oauth2-proxy / Authentik embedded outpost) in front of OpenProject - DEFERRED along with Prometheus/Uptime Kuma proxy work (see below) until Oracle VPS topology is decided. OpenProject container is healthy, /login returns 200, no projects yet.

Portainer SSO - Authentik side DONE, two manual steps PENDING (not yet done by user)

Per user: "yes continue with portainer" / "yes but make sure it is still secure" (approved exposing Portainer publicly via a NEW Cloudflare Tunnel hostname, with explicit requirement to keep it secure -> access restricted to the homelab-admin Authentik group). Created via docker exec authentik ak shell (Django ORM, no Authentik API token configured) on kscloud1's shared authentik-postgres:

  • OAuth2Provider "Portainer": client_id=portainer, client_secret=wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF, provider_id=9, redirect_uri=https://portainer.kitestacks.com (strict), scopes openid/email/profile, sub_mode=user_email, signing key + flows copied from existing providers (same pattern as Karakeep/Grafana).
  • Application "Portainer" (slug="portainer", meta_launch_url= https://portainer.kitestacks.com).
  • PolicyBinding restricting the Portainer application to Authentik group homelab-admin (UUID e21b0aa5-62e7-4b3a-8302-130b0ae148a5) - this is the "make sure it is still secure" piece (only homelab-admin members can SSO in).
  • Verified discovery doc resolves: https://auth.kitestacks.com/application/o/portainer/.well-known/openid-configuration. PENDING MANUAL STEPS (user must do via UI - confirmed portainer.kitestacks.com still returns 000 as of 2026-06-10):
  1. Cloudflare dashboard -> Tunnel -> add Public Hostname portainer.kitestacks.com -> service https://portainer:9443 (HTTPS), enable "No TLS Verify". (This is in the Tunnel config UI, which Cloudflare happens to host under the "Zero Trust" nav section, but adding a Tunnel hostname is NOT enabling Zero Trust/Access - does not violate the no-Zero-Trust constraint.)
  2. In Portainer -> Settings -> Authentication -> OAuth (Provider: Custom), on BOTH monk's and kscloud1's SEPARATE Portainer instances, configure:
    • Client ID: portainer
    • Client Secret: wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF
    • Authorization URL: https://auth.kitestacks.com/application/o/authorize/
    • Access Token URL: https://auth.kitestacks.com/application/o/token/
    • Resource/Userinfo URL: https://auth.kitestacks.com/application/o/userinfo/
    • Redirect URL: https://portainer.kitestacks.com
    • Logout URL: https://auth.kitestacks.com/application/o/portainer/end-session/
    • Scopes: openid email profile, User identifier claim: email AFTER both steps done: update the live portal's Portainer card (in the 3 files above) from data-coming-soon="1" to a real href="https://portainer.kitestacks.com" target="_blank" rel="noopener" link.

App-level SSO status summary (end of 2026-06-10 session)

Grafana: working (pre-existing). Forgejo: working (pre-existing). Karakeep: fixed this session, working. OpenProject: blocked by EE license (terminal at app level). Portainer: Authentik side done, waiting on user's 2 manual steps above. Prometheus + Uptime Kuma: DEFERRED - neither has native OAuth, need a forward-auth proxy (oauth2-proxy or Authentik embedded outpost) - deferred per user's "ok lets do smaller app level" (hold new infra until Oracle VPS decided). Cloudflare itself: no SSO concept applicable (it's Cloudflare's own dashboard managed outside the lab login) - was always about the portal's Cloudflare card placement, see "Portal UI changes" note above.

Uptime Kuma + Authentik SSO resumed on monk (2026-06-15)

User confirmed the next task is setting up Uptime Kuma with Authentik SSO in the main KiteStacks lab, and explicitly requested saving progress to ~/claude-memory and pushing to the Forgejo kenpat/claude-memory repo as we go.

Verified current live state on monk before making changes:

  • uptime-kuma container is running and healthy, published on host port 3001, image louislam/uptime-kuma:latest.
  • Installed Uptime Kuma version inside the container is 1.23.17.
  • Uptime Kuma compose file is ~/kitestacks-live/docker/uptime-kuma/docker-compose.yml, using external Docker volume uptime-kuma:/app/data and networks default + external kitestacks.
  • Uptime Kuma SQLite DB path inside container is /app/data/kuma.db; tables include user, setting, monitor, heartbeat, status_page, notification, api_key, and related monitor/status tables. No obvious native OAuth/OIDC tables were present in the initial schema list.
  • Grafana is already configured for Authentik generic OAuth in ~/kitestacks-live/docker/grafana/docker-compose.yml with Authentik public authorize URL and internal token/userinfo URLs.
  • authentik is healthy; authentik-worker currently shows unhealthy in docker ps even though it has been running for ~35h. Check logs/health before relying on new Authentik-side automation.
  • Existing Authentik objects were found for Uptime Kuma:
    • Application slug uptime-kuma, name Uptime Kuma, provider id 7.
    • ProxyProvider Uptime Kuma, external host https://status.kitestacks.com, internal host http://uptime-kuma:3001, mode proxy.
    • Embedded proxy outpost already includes providers Karakeep, Uptime Kuma, and LiteLLM.
  • https://status.kitestacks.com still routes directly to Kuma as of 2026-06-15: public curl gets Kuma's /dashboard redirect and 200 response, not an Authentik authorization flow. Cloudflare tunnel route still needs to be changed from direct Kuma to the Authentik embedded outpost/server.
  • Security fix applied 2026-06-15: created PolicyBinding 6f2ac876-2f47-473d-986d-d7c5d2a3214e from the Uptime Kuma application to Authentik group homelab-admin, enabled, order 0. This matches the Portainer restriction pattern.
  • Cloudflared is remote-managed: container command is tunnel --no-autoupdate run, no local ingress config exists, and the compose file stores a TUNNEL_TOKEN. Do not print that token; treat it as sensitive. Routing changes must be made through Cloudflare's tunnel API/dashboard unless a suitable Cloudflare API token is available locally.
  • Local validation after the Authentik binding: curl -I -H 'Host: status.kitestacks.com' http://localhost:9001 returns 302 to https://status.kitestacks.com/outpost.goauthentik.io/start?..., proving the embedded outpost/proxy provider works when traffic reaches Authentik.
  • No suitable Cloudflare API token was found during the local search; only the cloudflared connector tunnel token is present. Remaining blocker is changing the Cloudflare Tunnel public hostname for status.kitestacks.com from http://uptime-kuma:3001 to http://authentik:9000 (or equivalent Authentik service target in the Tunnel UI).
  • Correction after user tested: user does NOT want front-door proxy behavior for Uptime Kuma. Desired UX is an in-app "single sign on" button on the Uptime Kuma login screen, like Grafana/Forgejo style native OAuth. Authentik proxy redirect is not acceptable for this requirement.
  • Confirmed in the installed Uptime Kuma 1.23.17 frontend: /app/src/components/Login.vue only renders username, password, remember-me, and login submit controls. No native OAuth/OIDC/SSO button exists in this version's login component, and local source search only found monitor OAuth client-credentials support, not app login SSO.
  • If staying on Uptime Kuma 1.23.17, revert Cloudflare route for status.kitestacks.com back to http://uptime-kuma:3001; otherwise users get Authentik first and then still see Kuma's local login. Native in-app SSO would require an Uptime Kuma version/plugin/fork with login OIDC support or custom app code, not the Authentik proxy provider.
  • User reset the Cloudflare route back to http://uptime-kuma:3001 and asked to continue with an in-app Authentik button. Upstream latest checked via GitHub API: Uptime Kuma latest release is 2.4.0 (published 2026-05-31) and upstream src/components/Login.vue still has only username/password login, no native OAuth/OIDC button. Proceeded with a custom overlay patch.
  • Custom native Authentik SSO overlay deployed on BOTH active tunnel backends (monk and kscloud1) so public load-balanced traffic behaves consistently:
    • monk path: ~/kitestacks-live/docker/uptime-kuma/
    • kscloud1 path: /opt/kitestacks/docker/uptime-kuma/
    • backend preload module: custom/server/authentik-sso.js
    • frontend mounted files: custom/dist/index.html, index.html.gz, index.html.br
    • compose now sets NODE_OPTIONS=--require /app/custom/server/authentik-sso.js, loads .env.sso, and bind-mounts the custom files over Kuma's built HTML.
  • Authentik native OAuth provider/application created:
    • OAuth2Provider name Uptime Kuma Native, provider id 12
    • Application slug uptime-kuma-native, name Uptime Kuma Native SSO
    • Client ID uptime-kuma-native
    • Redirect URI https://status.kitestacks.com/auth/authentik/callback
    • Restricted to Authentik group homelab-admin via PolicyBinding 2e1eaa95-b397-4c4f-bfc7-abb337906cf3
    • Client secret is stored only in each host's .env.sso; do not print it.
  • Custom flow behavior:
    • Login page injects a Sign in with Authentik button linking to /auth/authentik.
    • Backend starts Authentik OIDC, validates callback state, fetches userinfo, maps the login to existing Kuma user kenpat, issues Kuma's normal JWT, then redirects to /?authentik_token=<token>.
    • Frontend one-time script stores the JWT in localStorage.token, removes the URL token, and redirects to /dashboard, letting Kuma's normal loginByToken flow establish the session.
  • Verification 2026-06-15:
    • monk local /dashboard HTML contains Sign in with Authentik, /auth/authentik, and authentik_token.
    • kscloud1 local /dashboard HTML contains the same and /auth/authentik redirects to Authentik with client_id uptime-kuma-native.
    • Public repeated check: for i in 1 2 3 4 5 6; do curl -sSL --compressed https://status.kitestacks.com/dashboard | grep -q "Sign in with Authentik"; done returned button for all 6 attempts, confirming both active connectors serve the button.
  • Post-test screenshot showed Uptime Kuma login page with red banner "Lost connection to the socket server. Reconnecting..." after clicking the SSO button. Root cause: active-active JWT mismatch. Uptime Kuma JWTs include a signature using setting.jwtSecret; monk and kscloud1 had matching user password hashes but different JWT secrets, so a token minted by one backend failed if the browser's websocket connected to the other backend. Fixed 2026-06-15 by copying monk's exact jwtSecret into kscloud1's /app/data/kuma.db using base64 transport (avoid shell expansion of secret chars), then restarting kscloud1 Uptime Kuma. Verified both hashes now match: jwtSecret length 60, sha3 prefix FA67E6E9EDCC8E1D. Public button check still returns button 6/6. If a browser still has a pre-fix bad token in localStorage, clear site data or click the Authentik button again to mint a fresh token.
  • User retested and still saw the socket reconnect banner. Follow-up finding: public Uptime Kuma frontend was using Socket.IO's default long-polling-first transport. In the active-active Cloudflare Tunnel setup, polling requests can bounce between monk and kscloud1 before a socket session is established, causing reconnect loops before Kuma even logs Login by token.
  • Fix applied 2026-06-15 on BOTH monk and kscloud1: copied the built frontend bundle index-BBxTfFCS.js into the overlay and patched the minified socket call from Ze=Nc(n) to Ze=Nc(n,{transports:["websocket"]}). Regenerated .gz and .br variants and mounted all three over /app/dist/assets/index-BBxTfFCS.js* in both compose files. Restarted both Uptime Kuma containers.
  • Verification after websocket-only patch:
    • monk local asset contains transports:["websocket"]
    • kscloud1 local asset contains transports:["websocket"]
    • public repeated asset check over https://status.kitestacks.com/assets/index-BBxTfFCS.js found transports:["websocket"] 6/6, confirming both tunnel backends serve the patched client bundle.
  • User still saw the same issue after trying another browser. Follow-up: websocket connections were reaching Kuma, but logs showed no Login by token, so the handoff from Authentik callback to Kuma storage was unreliable. Changed the SSO callback from /?authentik_token=<jwt> URL handoff to a short-lived readable cookie uk_authentik_token plus redirect directly to /dashboard. Updated injected HTML to read that cookie before Kuma initializes, store the token in localStorage.token, set localStorage.remember=1, then delete the cookie. This avoids long-token URL handling.
  • Important operational gotcha: Uptime Kuma caches index.html in memory at startup. After changing the mounted index.html/compressed variants, docker compose up -d was not enough because containers stayed "Running"; had to run docker compose restart uptime-kuma on BOTH monk and kscloud1 to reload the HTML into memory.
  • Verification after cookie handoff + explicit restarts:
    • monk local /dashboard HTML contains uk_authentik_token, authentik_token, and Sign in with Authentik.
    • kscloud1 local /dashboard HTML contains the same.
    • public repeated check for uk_authentik_token over https://status.kitestacks.com/dashboard returned cookie-handoff 6/6.
  • User confirmed after retest: Uptime Kuma Authentik SSO button works.

Uptime Kuma monitors mirrored into Prometheus/Grafana (2026-06-15)

User asked to set up the same monitors currently in Uptime Kuma for Grafana and Prometheus. Existing Uptime Kuma monitor list at the time:

  • T14 Deb Assassin: ping 127.0.0.1
  • HomeRouter: ping 192.168.1.254
  • Google DNS: ping 8.8.8.8
  • TailScale: ping 100.90.13.55

Implemented on monk's live Prometheus/Grafana stack:

  • Added prom/blackbox-exporter service to ~/kitestacks-live/docker/prometheus/docker-compose.yml.
  • Added blackbox config ~/kitestacks-live/docker/prometheus/blackbox.yml with ICMP module (preferred_ip_protocol: ip4, timeout 5s).
  • Added Prometheus scrape job uptime-kuma-ping-probes in ~/kitestacks-live/docker/prometheus/prometheus.yml, using /probe with module=icmp and labels monitor_name matching the Uptime Kuma names.
  • Added Grafana provisioned dashboard ~/kitestacks-live/docker/grafana/provisioning/dashboards/kitestacks-uptime-probes.json titled KiteStacks Uptime Probes, with stat/timeseries panels for probe_success{job="uptime-kuma-ping-probes"} and probe_duration_seconds{job="uptime-kuma-ping-probes"}.
  • Ran docker compose up -d in the Prometheus directory, pulled/started blackbox-exporter, restarted Prometheus, and restarted Grafana.

Verification:

  • Prometheus config validates with promtool check config.
  • Prometheus active targets include all four uptime-kuma-ping-probes.
  • Query result for probe_success{job="uptime-kuma-ping-probes"}: Google DNS=1, T14 Deb Assassin=1, HomeRouter=0, TailScale=0. The two failures match Kuma's existing failing ping behavior from inside the container/network namespace.
  • Grafana logs show dashboard provisioning completed without dashboard errors (only unrelated bundled plugin permission warnings).

Desktop widget for the same monitor set (2026-06-15)

User asked for a Rainmeter-like desktop widget on Debian 13 that can show the same Uptime Kuma monitor state in real time.

Created a local Conky-based widget scaffold in the desktop user's home:

  • ~/.local/bin/kitestacks-uptime-widget.sh
  • ~/.config/conky/kitestacks-uptime.conf

Behavior:

  • Polls Prometheus for probe_success and probe_duration_seconds from the uptime-kuma-ping-probes job.
  • Defaults to http://192.168.1.205:9090, with PROM_URL override support.
  • Prints the four Kuma monitor names, state, latency, and a summary line.
  • Degrades cleanly with Prometheus unavailable at ... when the endpoint cannot be reached.

Note: Conky is the closest direct Rainmeter-style equivalent for Debian/Linux desktop widgets; eww is the more modern alternative if the desktop session is Wayland-first and the user prefers GTK/Rust widgets instead of a classic desktop overlay.

Debian 13 package note:

  • conky is a virtual package in trixie.
  • Install conky-all for the full desktop widget experience: sudo apt update && sudo apt install conky-all

Connectivity note:

  • The laptop could not reach Prometheus at 192.168.1.205:9090, which means the widget can only work from a host that can reach the homelab LAN or a public/tunneled Prometheus endpoint.
  • The existing KiteStacks docs mark Prometheus as excluded from the Cloudflare tunnel, so there is no known public Prometheus URL to target yet.
  • The desktop widget script now defaults to https://prometheus.kitestacks.com and can send CF-Access-Client-Id / CF-Access-Client-Secret headers if the hostname is protected by Cloudflare Access.

Cyberpunk widget styling:

  • Conky panel tuned to the wallpaper palette with black base and neon cyan/magenta accents.
  • Header uses #ff4df0 pink and #2de0ff blue.
  • Monitor rows color-code UP as cyan and DOWN as pink for fast scanning.
  • conky.text now uses execpi so the helper's parsed color markup renders as one combined widget instead of only the title line.
  • The screenshot also showed a separate default Conky panel on the left; that is not part of the uptime widget itself.
  • Added a unified Conky desktop config at ~/.conkyrc plus an autostart wrapper that kills stray Conky instances and launches the single combined panel.

Important security hygiene: local git remote for ~/claude-memory contains an HTTP token in the URL; do not print it in summaries. Prefer redacted URLs in handoffs.

Oracle VPS migration - PLANNED, upcoming (stated 2026-06-11)

User confirmed on 2026-06-11: "we are going to switch things soon from hetzner cloud to oracle soon." -> kscloud1 (Hetzner, 5.78.233.28) is intended to be REPLACED by an Oracle Cloud VPS in the near future ("soon", no firm date yet). Originally raised 2026-06-10 as exploratory ("how easy would it be to move everything to oracle vps after?"), now an actual plan. Implication: avoid investing further one-off/manual config work that's hard to redo (e.g. more one-time DB syncs, hand-edited sqlite, etc.) on kscloud1 if avoidable - prefer changes that are easy to replicate on a new host. When the Oracle VPS is provisioned, plan to follow the same pattern as the kscloud1 cloud-failover build-out (new Cloudflare Tunnel connector + full service replicas + shared Authentik/Postgres/Redis over Tailscale + the Forgejo FORGEJO_API_BASE-over-Tailscale pattern for the portal's Recent Activity, see "Recent Activity" fix below) - then retire kscloud1 the same way assassin/T14 was retired (decommission once Oracle replica verified working).

Prior migration gotchas (monk, kept for reference - see git history/old notes if needed)

  • rsync --files-from recursion bug, bind-mount postgres dirs come over empty as non-root (use pg_dumpall/pg_dump --clean from running container instead), pg_dumpall --clean across template1 breaks on client/server version mismatch (use single-db pg_dump+psql instead), grafana data dir needs chown 472:472, kite-litellm needed manual docker network connect kitestacks kite-litellm.

2026-06-12: SSO fixes + Portainer deployed on kscloud1

Root cause: monk reconnect race condition

When monk goes offline (user travels) and reconnects, Cloudflare starts routing some token exchange requests to monk while codes were created on kscloud1 during the offline window. Auth codes had a 60-second TTL, which expired before monk's Authentik fully started (~5 min startup). FIX: increased access_code_validity from minutes=1 to minutes=10 for ALL 9 OAuth2 providers in the shared Postgres DB. This gives enough buffer for monk's containers to start before codes expire. Command used (via python:3-alpine container): docker run --rm --network host -v /tmp/fix_auth.py:/fix.py python:3-alpine sh -c ... connecting to shared Postgres at 100.123.254.52.

Karakeep redirect_uri reverted and re-fixed

The Karakeep OAuth2Provider _redirect_uris had reverted back to the proxy pattern (/outpost.goauthentik.io/callback?...) instead of the correct NextAuth callback (https://links.kitestacks.com/api/auth/callback/custom). This caused "Redirect URI Error" from Authentik whenever SSO was attempted. Root cause unknown (possibly an Authentik blueprint or UI save that regenerated/overrode the field). FIX: same Postgres UPDATE pattern. WATCH: if this reverts again, check Authentik blueprints or if someone modified the Karakeep provider via the Authentik admin UI.

Portainer deployed on kscloud1

Created /opt/kitestacks/docker/portainer/docker-compose.yml (same image/config as monk's portainer). Container running as portainer, port 9443:9443, on kitestacks network. Volume is local (NOT shared with monk - fresh Portainer instance). STILL PENDING (user action in Cloudflare dashboard):

  • Tunnel ID: 5e60ea8e-a543-49b6-bab5-325f39441e00, Account: d0bb7673333fcd794622956f1662f785
  • Add hostname portainer.kitestacks.com → service https://portainer:9443, No TLS Verify STILL PENDING (user action in both Portainer UIs): configure OAuth (see prior notes in "Portainer SSO" section above for exact credentials). Portal card update (3 files) also still pending until tunnel+OAuth done.

Phase 2 Planned: Obsidian Mind Map → HTML Mind Map Sync

User wants to create an Obsidian mind map of the KiteStacks homelab that syncs/exports to a live HTML mind map embedded in the homelab portal or a standalone page. To be built after full Obsidian+samurai setup is complete.

2026-06-13: OpenProject removed + Oracle VPS migration started

OpenProject REMOVED permanently

OpenProject requires Enterprise Edition license for SSO (confirmed last session). Removed from local stack (monk):

  • Docker volume openproject_openproject_assets deleted
  • /home/kenpatmonk/kitestacks-live/docker/openproject/ directory removed (pgdata dir needed sudo — user ran manually; pgdata was owned by container UID mapped to avahi)
  • NOT deploying on Oracle VPS
  • tasks.kitestacks.com subdomain is now dead — update Cloudflare/portal accordingly TODO: remove apps/openproject/ from kitestacks-homelab Forgejo repo once user can log in.

Forgejo issues found + partially fixed (2026-06-13)

Forgejo login page has two issues:

  1. URL banner: "configured to be served on http://5.78.233.28:3000/" — caused by kscloud1's Forgejo having wrong ROOT_URL. kscloud1 Forgejo has only 1 repo (separate DB from monk's 13-repo instance). Cloudflare tunnel load-balances between monk and kscloud1 Forgejo. FIX PENDING: stop Forgejo on kscloud1 (or fix its ROOT_URL). Deferred — do during Oracle migration.
  2. SSO button says "Proceed with OpenID" instead of "Authentik". PARTIAL FIX: renamed login_source from authentikAuthentik via admin CLI: docker exec -u git forgejo /app/gitea/gitea admin auth update-oauth --id 1 --name Authentik ... Provider type remains openidConnect — button text may still say "OpenID" (depends on Forgejo 11 template behavior). User to verify after refresh. Full fix may require admin UI once user can log into Forgejo. Forgejo DB: 13 repos under kenpat, 1 user (kenpat, admin, active, no 2FA). Forgejo login: username kenpat, direct password login works on the same page.

kitestacks-homelab repo: apps/forgejo/docker-compose.yml has wrong ROOT_URL

FORGEJO__server__ROOT_URL=http://192.168.1.205:3006 — old local IP, never updated. The LIVE local stack (~/kitestacks-live/docker/forgejo/docker-compose.yml) is correct (https://gitforge.kitestacks.com/). The repo copy needs updating. TODO: fix and commit once user can log in and clone the repo.

Oracle VPS migration plan (kscloud1 → Oracle Cloud)

Goal: replace Hetzner kscloud1 (5.78.233.28, $14.50/mo) with Oracle Cloud ARM VPS ($8.50/mo). Oracle instance: Ampere A1 Flex, 4 OCPU / 24 GB RAM, Chicago region (us-chicago-1). Status as of 2026-06-13: user is provisioning — hit "no capacity" in Chicago. Workarounds tried: capacity not available for 4 OCPU config. Options:

  • Try smaller shape (1 OCPU / 6 GB), resize after provisioning
  • Subscribe to another region (Frankfurt, Osaka, Toronto have better A1 availability)
  • Keep retrying (capacity opens randomly, early UTC morning tends to be better)

ARM64 compatibility analysis (all images verified):

  • All services ARM64-compatible EXCEPT OSticket
  • OSticket (campbellsoftwaresolutions/osticket) — x86 only FIX: enable QEMU binfmt emulation on Oracle ARM host, run with --platform linux/amd64 Performance acceptable for a ticket system.
  • ⚠️ Shaarli — verify ARM64 at deploy time

Services to deploy on Oracle VPS (OpenProject EXCLUDED): authentik, bookstack, cloudflared, forgejo, grafana, homepage/portal, karakeep (+meilisearch +chrome), kavita, kite-ai (litellm+openwebui), linkding, osticket, portainer, prometheus+node-exporter, shaarli, uptime-kuma

Migration phases:

  1. Oracle VPS provisioning (in progress)
  2. Oracle initial setup: Ubuntu 22.04 ARM64, Docker, iptables flush (Oracle blocks by default), QEMU binfmt for OSticket x86 emulation
  3. Deploy full stack — fix Forgejo ROOT_URL correctly from day one
  4. Connect cloudflared on Oracle to KiteStacks tunnel (same TUNNEL_TOKEN)
  5. Verify all services, then remove kscloud1 from tunnel + cancel Hetzner NOTE: same active-active pattern as kscloud1 — shared Authentik Postgres+Redis over Tailscale, same TUNNEL_TOKEN, fresh DBs for stateful apps except identity (authentik/kavita). IMPORTANT Oracle gotcha: Ubuntu on Oracle has iptables rules that block all traffic at boot even after Security List rules are opened. Must flush iptables as part of initial setup.

osTicket deployed on monk + kscloud1 (found 2026-06-13/14, installed ~2026-06-12)

osTicket (campbellsoftwaresolutions/osticket image, x86 - runs natively on both hosts, no QEMU needed) + nginx proxy + MariaDB 10.11, under ~/kitestacks-live/docker/osticket/ (monk) and /opt/kitestacks/docker/osticket/ (kscloud1). tasks.kitestacks.com -> "KiteStacks Help Desk", verified HTTP 200. Admin: kenpat7177 / kenpat7177@gmail.com. Host ports: monk 8092:8080, kscloud1 8090:8080 (both nginx -> osticket-app:80). .env (OSTICKET_DB_PASS/ROOT/ADMIN_PASS/INSTALL_SECRET) is IDENTICAL on both hosts.

DB unification (2026-06-13/14) - same pattern as Authentik shared-DB fix

Both hosts originally had their OWN osticket-db (drift risk like pre-fix Kavita). Per user request ("database should be accessible from any computer"), unified onto kscloud1's osticket-db as canonical:

  • kscloud1 osticket-db: added ports: - "100.123.254.52:3306:3306" (Tailscale-only, matches authentik-postgres/redis pattern) to /opt/kitestacks/docker/osticket/docker-compose.yml, docker compose up -d.
  • monk: docker compose stop osticket-db (left stopped, NOT removed - rollback data intact in its volume). Edited ~/kitestacks-live/docker/osticket/docker-compose.yml: removed osticket-db service block, changed osticket-app's MYSQL_HOST=osticket-db -> MYSQL_HOST=100.123.254.52, removed depends_on: osticket-db. docker compose up -d osticket-app.
  • GOTCHA: after recreating osticket-app, the osticket nginx proxy container on monk returned 502 (cached stale upstream IP for osticket-app from its old container) - fixed with docker restart osticket. Apply this same restart on kscloud1's osticket nginx if its osticket-app is ever recreated.
  • Verified: both DBs had identical data before merge (1 ticket, 1 staff/kenpat7177) so no data loss either way. tasks.kitestacks.com returns 200 consistently post-merge.
  • Backups: docker-compose.yml.bak left in both hosts' osticket dirs.

osticket-capstone Forgejo repo (created 2026-06-13/14)

New private repo kenpat/osticket-capstone on gitforge (created via API using a scoped token claude-capstone-osticket generated via docker exec -u git forgejo /app/gitea/gitea admin user generate-access-token on monk's forgejo container - token has write:repository,write:user scopes). Holds redacted osTicket deployment config + Per Scholas capstone docs/evidence - see project-per-scholas-capstone. NOTE: gitforge.kitestacks.com is also active-active load-balanced (monk/kscloud1 separate forgejo DBs) - API calls against the public hostname can hit the wrong DB; use monk's local http://localhost:3006 for API operations tied to monk's forgejo data.

Remaining osTicket work

  • Authentik SSO plugin for osTicket staff/agent login (osTicket has no native OIDC, needs 3rd-party OAuth2/SAML plugin) - NOT YET DONE.
  • End-user ticket submission uses osTicket's native client portal signup (works out of the box, no SSO needed).

2026-06-14/15: Forgejo sync fixed + osTicket Authentik LDAP SSO complete

Forgejo sync (monk → kscloud1) - FIXED

  • Ran docker exec -u git forgejo /app/gitea/gitea dump on monk, scp'd to kscloud1
  • Restored: 13 repos + DB synced, ROOT_URL fixed on kscloud1 to https://gitforge.kitestacks.com/
  • kscloud1 Forgejo docker-compose updated (correct ROOT_URL + SSH port 2222)
  • Sync script: ~/kitestacks-live/docker/forgejo/sync-to-cloud.sh (rsync repos + DB dump)
  • Cron: 0 */6 * * * runs sync-to-cloud.sh, logs to /tmp/forgejo-sync.log
  • Authentik redirect URI fixed: updated _redirect_uris in shared Postgres from authentik/callbackAuthentik/callback (matched renamed Forgejo source name)

osTicket Authentik LDAP SSO - COMPLETE (2026-06-14/15)

Uses Authentik's LDAP outpost + osTicket's built-in auth-ldap.phar plugin.

Authentik side:

  • LDAPProvider "osTicket LDAP" (pk=11, base_dn=DC=ldap,DC=goauthentik,DC=io)
  • Application "osTicket LDAP" (slug=osticket-ldap, backchannel provider)
  • Outpost "osTicket LDAP Outpost" (pk=5c42f5ba-64bd-434e-a47f-7ce9da13227a)
  • Outpost service token: jjYRKWuGtoeq9r0qeifbCnXGHDjhCJU2MLnkCvMMduIGA1kQKz85qnt7u5Zf
  • ldap-svc user (search account): DN=cn=ldap-svc,ou=users,dc=ldap,dc=goauthentik,dc=io password=IlgQaxBPv9rdoq03CsoY53tH, member of homelab-admin group

Docker services added on monk:

  • ~/kitestacks-live/docker/authentik-ldap/docker-compose.yml
    • authentik-ldap (ghcr.io/goauthentik/ldap:2025.2.4) on kitestacks+osticket_default networks
    • authentik-ldap-proxy (alpine/socat) bridges port 389→3389 on osticket_default so osticket-app can reach standard LDAP port without phar URI workaround

Docker services added on kscloud1:

  • /opt/kitestacks/docker/authentik-ldap/docker-compose.yml
    • Same authentik-ldap container, bound to 100.123.254.52:3389 (Tailscale) + 127.0.0.1:3389

auth-ldap.phar patches (3 patches applied, original backed up as auth-ldap.phar.orig):

  1. authentication.php - getConnection(): adds binddn/bindpw from plugin config to Net_LDAP2 params so initial connect uses credentials (not anonymous, which Authentik rejects)
  2. config.php - validation block: sets include_path to phar's include dir before require_once Net/LDAP2.php so sub-files resolve correctly in FPM context
  3. ALL include/Net/LDAP2/*.php files: guards require_once 'PEAR.php' with if (!class_exists('PEAR', false)) to prevent fatal conflict between osTicket's /include/pear/PEAR.php and PHP global /usr/local/lib/php/PEAR.php

osTicket LDAP plugin config (namespace plugin.2 in ost_config):

  • servers: authentik-ldap-proxy (via socat on port 389)
  • bind_dn: cn=ldap-svc,ou=users,dc=ldap,dc=goauthentik,dc=io
  • bind_pw: encrypted with Crypto::encrypt(pass, SECRET_SALT, 'plugin.2')
  • search_base: ou=users,dc=ldap,dc=goauthentik,dc=io
  • schema: auto, auth-staff: 1, auth-client: 0, domain: ldap.goauthentik.io

Staff login: username=kenpat7177, password=Authentik password (reset to KiteStacks2026!) on tasks.kitestacks.com/scp/login.php

Per Scholas IT Support Capstone - IN PROGRESS

See project-per-scholas-capstone. Next steps:

  • Create capstone incident tickets in osTicket (5-phase challenge)
  • Set up osTicket user/client portal for non-staff users (Phase 3 end-user access)
  • Each capstone ticket maps to a phase scenario (migration event, incident response, etc.)