claude-memory/project-kitestacks-migration.md

850 lines
54 KiB
Markdown

---
name: project-kitestacks-migration
description: "Migration of the live KiteStacks homelab/website from assassin (T14) to monk — COMPLETE. Plus full Hetzner cloud failover (kscloud1, 5.78.233.28) — COMPLETE. All 9 subdomains can run from any single host. Plus 2026-06-10 portal/SSO push: portal FluxCD+coming-soon changes deployed, Karakeep SSO fixed, OpenProject SSO blocked by EE license, Portainer SSO Authentik-side done (pending user manual steps)."
metadata:
node_type: memory
type: project
originSessionId: 33992890-3940-4d4a-a94a-22b5621e9c1a
---
## STATUS: MIGRATION + CLOUD FAILOVER COMPLETE (2026-06-10)
monk is the live production host. assassin (T14) is OFF. kscloud1 (Hetzner VPS,
5.78.233.28) is now a THIRD active Cloudflare Tunnel connector and runs a FULL
replica of all 9 services, so the site stays up even if both monk and assassin
are off (verified by user testing with home wifi off, from phone + mom's phone).
All 9 public subdomains (www, ai, auth, gitforge, grafana, kavita, links, status, tasks)
verified returning correct status codes via the live tunnel with kscloud1 in rotation.
## Governing principle (user's explicit words)
"leave the cloud backup on at all times" / "thats the point of it. if I am
travelling my site will go down otherwise." -> kscloud1 runs PERMANENTLY as a
3rd connector, NOT cold standby. Cloudflare Tunnel load-balances ACTIVE-ACTIVE
across all 3 connectors (no primary/backup priority). This means stateful apps
(gitforge, openproject, authentik, karakeep, kavita, openwebui, etc.) may show
DIFFERENT/STALE data depending on which connector serves a given request -
EXPLICITLY ACCEPTED by user as the cost of guaranteed uptime. Fresh/separate
databases on kscloud1 are fine; do not try to sync data between monk and kscloud1.
## kscloud1 access
SSH: `ssh -i ~/.ssh/id_ed25519_kscloud1 kenpat@5.78.233.28` (passwordless, key auth).
sudo needs a password ("p12217177") and has no askpass helper - avoid sudo;
most things doable as kenpat or via docker.
All services live under `/opt/kitestacks/docker/<service>/docker-compose.yml`,
same one-dir-per-app pattern as monk's `~/kitestacks-live/docker/`.
## kscloud1 services deployed (all `docker compose up -d`, joined to local `kitestacks` network)
- cloudflared (3rd tunnel connector, same TUNNEL_TOKEN, connector id 78521d9f-71c0-4e3d-992f-bd1f77da1a8f)
- homepage-backup (alias `homepage`) + caddy + kitestacks-metrics-api-backup - PRE-EXISTING
- forgejo (alias `forgejo`) - PRE-EXISTING, separate DB from monk's (gitforge data inconsistent across connectors, accepted)
- prometheus + node-exporter (job `kscloud1-node`)
- grafana (alias `grafana`, port 3150) - Prometheus datasource (uid 000000001) + "Node Exporter Full"
dashboard (id 1860) provisioned via `./provisioning/`. OAuth->authentik config present but
authentik on kscloud1 has a FRESH db (no provider apps configured) so OAuth login won't work
there; local admin login works.
- uptime-kuma (alias `status`->`uptime-kuma`) - kuma.db seeded by copying monk's admin user
(same login: kenpat / same password hash) + monitors: kscloud1 self-ping, Google DNS, and
HTTP checks for all 9 *.kitestacks.com subdomains (external monitoring of the live site).
- kavita (alias `kavita`) - empty library (fresh)
- karakeep + karakeep-chrome + karakeep-meilisearch (alias `karakeep`) - fresh meilisearch/db
- authentik + authentik-worker + authentik-postgres + authentik-redis (alias on `auth`) - FRESH DB.
Bootstrap admin: `akadmin@kitestacks.com` / password `6KlYpfCyYxbnKQNiOewN` (set via
AUTHENTIK_BOOTSTRAP_PASSWORD in .env). No OAuth provider apps exist yet (would need to be
manually recreated in authentik UI for grafana/openwebui/karakeep/openproject SSO to work
when kscloud1 is the active backend).
- kite-litellm + kite-openwebui (alias `ai`->openwebui) - same .env/secrets as monk. OpenWebUI
has `ENABLE_SIGNUP=true` (changed from monk's `false`) so kenpat can create a local admin
account on first use, since authentik OAuth won't work with kscloud1's fresh authentik.
- openproject (alias on `tasks`, port 8090:80 host - port 80 was taken by caddy) - FRESH db,
self-initialized via the all-in-one image's bootstrap (took ~3-4 min). Empty/no projects yet.
## monk-side changes made for cross-host monitoring
- `~/kitestacks-live/docker/prometheus/prometheus.yml`: added scrape job
`kscloud1-node` -> `5.78.233.28:9100` (kscloud1's node-exporter is exposed
0.0.0.0:9100, no firewall - reachable from monk's public IP). monk's grafana
(the live one, "Node Exporter Full" dashboard now provisioned via
`~/kitestacks-live/docker/grafana/provisioning/`) shows BOTH `t14-node`
(monk/"this pc") and `kscloud1-node` ("the cloud") via the instance picker.
- kscloud1's prometheus only scrapes itself (`kscloud1-node`) - monk is behind
home NAT, not reachable from kscloud1.
## Resource notes (kscloud1: 3 vCPU, 3.7GB RAM + 6GB swap, 75GB disk)
With all services running: ~2.8-2.9GB RAM used, ~2.6-2.8GB swap used (of 6GB),
~835MB-1.2GB "available", disk 29GB/75GB used. Site is functional but under
memory pressure - if BOTH monk and assassin are down for an extended period
with real concurrent usage, expect sluggishness (esp. openproject/authentik/
openwebui). Not yet stress-tested under real failover load.
## Key gotchas from THIS phase (cloud failover build-out)
- kscloud1's `kitestacks` Docker network is LOCAL/separate from monk's (same name,
no conflict). cloudflared on each host resolves container names against its
own host's network.
- Adding a new tunnel connector that lacks a backend for an ingress hostname ->
502 for requests routed there. If it has a DIFFERENT backend (e.g. forgejo) ->
serves different data inconsistently. Both accepted/expected now that all 9
hostnames have backends on kscloud1.
- port 80 on kscloud1 is owned by `caddy` (serves www-backup/git-backup.kitestacks.com
direct A-records, pre-existing, unrelated to the tunnel) - openproject uses 8090:80
for its host port instead (internal container port 8080 is what cloudflared hits).
- uptime-kuma / grafana have no simple file-based config API for monitors/datasources
beyond grafana provisioning - used direct sqlite manipulation (`docker exec ... sqlite3`,
or python3 sqlite3 module via a throwaway `python:3-alpine` container with the volume
mounted) to seed uptime-kuma's kuma.db with users/monitors.
- authentik first boot takes ~1-2 min (migrations); openproject first boot takes
~4-5 min (postgres initdb + Rails migrations + Puma boot), watch `docker logs`
for "Listening on http://0.0.0.0:8080" before testing.
## Authentik/Kavita login fix (2026-06-10, post cloud-failover)
PROBLEM: Cloudflare Tunnel load-balances auth./kavita. active-active across monk
and kscloud1. kscloud1's authentik had only the fresh `akadmin` bootstrap user
(not kenpat7177) and kscloud1's kavita had ZERO users -> ~50% of requests showed
"wrong password" on authentik and a "create admin account" (signup) screen on
kavita instead of login. This contradicts the earlier "fresh DBs are fine"
assumption - for IDENTITY apps it breaks login, so it was NOT acceptable.
FIX APPLIED (one-time sync, same pattern as uptime-kuma's kuma.db seed):
- pg_dump'd monk's authentik-postgres `authentik` db (--clean --if-exists),
scp'd to kscloud1, stopped authentik+authentik-worker on kscloud1, restored
via `docker exec -i authentik-postgres psql -U authentik -d authentik < dump`,
restarted. Worked cleanly because AUTHENTIK_SECRET_KEY and PG_PASS were
ALREADY IDENTICAL between monk's and kscloud1's authentik/.env.
- For kavita: copying the raw kavita.db file via plain `cp` produced
"database disk image is malformed" (WAL-mode db isn't standalone-consistent
as a flat file copy even when -wal/-shm look small). FIX: use python3
sqlite3 `Connection.backup()` (via throwaway python:3-alpine container) to
produce a consistent copy, THEN on kscloud1 stop kavita, rm the OLD
kavita.db-shm/kavita.db-wal too (stale WAL files against new db = same
corruption error), copy in the new kavita.db (chown root:root, chmod 644
to match original ownership - kavita container runs as root), restart.
- Result: kscloud1 authentik now has kenpat7177 (matches monk), kscloud1
kavita now has kenpat7177 + acurrie (matches monk). Both connectors now
return the same login screen/credentials. NOTE: this is a ONE-TIME sync,
not continuous - if monk's users/passwords change later, kscloud1 will
drift again and the same symptoms could return; re-run this sync if so.
- kscloud1 kavita's library entries point at /books paths that don't exist on
kscloud1 (no actual book files there) - login works fine, but browsing the
library when served by kscloud1 will show entries with missing files. Same
"stale data" tradeoff as gitforge, accepted.
## Authentik shared Postgres+Redis over Tailscale (2026-06-10) - fixes "invalid_grant" SSO
PROBLEM: Even after the one-time DB sync above, "Sign in with Authentik" on
Kavita could fail with "invalid_grant" / "Code does not exist". Root cause:
monk and kscloud1 each ran their OWN authentik-postgres. OAuth2 authorization
codes are short-lived per-flow rows in `authentik_providers_oauth2_authorizationcode`
- if Cloudflare Tunnel's active-active routing sends `/authorize` to one
connector and `/application/o/token/` to the other, the code only exists in
one of the two DBs -> invalid_grant. A one-time data sync can't fix this
because the data is created fresh on every login attempt.
FIX: Converted to a single shared Postgres+Redis (HA pattern), hosted on
kscloud1, reachable ONLY over Tailscale:
- Installed Tailscale on both monk and kscloud1 (same tailnet). kscloud1's
tailscale IP is `100.123.254.52`.
- kscloud1's `/opt/kitestacks/docker/authentik/docker-compose.yml`:
authentik-postgres now binds `100.123.254.52:5432:5432` (was unbound/internal-only),
authentik-redis now binds `100.123.254.52:6379:6379`. Both still also reachable
on the local `kitestacks` docker network for kscloud1's own authentik+worker.
Backup of pre-change file: `docker-compose.yml.backup-before-shared-db-20260610-1138`.
- monk's `~/kitestacks-live/docker/authentik/docker-compose.yml`: REMOVED the
`postgresql` and `redis` services entirely. monk's `authentik`/`authentik-worker`
now point `AUTHENTIK_POSTGRESQL__HOST` and `AUTHENTIK_REDIS__HOST` at
`100.123.254.52` (kscloud1 over Tailscale), using the same `PG_PASS` /
`AUTHENTIK_SECRET_KEY` as before (already identical between hosts).
- monk's old local `authentik-postgres`/`authentik-redis` containers were
STOPPED (not removed) - data dirs preserved under
`~/kitestacks-live/docker/authentik/postgres` in case of rollback, but no
longer in use.
- Result: BOTH connectors' authentik+worker now read/write the SAME db/redis,
regardless of which one handles `/authorize` vs `/application/o/token/`.
Verified both `authentik`+`authentik-worker` healthy on monk and kscloud1,
OIDC discovery docs identical, user list matches (`kenpat7177` etc.) on both.
CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" on Kavita now works
(when monk's connector serves the request).
## Kavita "Sign in with Authentik" button missing on kscloud1 - FIXED 2026-06-10
After the shared-Authentik-DB fix above, the button still didn't appear when
Cloudflare routed kavita.kitestacks.com to kscloud1's connector. CAUSE: Kavita's
OIDC config lives in ITS OWN db (kavita.db `ServerSetting` table, Key=40, a JSON
blob with Authority/ClientId/Secret/Enabled), separate from Authentik's db.
The earlier one-time kavita.db sync (see fix above) was taken BEFORE OIDC SSO
was configured in monk's Kavita, so kscloud1's copy had Key=40 with empty
Authority/Secret and `"Enabled":false`. FIX: copied monk's Key=40 JSON value
verbatim into kscloud1's kavita.db (stop kavita, `docker run --rm -v
.../kavita/config:/data -v fix.sql:/fix.sql alpine` + apk sqlite + `sqlite3
/data/kavita.db < fix.sql` with `UPDATE ServerSetting SET Value='...' WHERE
"Key"=40`, restart kavita). NOTE: AspNetUserLogins (OIDC account-linkage table)
is empty on BOTH monk and kscloud1 - Kavita creates this row on first OIDC
login per-instance (matches existing local user by email since
ProvisionAccounts=false), so no extra action needed there.
GOTCHA: ServerSetting's PK column is `"Key"` (INTEGER), not `Id` - must quote
it in sqlite (`"Key"`) since KEY is a SQL reserved word.
DRIFT WARNING: any future Kavita server-setting change (OIDC config, library
paths, etc.) made on monk will NOT propagate to kscloud1's kavita.db
automatically - same one-time-sync caveat as the user-table sync above.
UPDATE 2026-06-10 (RESOLVED via Kavita UI, not direct DB edit): Direct SQL
edits to ServerSetting Key=40 got WIPED back to disabled/empty by Kavita on
every container restart (RowVersion incremented +2 each time, Authority/Secret
cleared, Enabled->false) - confirmed twice, even with a full WAL-consistent
kavita.db replace from monk. Direct DB writes to this table do NOT survive a
restart; only saves through Kavita's own Settings UI/API persist correctly.
FIX: opened an SSH local port-forward (`ssh -L 5099:localhost:5000
kenpat@5.78.233.28`) so the user could reach kscloud1's Kavita directly at
http://localhost:5099 (bypassing the Cloudflare load-balanced domain), logged
in with their normal kenpat7177 Kavita password, and re-entered the OIDC
config in Settings -> OIDC:
- Authority: `https://auth.kitestacks.com/application/o/kavita/`
(MUST include trailing slash - Kavita validates that this exactly matches
the `issuer` claim in Authentik's `.well-known/openid-configuration`,
which has a trailing slash. Without it: "Kavita can load the OIDC
configuration, but the issuer does not match".)
- Client ID: `kavita`, Client Secret: (96-hex-char secret from Authentik's
Kavita OAuth2 provider - watch for copy/paste truncation, verify length=96)
- Enabled: true, ProviderName: authentik
Saved via UI (RowVersion 8->12->14 across two saves to fix a 1-char-truncated
secret), then `docker compose restart kavita` on kscloud1 - config SURVIVED
this restart (unlike the direct-SQL attempts) and `/api/settings/oidc` now
reports `"enabled": true`. SSH tunnel closed afterward (no firewall changes
were made/needed). Set a temporary ApiKey on kenpat7177's kscloud1 kavita
account during troubleshooting (for a Plugin/authenticate attempt that turned
out to return 401 / unused) - left in place, harmless (grants API access to
that user's own account only).
TAKEAWAY FOR FUTURE KAVITA CONFIG CHANGES ON KSCLOUD1: always use the Kavita
UI (via SSH port-forward to localhost:5000) rather than direct sqlite edits -
direct edits to ServerSetting do not survive a restart.
CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" now works on Kavita
regardless of which connector (monk/kscloud1) answers.
## Kavita cover images missing on kscloud1 - FIXED 2026-06-10
After the kavita.db sync from monk, kscloud1's db referenced cover image files
(e.g. `v1_c1.png`..`v10_c10.png` in `ServerSetting`/`Series.CoverImage`) that
didn't exist on kscloud1's filesystem - kscloud1's `config/covers/` dir was
empty (monk has 9 files, ~1.3MB). Result: book/series cover thumbnails didn't
load when kscloud1 served the request. FIX: tar'd monk's
`~/kitestacks-live/docker/kavita/config/covers/` (owned 1000:1000), scp'd to
kscloud1, extracted into `/opt/kitestacks/docker/kavita/config/covers/` via a
throwaway alpine container, `chown -R 1000:1000`. No kavita restart needed -
covers are served as static files from disk. CONFIRMED BY USER: covers now
load correctly.
NOTE: this is another one-time sync (same drift caveat) - if new books/covers
are added on monk later, they won't appear on kscloud1 unless re-synced
(covers/ dir + kavita.db + actual book files under library/books, none of
which exist on kscloud1 per the earlier "stale data" note).
SECURITY NOTE: postgres/redis on kscloud1 are bound to the Tailscale interface
IP only (100.123.254.52), not 0.0.0.0 - not exposed to the public internet.
ROLLBACK: if Tailscale connectivity ever breaks, monk's authentik will fail to
start (can't reach 100.123.254.52). To roll back: restore monk's
docker-compose.yml from git/backup to use local postgresql/redis services
again, restart monk's old authentik-postgres/authentik-redis containers
(`docker start authentik-postgres authentik-redis` in
~/kitestacks-live/docker/authentik), `docker compose up -d`. Note this would
mean monk's authentik db is now STALE (kscloud1's shared db has any logins/
changes since 2026-06-10) - would need a fresh pg_dump sync from kscloud1 first.
## kscloud1 ufw blocks docker-bridge -> host port 8000 (metrics API) - FIXED 2026-06-10
kscloud1 has ufw active with `default deny incoming/routed`. The
kitestacks-metrics-api-backup container (network_mode: host, binds 0.0.0.0:8000)
was unreachable from homepage-backup via `host.docker.internal:8000` (TCP
timeout, not refused -> ufw drop), causing the homepage System Status widget to
show 0%/"Offline" when kscloud1 served the request. FIXED by adding:
`sudo ufw allow from 172.16.0.0/12 to any port 8000 proto tcp` (covers all
docker bridge subnets on this host: 172.17-172.29.x.x). Verified
homepage-backup -> host.docker.internal:8000/api/metrics now returns real
CPU/RAM/storage/network data. kscloud1 sudo password is in the "kscloud1
access" section above - needed `echo PASS | sudo -S <cmd>` (no askpass helper,
non-interactive sudo via -S works fine).
## Portal SSO/coming-soon push + Karakeep fix + OpenProject EE blocker + Portainer SSO setup (2026-06-10)
Per user's original request: SSO for grafana/prometheus/portainer/cloudflare/uptime-kuma
(authentik=source), KITEAI/LITELLM/OPENROUTER -> "coming soon", SSO for openproject/forgejo/
karakeep, FluxCD card in AI&Automation. Hard constraint: user does NOT use Cloudflare Zero
Trust/Access ("costs money") - any Cloudflare work must avoid those products (note: the
Tunnel UI itself lives under Cloudflare's "Zero Trust" dashboard section, but configuring a
Tunnel public hostname there is NOT the same as enabling Zero Trust/Access - fine to use).
### Portal UI changes - DEPLOYED to all 3 copies, verified live
Edited the AI & AUTOMATION panel (`cards cards-3` -> `cards cards-2`, now 2x2):
Kite AI and OpenRouter cards changed from external links to
`href="#" data-coming-soon="1"` (LiteLLM was already coming-soon); added a 4th
card "FluxCD" / "GitOps Automation" using `/images/icons/fluxcd.png`, also
coming-soon (automation scripts with FluxCD+Prometheus+Grafana+node-exporter
are a future project). Applied identically to:
- `~/kitestacks-live/docker/kitestacks-portal-test/public/index.html` (monk, dev, port 3008)
- `~/kitestacks-live/docker/kitestacks-portal/public/index.html` (monk, LIVE, served by
"homepage" container 3005->3000 - this is the file that backs www.kitestacks.com)
- `/opt/kitestacks/docker/www-backup/kitestacks-portal/public/index.html` (kscloud1,
served by `homepage-backup` port 3015)
Verified `https://www.kitestacks.com` returns "FluxCD" consistently (6/6 requests
across both connectors).
NOTE: Portainer card on the live portal is currently `data-coming-soon="1"` -
update this to a real `href="https://portainer.kitestacks.com"` link (remove
data-coming-soon) once the Portainer SSO manual steps below are completed.
NOTE 2: "cloudflare should all be in the networking side" from the original
request was never resolved - Cloudflare card is still in the INFRASTRUCTURE
panel, not moved/renamed to a "NETWORKING" panel. Ambiguous, deprioritized,
not revisited.
### Karakeep SSO redirect_uri fix - DONE, confirmed working
Karakeep uses NextAuth.js with provider id "custom" (not "authentik") - actual
OAuth callback path is `/api/auth/callback/custom`, but Authentik's Karakeep
OAuth2Provider's `_redirect_uris` had the wrong path -> "Redirect URI Error".
FIX: direct Postgres UPDATE to
`authentik_providers_oauth2_oauth2provider._redirect_uris` (JSON column) on
the shared kscloud1 authentik-postgres (100.123.254.52), wrapped in explicit
`BEGIN; UPDATE ...; COMMIT;` (a bare single-statement -c "UPDATE..." reported
"UPDATE 1" but did NOT persist on first attempt - cause unclear, explicit
transaction fixed it). After the DB write, restarted authentik+authentik-worker
on BOTH monk and kscloud1 and polled
`docker inspect --format '{{.State.Health.Status}}'` until both reported
"healthy" (~50s) before retesting - first retest hit a transient 502 because
kscloud1's authentik was still "starting". CONFIRMED: Authentik now serves the
login page (not "Redirect URI Error") for Karakeep SSO.
PG_PASS GOTCHA: `~/kitestacks-live/docker/authentik/.env` PG_PASS value ends in
`=` - extract with `cut -d= -f2-` (NOT `-f2`, which truncates the trailing `=`
and causes "password authentication failed").
REUSABLE PATTERN for any future direct Authentik DB edit: (1) wrap writes in
explicit BEGIN/COMMIT, (2) restart authentik+authentik-worker on BOTH monk and
kscloud1, (3) wait for health=healthy on both before testing.
### OpenProject SSO - config bug fixed, but BLOCKED by Enterprise licensing (no further action possible)
`~/kitestacks-live/docker/openproject/docker-compose.yml` env vars were wrong in
two ways: (1) extra "PROVIDERS_" segment in var names caused
`seed_oidc_provider = {"providers": {"authentik": {...}}}` instead of
`{"authentik": {...}}`, producing a broken stub provider record (slug=
"providers", id=1, since deleted via Rails runner); (2) `discovery_endpoint`
isn't read by `ConfigurationMapper` at all - replaced with explicit
ISSUER/AUTHORIZATION__ENDPOINT/TOKEN__ENDPOINT/USERINFO__ENDPOINT/
END__SESSION__ENDPOINT/JWKS__URI vars (current docker-compose.yml has the
corrected version, see file - all derived from
`https://auth.kitestacks.com/application/o/openproject/.well-known/openid-configuration`).
After fixing both, the seeder correctly creates provider slug="authentik",
available=true, all fields correct - BUT the SSO button still does not appear
on /login. CONFIRMED ROOT CAUSE (terminal, source-code-verified): OpenProject
CE 2025/v15's OmniAuth SSO strategy
(`OpenProject::Plugins::AuthPlugin`/`OpenIDConnect`) AND SAML
(`auth_saml/lib/open_project/auth_saml/engine.rb`, `enterprise_feature:
"sso_auth_providers"`) are BOTH gated behind an Enterprise Edition license -
"OmniAuth SSO strategy ... is only available for Enterprise Editions". No
app/config-level workaround exists. Only remaining options: buy EE license, OR
put a forward-auth proxy (oauth2-proxy / Authentik embedded outpost) in front
of OpenProject - DEFERRED along with Prometheus/Uptime Kuma proxy work (see
below) until Oracle VPS topology is decided.
OpenProject container is healthy, `/login` returns 200, no projects yet.
### Portainer SSO - Authentik side DONE, two manual steps PENDING (not yet done by user)
Per user: "yes continue with portainer" / "yes but make sure it is still
secure" (approved exposing Portainer publicly via a NEW Cloudflare Tunnel
hostname, with explicit requirement to keep it secure -> access restricted to
the `homelab-admin` Authentik group).
Created via `docker exec authentik ak shell` (Django ORM, no Authentik API
token configured) on kscloud1's shared authentik-postgres:
- OAuth2Provider "Portainer": client_id=`portainer`,
client_secret=`wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF`,
provider_id=9, redirect_uri=`https://portainer.kitestacks.com` (strict),
scopes openid/email/profile, sub_mode=user_email, signing key + flows copied
from existing providers (same pattern as Karakeep/Grafana).
- Application "Portainer" (slug="portainer", meta_launch_url=
`https://portainer.kitestacks.com`).
- PolicyBinding restricting the Portainer application to Authentik group
`homelab-admin` (UUID e21b0aa5-62e7-4b3a-8302-130b0ae148a5) - this is the
"make sure it is still secure" piece (only homelab-admin members can SSO in).
- Verified discovery doc resolves:
`https://auth.kitestacks.com/application/o/portainer/.well-known/openid-configuration`.
PENDING MANUAL STEPS (user must do via UI - confirmed `portainer.kitestacks.com`
still returns `000` as of 2026-06-10):
1. Cloudflare dashboard -> Tunnel -> add Public Hostname `portainer.kitestacks.com`
-> service `https://portainer:9443` (HTTPS), enable "No TLS Verify". (This is
in the Tunnel config UI, which Cloudflare happens to host under the "Zero
Trust" nav section, but adding a Tunnel hostname is NOT enabling Zero
Trust/Access - does not violate the no-Zero-Trust constraint.)
2. In Portainer -> Settings -> Authentication -> OAuth (Provider: Custom), on
BOTH monk's and kscloud1's SEPARATE Portainer instances, configure:
- Client ID: `portainer`
- Client Secret: `wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF`
- Authorization URL: `https://auth.kitestacks.com/application/o/authorize/`
- Access Token URL: `https://auth.kitestacks.com/application/o/token/`
- Resource/Userinfo URL: `https://auth.kitestacks.com/application/o/userinfo/`
- Redirect URL: `https://portainer.kitestacks.com`
- Logout URL: `https://auth.kitestacks.com/application/o/portainer/end-session/`
- Scopes: `openid email profile`, User identifier claim: `email`
AFTER both steps done: update the live portal's Portainer card (in the 3 files
above) from `data-coming-soon="1"` to a real
`href="https://portainer.kitestacks.com" target="_blank" rel="noopener"` link.
### App-level SSO status summary (end of 2026-06-10 session)
Grafana: working (pre-existing). Forgejo: working (pre-existing). Karakeep:
fixed this session, working. OpenProject: blocked by EE license (terminal at
app level). Portainer: Authentik side done, waiting on user's 2 manual steps
above. Prometheus + Uptime Kuma: DEFERRED - neither has native OAuth, need a
forward-auth proxy (oauth2-proxy or Authentik embedded outpost) - deferred per
user's "ok lets do smaller app level" (hold new infra until Oracle VPS decided).
Cloudflare itself: no SSO concept applicable (it's Cloudflare's own dashboard
managed outside the lab login) - was always about the portal's Cloudflare card
placement, see "Portal UI changes" note above.
### Uptime Kuma + Authentik SSO resumed on monk (2026-06-15)
User confirmed the next task is setting up Uptime Kuma with Authentik SSO in
the main KiteStacks lab, and explicitly requested saving progress to
`~/claude-memory` and pushing to the Forgejo `kenpat/claude-memory` repo as we
go.
Verified current live state on monk before making changes:
- `uptime-kuma` container is running and healthy, published on host port
`3001`, image `louislam/uptime-kuma:latest`.
- Installed Uptime Kuma version inside the container is `1.23.17`.
- Uptime Kuma compose file is
`~/kitestacks-live/docker/uptime-kuma/docker-compose.yml`, using external
Docker volume `uptime-kuma:/app/data` and networks `default` + external
`kitestacks`.
- Uptime Kuma SQLite DB path inside container is `/app/data/kuma.db`; tables
include `user`, `setting`, `monitor`, `heartbeat`, `status_page`,
`notification`, `api_key`, and related monitor/status tables. No obvious
native OAuth/OIDC tables were present in the initial schema list.
- Grafana is already configured for Authentik generic OAuth in
`~/kitestacks-live/docker/grafana/docker-compose.yml` with Authentik public
authorize URL and internal token/userinfo URLs.
- `authentik` is healthy; `authentik-worker` currently shows unhealthy in
`docker ps` even though it has been running for ~35h. Check logs/health
before relying on new Authentik-side automation.
- Existing Authentik objects were found for Uptime Kuma:
- Application slug `uptime-kuma`, name `Uptime Kuma`, provider id `7`.
- ProxyProvider `Uptime Kuma`, external host `https://status.kitestacks.com`,
internal host `http://uptime-kuma:3001`, mode `proxy`.
- Embedded proxy outpost already includes providers `Karakeep`,
`Uptime Kuma`, and `LiteLLM`.
- `https://status.kitestacks.com` still routes directly to Kuma as of
2026-06-15: public curl gets Kuma's `/dashboard` redirect and 200 response,
not an Authentik authorization flow. Cloudflare tunnel route still needs to
be changed from direct Kuma to the Authentik embedded outpost/server.
- Security fix applied 2026-06-15: created PolicyBinding
`6f2ac876-2f47-473d-986d-d7c5d2a3214e` from the Uptime Kuma application to
Authentik group `homelab-admin`, enabled, order 0. This matches the Portainer
restriction pattern.
- Cloudflared is remote-managed: container command is `tunnel --no-autoupdate
run`, no local ingress config exists, and the compose file stores a
`TUNNEL_TOKEN`. Do not print that token; treat it as sensitive. Routing
changes must be made through Cloudflare's tunnel API/dashboard unless a
suitable Cloudflare API token is available locally.
- Local validation after the Authentik binding: `curl -I -H 'Host:
status.kitestacks.com' http://localhost:9001` returns `302` to
`https://status.kitestacks.com/outpost.goauthentik.io/start?...`, proving
the embedded outpost/proxy provider works when traffic reaches Authentik.
- No suitable Cloudflare API token was found during the local search; only the
cloudflared connector tunnel token is present. Remaining blocker is changing
the Cloudflare Tunnel public hostname for `status.kitestacks.com` from
`http://uptime-kuma:3001` to `http://authentik:9000` (or equivalent
Authentik service target in the Tunnel UI).
- Correction after user tested: user does NOT want front-door proxy behavior
for Uptime Kuma. Desired UX is an in-app "single sign on" button on the
Uptime Kuma login screen, like Grafana/Forgejo style native OAuth. Authentik
proxy redirect is not acceptable for this requirement.
- Confirmed in the installed Uptime Kuma 1.23.17 frontend:
`/app/src/components/Login.vue` only renders username, password, remember-me,
and login submit controls. No native OAuth/OIDC/SSO button exists in this
version's login component, and local source search only found monitor OAuth
client-credentials support, not app login SSO.
- If staying on Uptime Kuma 1.23.17, revert Cloudflare route for
`status.kitestacks.com` back to `http://uptime-kuma:3001`; otherwise users
get Authentik first and then still see Kuma's local login. Native in-app SSO
would require an Uptime Kuma version/plugin/fork with login OIDC support or
custom app code, not the Authentik proxy provider.
- User reset the Cloudflare route back to `http://uptime-kuma:3001` and asked
to continue with an in-app Authentik button. Upstream latest checked via
GitHub API: Uptime Kuma latest release is `2.4.0` (published 2026-05-31) and
upstream `src/components/Login.vue` still has only username/password login,
no native OAuth/OIDC button. Proceeded with a custom overlay patch.
- Custom native Authentik SSO overlay deployed on BOTH active tunnel backends
(monk and kscloud1) so public load-balanced traffic behaves consistently:
- monk path: `~/kitestacks-live/docker/uptime-kuma/`
- kscloud1 path: `/opt/kitestacks/docker/uptime-kuma/`
- backend preload module:
`custom/server/authentik-sso.js`
- frontend mounted files:
`custom/dist/index.html`, `index.html.gz`, `index.html.br`
- compose now sets `NODE_OPTIONS=--require /app/custom/server/authentik-sso.js`,
loads `.env.sso`, and bind-mounts the custom files over Kuma's built HTML.
- Authentik native OAuth provider/application created:
- OAuth2Provider name `Uptime Kuma Native`, provider id `12`
- Application slug `uptime-kuma-native`, name `Uptime Kuma Native SSO`
- Client ID `uptime-kuma-native`
- Redirect URI `https://status.kitestacks.com/auth/authentik/callback`
- Restricted to Authentik group `homelab-admin` via PolicyBinding
`2e1eaa95-b397-4c4f-bfc7-abb337906cf3`
- Client secret is stored only in each host's `.env.sso`; do not print it.
- Custom flow behavior:
- Login page injects a `Sign in with Authentik` button linking to
`/auth/authentik`.
- Backend starts Authentik OIDC, validates callback state, fetches userinfo,
maps the login to existing Kuma user `kenpat`, issues Kuma's normal JWT,
then redirects to `/?authentik_token=<token>`.
- Frontend one-time script stores the JWT in `localStorage.token`, removes
the URL token, and redirects to `/dashboard`, letting Kuma's normal
`loginByToken` flow establish the session.
- Verification 2026-06-15:
- monk local `/dashboard` HTML contains `Sign in with Authentik`,
`/auth/authentik`, and `authentik_token`.
- kscloud1 local `/dashboard` HTML contains the same and `/auth/authentik`
redirects to Authentik with client_id `uptime-kuma-native`.
- Public repeated check:
`for i in 1 2 3 4 5 6; do curl -sSL --compressed https://status.kitestacks.com/dashboard | grep -q "Sign in with Authentik"; done`
returned `button` for all 6 attempts, confirming both active connectors
serve the button.
- Post-test screenshot showed Uptime Kuma login page with red banner "Lost
connection to the socket server. Reconnecting..." after clicking the SSO
button. Root cause: active-active JWT mismatch. Uptime Kuma JWTs include a
signature using `setting.jwtSecret`; monk and kscloud1 had matching user
password hashes but different JWT secrets, so a token minted by one backend
failed if the browser's websocket connected to the other backend. Fixed
2026-06-15 by copying monk's exact `jwtSecret` into kscloud1's
`/app/data/kuma.db` using base64 transport (avoid shell expansion of secret
chars), then restarting kscloud1 Uptime Kuma. Verified both hashes now match:
`jwtSecret` length 60, sha3 prefix `FA67E6E9EDCC8E1D`. Public button check
still returns `button` 6/6. If a browser still has a pre-fix bad token in
localStorage, clear site data or click the Authentik button again to mint a
fresh token.
- User retested and still saw the socket reconnect banner. Follow-up finding:
public Uptime Kuma frontend was using Socket.IO's default long-polling-first
transport. In the active-active Cloudflare Tunnel setup, polling requests can
bounce between monk and kscloud1 before a socket session is established,
causing reconnect loops before Kuma even logs `Login by token`.
- Fix applied 2026-06-15 on BOTH monk and kscloud1: copied the built frontend
bundle `index-BBxTfFCS.js` into the overlay and patched the minified socket
call from `Ze=Nc(n)` to `Ze=Nc(n,{transports:["websocket"]})`. Regenerated
`.gz` and `.br` variants and mounted all three over
`/app/dist/assets/index-BBxTfFCS.js*` in both compose files. Restarted both
Uptime Kuma containers.
- Verification after websocket-only patch:
- monk local asset contains `transports:["websocket"]`
- kscloud1 local asset contains `transports:["websocket"]`
- public repeated asset check over `https://status.kitestacks.com/assets/index-BBxTfFCS.js`
found `transports:["websocket"]` 6/6, confirming both tunnel backends serve
the patched client bundle.
- User still saw the same issue after trying another browser. Follow-up:
websocket connections were reaching Kuma, but logs showed no `Login by token`,
so the handoff from Authentik callback to Kuma storage was unreliable. Changed
the SSO callback from `/?authentik_token=<jwt>` URL handoff to a short-lived
readable cookie `uk_authentik_token` plus redirect directly to `/dashboard`.
Updated injected HTML to read that cookie before Kuma initializes, store the
token in `localStorage.token`, set `localStorage.remember=1`, then delete the
cookie. This avoids long-token URL handling.
- Important operational gotcha: Uptime Kuma caches `index.html` in memory at
startup. After changing the mounted `index.html`/compressed variants, `docker
compose up -d` was not enough because containers stayed "Running"; had to run
`docker compose restart uptime-kuma` on BOTH monk and kscloud1 to reload the
HTML into memory.
- Verification after cookie handoff + explicit restarts:
- monk local `/dashboard` HTML contains `uk_authentik_token`, `authentik_token`,
and `Sign in with Authentik`.
- kscloud1 local `/dashboard` HTML contains the same.
- public repeated check for `uk_authentik_token` over
`https://status.kitestacks.com/dashboard` returned `cookie-handoff` 6/6.
- User confirmed after retest: Uptime Kuma Authentik SSO button works.
### Uptime Kuma monitors mirrored into Prometheus/Grafana (2026-06-15)
User asked to set up the same monitors currently in Uptime Kuma for Grafana and
Prometheus. Existing Uptime Kuma monitor list at the time:
- `T14 Deb Assassin`: ping `127.0.0.1`
- `HomeRouter`: ping `192.168.1.254`
- `Google DNS`: ping `8.8.8.8`
- `TailScale`: ping `100.90.13.55`
Implemented on monk's live Prometheus/Grafana stack:
- Added `prom/blackbox-exporter` service to
`~/kitestacks-live/docker/prometheus/docker-compose.yml`.
- Added blackbox config
`~/kitestacks-live/docker/prometheus/blackbox.yml` with ICMP module
(`preferred_ip_protocol: ip4`, timeout 5s).
- Added Prometheus scrape job `uptime-kuma-ping-probes` in
`~/kitestacks-live/docker/prometheus/prometheus.yml`, using `/probe` with
`module=icmp` and labels `monitor_name` matching the Uptime Kuma names.
- Added Grafana provisioned dashboard
`~/kitestacks-live/docker/grafana/provisioning/dashboards/kitestacks-uptime-probes.json`
titled `KiteStacks Uptime Probes`, with stat/timeseries panels for
`probe_success{job="uptime-kuma-ping-probes"}` and
`probe_duration_seconds{job="uptime-kuma-ping-probes"}`.
- Ran `docker compose up -d` in the Prometheus directory, pulled/started
`blackbox-exporter`, restarted Prometheus, and restarted Grafana.
Verification:
- Prometheus config validates with `promtool check config`.
- Prometheus active targets include all four `uptime-kuma-ping-probes`.
- Query result for `probe_success{job="uptime-kuma-ping-probes"}`:
`Google DNS=1`, `T14 Deb Assassin=1`, `HomeRouter=0`, `TailScale=0`.
The two failures match Kuma's existing failing ping behavior from inside the
container/network namespace.
- Grafana logs show dashboard provisioning completed without dashboard errors
(only unrelated bundled plugin permission warnings).
### Desktop widget for the same monitor set (2026-06-15)
User asked for a Rainmeter-like desktop widget on Debian 13 that can show the
same Uptime Kuma monitor state in real time.
Created a local Conky-based widget scaffold in the desktop user's home:
- `~/.local/bin/kitestacks-uptime-widget.sh`
- `~/.config/conky/kitestacks-uptime.conf`
Behavior:
- Polls Prometheus for `probe_success` and `probe_duration_seconds` from the
`uptime-kuma-ping-probes` job.
- Defaults to `http://192.168.1.205:9090`, with `PROM_URL`
override support.
- Prints the four Kuma monitor names, state, latency, and a summary line.
- Degrades cleanly with `Prometheus unavailable at ...` when the endpoint
cannot be reached.
Note: Conky is the closest direct Rainmeter-style equivalent for Debian/Linux
desktop widgets; `eww` is the more modern alternative if the desktop session is
Wayland-first and the user prefers GTK/Rust widgets instead of a classic
desktop overlay.
Debian 13 package note:
- `conky` is a virtual package in trixie.
- Install `conky-all` for the full desktop widget experience:
`sudo apt update && sudo apt install conky-all`
Connectivity note:
- The laptop could not reach Prometheus at `192.168.1.205:9090`, which means
the widget can only work from a host that can reach the homelab LAN or a
public/tunneled Prometheus endpoint.
- The existing KiteStacks docs mark Prometheus as excluded from the Cloudflare
tunnel, so there is no known public Prometheus URL to target yet.
- The desktop widget script now defaults to `https://prometheus.kitestacks.com`
and can send `CF-Access-Client-Id` / `CF-Access-Client-Secret` headers if the
hostname is protected by Cloudflare Access.
Cyberpunk widget styling:
- Conky panel tuned to the wallpaper palette with black base and neon
cyan/magenta accents.
- Header uses `#ff4df0` pink and `#2de0ff` blue.
- Monitor rows color-code `UP` as cyan and `DOWN` as pink for fast scanning.
Important security hygiene: local git remote for `~/claude-memory` contains an
HTTP token in the URL; do not print it in summaries. Prefer redacted URLs in
handoffs.
### Oracle VPS migration - PLANNED, upcoming (stated 2026-06-11)
User confirmed on 2026-06-11: "we are going to switch things soon from hetzner
cloud to oracle soon." -> kscloud1 (Hetzner, 5.78.233.28) is intended to be
REPLACED by an Oracle Cloud VPS in the near future ("soon", no firm date yet).
Originally raised 2026-06-10 as exploratory ("how easy would it be to move
everything to oracle vps after?"), now an actual plan.
Implication: avoid investing further one-off/manual config work that's hard to
redo (e.g. more one-time DB syncs, hand-edited sqlite, etc.) on kscloud1 if
avoidable - prefer changes that are easy to replicate on a new host. When the
Oracle VPS is provisioned, plan to follow the same pattern as the kscloud1
cloud-failover build-out (new Cloudflare Tunnel connector + full service
replicas + shared Authentik/Postgres/Redis over Tailscale + the Forgejo
FORGEJO_API_BASE-over-Tailscale pattern for the portal's Recent Activity, see
"Recent Activity" fix below) - then retire kscloud1 the same way assassin/T14
was retired (decommission once Oracle replica verified working).
## Prior migration gotchas (monk, kept for reference - see git history/old notes if needed)
- rsync --files-from recursion bug, bind-mount postgres dirs come over empty as
non-root (use pg_dumpall/pg_dump --clean from running container instead),
pg_dumpall --clean across template1 breaks on client/server version mismatch
(use single-db pg_dump+psql instead), grafana data dir needs chown 472:472,
kite-litellm needed manual `docker network connect kitestacks kite-litellm`.
## 2026-06-12: SSO fixes + Portainer deployed on kscloud1
### Root cause: monk reconnect race condition
When monk goes offline (user travels) and reconnects, Cloudflare starts routing
some token exchange requests to monk while codes were created on kscloud1 during
the offline window. Auth codes had a 60-second TTL, which expired before monk's
Authentik fully started (~5 min startup). FIX: increased `access_code_validity`
from `minutes=1` to `minutes=10` for ALL 9 OAuth2 providers in the shared Postgres
DB. This gives enough buffer for monk's containers to start before codes expire.
Command used (via python:3-alpine container):
`docker run --rm --network host -v /tmp/fix_auth.py:/fix.py python:3-alpine sh -c ...`
connecting to shared Postgres at 100.123.254.52.
### Karakeep redirect_uri reverted and re-fixed
The Karakeep OAuth2Provider `_redirect_uris` had reverted back to the proxy pattern
(`/outpost.goauthentik.io/callback?...`) instead of the correct NextAuth callback
(`https://links.kitestacks.com/api/auth/callback/custom`). This caused "Redirect URI
Error" from Authentik whenever SSO was attempted. Root cause unknown (possibly an
Authentik blueprint or UI save that regenerated/overrode the field). FIX: same
Postgres UPDATE pattern. WATCH: if this reverts again, check Authentik blueprints
or if someone modified the Karakeep provider via the Authentik admin UI.
### Portainer deployed on kscloud1
Created `/opt/kitestacks/docker/portainer/docker-compose.yml` (same image/config as
monk's portainer). Container running as `portainer`, port 9443:9443, on `kitestacks`
network. Volume is local (NOT shared with monk - fresh Portainer instance).
STILL PENDING (user action in Cloudflare dashboard):
- Tunnel ID: 5e60ea8e-a543-49b6-bab5-325f39441e00, Account: d0bb7673333fcd794622956f1662f785
- Add hostname `portainer.kitestacks.com` → service `https://portainer:9443`, No TLS Verify
STILL PENDING (user action in both Portainer UIs): configure OAuth (see prior notes
in "Portainer SSO" section above for exact credentials).
Portal card update (3 files) also still pending until tunnel+OAuth done.
## Phase 2 Planned: Obsidian Mind Map → HTML Mind Map Sync
User wants to create an Obsidian mind map of the KiteStacks homelab that syncs/exports to a live HTML mind map embedded in the homelab portal or a standalone page. To be built after full Obsidian+samurai setup is complete.
## 2026-06-13: OpenProject removed + Oracle VPS migration started
### OpenProject REMOVED permanently
OpenProject requires Enterprise Edition license for SSO (confirmed last session).
Removed from local stack (monk):
- Docker volume `openproject_openproject_assets` deleted
- `/home/kenpatmonk/kitestacks-live/docker/openproject/` directory removed (pgdata dir
needed sudo — user ran manually; pgdata was owned by container UID mapped to `avahi`)
- NOT deploying on Oracle VPS
- tasks.kitestacks.com subdomain is now dead — update Cloudflare/portal accordingly
TODO: remove `apps/openproject/` from kitestacks-homelab Forgejo repo once user can log in.
### Forgejo issues found + partially fixed (2026-06-13)
Forgejo login page has two issues:
1. URL banner: "configured to be served on http://5.78.233.28:3000/" — caused by kscloud1's
Forgejo having wrong ROOT_URL. kscloud1 Forgejo has only 1 repo (separate DB from monk's
13-repo instance). Cloudflare tunnel load-balances between monk and kscloud1 Forgejo.
FIX PENDING: stop Forgejo on kscloud1 (or fix its ROOT_URL). Deferred — do during Oracle migration.
2. SSO button says "Proceed with OpenID" instead of "Authentik".
PARTIAL FIX: renamed login_source from `authentik` → `Authentik` via admin CLI:
`docker exec -u git forgejo /app/gitea/gitea admin auth update-oauth --id 1 --name Authentik ...`
Provider type remains `openidConnect` — button text may still say "OpenID" (depends on
Forgejo 11 template behavior). User to verify after refresh. Full fix may require admin UI
once user can log into Forgejo.
Forgejo DB: 13 repos under `kenpat`, 1 user (kenpat, admin, active, no 2FA).
Forgejo login: username `kenpat`, direct password login works on the same page.
### kitestacks-homelab repo: apps/forgejo/docker-compose.yml has wrong ROOT_URL
`FORGEJO__server__ROOT_URL=http://192.168.1.205:3006` — old local IP, never updated.
The LIVE local stack (`~/kitestacks-live/docker/forgejo/docker-compose.yml`) is correct
(`https://gitforge.kitestacks.com/`). The repo copy needs updating.
TODO: fix and commit once user can log in and clone the repo.
### Oracle VPS migration plan (kscloud1 → Oracle Cloud)
Goal: replace Hetzner kscloud1 (5.78.233.28, $14.50/mo) with Oracle Cloud ARM VPS ($8.50/mo).
Oracle instance: Ampere A1 Flex, 4 OCPU / 24 GB RAM, Chicago region (us-chicago-1).
Status as of 2026-06-13: user is provisioning — hit "no capacity" in Chicago.
Workarounds tried: capacity not available for 4 OCPU config. Options:
- Try smaller shape (1 OCPU / 6 GB), resize after provisioning
- Subscribe to another region (Frankfurt, Osaka, Toronto have better A1 availability)
- Keep retrying (capacity opens randomly, early UTC morning tends to be better)
ARM64 compatibility analysis (all images verified):
- ✅ All services ARM64-compatible EXCEPT OSticket
- ❌ OSticket (`campbellsoftwaresolutions/osticket`) — x86 only
FIX: enable QEMU binfmt emulation on Oracle ARM host, run with `--platform linux/amd64`
Performance acceptable for a ticket system.
- ⚠️ Shaarli — verify ARM64 at deploy time
Services to deploy on Oracle VPS (OpenProject EXCLUDED):
authentik, bookstack, cloudflared, forgejo, grafana, homepage/portal,
karakeep (+meilisearch +chrome), kavita, kite-ai (litellm+openwebui),
linkding, osticket, portainer, prometheus+node-exporter, shaarli, uptime-kuma
Migration phases:
1. Oracle VPS provisioning (in progress)
2. Oracle initial setup: Ubuntu 22.04 ARM64, Docker, iptables flush (Oracle blocks by default),
QEMU binfmt for OSticket x86 emulation
3. Deploy full stack — fix Forgejo ROOT_URL correctly from day one
4. Connect cloudflared on Oracle to KiteStacks tunnel (same TUNNEL_TOKEN)
5. Verify all services, then remove kscloud1 from tunnel + cancel Hetzner
NOTE: same active-active pattern as kscloud1 — shared Authentik Postgres+Redis over
Tailscale, same TUNNEL_TOKEN, fresh DBs for stateful apps except identity (authentik/kavita).
IMPORTANT Oracle gotcha: Ubuntu on Oracle has iptables rules that block all traffic at boot
even after Security List rules are opened. Must flush iptables as part of initial setup.
## osTicket deployed on monk + kscloud1 (found 2026-06-13/14, installed ~2026-06-12)
osTicket (campbellsoftwaresolutions/osticket image, x86 - runs natively on both hosts,
no QEMU needed) + nginx proxy + MariaDB 10.11, under
`~/kitestacks-live/docker/osticket/` (monk) and `/opt/kitestacks/docker/osticket/`
(kscloud1). `tasks.kitestacks.com` -> "KiteStacks Help Desk", verified HTTP 200.
Admin: kenpat7177 / kenpat7177@gmail.com. Host ports: monk 8092:8080, kscloud1 8090:8080
(both nginx -> osticket-app:80). .env (OSTICKET_DB_PASS/ROOT/ADMIN_PASS/INSTALL_SECRET)
is IDENTICAL on both hosts.
### DB unification (2026-06-13/14) - same pattern as Authentik shared-DB fix
Both hosts originally had their OWN osticket-db (drift risk like pre-fix Kavita). Per
user request ("database should be accessible from any computer"), unified onto
kscloud1's osticket-db as canonical:
- kscloud1 osticket-db: added `ports: - "100.123.254.52:3306:3306"` (Tailscale-only,
matches authentik-postgres/redis pattern) to
`/opt/kitestacks/docker/osticket/docker-compose.yml`, `docker compose up -d`.
- monk: `docker compose stop osticket-db` (left stopped, NOT removed - rollback data
intact in its volume). Edited `~/kitestacks-live/docker/osticket/docker-compose.yml`:
removed osticket-db service block, changed osticket-app's `MYSQL_HOST=osticket-db`
-> `MYSQL_HOST=100.123.254.52`, removed `depends_on: osticket-db`. `docker compose
up -d osticket-app`.
- GOTCHA: after recreating osticket-app, the `osticket` nginx proxy container on monk
returned 502 (cached stale upstream IP for osticket-app from its old container) -
fixed with `docker restart osticket`. Apply this same restart on kscloud1's `osticket`
nginx if its osticket-app is ever recreated.
- Verified: both DBs had identical data before merge (1 ticket, 1 staff/kenpat7177) so
no data loss either way. tasks.kitestacks.com returns 200 consistently post-merge.
- Backups: `docker-compose.yml.bak` left in both hosts' osticket dirs.
### osticket-capstone Forgejo repo (created 2026-06-13/14)
New private repo `kenpat/osticket-capstone` on gitforge (created via API using a
scoped token `claude-capstone-osticket` generated via
`docker exec -u git forgejo /app/gitea/gitea admin user generate-access-token` on
monk's forgejo container - token has write:repository,write:user scopes). Holds
redacted osTicket deployment config + Per Scholas capstone docs/evidence - see
[[project-per-scholas-capstone]]. NOTE: gitforge.kitestacks.com is also
active-active load-balanced (monk/kscloud1 separate forgejo DBs) - API calls
against the public hostname can hit the wrong DB; use monk's local
`http://localhost:3006` for API operations tied to monk's forgejo data.
### Remaining osTicket work
- Authentik SSO plugin for osTicket staff/agent login (osTicket has no native OIDC,
needs 3rd-party OAuth2/SAML plugin) - NOT YET DONE.
- End-user ticket submission uses osTicket's native client portal signup (works
out of the box, no SSO needed).
## 2026-06-14/15: Forgejo sync fixed + osTicket Authentik LDAP SSO complete
### Forgejo sync (monk → kscloud1) - FIXED
- Ran `docker exec -u git forgejo /app/gitea/gitea dump` on monk, scp'd to kscloud1
- Restored: 13 repos + DB synced, ROOT_URL fixed on kscloud1 to `https://gitforge.kitestacks.com/`
- kscloud1 Forgejo docker-compose updated (correct ROOT_URL + SSH port 2222)
- Sync script: `~/kitestacks-live/docker/forgejo/sync-to-cloud.sh` (rsync repos + DB dump)
- Cron: `0 */6 * * *` runs sync-to-cloud.sh, logs to `/tmp/forgejo-sync.log`
- Authentik redirect URI fixed: updated `_redirect_uris` in shared Postgres from
`authentik/callback` → `Authentik/callback` (matched renamed Forgejo source name)
### osTicket Authentik LDAP SSO - COMPLETE (2026-06-14/15)
Uses Authentik's LDAP outpost + osTicket's built-in auth-ldap.phar plugin.
**Authentik side:**
- LDAPProvider "osTicket LDAP" (pk=11, base_dn=DC=ldap,DC=goauthentik,DC=io)
- Application "osTicket LDAP" (slug=osticket-ldap, backchannel provider)
- Outpost "osTicket LDAP Outpost" (pk=5c42f5ba-64bd-434e-a47f-7ce9da13227a)
- Outpost service token: `jjYRKWuGtoeq9r0qeifbCnXGHDjhCJU2MLnkCvMMduIGA1kQKz85qnt7u5Zf`
- ldap-svc user (search account): DN=`cn=ldap-svc,ou=users,dc=ldap,dc=goauthentik,dc=io`
password=`IlgQaxBPv9rdoq03CsoY53tH`, member of homelab-admin group
**Docker services added on monk:**
- `~/kitestacks-live/docker/authentik-ldap/docker-compose.yml`
- `authentik-ldap` (ghcr.io/goauthentik/ldap:2025.2.4) on kitestacks+osticket_default networks
- `authentik-ldap-proxy` (alpine/socat) bridges port 389→3389 on osticket_default
so osticket-app can reach standard LDAP port without phar URI workaround
**Docker services added on kscloud1:**
- `/opt/kitestacks/docker/authentik-ldap/docker-compose.yml`
- Same authentik-ldap container, bound to 100.123.254.52:3389 (Tailscale) + 127.0.0.1:3389
**auth-ldap.phar patches (3 patches applied, original backed up as auth-ldap.phar.orig):**
1. `authentication.php` - `getConnection()`: adds binddn/bindpw from plugin config to
Net_LDAP2 params so initial connect uses credentials (not anonymous, which Authentik rejects)
2. `config.php` - validation block: sets include_path to phar's include dir before
`require_once Net/LDAP2.php` so sub-files resolve correctly in FPM context
3. ALL `include/Net/LDAP2/*.php` files: guards `require_once 'PEAR.php'` with
`if (!class_exists('PEAR', false))` to prevent fatal conflict between osTicket's
`/include/pear/PEAR.php` and PHP global `/usr/local/lib/php/PEAR.php`
**osTicket LDAP plugin config (namespace plugin.2 in ost_config):**
- servers: `authentik-ldap-proxy` (via socat on port 389)
- bind_dn: `cn=ldap-svc,ou=users,dc=ldap,dc=goauthentik,dc=io`
- bind_pw: encrypted with `Crypto::encrypt(pass, SECRET_SALT, 'plugin.2')`
- search_base: `ou=users,dc=ldap,dc=goauthentik,dc=io`
- schema: auto, auth-staff: 1, auth-client: 0, domain: ldap.goauthentik.io
**Staff login:** username=`kenpat7177`, password=Authentik password (reset to `KiteStacks2026!`)
on `tasks.kitestacks.com/scp/login.php`
### Per Scholas IT Support Capstone - IN PROGRESS
See [[project-per-scholas-capstone]]. Next steps:
- Create capstone incident tickets in osTicket (5-phase challenge)
- Set up osTicket user/client portal for non-staff users (Phase 3 end-user access)
- Each capstone ticket maps to a phase scenario (migration event, incident response, etc.)