881 lines
56 KiB
Markdown
881 lines
56 KiB
Markdown
---
|
|
name: project-kitestacks-migration
|
|
description: "Migration of the live KiteStacks homelab/website from assassin (T14) to monk — COMPLETE. Plus full Hetzner cloud failover (kscloud1, 5.78.233.28) — COMPLETE. All 9 subdomains can run from any single host. Plus 2026-06-10 portal/SSO push: portal FluxCD+coming-soon changes deployed, Karakeep SSO fixed, OpenProject SSO blocked by EE license, Portainer SSO Authentik-side done (pending user manual steps)."
|
|
metadata:
|
|
node_type: memory
|
|
type: project
|
|
originSessionId: 33992890-3940-4d4a-a94a-22b5621e9c1a
|
|
---
|
|
|
|
## Final Polish, Security, and Runbook Completion (2026-06-15)
|
|
|
|
The KiteStacks infrastructure is now in its final, secured, and documented state:
|
|
- **GitOps UI/Dashboard:** Added a standalone Nginx container for FluxCD status, bypassing Authentik so Cloudflare edge can route it freely. The dashboard is live at `flux.kitestacks.com`.
|
|
- **Security Posture:** Validated Zero Trust architecture. No inbound open ports, strict mesh networking via Tailscale `100.x.x.x`, and Authentik protecting all administrative dashboards (`/scp/` for osTicket, Portainer, Grafana, Kite AI).
|
|
- **Runbook Cleaned:** `RUNBOOK.md` truncated and organized. Historical issues (like Authentik invalid_grant, osTicket email SMTP lack of MTA) have been relocated to `docs/DEBUGGING.md`.
|
|
- **osTicket Diagnostics:** Documented that activation emails fail because Docker containers lack a local MTA. Fix involves adding an external SMTP server in the osTicket Admin Panel.
|
|
|
|
## T14s GitOps Automation SUCCESS (2026-06-15)
|
|
|
|
The cluster configuration originally for "assassin" (T14) has been moved to the
|
|
**T14s**. The machine is now fully bootstrapped with FluxCD GitOps.
|
|
|
|
- **Cluster Hostname:** monk (T14s)
|
|
- **GitOps Repo:** `kitestacks-homelab` (main branch)
|
|
- **Path:** `clusters/T14s`
|
|
- **Automation:** FluxCD is now managing the `kavita` namespace.
|
|
- **Kavita Manifests:**
|
|
- Deployment, Service, PVC (2Gi local-path), and Namespace.
|
|
- Successfully synced and running (verified 2026-06-15).
|
|
- **Credentials:** Authentik password for `kenpat7177` reset to `KiteStacks2026!`.
|
|
- **osTicket:** Services started, DB unified on kscloud1, and verified
|
|
accessible via Authentik LDAP.
|
|
|
|
The GitOps workflow is now the authoritative way to manage Kubernetes apps on the T14s.
|
|
|
|
monk is the live production host. assassin (T14) is OFF. kscloud1 (Hetzner VPS,
|
|
5.78.233.28) is now a THIRD active Cloudflare Tunnel connector and runs a FULL
|
|
replica of all 9 services, so the site stays up even if both monk and assassin
|
|
are off (verified by user testing with home wifi off, from phone + mom's phone).
|
|
|
|
All 9 public subdomains (www, ai, auth, gitforge, grafana, kavita, links, status, tasks)
|
|
verified returning correct status codes via the live tunnel with kscloud1 in rotation.
|
|
|
|
## Governing principle (user's explicit words)
|
|
"leave the cloud backup on at all times" / "thats the point of it. if I am
|
|
travelling my site will go down otherwise." -> kscloud1 runs PERMANENTLY as a
|
|
3rd connector, NOT cold standby. Cloudflare Tunnel load-balances ACTIVE-ACTIVE
|
|
across all 3 connectors (no primary/backup priority). This means stateful apps
|
|
(gitforge, openproject, authentik, karakeep, kavita, openwebui, etc.) may show
|
|
DIFFERENT/STALE data depending on which connector serves a given request -
|
|
EXPLICITLY ACCEPTED by user as the cost of guaranteed uptime. Fresh/separate
|
|
databases on kscloud1 are fine; do not try to sync data between monk and kscloud1.
|
|
|
|
## kscloud1 access
|
|
SSH: `ssh -i ~/.ssh/id_ed25519_kscloud1 kenpat@5.78.233.28` (passwordless, key auth).
|
|
sudo needs a password ("p12217177") and has no askpass helper - avoid sudo;
|
|
most things doable as kenpat or via docker.
|
|
All services live under `/opt/kitestacks/docker/<service>/docker-compose.yml`,
|
|
same one-dir-per-app pattern as monk's `~/kitestacks-live/docker/`.
|
|
|
|
## kscloud1 services deployed (all `docker compose up -d`, joined to local `kitestacks` network)
|
|
- cloudflared (3rd tunnel connector, same TUNNEL_TOKEN, connector id 78521d9f-71c0-4e3d-992f-bd1f77da1a8f)
|
|
- homepage-backup (alias `homepage`) + caddy + kitestacks-metrics-api-backup - PRE-EXISTING
|
|
- forgejo (alias `forgejo`) - PRE-EXISTING, separate DB from monk's (gitforge data inconsistent across connectors, accepted)
|
|
- prometheus + node-exporter (job `kscloud1-node`)
|
|
- grafana (alias `grafana`, port 3150) - Prometheus datasource (uid 000000001) + "Node Exporter Full"
|
|
dashboard (id 1860) provisioned via `./provisioning/`. OAuth->authentik config present but
|
|
authentik on kscloud1 has a FRESH db (no provider apps configured) so OAuth login won't work
|
|
there; local admin login works.
|
|
- uptime-kuma (alias `status`->`uptime-kuma`) - kuma.db seeded by copying monk's admin user
|
|
(same login: kenpat / same password hash) + monitors: kscloud1 self-ping, Google DNS, and
|
|
HTTP checks for all 9 *.kitestacks.com subdomains (external monitoring of the live site).
|
|
- kavita (alias `kavita`) - empty library (fresh)
|
|
- karakeep + karakeep-chrome + karakeep-meilisearch (alias `karakeep`) - fresh meilisearch/db
|
|
- authentik + authentik-worker + authentik-postgres + authentik-redis (alias on `auth`) - FRESH DB.
|
|
Bootstrap admin: `akadmin@kitestacks.com` / password `6KlYpfCyYxbnKQNiOewN` (set via
|
|
AUTHENTIK_BOOTSTRAP_PASSWORD in .env). No OAuth provider apps exist yet (would need to be
|
|
manually recreated in authentik UI for grafana/openwebui/karakeep/openproject SSO to work
|
|
when kscloud1 is the active backend).
|
|
- kite-litellm + kite-openwebui (alias `ai`->openwebui) - same .env/secrets as monk. OpenWebUI
|
|
has `ENABLE_SIGNUP=true` (changed from monk's `false`) so kenpat can create a local admin
|
|
account on first use, since authentik OAuth won't work with kscloud1's fresh authentik.
|
|
- openproject (alias on `tasks`, port 8090:80 host - port 80 was taken by caddy) - FRESH db,
|
|
self-initialized via the all-in-one image's bootstrap (took ~3-4 min). Empty/no projects yet.
|
|
|
|
## monk-side changes made for cross-host monitoring
|
|
- `~/kitestacks-live/docker/prometheus/prometheus.yml`: added scrape job
|
|
`kscloud1-node` -> `5.78.233.28:9100` (kscloud1's node-exporter is exposed
|
|
0.0.0.0:9100, no firewall - reachable from monk's public IP). monk's grafana
|
|
(the live one, "Node Exporter Full" dashboard now provisioned via
|
|
`~/kitestacks-live/docker/grafana/provisioning/`) shows BOTH `t14-node`
|
|
(monk/"this pc") and `kscloud1-node` ("the cloud") via the instance picker.
|
|
- kscloud1's prometheus only scrapes itself (`kscloud1-node`) - monk is behind
|
|
home NAT, not reachable from kscloud1.
|
|
|
|
## Resource notes (kscloud1: 3 vCPU, 3.7GB RAM + 6GB swap, 75GB disk)
|
|
With all services running: ~2.8-2.9GB RAM used, ~2.6-2.8GB swap used (of 6GB),
|
|
~835MB-1.2GB "available", disk 29GB/75GB used. Site is functional but under
|
|
memory pressure - if BOTH monk and assassin are down for an extended period
|
|
with real concurrent usage, expect sluggishness (esp. openproject/authentik/
|
|
openwebui). Not yet stress-tested under real failover load.
|
|
|
|
## Key gotchas from THIS phase (cloud failover build-out)
|
|
- kscloud1's `kitestacks` Docker network is LOCAL/separate from monk's (same name,
|
|
no conflict). cloudflared on each host resolves container names against its
|
|
own host's network.
|
|
- Adding a new tunnel connector that lacks a backend for an ingress hostname ->
|
|
502 for requests routed there. If it has a DIFFERENT backend (e.g. forgejo) ->
|
|
serves different data inconsistently. Both accepted/expected now that all 9
|
|
hostnames have backends on kscloud1.
|
|
- port 80 on kscloud1 is owned by `caddy` (serves www-backup/git-backup.kitestacks.com
|
|
direct A-records, pre-existing, unrelated to the tunnel) - openproject uses 8090:80
|
|
for its host port instead (internal container port 8080 is what cloudflared hits).
|
|
- uptime-kuma / grafana have no simple file-based config API for monitors/datasources
|
|
beyond grafana provisioning - used direct sqlite manipulation (`docker exec ... sqlite3`,
|
|
or python3 sqlite3 module via a throwaway `python:3-alpine` container with the volume
|
|
mounted) to seed uptime-kuma's kuma.db with users/monitors.
|
|
- authentik first boot takes ~1-2 min (migrations); openproject first boot takes
|
|
~4-5 min (postgres initdb + Rails migrations + Puma boot), watch `docker logs`
|
|
for "Listening on http://0.0.0.0:8080" before testing.
|
|
|
|
## Authentik/Kavita login fix (2026-06-10, post cloud-failover)
|
|
PROBLEM: Cloudflare Tunnel load-balances auth./kavita. active-active across monk
|
|
and kscloud1. kscloud1's authentik had only the fresh `akadmin` bootstrap user
|
|
(not kenpat7177) and kscloud1's kavita had ZERO users -> ~50% of requests showed
|
|
"wrong password" on authentik and a "create admin account" (signup) screen on
|
|
kavita instead of login. This contradicts the earlier "fresh DBs are fine"
|
|
assumption - for IDENTITY apps it breaks login, so it was NOT acceptable.
|
|
FIX APPLIED (one-time sync, same pattern as uptime-kuma's kuma.db seed):
|
|
- pg_dump'd monk's authentik-postgres `authentik` db (--clean --if-exists),
|
|
scp'd to kscloud1, stopped authentik+authentik-worker on kscloud1, restored
|
|
via `docker exec -i authentik-postgres psql -U authentik -d authentik < dump`,
|
|
restarted. Worked cleanly because AUTHENTIK_SECRET_KEY and PG_PASS were
|
|
ALREADY IDENTICAL between monk's and kscloud1's authentik/.env.
|
|
- For kavita: copying the raw kavita.db file via plain `cp` produced
|
|
"database disk image is malformed" (WAL-mode db isn't standalone-consistent
|
|
as a flat file copy even when -wal/-shm look small). FIX: use python3
|
|
sqlite3 `Connection.backup()` (via throwaway python:3-alpine container) to
|
|
produce a consistent copy, THEN on kscloud1 stop kavita, rm the OLD
|
|
kavita.db-shm/kavita.db-wal too (stale WAL files against new db = same
|
|
corruption error), copy in the new kavita.db (chown root:root, chmod 644
|
|
to match original ownership - kavita container runs as root), restart.
|
|
- Result: kscloud1 authentik now has kenpat7177 (matches monk), kscloud1
|
|
kavita now has kenpat7177 + acurrie (matches monk). Both connectors now
|
|
return the same login screen/credentials. NOTE: this is a ONE-TIME sync,
|
|
not continuous - if monk's users/passwords change later, kscloud1 will
|
|
drift again and the same symptoms could return; re-run this sync if so.
|
|
- kscloud1 kavita's library entries point at /books paths that don't exist on
|
|
kscloud1 (no actual book files there) - login works fine, but browsing the
|
|
library when served by kscloud1 will show entries with missing files. Same
|
|
"stale data" tradeoff as gitforge, accepted.
|
|
|
|
## Authentik shared Postgres+Redis over Tailscale (2026-06-10) - fixes "invalid_grant" SSO
|
|
PROBLEM: Even after the one-time DB sync above, "Sign in with Authentik" on
|
|
Kavita could fail with "invalid_grant" / "Code does not exist". Root cause:
|
|
monk and kscloud1 each ran their OWN authentik-postgres. OAuth2 authorization
|
|
codes are short-lived per-flow rows in `authentik_providers_oauth2_authorizationcode`
|
|
- if Cloudflare Tunnel's active-active routing sends `/authorize` to one
|
|
connector and `/application/o/token/` to the other, the code only exists in
|
|
one of the two DBs -> invalid_grant. A one-time data sync can't fix this
|
|
because the data is created fresh on every login attempt.
|
|
FIX: Converted to a single shared Postgres+Redis (HA pattern), hosted on
|
|
kscloud1, reachable ONLY over Tailscale:
|
|
- Installed Tailscale on both monk and kscloud1 (same tailnet). kscloud1's
|
|
tailscale IP is `100.123.254.52`.
|
|
- kscloud1's `/opt/kitestacks/docker/authentik/docker-compose.yml`:
|
|
authentik-postgres now binds `100.123.254.52:5432:5432` (was unbound/internal-only),
|
|
authentik-redis now binds `100.123.254.52:6379:6379`. Both still also reachable
|
|
on the local `kitestacks` docker network for kscloud1's own authentik+worker.
|
|
Backup of pre-change file: `docker-compose.yml.backup-before-shared-db-20260610-1138`.
|
|
- monk's `~/kitestacks-live/docker/authentik/docker-compose.yml`: REMOVED the
|
|
`postgresql` and `redis` services entirely. monk's `authentik`/`authentik-worker`
|
|
now point `AUTHENTIK_POSTGRESQL__HOST` and `AUTHENTIK_REDIS__HOST` at
|
|
`100.123.254.52` (kscloud1 over Tailscale), using the same `PG_PASS` /
|
|
`AUTHENTIK_SECRET_KEY` as before (already identical between hosts).
|
|
- monk's old local `authentik-postgres`/`authentik-redis` containers were
|
|
STOPPED (not removed) - data dirs preserved under
|
|
`~/kitestacks-live/docker/authentik/postgres` in case of rollback, but no
|
|
longer in use.
|
|
- Result: BOTH connectors' authentik+worker now read/write the SAME db/redis,
|
|
regardless of which one handles `/authorize` vs `/application/o/token/`.
|
|
Verified both `authentik`+`authentik-worker` healthy on monk and kscloud1,
|
|
OIDC discovery docs identical, user list matches (`kenpat7177` etc.) on both.
|
|
CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" on Kavita now works
|
|
(when monk's connector serves the request).
|
|
|
|
## Kavita "Sign in with Authentik" button missing on kscloud1 - FIXED 2026-06-10
|
|
After the shared-Authentik-DB fix above, the button still didn't appear when
|
|
Cloudflare routed kavita.kitestacks.com to kscloud1's connector. CAUSE: Kavita's
|
|
OIDC config lives in ITS OWN db (kavita.db `ServerSetting` table, Key=40, a JSON
|
|
blob with Authority/ClientId/Secret/Enabled), separate from Authentik's db.
|
|
The earlier one-time kavita.db sync (see fix above) was taken BEFORE OIDC SSO
|
|
was configured in monk's Kavita, so kscloud1's copy had Key=40 with empty
|
|
Authority/Secret and `"Enabled":false`. FIX: copied monk's Key=40 JSON value
|
|
verbatim into kscloud1's kavita.db (stop kavita, `docker run --rm -v
|
|
.../kavita/config:/data -v fix.sql:/fix.sql alpine` + apk sqlite + `sqlite3
|
|
/data/kavita.db < fix.sql` with `UPDATE ServerSetting SET Value='...' WHERE
|
|
"Key"=40`, restart kavita). NOTE: AspNetUserLogins (OIDC account-linkage table)
|
|
is empty on BOTH monk and kscloud1 - Kavita creates this row on first OIDC
|
|
login per-instance (matches existing local user by email since
|
|
ProvisionAccounts=false), so no extra action needed there.
|
|
GOTCHA: ServerSetting's PK column is `"Key"` (INTEGER), not `Id` - must quote
|
|
it in sqlite (`"Key"`) since KEY is a SQL reserved word.
|
|
DRIFT WARNING: any future Kavita server-setting change (OIDC config, library
|
|
paths, etc.) made on monk will NOT propagate to kscloud1's kavita.db
|
|
automatically - same one-time-sync caveat as the user-table sync above.
|
|
|
|
UPDATE 2026-06-10 (RESOLVED via Kavita UI, not direct DB edit): Direct SQL
|
|
edits to ServerSetting Key=40 got WIPED back to disabled/empty by Kavita on
|
|
every container restart (RowVersion incremented +2 each time, Authority/Secret
|
|
cleared, Enabled->false) - confirmed twice, even with a full WAL-consistent
|
|
kavita.db replace from monk. Direct DB writes to this table do NOT survive a
|
|
restart; only saves through Kavita's own Settings UI/API persist correctly.
|
|
FIX: opened an SSH local port-forward (`ssh -L 5099:localhost:5000
|
|
kenpat@5.78.233.28`) so the user could reach kscloud1's Kavita directly at
|
|
http://localhost:5099 (bypassing the Cloudflare load-balanced domain), logged
|
|
in with their normal kenpat7177 Kavita password, and re-entered the OIDC
|
|
config in Settings -> OIDC:
|
|
- Authority: `https://auth.kitestacks.com/application/o/kavita/`
|
|
(MUST include trailing slash - Kavita validates that this exactly matches
|
|
the `issuer` claim in Authentik's `.well-known/openid-configuration`,
|
|
which has a trailing slash. Without it: "Kavita can load the OIDC
|
|
configuration, but the issuer does not match".)
|
|
- Client ID: `kavita`, Client Secret: (96-hex-char secret from Authentik's
|
|
Kavita OAuth2 provider - watch for copy/paste truncation, verify length=96)
|
|
- Enabled: true, ProviderName: authentik
|
|
Saved via UI (RowVersion 8->12->14 across two saves to fix a 1-char-truncated
|
|
secret), then `docker compose restart kavita` on kscloud1 - config SURVIVED
|
|
this restart (unlike the direct-SQL attempts) and `/api/settings/oidc` now
|
|
reports `"enabled": true`. SSH tunnel closed afterward (no firewall changes
|
|
were made/needed). Set a temporary ApiKey on kenpat7177's kscloud1 kavita
|
|
account during troubleshooting (for a Plugin/authenticate attempt that turned
|
|
out to return 401 / unused) - left in place, harmless (grants API access to
|
|
that user's own account only).
|
|
TAKEAWAY FOR FUTURE KAVITA CONFIG CHANGES ON KSCLOUD1: always use the Kavita
|
|
UI (via SSH port-forward to localhost:5000) rather than direct sqlite edits -
|
|
direct edits to ServerSetting do not survive a restart.
|
|
CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" now works on Kavita
|
|
regardless of which connector (monk/kscloud1) answers.
|
|
|
|
## Kavita cover images missing on kscloud1 - FIXED 2026-06-10
|
|
After the kavita.db sync from monk, kscloud1's db referenced cover image files
|
|
(e.g. `v1_c1.png`..`v10_c10.png` in `ServerSetting`/`Series.CoverImage`) that
|
|
didn't exist on kscloud1's filesystem - kscloud1's `config/covers/` dir was
|
|
empty (monk has 9 files, ~1.3MB). Result: book/series cover thumbnails didn't
|
|
load when kscloud1 served the request. FIX: tar'd monk's
|
|
`~/kitestacks-live/docker/kavita/config/covers/` (owned 1000:1000), scp'd to
|
|
kscloud1, extracted into `/opt/kitestacks/docker/kavita/config/covers/` via a
|
|
throwaway alpine container, `chown -R 1000:1000`. No kavita restart needed -
|
|
covers are served as static files from disk. CONFIRMED BY USER: covers now
|
|
load correctly.
|
|
NOTE: this is another one-time sync (same drift caveat) - if new books/covers
|
|
are added on monk later, they won't appear on kscloud1 unless re-synced
|
|
(covers/ dir + kavita.db + actual book files under library/books, none of
|
|
which exist on kscloud1 per the earlier "stale data" note).
|
|
SECURITY NOTE: postgres/redis on kscloud1 are bound to the Tailscale interface
|
|
IP only (100.123.254.52), not 0.0.0.0 - not exposed to the public internet.
|
|
ROLLBACK: if Tailscale connectivity ever breaks, monk's authentik will fail to
|
|
start (can't reach 100.123.254.52). To roll back: restore monk's
|
|
docker-compose.yml from git/backup to use local postgresql/redis services
|
|
again, restart monk's old authentik-postgres/authentik-redis containers
|
|
(`docker start authentik-postgres authentik-redis` in
|
|
~/kitestacks-live/docker/authentik), `docker compose up -d`. Note this would
|
|
mean monk's authentik db is now STALE (kscloud1's shared db has any logins/
|
|
changes since 2026-06-10) - would need a fresh pg_dump sync from kscloud1 first.
|
|
|
|
## kscloud1 ufw blocks docker-bridge -> host port 8000 (metrics API) - FIXED 2026-06-10
|
|
kscloud1 has ufw active with `default deny incoming/routed`. The
|
|
kitestacks-metrics-api-backup container (network_mode: host, binds 0.0.0.0:8000)
|
|
was unreachable from homepage-backup via `host.docker.internal:8000` (TCP
|
|
timeout, not refused -> ufw drop), causing the homepage System Status widget to
|
|
show 0%/"Offline" when kscloud1 served the request. FIXED by adding:
|
|
`sudo ufw allow from 172.16.0.0/12 to any port 8000 proto tcp` (covers all
|
|
docker bridge subnets on this host: 172.17-172.29.x.x). Verified
|
|
homepage-backup -> host.docker.internal:8000/api/metrics now returns real
|
|
CPU/RAM/storage/network data. kscloud1 sudo password is in the "kscloud1
|
|
access" section above - needed `echo PASS | sudo -S <cmd>` (no askpass helper,
|
|
non-interactive sudo via -S works fine).
|
|
|
|
## Portal SSO/coming-soon push + Karakeep fix + OpenProject EE blocker + Portainer SSO setup (2026-06-10)
|
|
Per user's original request: SSO for grafana/prometheus/portainer/cloudflare/uptime-kuma
|
|
(authentik=source), KITEAI/LITELLM/OPENROUTER -> "coming soon", SSO for openproject/forgejo/
|
|
karakeep, FluxCD card in AI&Automation. Hard constraint: user does NOT use Cloudflare Zero
|
|
Trust/Access ("costs money") - any Cloudflare work must avoid those products (note: the
|
|
Tunnel UI itself lives under Cloudflare's "Zero Trust" dashboard section, but configuring a
|
|
Tunnel public hostname there is NOT the same as enabling Zero Trust/Access - fine to use).
|
|
|
|
### Portal UI changes - DEPLOYED to all 3 copies, verified live
|
|
Edited the AI & AUTOMATION panel (`cards cards-3` -> `cards cards-2`, now 2x2):
|
|
Kite AI and OpenRouter cards changed from external links to
|
|
`href="#" data-coming-soon="1"` (LiteLLM was already coming-soon); added a 4th
|
|
card "FluxCD" / "GitOps Automation" using `/images/icons/fluxcd.png`, also
|
|
coming-soon (automation scripts with FluxCD+Prometheus+Grafana+node-exporter
|
|
are a future project). Applied identically to:
|
|
- `~/kitestacks-live/docker/kitestacks-portal-test/public/index.html` (monk, dev, port 3008)
|
|
- `~/kitestacks-live/docker/kitestacks-portal/public/index.html` (monk, LIVE, served by
|
|
"homepage" container 3005->3000 - this is the file that backs www.kitestacks.com)
|
|
- `/opt/kitestacks/docker/www-backup/kitestacks-portal/public/index.html` (kscloud1,
|
|
served by `homepage-backup` port 3015)
|
|
Verified `https://www.kitestacks.com` returns "FluxCD" consistently (6/6 requests
|
|
across both connectors).
|
|
NOTE: Portainer card on the live portal is currently `data-coming-soon="1"` -
|
|
update this to a real `href="https://portainer.kitestacks.com"` link (remove
|
|
data-coming-soon) once the Portainer SSO manual steps below are completed.
|
|
NOTE 2: "cloudflare should all be in the networking side" from the original
|
|
request was never resolved - Cloudflare card is still in the INFRASTRUCTURE
|
|
panel, not moved/renamed to a "NETWORKING" panel. Ambiguous, deprioritized,
|
|
not revisited.
|
|
|
|
### Karakeep SSO redirect_uri fix - DONE, confirmed working
|
|
Karakeep uses NextAuth.js with provider id "custom" (not "authentik") - actual
|
|
OAuth callback path is `/api/auth/callback/custom`, but Authentik's Karakeep
|
|
OAuth2Provider's `_redirect_uris` had the wrong path -> "Redirect URI Error".
|
|
FIX: direct Postgres UPDATE to
|
|
`authentik_providers_oauth2_oauth2provider._redirect_uris` (JSON column) on
|
|
the shared kscloud1 authentik-postgres (100.123.254.52), wrapped in explicit
|
|
`BEGIN; UPDATE ...; COMMIT;` (a bare single-statement -c "UPDATE..." reported
|
|
"UPDATE 1" but did NOT persist on first attempt - cause unclear, explicit
|
|
transaction fixed it). After the DB write, restarted authentik+authentik-worker
|
|
on BOTH monk and kscloud1 and polled
|
|
`docker inspect --format '{{.State.Health.Status}}'` until both reported
|
|
"healthy" (~50s) before retesting - first retest hit a transient 502 because
|
|
kscloud1's authentik was still "starting". CONFIRMED: Authentik now serves the
|
|
login page (not "Redirect URI Error") for Karakeep SSO.
|
|
PG_PASS GOTCHA: `~/kitestacks-live/docker/authentik/.env` PG_PASS value ends in
|
|
`=` - extract with `cut -d= -f2-` (NOT `-f2`, which truncates the trailing `=`
|
|
and causes "password authentication failed").
|
|
REUSABLE PATTERN for any future direct Authentik DB edit: (1) wrap writes in
|
|
explicit BEGIN/COMMIT, (2) restart authentik+authentik-worker on BOTH monk and
|
|
kscloud1, (3) wait for health=healthy on both before testing.
|
|
|
|
### OpenProject SSO - config bug fixed, but BLOCKED by Enterprise licensing (no further action possible)
|
|
`~/kitestacks-live/docker/openproject/docker-compose.yml` env vars were wrong in
|
|
two ways: (1) extra "PROVIDERS_" segment in var names caused
|
|
`seed_oidc_provider = {"providers": {"authentik": {...}}}` instead of
|
|
`{"authentik": {...}}`, producing a broken stub provider record (slug=
|
|
"providers", id=1, since deleted via Rails runner); (2) `discovery_endpoint`
|
|
isn't read by `ConfigurationMapper` at all - replaced with explicit
|
|
ISSUER/AUTHORIZATION__ENDPOINT/TOKEN__ENDPOINT/USERINFO__ENDPOINT/
|
|
END__SESSION__ENDPOINT/JWKS__URI vars (current docker-compose.yml has the
|
|
corrected version, see file - all derived from
|
|
`https://auth.kitestacks.com/application/o/openproject/.well-known/openid-configuration`).
|
|
After fixing both, the seeder correctly creates provider slug="authentik",
|
|
available=true, all fields correct - BUT the SSO button still does not appear
|
|
on /login. CONFIRMED ROOT CAUSE (terminal, source-code-verified): OpenProject
|
|
CE 2025/v15's OmniAuth SSO strategy
|
|
(`OpenProject::Plugins::AuthPlugin`/`OpenIDConnect`) AND SAML
|
|
(`auth_saml/lib/open_project/auth_saml/engine.rb`, `enterprise_feature:
|
|
"sso_auth_providers"`) are BOTH gated behind an Enterprise Edition license -
|
|
"OmniAuth SSO strategy ... is only available for Enterprise Editions". No
|
|
app/config-level workaround exists. Only remaining options: buy EE license, OR
|
|
put a forward-auth proxy (oauth2-proxy / Authentik embedded outpost) in front
|
|
of OpenProject - DEFERRED along with Prometheus/Uptime Kuma proxy work (see
|
|
below) until Oracle VPS topology is decided.
|
|
OpenProject container is healthy, `/login` returns 200, no projects yet.
|
|
|
|
### Portainer SSO - Authentik side DONE, two manual steps PENDING (not yet done by user)
|
|
Per user: "yes continue with portainer" / "yes but make sure it is still
|
|
secure" (approved exposing Portainer publicly via a NEW Cloudflare Tunnel
|
|
hostname, with explicit requirement to keep it secure -> access restricted to
|
|
the `homelab-admin` Authentik group).
|
|
Created via `docker exec authentik ak shell` (Django ORM, no Authentik API
|
|
token configured) on kscloud1's shared authentik-postgres:
|
|
- OAuth2Provider "Portainer": client_id=`portainer`,
|
|
client_secret=`wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF`,
|
|
provider_id=9, redirect_uri=`https://portainer.kitestacks.com` (strict),
|
|
scopes openid/email/profile, sub_mode=user_email, signing key + flows copied
|
|
from existing providers (same pattern as Karakeep/Grafana).
|
|
- Application "Portainer" (slug="portainer", meta_launch_url=
|
|
`https://portainer.kitestacks.com`).
|
|
- PolicyBinding restricting the Portainer application to Authentik group
|
|
`homelab-admin` (UUID e21b0aa5-62e7-4b3a-8302-130b0ae148a5) - this is the
|
|
"make sure it is still secure" piece (only homelab-admin members can SSO in).
|
|
- Verified discovery doc resolves:
|
|
`https://auth.kitestacks.com/application/o/portainer/.well-known/openid-configuration`.
|
|
PENDING MANUAL STEPS (user must do via UI - confirmed `portainer.kitestacks.com`
|
|
still returns `000` as of 2026-06-10):
|
|
1. Cloudflare dashboard -> Tunnel -> add Public Hostname `portainer.kitestacks.com`
|
|
-> service `https://portainer:9443` (HTTPS), enable "No TLS Verify". (This is
|
|
in the Tunnel config UI, which Cloudflare happens to host under the "Zero
|
|
Trust" nav section, but adding a Tunnel hostname is NOT enabling Zero
|
|
Trust/Access - does not violate the no-Zero-Trust constraint.)
|
|
2. In Portainer -> Settings -> Authentication -> OAuth (Provider: Custom), on
|
|
BOTH monk's and kscloud1's SEPARATE Portainer instances, configure:
|
|
- Client ID: `portainer`
|
|
- Client Secret: `wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF`
|
|
- Authorization URL: `https://auth.kitestacks.com/application/o/authorize/`
|
|
- Access Token URL: `https://auth.kitestacks.com/application/o/token/`
|
|
- Resource/Userinfo URL: `https://auth.kitestacks.com/application/o/userinfo/`
|
|
- Redirect URL: `https://portainer.kitestacks.com`
|
|
- Logout URL: `https://auth.kitestacks.com/application/o/portainer/end-session/`
|
|
- Scopes: `openid email profile`, User identifier claim: `email`
|
|
AFTER both steps done: update the live portal's Portainer card (in the 3 files
|
|
above) from `data-coming-soon="1"` to a real
|
|
`href="https://portainer.kitestacks.com" target="_blank" rel="noopener"` link.
|
|
|
|
### App-level SSO status summary (end of 2026-06-10 session)
|
|
Grafana: working (pre-existing). Forgejo: working (pre-existing). Karakeep:
|
|
fixed this session, working. OpenProject: blocked by EE license (terminal at
|
|
app level). Portainer: Authentik side done, waiting on user's 2 manual steps
|
|
above. Prometheus + Uptime Kuma: DEFERRED - neither has native OAuth, need a
|
|
forward-auth proxy (oauth2-proxy or Authentik embedded outpost) - deferred per
|
|
user's "ok lets do smaller app level" (hold new infra until Oracle VPS decided).
|
|
Cloudflare itself: no SSO concept applicable (it's Cloudflare's own dashboard
|
|
managed outside the lab login) - was always about the portal's Cloudflare card
|
|
placement, see "Portal UI changes" note above.
|
|
|
|
### Uptime Kuma + Authentik SSO resumed on monk (2026-06-15)
|
|
User confirmed the next task is setting up Uptime Kuma with Authentik SSO in
|
|
the main KiteStacks lab, and explicitly requested saving progress to
|
|
`~/claude-memory` and pushing to the Forgejo `kenpat/claude-memory` repo as we
|
|
go.
|
|
|
|
Verified current live state on monk before making changes:
|
|
- `uptime-kuma` container is running and healthy, published on host port
|
|
`3001`, image `louislam/uptime-kuma:latest`.
|
|
- Installed Uptime Kuma version inside the container is `1.23.17`.
|
|
- Uptime Kuma compose file is
|
|
`~/kitestacks-live/docker/uptime-kuma/docker-compose.yml`, using external
|
|
Docker volume `uptime-kuma:/app/data` and networks `default` + external
|
|
`kitestacks`.
|
|
- Uptime Kuma SQLite DB path inside container is `/app/data/kuma.db`; tables
|
|
include `user`, `setting`, `monitor`, `heartbeat`, `status_page`,
|
|
`notification`, `api_key`, and related monitor/status tables. No obvious
|
|
native OAuth/OIDC tables were present in the initial schema list.
|
|
- Grafana is already configured for Authentik generic OAuth in
|
|
`~/kitestacks-live/docker/grafana/docker-compose.yml` with Authentik public
|
|
authorize URL and internal token/userinfo URLs.
|
|
- `authentik` is healthy; `authentik-worker` currently shows unhealthy in
|
|
`docker ps` even though it has been running for ~35h. Check logs/health
|
|
before relying on new Authentik-side automation.
|
|
- Existing Authentik objects were found for Uptime Kuma:
|
|
- Application slug `uptime-kuma`, name `Uptime Kuma`, provider id `7`.
|
|
- ProxyProvider `Uptime Kuma`, external host `https://status.kitestacks.com`,
|
|
internal host `http://uptime-kuma:3001`, mode `proxy`.
|
|
- Embedded proxy outpost already includes providers `Karakeep`,
|
|
`Uptime Kuma`, and `LiteLLM`.
|
|
- `https://status.kitestacks.com` still routes directly to Kuma as of
|
|
2026-06-15: public curl gets Kuma's `/dashboard` redirect and 200 response,
|
|
not an Authentik authorization flow. Cloudflare tunnel route still needs to
|
|
be changed from direct Kuma to the Authentik embedded outpost/server.
|
|
- Security fix applied 2026-06-15: created PolicyBinding
|
|
`6f2ac876-2f47-473d-986d-d7c5d2a3214e` from the Uptime Kuma application to
|
|
Authentik group `homelab-admin`, enabled, order 0. This matches the Portainer
|
|
restriction pattern.
|
|
- Cloudflared is remote-managed: container command is `tunnel --no-autoupdate
|
|
run`, no local ingress config exists, and the compose file stores a
|
|
`TUNNEL_TOKEN`. Do not print that token; treat it as sensitive. Routing
|
|
changes must be made through Cloudflare's tunnel API/dashboard unless a
|
|
suitable Cloudflare API token is available locally.
|
|
- Local validation after the Authentik binding: `curl -I -H 'Host:
|
|
status.kitestacks.com' http://localhost:9001` returns `302` to
|
|
`https://status.kitestacks.com/outpost.goauthentik.io/start?...`, proving
|
|
the embedded outpost/proxy provider works when traffic reaches Authentik.
|
|
- No suitable Cloudflare API token was found during the local search; only the
|
|
cloudflared connector tunnel token is present. Remaining blocker is changing
|
|
the Cloudflare Tunnel public hostname for `status.kitestacks.com` from
|
|
`http://uptime-kuma:3001` to `http://authentik:9000` (or equivalent
|
|
Authentik service target in the Tunnel UI).
|
|
- Correction after user tested: user does NOT want front-door proxy behavior
|
|
for Uptime Kuma. Desired UX is an in-app "single sign on" button on the
|
|
Uptime Kuma login screen, like Grafana/Forgejo style native OAuth. Authentik
|
|
proxy redirect is not acceptable for this requirement.
|
|
- Confirmed in the installed Uptime Kuma 1.23.17 frontend:
|
|
`/app/src/components/Login.vue` only renders username, password, remember-me,
|
|
and login submit controls. No native OAuth/OIDC/SSO button exists in this
|
|
version's login component, and local source search only found monitor OAuth
|
|
client-credentials support, not app login SSO.
|
|
- If staying on Uptime Kuma 1.23.17, revert Cloudflare route for
|
|
`status.kitestacks.com` back to `http://uptime-kuma:3001`; otherwise users
|
|
get Authentik first and then still see Kuma's local login. Native in-app SSO
|
|
would require an Uptime Kuma version/plugin/fork with login OIDC support or
|
|
custom app code, not the Authentik proxy provider.
|
|
- User reset the Cloudflare route back to `http://uptime-kuma:3001` and asked
|
|
to continue with an in-app Authentik button. Upstream latest checked via
|
|
GitHub API: Uptime Kuma latest release is `2.4.0` (published 2026-05-31) and
|
|
upstream `src/components/Login.vue` still has only username/password login,
|
|
no native OAuth/OIDC button. Proceeded with a custom overlay patch.
|
|
- Custom native Authentik SSO overlay deployed on BOTH active tunnel backends
|
|
(monk and kscloud1) so public load-balanced traffic behaves consistently:
|
|
- monk path: `~/kitestacks-live/docker/uptime-kuma/`
|
|
- kscloud1 path: `/opt/kitestacks/docker/uptime-kuma/`
|
|
- backend preload module:
|
|
`custom/server/authentik-sso.js`
|
|
- frontend mounted files:
|
|
`custom/dist/index.html`, `index.html.gz`, `index.html.br`
|
|
- compose now sets `NODE_OPTIONS=--require /app/custom/server/authentik-sso.js`,
|
|
loads `.env.sso`, and bind-mounts the custom files over Kuma's built HTML.
|
|
- Authentik native OAuth provider/application created:
|
|
- OAuth2Provider name `Uptime Kuma Native`, provider id `12`
|
|
- Application slug `uptime-kuma-native`, name `Uptime Kuma Native SSO`
|
|
- Client ID `uptime-kuma-native`
|
|
- Redirect URI `https://status.kitestacks.com/auth/authentik/callback`
|
|
- Restricted to Authentik group `homelab-admin` via PolicyBinding
|
|
`2e1eaa95-b397-4c4f-bfc7-abb337906cf3`
|
|
- Client secret is stored only in each host's `.env.sso`; do not print it.
|
|
- Custom flow behavior:
|
|
- Login page injects a `Sign in with Authentik` button linking to
|
|
`/auth/authentik`.
|
|
- Backend starts Authentik OIDC, validates callback state, fetches userinfo,
|
|
maps the login to existing Kuma user `kenpat`, issues Kuma's normal JWT,
|
|
then redirects to `/?authentik_token=<token>`.
|
|
- Frontend one-time script stores the JWT in `localStorage.token`, removes
|
|
the URL token, and redirects to `/dashboard`, letting Kuma's normal
|
|
`loginByToken` flow establish the session.
|
|
- Verification 2026-06-15:
|
|
- monk local `/dashboard` HTML contains `Sign in with Authentik`,
|
|
`/auth/authentik`, and `authentik_token`.
|
|
- kscloud1 local `/dashboard` HTML contains the same and `/auth/authentik`
|
|
redirects to Authentik with client_id `uptime-kuma-native`.
|
|
- Public repeated check:
|
|
`for i in 1 2 3 4 5 6; do curl -sSL --compressed https://status.kitestacks.com/dashboard | grep -q "Sign in with Authentik"; done`
|
|
returned `button` for all 6 attempts, confirming both active connectors
|
|
serve the button.
|
|
- Post-test screenshot showed Uptime Kuma login page with red banner "Lost
|
|
connection to the socket server. Reconnecting..." after clicking the SSO
|
|
button. Root cause: active-active JWT mismatch. Uptime Kuma JWTs include a
|
|
signature using `setting.jwtSecret`; monk and kscloud1 had matching user
|
|
password hashes but different JWT secrets, so a token minted by one backend
|
|
failed if the browser's websocket connected to the other backend. Fixed
|
|
2026-06-15 by copying monk's exact `jwtSecret` into kscloud1's
|
|
`/app/data/kuma.db` using base64 transport (avoid shell expansion of secret
|
|
chars), then restarting kscloud1 Uptime Kuma. Verified both hashes now match:
|
|
`jwtSecret` length 60, sha3 prefix `FA67E6E9EDCC8E1D`. Public button check
|
|
still returns `button` 6/6. If a browser still has a pre-fix bad token in
|
|
localStorage, clear site data or click the Authentik button again to mint a
|
|
fresh token.
|
|
- User retested and still saw the socket reconnect banner. Follow-up finding:
|
|
public Uptime Kuma frontend was using Socket.IO's default long-polling-first
|
|
transport. In the active-active Cloudflare Tunnel setup, polling requests can
|
|
bounce between monk and kscloud1 before a socket session is established,
|
|
causing reconnect loops before Kuma even logs `Login by token`.
|
|
- Fix applied 2026-06-15 on BOTH monk and kscloud1: copied the built frontend
|
|
bundle `index-BBxTfFCS.js` into the overlay and patched the minified socket
|
|
call from `Ze=Nc(n)` to `Ze=Nc(n,{transports:["websocket"]})`. Regenerated
|
|
`.gz` and `.br` variants and mounted all three over
|
|
`/app/dist/assets/index-BBxTfFCS.js*` in both compose files. Restarted both
|
|
Uptime Kuma containers.
|
|
- Verification after websocket-only patch:
|
|
- monk local asset contains `transports:["websocket"]`
|
|
- kscloud1 local asset contains `transports:["websocket"]`
|
|
- public repeated asset check over `https://status.kitestacks.com/assets/index-BBxTfFCS.js`
|
|
found `transports:["websocket"]` 6/6, confirming both tunnel backends serve
|
|
the patched client bundle.
|
|
- User still saw the same issue after trying another browser. Follow-up:
|
|
websocket connections were reaching Kuma, but logs showed no `Login by token`,
|
|
so the handoff from Authentik callback to Kuma storage was unreliable. Changed
|
|
the SSO callback from `/?authentik_token=<jwt>` URL handoff to a short-lived
|
|
readable cookie `uk_authentik_token` plus redirect directly to `/dashboard`.
|
|
Updated injected HTML to read that cookie before Kuma initializes, store the
|
|
token in `localStorage.token`, set `localStorage.remember=1`, then delete the
|
|
cookie. This avoids long-token URL handling.
|
|
- Important operational gotcha: Uptime Kuma caches `index.html` in memory at
|
|
startup. After changing the mounted `index.html`/compressed variants, `docker
|
|
compose up -d` was not enough because containers stayed "Running"; had to run
|
|
`docker compose restart uptime-kuma` on BOTH monk and kscloud1 to reload the
|
|
HTML into memory.
|
|
- Verification after cookie handoff + explicit restarts:
|
|
- monk local `/dashboard` HTML contains `uk_authentik_token`, `authentik_token`,
|
|
and `Sign in with Authentik`.
|
|
- kscloud1 local `/dashboard` HTML contains the same.
|
|
- public repeated check for `uk_authentik_token` over
|
|
`https://status.kitestacks.com/dashboard` returned `cookie-handoff` 6/6.
|
|
- User confirmed after retest: Uptime Kuma Authentik SSO button works.
|
|
|
|
### Uptime Kuma monitors mirrored into Prometheus/Grafana (2026-06-15)
|
|
User asked to set up the same monitors currently in Uptime Kuma for Grafana and
|
|
Prometheus. Existing Uptime Kuma monitor list at the time:
|
|
- `T14 Deb Assassin`: ping `127.0.0.1`
|
|
- `HomeRouter`: ping `192.168.1.254`
|
|
- `Google DNS`: ping `8.8.8.8`
|
|
- `TailScale`: ping `100.90.13.55`
|
|
|
|
Implemented on monk's live Prometheus/Grafana stack:
|
|
- Added `prom/blackbox-exporter` service to
|
|
`~/kitestacks-live/docker/prometheus/docker-compose.yml`.
|
|
- Added blackbox config
|
|
`~/kitestacks-live/docker/prometheus/blackbox.yml` with ICMP module
|
|
(`preferred_ip_protocol: ip4`, timeout 5s).
|
|
- Added Prometheus scrape job `uptime-kuma-ping-probes` in
|
|
`~/kitestacks-live/docker/prometheus/prometheus.yml`, using `/probe` with
|
|
`module=icmp` and labels `monitor_name` matching the Uptime Kuma names.
|
|
- Added Grafana provisioned dashboard
|
|
`~/kitestacks-live/docker/grafana/provisioning/dashboards/kitestacks-uptime-probes.json`
|
|
titled `KiteStacks Uptime Probes`, with stat/timeseries panels for
|
|
`probe_success{job="uptime-kuma-ping-probes"}` and
|
|
`probe_duration_seconds{job="uptime-kuma-ping-probes"}`.
|
|
- Ran `docker compose up -d` in the Prometheus directory, pulled/started
|
|
`blackbox-exporter`, restarted Prometheus, and restarted Grafana.
|
|
|
|
Verification:
|
|
- Prometheus config validates with `promtool check config`.
|
|
- Prometheus active targets include all four `uptime-kuma-ping-probes`.
|
|
- Query result for `probe_success{job="uptime-kuma-ping-probes"}`:
|
|
`Google DNS=1`, `T14 Deb Assassin=1`, `HomeRouter=0`, `TailScale=0`.
|
|
The two failures match Kuma's existing failing ping behavior from inside the
|
|
container/network namespace.
|
|
- Grafana logs show dashboard provisioning completed without dashboard errors
|
|
(only unrelated bundled plugin permission warnings).
|
|
|
|
### Desktop widget for the same monitor set (2026-06-15)
|
|
User asked for a Rainmeter-like desktop widget on Debian 13 that can show the
|
|
same Uptime Kuma monitor state in real time.
|
|
|
|
Created a local Conky-based widget scaffold in the desktop user's home:
|
|
- `~/.local/bin/kitestacks-uptime-widget.sh`
|
|
- `~/.config/conky/kitestacks-uptime.conf`
|
|
|
|
Behavior:
|
|
- Polls Prometheus for `probe_success` and `probe_duration_seconds` from the
|
|
`uptime-kuma-ping-probes` job.
|
|
- Defaults to `http://192.168.1.205:9090`, with `PROM_URL`
|
|
override support.
|
|
- Prints the four Kuma monitor names, state, latency, and a summary line.
|
|
- Degrades cleanly with `Prometheus unavailable at ...` when the endpoint
|
|
cannot be reached.
|
|
|
|
Note: Conky is the closest direct Rainmeter-style equivalent for Debian/Linux
|
|
desktop widgets; `eww` is the more modern alternative if the desktop session is
|
|
Wayland-first and the user prefers GTK/Rust widgets instead of a classic
|
|
desktop overlay.
|
|
|
|
Debian 13 package note:
|
|
- `conky` is a virtual package in trixie.
|
|
- Install `conky-all` for the full desktop widget experience:
|
|
`sudo apt update && sudo apt install conky-all`
|
|
|
|
Connectivity note:
|
|
- The laptop could not reach Prometheus at `192.168.1.205:9090`, which means
|
|
the widget can only work from a host that can reach the homelab LAN or a
|
|
public/tunneled Prometheus endpoint.
|
|
- The existing KiteStacks docs mark Prometheus as excluded from the Cloudflare
|
|
tunnel, so there is no known public Prometheus URL to target yet.
|
|
- The desktop widget script now defaults to `https://prometheus.kitestacks.com`
|
|
and can send `CF-Access-Client-Id` / `CF-Access-Client-Secret` headers if the
|
|
hostname is protected by Cloudflare Access.
|
|
|
|
Cyberpunk widget styling:
|
|
- Conky panel tuned to the wallpaper palette with black base and neon
|
|
cyan/magenta accents.
|
|
- Header uses `#ff4df0` pink and `#2de0ff` blue.
|
|
- Monitor rows color-code `UP` as cyan and `DOWN` as pink for fast scanning.
|
|
- `conky.text` now uses `execpi` so the helper's parsed color markup renders as
|
|
one combined widget instead of only the title line.
|
|
- The screenshot also showed a separate default Conky panel on the left; that
|
|
is not part of the uptime widget itself.
|
|
- Added a unified Conky desktop config at `~/.conkyrc` plus an autostart
|
|
wrapper that kills stray Conky instances and launches the single combined
|
|
panel.
|
|
|
|
Important security hygiene: local git remote for `~/claude-memory` contains an
|
|
HTTP token in the URL; do not print it in summaries. Prefer redacted URLs in
|
|
handoffs.
|
|
|
|
### Oracle VPS migration - PLANNED, upcoming (stated 2026-06-11)
|
|
User confirmed on 2026-06-11: "we are going to switch things soon from hetzner
|
|
cloud to oracle soon." -> kscloud1 (Hetzner, 5.78.233.28) is intended to be
|
|
REPLACED by an Oracle Cloud VPS in the near future ("soon", no firm date yet).
|
|
Originally raised 2026-06-10 as exploratory ("how easy would it be to move
|
|
everything to oracle vps after?"), now an actual plan.
|
|
Implication: avoid investing further one-off/manual config work that's hard to
|
|
redo (e.g. more one-time DB syncs, hand-edited sqlite, etc.) on kscloud1 if
|
|
avoidable - prefer changes that are easy to replicate on a new host. When the
|
|
Oracle VPS is provisioned, plan to follow the same pattern as the kscloud1
|
|
cloud-failover build-out (new Cloudflare Tunnel connector + full service
|
|
replicas + shared Authentik/Postgres/Redis over Tailscale + the Forgejo
|
|
FORGEJO_API_BASE-over-Tailscale pattern for the portal's Recent Activity, see
|
|
"Recent Activity" fix below) - then retire kscloud1 the same way assassin/T14
|
|
was retired (decommission once Oracle replica verified working).
|
|
|
|
## Prior migration gotchas (monk, kept for reference - see git history/old notes if needed)
|
|
- rsync --files-from recursion bug, bind-mount postgres dirs come over empty as
|
|
non-root (use pg_dumpall/pg_dump --clean from running container instead),
|
|
pg_dumpall --clean across template1 breaks on client/server version mismatch
|
|
(use single-db pg_dump+psql instead), grafana data dir needs chown 472:472,
|
|
kite-litellm needed manual `docker network connect kitestacks kite-litellm`.
|
|
|
|
## 2026-06-12: SSO fixes + Portainer deployed on kscloud1
|
|
|
|
### Root cause: monk reconnect race condition
|
|
When monk goes offline (user travels) and reconnects, Cloudflare starts routing
|
|
some token exchange requests to monk while codes were created on kscloud1 during
|
|
the offline window. Auth codes had a 60-second TTL, which expired before monk's
|
|
Authentik fully started (~5 min startup). FIX: increased `access_code_validity`
|
|
from `minutes=1` to `minutes=10` for ALL 9 OAuth2 providers in the shared Postgres
|
|
DB. This gives enough buffer for monk's containers to start before codes expire.
|
|
Command used (via python:3-alpine container):
|
|
`docker run --rm --network host -v /tmp/fix_auth.py:/fix.py python:3-alpine sh -c ...`
|
|
connecting to shared Postgres at 100.123.254.52.
|
|
|
|
### Karakeep redirect_uri reverted and re-fixed
|
|
The Karakeep OAuth2Provider `_redirect_uris` had reverted back to the proxy pattern
|
|
(`/outpost.goauthentik.io/callback?...`) instead of the correct NextAuth callback
|
|
(`https://links.kitestacks.com/api/auth/callback/custom`). This caused "Redirect URI
|
|
Error" from Authentik whenever SSO was attempted. Root cause unknown (possibly an
|
|
Authentik blueprint or UI save that regenerated/overrode the field). FIX: same
|
|
Postgres UPDATE pattern. WATCH: if this reverts again, check Authentik blueprints
|
|
or if someone modified the Karakeep provider via the Authentik admin UI.
|
|
|
|
### Portainer deployed on kscloud1
|
|
Created `/opt/kitestacks/docker/portainer/docker-compose.yml` (same image/config as
|
|
monk's portainer). Container running as `portainer`, port 9443:9443, on `kitestacks`
|
|
network. Volume is local (NOT shared with monk - fresh Portainer instance).
|
|
STILL PENDING (user action in Cloudflare dashboard):
|
|
- Tunnel ID: 5e60ea8e-a543-49b6-bab5-325f39441e00, Account: d0bb7673333fcd794622956f1662f785
|
|
- Add hostname `portainer.kitestacks.com` → service `https://portainer:9443`, No TLS Verify
|
|
STILL PENDING (user action in both Portainer UIs): configure OAuth (see prior notes
|
|
in "Portainer SSO" section above for exact credentials).
|
|
Portal card update (3 files) also still pending until tunnel+OAuth done.
|
|
|
|
## Phase 2 Planned: Obsidian Mind Map → HTML Mind Map Sync
|
|
User wants to create an Obsidian mind map of the KiteStacks homelab that syncs/exports to a live HTML mind map embedded in the homelab portal or a standalone page. To be built after full Obsidian+samurai setup is complete.
|
|
|
|
## 2026-06-13: OpenProject removed + Oracle VPS migration started
|
|
|
|
### OpenProject REMOVED permanently
|
|
OpenProject requires Enterprise Edition license for SSO (confirmed last session).
|
|
Removed from local stack (monk):
|
|
- Docker volume `openproject_openproject_assets` deleted
|
|
- `/home/kenpatmonk/kitestacks-live/docker/openproject/` directory removed (pgdata dir
|
|
needed sudo — user ran manually; pgdata was owned by container UID mapped to `avahi`)
|
|
- NOT deploying on Oracle VPS
|
|
- tasks.kitestacks.com subdomain is now dead — update Cloudflare/portal accordingly
|
|
TODO: remove `apps/openproject/` from kitestacks-homelab Forgejo repo once user can log in.
|
|
|
|
### Forgejo issues found + partially fixed (2026-06-13)
|
|
Forgejo login page has two issues:
|
|
1. URL banner: "configured to be served on http://5.78.233.28:3000/" — caused by kscloud1's
|
|
Forgejo having wrong ROOT_URL. kscloud1 Forgejo has only 1 repo (separate DB from monk's
|
|
13-repo instance). Cloudflare tunnel load-balances between monk and kscloud1 Forgejo.
|
|
FIX PENDING: stop Forgejo on kscloud1 (or fix its ROOT_URL). Deferred — do during Oracle migration.
|
|
2. SSO button says "Proceed with OpenID" instead of "Authentik".
|
|
PARTIAL FIX: renamed login_source from `authentik` → `Authentik` via admin CLI:
|
|
`docker exec -u git forgejo /app/gitea/gitea admin auth update-oauth --id 1 --name Authentik ...`
|
|
Provider type remains `openidConnect` — button text may still say "OpenID" (depends on
|
|
Forgejo 11 template behavior). User to verify after refresh. Full fix may require admin UI
|
|
once user can log into Forgejo.
|
|
Forgejo DB: 13 repos under `kenpat`, 1 user (kenpat, admin, active, no 2FA).
|
|
Forgejo login: username `kenpat`, direct password login works on the same page.
|
|
|
|
### kitestacks-homelab repo: apps/forgejo/docker-compose.yml has wrong ROOT_URL
|
|
`FORGEJO__server__ROOT_URL=http://192.168.1.205:3006` — old local IP, never updated.
|
|
The LIVE local stack (`~/kitestacks-live/docker/forgejo/docker-compose.yml`) is correct
|
|
(`https://gitforge.kitestacks.com/`). The repo copy needs updating.
|
|
TODO: fix and commit once user can log in and clone the repo.
|
|
|
|
### Oracle VPS migration plan (kscloud1 → Oracle Cloud)
|
|
Goal: replace Hetzner kscloud1 (5.78.233.28, $14.50/mo) with Oracle Cloud ARM VPS ($8.50/mo).
|
|
Oracle instance: Ampere A1 Flex, 4 OCPU / 24 GB RAM, Chicago region (us-chicago-1).
|
|
Status as of 2026-06-13: user is provisioning — hit "no capacity" in Chicago.
|
|
Workarounds tried: capacity not available for 4 OCPU config. Options:
|
|
- Try smaller shape (1 OCPU / 6 GB), resize after provisioning
|
|
- Subscribe to another region (Frankfurt, Osaka, Toronto have better A1 availability)
|
|
- Keep retrying (capacity opens randomly, early UTC morning tends to be better)
|
|
|
|
ARM64 compatibility analysis (all images verified):
|
|
- ✅ All services ARM64-compatible EXCEPT OSticket
|
|
- ❌ OSticket (`campbellsoftwaresolutions/osticket`) — x86 only
|
|
FIX: enable QEMU binfmt emulation on Oracle ARM host, run with `--platform linux/amd64`
|
|
Performance acceptable for a ticket system.
|
|
- ⚠️ Shaarli — verify ARM64 at deploy time
|
|
|
|
Services to deploy on Oracle VPS (OpenProject EXCLUDED):
|
|
authentik, bookstack, cloudflared, forgejo, grafana, homepage/portal,
|
|
karakeep (+meilisearch +chrome), kavita, kite-ai (litellm+openwebui),
|
|
linkding, osticket, portainer, prometheus+node-exporter, shaarli, uptime-kuma
|
|
|
|
Migration phases:
|
|
1. Oracle VPS provisioning (in progress)
|
|
2. Oracle initial setup: Ubuntu 22.04 ARM64, Docker, iptables flush (Oracle blocks by default),
|
|
QEMU binfmt for OSticket x86 emulation
|
|
3. Deploy full stack — fix Forgejo ROOT_URL correctly from day one
|
|
4. Connect cloudflared on Oracle to KiteStacks tunnel (same TUNNEL_TOKEN)
|
|
5. Verify all services, then remove kscloud1 from tunnel + cancel Hetzner
|
|
NOTE: same active-active pattern as kscloud1 — shared Authentik Postgres+Redis over
|
|
Tailscale, same TUNNEL_TOKEN, fresh DBs for stateful apps except identity (authentik/kavita).
|
|
IMPORTANT Oracle gotcha: Ubuntu on Oracle has iptables rules that block all traffic at boot
|
|
even after Security List rules are opened. Must flush iptables as part of initial setup.
|
|
|
|
## osTicket deployed on monk + kscloud1 (found 2026-06-13/14, installed ~2026-06-12)
|
|
osTicket (campbellsoftwaresolutions/osticket image, x86 - runs natively on both hosts,
|
|
no QEMU needed) + nginx proxy + MariaDB 10.11, under
|
|
`~/kitestacks-live/docker/osticket/` (monk) and `/opt/kitestacks/docker/osticket/`
|
|
(kscloud1). `tasks.kitestacks.com` -> "KiteStacks Help Desk", verified HTTP 200.
|
|
Admin: kenpat7177 / kenpat7177@gmail.com. Host ports: monk 8092:8080, kscloud1 8090:8080
|
|
(both nginx -> osticket-app:80). .env (OSTICKET_DB_PASS/ROOT/ADMIN_PASS/INSTALL_SECRET)
|
|
is IDENTICAL on both hosts.
|
|
|
|
### DB unification (2026-06-13/14) - same pattern as Authentik shared-DB fix
|
|
Both hosts originally had their OWN osticket-db (drift risk like pre-fix Kavita). Per
|
|
user request ("database should be accessible from any computer"), unified onto
|
|
kscloud1's osticket-db as canonical:
|
|
- kscloud1 osticket-db: added `ports: - "100.123.254.52:3306:3306"` (Tailscale-only,
|
|
matches authentik-postgres/redis pattern) to
|
|
`/opt/kitestacks/docker/osticket/docker-compose.yml`, `docker compose up -d`.
|
|
- monk: `docker compose stop osticket-db` (left stopped, NOT removed - rollback data
|
|
intact in its volume). Edited `~/kitestacks-live/docker/osticket/docker-compose.yml`:
|
|
removed osticket-db service block, changed osticket-app's `MYSQL_HOST=osticket-db`
|
|
-> `MYSQL_HOST=100.123.254.52`, removed `depends_on: osticket-db`. `docker compose
|
|
up -d osticket-app`.
|
|
- GOTCHA: after recreating osticket-app, the `osticket` nginx proxy container on monk
|
|
returned 502 (cached stale upstream IP for osticket-app from its old container) -
|
|
fixed with `docker restart osticket`. Apply this same restart on kscloud1's `osticket`
|
|
nginx if its osticket-app is ever recreated.
|
|
- Verified: both DBs had identical data before merge (1 ticket, 1 staff/kenpat7177) so
|
|
no data loss either way. tasks.kitestacks.com returns 200 consistently post-merge.
|
|
- Backups: `docker-compose.yml.bak` left in both hosts' osticket dirs.
|
|
|
|
### osticket-capstone Forgejo repo (created 2026-06-13/14)
|
|
New private repo `kenpat/osticket-capstone` on gitforge (created via API using a
|
|
scoped token `claude-capstone-osticket` generated via
|
|
`docker exec -u git forgejo /app/gitea/gitea admin user generate-access-token` on
|
|
monk's forgejo container - token has write:repository,write:user scopes). Holds
|
|
redacted osTicket deployment config + Per Scholas capstone docs/evidence - see
|
|
[[project-per-scholas-capstone]]. NOTE: gitforge.kitestacks.com is also
|
|
active-active load-balanced (monk/kscloud1 separate forgejo DBs) - API calls
|
|
against the public hostname can hit the wrong DB; use monk's local
|
|
`http://localhost:3006` for API operations tied to monk's forgejo data.
|
|
|
|
### Remaining osTicket work
|
|
- Authentik SSO plugin for osTicket staff/agent login (osTicket has no native OIDC,
|
|
needs 3rd-party OAuth2/SAML plugin) - NOT YET DONE.
|
|
- End-user ticket submission uses osTicket's native client portal signup (works
|
|
out of the box, no SSO needed).
|
|
|
|
## 2026-06-14/15: Forgejo sync fixed + osTicket Authentik LDAP SSO complete
|
|
|
|
### Forgejo sync (monk → kscloud1) - FIXED
|
|
- Ran `docker exec -u git forgejo /app/gitea/gitea dump` on monk, scp'd to kscloud1
|
|
- Restored: 13 repos + DB synced, ROOT_URL fixed on kscloud1 to `https://gitforge.kitestacks.com/`
|
|
- kscloud1 Forgejo docker-compose updated (correct ROOT_URL + SSH port 2222)
|
|
- Sync script: `~/kitestacks-live/docker/forgejo/sync-to-cloud.sh` (rsync repos + DB dump)
|
|
- Cron: `0 */6 * * *` runs sync-to-cloud.sh, logs to `/tmp/forgejo-sync.log`
|
|
- Authentik redirect URI fixed: updated `_redirect_uris` in shared Postgres from
|
|
`authentik/callback` → `Authentik/callback` (matched renamed Forgejo source name)
|
|
|
|
### osTicket Authentik LDAP SSO - COMPLETE (2026-06-14/15)
|
|
Uses Authentik's LDAP outpost + osTicket's built-in auth-ldap.phar plugin.
|
|
|
|
**Authentik side:**
|
|
- LDAPProvider "osTicket LDAP" (pk=11, base_dn=DC=ldap,DC=goauthentik,DC=io)
|
|
- Application "osTicket LDAP" (slug=osticket-ldap, backchannel provider)
|
|
- Outpost "osTicket LDAP Outpost" (pk=5c42f5ba-64bd-434e-a47f-7ce9da13227a)
|
|
- Outpost service token: `jjYRKWuGtoeq9r0qeifbCnXGHDjhCJU2MLnkCvMMduIGA1kQKz85qnt7u5Zf`
|
|
- ldap-svc user (search account): DN=`cn=ldap-svc,ou=users,dc=ldap,dc=goauthentik,dc=io`
|
|
password=`IlgQaxBPv9rdoq03CsoY53tH`, member of homelab-admin group
|
|
|
|
**Docker services added on monk:**
|
|
- `~/kitestacks-live/docker/authentik-ldap/docker-compose.yml`
|
|
- `authentik-ldap` (ghcr.io/goauthentik/ldap:2025.2.4) on kitestacks+osticket_default networks
|
|
- `authentik-ldap-proxy` (alpine/socat) bridges port 389→3389 on osticket_default
|
|
so osticket-app can reach standard LDAP port without phar URI workaround
|
|
|
|
**Docker services added on kscloud1:**
|
|
- `/opt/kitestacks/docker/authentik-ldap/docker-compose.yml`
|
|
- Same authentik-ldap container, bound to 100.123.254.52:3389 (Tailscale) + 127.0.0.1:3389
|
|
|
|
**auth-ldap.phar patches (3 patches applied, original backed up as auth-ldap.phar.orig):**
|
|
1. `authentication.php` - `getConnection()`: adds binddn/bindpw from plugin config to
|
|
Net_LDAP2 params so initial connect uses credentials (not anonymous, which Authentik rejects)
|
|
2. `config.php` - validation block: sets include_path to phar's include dir before
|
|
`require_once Net/LDAP2.php` so sub-files resolve correctly in FPM context
|
|
3. ALL `include/Net/LDAP2/*.php` files: guards `require_once 'PEAR.php'` with
|
|
`if (!class_exists('PEAR', false))` to prevent fatal conflict between osTicket's
|
|
`/include/pear/PEAR.php` and PHP global `/usr/local/lib/php/PEAR.php`
|
|
|
|
**osTicket LDAP plugin config (namespace plugin.2 in ost_config):**
|
|
- servers: `authentik-ldap-proxy` (via socat on port 389)
|
|
- bind_dn: `cn=ldap-svc,ou=users,dc=ldap,dc=goauthentik,dc=io`
|
|
- bind_pw: encrypted with `Crypto::encrypt(pass, SECRET_SALT, 'plugin.2')`
|
|
- search_base: `ou=users,dc=ldap,dc=goauthentik,dc=io`
|
|
- schema: auto, auth-staff: 1, auth-client: 0, domain: ldap.goauthentik.io
|
|
|
|
**Staff login:** username=`kenpat7177`, password=Authentik password (reset to `KiteStacks2026!`)
|
|
on `tasks.kitestacks.com/scp/login.php`
|
|
|
|
### Per Scholas IT Support Capstone - IN PROGRESS
|
|
See [[project-per-scholas-capstone]]. Next steps:
|
|
- Create capstone incident tickets in osTicket (5-phase challenge)
|
|
- Set up osTicket user/client portal for non-staff users (Phase 3 end-user access)
|
|
- Each capstone ticket maps to a phase scenario (migration event, incident response, etc.)
|