2026-06-12: full KiteStacks session sync
- KiteStacks migration memory updated: OSticket live, Portainer SSO live on both monk+kscloud1, portainer.kitestacks.com HTTP 200, CF noTLSVerify fixed via API, auth code TTL bumped 1->10min, Karakeep redirect_uri fixed - Oracle Cloud ARM migration next: user provisioning manually (Ampere A1, 4 OCPU, 24GB RAM). OSticket x86-only issue to solve on Oracle side. - CF API token kitestacks-dns-fix needs rolling (was exposed in chat) - Portainer admin creds: monk=admin/n1t1MvVHCdcXWIIu, kscloud1=kenpat7177/same - Added: feedback-forgejo-redaction, project-a-plus-core2 memories
This commit is contained in:
parent
00145b1657
commit
fcd8def71a
4 changed files with 511 additions and 5 deletions
441
project-kitestacks-migration.md
Normal file
441
project-kitestacks-migration.md
Normal file
|
|
@ -0,0 +1,441 @@
|
|||
---
|
||||
name: project-kitestacks-migration
|
||||
description: "Migration of the live KiteStacks homelab/website from assassin (T14) to monk — COMPLETE. Plus full Hetzner cloud failover (kscloud1, 5.78.233.28) — COMPLETE. All 9 subdomains can run from any single host. Plus 2026-06-10 portal/SSO push: portal FluxCD+coming-soon changes deployed, Karakeep SSO fixed, OpenProject SSO blocked by EE license, Portainer SSO Authentik-side done (pending user manual steps)."
|
||||
metadata:
|
||||
node_type: memory
|
||||
type: project
|
||||
originSessionId: 33992890-3940-4d4a-a94a-22b5621e9c1a
|
||||
---
|
||||
|
||||
## STATUS: MIGRATION + CLOUD FAILOVER COMPLETE (2026-06-10)
|
||||
|
||||
monk is the live production host. assassin (T14) is OFF. kscloud1 (Hetzner VPS,
|
||||
5.78.233.28) is now a THIRD active Cloudflare Tunnel connector and runs a FULL
|
||||
replica of all 9 services, so the site stays up even if both monk and assassin
|
||||
are off (verified by user testing with home wifi off, from phone + mom's phone).
|
||||
|
||||
All 9 public subdomains (www, ai, auth, gitforge, grafana, kavita, links, status, tasks)
|
||||
verified returning correct status codes via the live tunnel with kscloud1 in rotation.
|
||||
|
||||
## Governing principle (user's explicit words)
|
||||
"leave the cloud backup on at all times" / "thats the point of it. if I am
|
||||
travelling my site will go down otherwise." -> kscloud1 runs PERMANENTLY as a
|
||||
3rd connector, NOT cold standby. Cloudflare Tunnel load-balances ACTIVE-ACTIVE
|
||||
across all 3 connectors (no primary/backup priority). This means stateful apps
|
||||
(gitforge, openproject, authentik, karakeep, kavita, openwebui, etc.) may show
|
||||
DIFFERENT/STALE data depending on which connector serves a given request -
|
||||
EXPLICITLY ACCEPTED by user as the cost of guaranteed uptime. Fresh/separate
|
||||
databases on kscloud1 are fine; do not try to sync data between monk and kscloud1.
|
||||
|
||||
## kscloud1 access
|
||||
SSH: `ssh -i ~/.ssh/id_ed25519_kscloud1 kenpat@5.78.233.28` (passwordless, key auth).
|
||||
sudo needs a password ("p12217177") and has no askpass helper - avoid sudo;
|
||||
most things doable as kenpat or via docker.
|
||||
All services live under `/opt/kitestacks/docker/<service>/docker-compose.yml`,
|
||||
same one-dir-per-app pattern as monk's `~/kitestacks-live/docker/`.
|
||||
|
||||
## kscloud1 services deployed (all `docker compose up -d`, joined to local `kitestacks` network)
|
||||
- cloudflared (3rd tunnel connector, same TUNNEL_TOKEN, connector id 78521d9f-71c0-4e3d-992f-bd1f77da1a8f)
|
||||
- homepage-backup (alias `homepage`) + caddy + kitestacks-metrics-api-backup - PRE-EXISTING
|
||||
- forgejo (alias `forgejo`) - PRE-EXISTING, separate DB from monk's (gitforge data inconsistent across connectors, accepted)
|
||||
- prometheus + node-exporter (job `kscloud1-node`)
|
||||
- grafana (alias `grafana`, port 3150) - Prometheus datasource (uid 000000001) + "Node Exporter Full"
|
||||
dashboard (id 1860) provisioned via `./provisioning/`. OAuth->authentik config present but
|
||||
authentik on kscloud1 has a FRESH db (no provider apps configured) so OAuth login won't work
|
||||
there; local admin login works.
|
||||
- uptime-kuma (alias `status`->`uptime-kuma`) - kuma.db seeded by copying monk's admin user
|
||||
(same login: kenpat / same password hash) + monitors: kscloud1 self-ping, Google DNS, and
|
||||
HTTP checks for all 9 *.kitestacks.com subdomains (external monitoring of the live site).
|
||||
- kavita (alias `kavita`) - empty library (fresh)
|
||||
- karakeep + karakeep-chrome + karakeep-meilisearch (alias `karakeep`) - fresh meilisearch/db
|
||||
- authentik + authentik-worker + authentik-postgres + authentik-redis (alias on `auth`) - FRESH DB.
|
||||
Bootstrap admin: `akadmin@kitestacks.com` / password `6KlYpfCyYxbnKQNiOewN` (set via
|
||||
AUTHENTIK_BOOTSTRAP_PASSWORD in .env). No OAuth provider apps exist yet (would need to be
|
||||
manually recreated in authentik UI for grafana/openwebui/karakeep/openproject SSO to work
|
||||
when kscloud1 is the active backend).
|
||||
- kite-litellm + kite-openwebui (alias `ai`->openwebui) - same .env/secrets as monk. OpenWebUI
|
||||
has `ENABLE_SIGNUP=true` (changed from monk's `false`) so kenpat can create a local admin
|
||||
account on first use, since authentik OAuth won't work with kscloud1's fresh authentik.
|
||||
- openproject (alias on `tasks`, port 8090:80 host - port 80 was taken by caddy) - FRESH db,
|
||||
self-initialized via the all-in-one image's bootstrap (took ~3-4 min). Empty/no projects yet.
|
||||
|
||||
## monk-side changes made for cross-host monitoring
|
||||
- `~/kitestacks-live/docker/prometheus/prometheus.yml`: added scrape job
|
||||
`kscloud1-node` -> `5.78.233.28:9100` (kscloud1's node-exporter is exposed
|
||||
0.0.0.0:9100, no firewall - reachable from monk's public IP). monk's grafana
|
||||
(the live one, "Node Exporter Full" dashboard now provisioned via
|
||||
`~/kitestacks-live/docker/grafana/provisioning/`) shows BOTH `t14-node`
|
||||
(monk/"this pc") and `kscloud1-node` ("the cloud") via the instance picker.
|
||||
- kscloud1's prometheus only scrapes itself (`kscloud1-node`) - monk is behind
|
||||
home NAT, not reachable from kscloud1.
|
||||
|
||||
## Resource notes (kscloud1: 3 vCPU, 3.7GB RAM + 6GB swap, 75GB disk)
|
||||
With all services running: ~2.8-2.9GB RAM used, ~2.6-2.8GB swap used (of 6GB),
|
||||
~835MB-1.2GB "available", disk 29GB/75GB used. Site is functional but under
|
||||
memory pressure - if BOTH monk and assassin are down for an extended period
|
||||
with real concurrent usage, expect sluggishness (esp. openproject/authentik/
|
||||
openwebui). Not yet stress-tested under real failover load.
|
||||
|
||||
## Key gotchas from THIS phase (cloud failover build-out)
|
||||
- kscloud1's `kitestacks` Docker network is LOCAL/separate from monk's (same name,
|
||||
no conflict). cloudflared on each host resolves container names against its
|
||||
own host's network.
|
||||
- Adding a new tunnel connector that lacks a backend for an ingress hostname ->
|
||||
502 for requests routed there. If it has a DIFFERENT backend (e.g. forgejo) ->
|
||||
serves different data inconsistently. Both accepted/expected now that all 9
|
||||
hostnames have backends on kscloud1.
|
||||
- port 80 on kscloud1 is owned by `caddy` (serves www-backup/git-backup.kitestacks.com
|
||||
direct A-records, pre-existing, unrelated to the tunnel) - openproject uses 8090:80
|
||||
for its host port instead (internal container port 8080 is what cloudflared hits).
|
||||
- uptime-kuma / grafana have no simple file-based config API for monitors/datasources
|
||||
beyond grafana provisioning - used direct sqlite manipulation (`docker exec ... sqlite3`,
|
||||
or python3 sqlite3 module via a throwaway `python:3-alpine` container with the volume
|
||||
mounted) to seed uptime-kuma's kuma.db with users/monitors.
|
||||
- authentik first boot takes ~1-2 min (migrations); openproject first boot takes
|
||||
~4-5 min (postgres initdb + Rails migrations + Puma boot), watch `docker logs`
|
||||
for "Listening on http://0.0.0.0:8080" before testing.
|
||||
|
||||
## Authentik/Kavita login fix (2026-06-10, post cloud-failover)
|
||||
PROBLEM: Cloudflare Tunnel load-balances auth./kavita. active-active across monk
|
||||
and kscloud1. kscloud1's authentik had only the fresh `akadmin` bootstrap user
|
||||
(not kenpat7177) and kscloud1's kavita had ZERO users -> ~50% of requests showed
|
||||
"wrong password" on authentik and a "create admin account" (signup) screen on
|
||||
kavita instead of login. This contradicts the earlier "fresh DBs are fine"
|
||||
assumption - for IDENTITY apps it breaks login, so it was NOT acceptable.
|
||||
FIX APPLIED (one-time sync, same pattern as uptime-kuma's kuma.db seed):
|
||||
- pg_dump'd monk's authentik-postgres `authentik` db (--clean --if-exists),
|
||||
scp'd to kscloud1, stopped authentik+authentik-worker on kscloud1, restored
|
||||
via `docker exec -i authentik-postgres psql -U authentik -d authentik < dump`,
|
||||
restarted. Worked cleanly because AUTHENTIK_SECRET_KEY and PG_PASS were
|
||||
ALREADY IDENTICAL between monk's and kscloud1's authentik/.env.
|
||||
- For kavita: copying the raw kavita.db file via plain `cp` produced
|
||||
"database disk image is malformed" (WAL-mode db isn't standalone-consistent
|
||||
as a flat file copy even when -wal/-shm look small). FIX: use python3
|
||||
sqlite3 `Connection.backup()` (via throwaway python:3-alpine container) to
|
||||
produce a consistent copy, THEN on kscloud1 stop kavita, rm the OLD
|
||||
kavita.db-shm/kavita.db-wal too (stale WAL files against new db = same
|
||||
corruption error), copy in the new kavita.db (chown root:root, chmod 644
|
||||
to match original ownership - kavita container runs as root), restart.
|
||||
- Result: kscloud1 authentik now has kenpat7177 (matches monk), kscloud1
|
||||
kavita now has kenpat7177 + acurrie (matches monk). Both connectors now
|
||||
return the same login screen/credentials. NOTE: this is a ONE-TIME sync,
|
||||
not continuous - if monk's users/passwords change later, kscloud1 will
|
||||
drift again and the same symptoms could return; re-run this sync if so.
|
||||
- kscloud1 kavita's library entries point at /books paths that don't exist on
|
||||
kscloud1 (no actual book files there) - login works fine, but browsing the
|
||||
library when served by kscloud1 will show entries with missing files. Same
|
||||
"stale data" tradeoff as gitforge, accepted.
|
||||
|
||||
## Authentik shared Postgres+Redis over Tailscale (2026-06-10) - fixes "invalid_grant" SSO
|
||||
PROBLEM: Even after the one-time DB sync above, "Sign in with Authentik" on
|
||||
Kavita could fail with "invalid_grant" / "Code does not exist". Root cause:
|
||||
monk and kscloud1 each ran their OWN authentik-postgres. OAuth2 authorization
|
||||
codes are short-lived per-flow rows in `authentik_providers_oauth2_authorizationcode`
|
||||
- if Cloudflare Tunnel's active-active routing sends `/authorize` to one
|
||||
connector and `/application/o/token/` to the other, the code only exists in
|
||||
one of the two DBs -> invalid_grant. A one-time data sync can't fix this
|
||||
because the data is created fresh on every login attempt.
|
||||
FIX: Converted to a single shared Postgres+Redis (HA pattern), hosted on
|
||||
kscloud1, reachable ONLY over Tailscale:
|
||||
- Installed Tailscale on both monk and kscloud1 (same tailnet). kscloud1's
|
||||
tailscale IP is `100.123.254.52`.
|
||||
- kscloud1's `/opt/kitestacks/docker/authentik/docker-compose.yml`:
|
||||
authentik-postgres now binds `100.123.254.52:5432:5432` (was unbound/internal-only),
|
||||
authentik-redis now binds `100.123.254.52:6379:6379`. Both still also reachable
|
||||
on the local `kitestacks` docker network for kscloud1's own authentik+worker.
|
||||
Backup of pre-change file: `docker-compose.yml.backup-before-shared-db-20260610-1138`.
|
||||
- monk's `~/kitestacks-live/docker/authentik/docker-compose.yml`: REMOVED the
|
||||
`postgresql` and `redis` services entirely. monk's `authentik`/`authentik-worker`
|
||||
now point `AUTHENTIK_POSTGRESQL__HOST` and `AUTHENTIK_REDIS__HOST` at
|
||||
`100.123.254.52` (kscloud1 over Tailscale), using the same `PG_PASS` /
|
||||
`AUTHENTIK_SECRET_KEY` as before (already identical between hosts).
|
||||
- monk's old local `authentik-postgres`/`authentik-redis` containers were
|
||||
STOPPED (not removed) - data dirs preserved under
|
||||
`~/kitestacks-live/docker/authentik/postgres` in case of rollback, but no
|
||||
longer in use.
|
||||
- Result: BOTH connectors' authentik+worker now read/write the SAME db/redis,
|
||||
regardless of which one handles `/authorize` vs `/application/o/token/`.
|
||||
Verified both `authentik`+`authentik-worker` healthy on monk and kscloud1,
|
||||
OIDC discovery docs identical, user list matches (`kenpat7177` etc.) on both.
|
||||
CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" on Kavita now works
|
||||
(when monk's connector serves the request).
|
||||
|
||||
## Kavita "Sign in with Authentik" button missing on kscloud1 - FIXED 2026-06-10
|
||||
After the shared-Authentik-DB fix above, the button still didn't appear when
|
||||
Cloudflare routed kavita.kitestacks.com to kscloud1's connector. CAUSE: Kavita's
|
||||
OIDC config lives in ITS OWN db (kavita.db `ServerSetting` table, Key=40, a JSON
|
||||
blob with Authority/ClientId/Secret/Enabled), separate from Authentik's db.
|
||||
The earlier one-time kavita.db sync (see fix above) was taken BEFORE OIDC SSO
|
||||
was configured in monk's Kavita, so kscloud1's copy had Key=40 with empty
|
||||
Authority/Secret and `"Enabled":false`. FIX: copied monk's Key=40 JSON value
|
||||
verbatim into kscloud1's kavita.db (stop kavita, `docker run --rm -v
|
||||
.../kavita/config:/data -v fix.sql:/fix.sql alpine` + apk sqlite + `sqlite3
|
||||
/data/kavita.db < fix.sql` with `UPDATE ServerSetting SET Value='...' WHERE
|
||||
"Key"=40`, restart kavita). NOTE: AspNetUserLogins (OIDC account-linkage table)
|
||||
is empty on BOTH monk and kscloud1 - Kavita creates this row on first OIDC
|
||||
login per-instance (matches existing local user by email since
|
||||
ProvisionAccounts=false), so no extra action needed there.
|
||||
GOTCHA: ServerSetting's PK column is `"Key"` (INTEGER), not `Id` - must quote
|
||||
it in sqlite (`"Key"`) since KEY is a SQL reserved word.
|
||||
DRIFT WARNING: any future Kavita server-setting change (OIDC config, library
|
||||
paths, etc.) made on monk will NOT propagate to kscloud1's kavita.db
|
||||
automatically - same one-time-sync caveat as the user-table sync above.
|
||||
|
||||
UPDATE 2026-06-10 (RESOLVED via Kavita UI, not direct DB edit): Direct SQL
|
||||
edits to ServerSetting Key=40 got WIPED back to disabled/empty by Kavita on
|
||||
every container restart (RowVersion incremented +2 each time, Authority/Secret
|
||||
cleared, Enabled->false) - confirmed twice, even with a full WAL-consistent
|
||||
kavita.db replace from monk. Direct DB writes to this table do NOT survive a
|
||||
restart; only saves through Kavita's own Settings UI/API persist correctly.
|
||||
FIX: opened an SSH local port-forward (`ssh -L 5099:localhost:5000
|
||||
kenpat@5.78.233.28`) so the user could reach kscloud1's Kavita directly at
|
||||
http://localhost:5099 (bypassing the Cloudflare load-balanced domain), logged
|
||||
in with their normal kenpat7177 Kavita password, and re-entered the OIDC
|
||||
config in Settings -> OIDC:
|
||||
- Authority: `https://auth.kitestacks.com/application/o/kavita/`
|
||||
(MUST include trailing slash - Kavita validates that this exactly matches
|
||||
the `issuer` claim in Authentik's `.well-known/openid-configuration`,
|
||||
which has a trailing slash. Without it: "Kavita can load the OIDC
|
||||
configuration, but the issuer does not match".)
|
||||
- Client ID: `kavita`, Client Secret: (96-hex-char secret from Authentik's
|
||||
Kavita OAuth2 provider - watch for copy/paste truncation, verify length=96)
|
||||
- Enabled: true, ProviderName: authentik
|
||||
Saved via UI (RowVersion 8->12->14 across two saves to fix a 1-char-truncated
|
||||
secret), then `docker compose restart kavita` on kscloud1 - config SURVIVED
|
||||
this restart (unlike the direct-SQL attempts) and `/api/settings/oidc` now
|
||||
reports `"enabled": true`. SSH tunnel closed afterward (no firewall changes
|
||||
were made/needed). Set a temporary ApiKey on kenpat7177's kscloud1 kavita
|
||||
account during troubleshooting (for a Plugin/authenticate attempt that turned
|
||||
out to return 401 / unused) - left in place, harmless (grants API access to
|
||||
that user's own account only).
|
||||
TAKEAWAY FOR FUTURE KAVITA CONFIG CHANGES ON KSCLOUD1: always use the Kavita
|
||||
UI (via SSH port-forward to localhost:5000) rather than direct sqlite edits -
|
||||
direct edits to ServerSetting do not survive a restart.
|
||||
CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" now works on Kavita
|
||||
regardless of which connector (monk/kscloud1) answers.
|
||||
|
||||
## Kavita cover images missing on kscloud1 - FIXED 2026-06-10
|
||||
After the kavita.db sync from monk, kscloud1's db referenced cover image files
|
||||
(e.g. `v1_c1.png`..`v10_c10.png` in `ServerSetting`/`Series.CoverImage`) that
|
||||
didn't exist on kscloud1's filesystem - kscloud1's `config/covers/` dir was
|
||||
empty (monk has 9 files, ~1.3MB). Result: book/series cover thumbnails didn't
|
||||
load when kscloud1 served the request. FIX: tar'd monk's
|
||||
`~/kitestacks-live/docker/kavita/config/covers/` (owned 1000:1000), scp'd to
|
||||
kscloud1, extracted into `/opt/kitestacks/docker/kavita/config/covers/` via a
|
||||
throwaway alpine container, `chown -R 1000:1000`. No kavita restart needed -
|
||||
covers are served as static files from disk. CONFIRMED BY USER: covers now
|
||||
load correctly.
|
||||
NOTE: this is another one-time sync (same drift caveat) - if new books/covers
|
||||
are added on monk later, they won't appear on kscloud1 unless re-synced
|
||||
(covers/ dir + kavita.db + actual book files under library/books, none of
|
||||
which exist on kscloud1 per the earlier "stale data" note).
|
||||
SECURITY NOTE: postgres/redis on kscloud1 are bound to the Tailscale interface
|
||||
IP only (100.123.254.52), not 0.0.0.0 - not exposed to the public internet.
|
||||
ROLLBACK: if Tailscale connectivity ever breaks, monk's authentik will fail to
|
||||
start (can't reach 100.123.254.52). To roll back: restore monk's
|
||||
docker-compose.yml from git/backup to use local postgresql/redis services
|
||||
again, restart monk's old authentik-postgres/authentik-redis containers
|
||||
(`docker start authentik-postgres authentik-redis` in
|
||||
~/kitestacks-live/docker/authentik), `docker compose up -d`. Note this would
|
||||
mean monk's authentik db is now STALE (kscloud1's shared db has any logins/
|
||||
changes since 2026-06-10) - would need a fresh pg_dump sync from kscloud1 first.
|
||||
|
||||
## kscloud1 ufw blocks docker-bridge -> host port 8000 (metrics API) - FIXED 2026-06-10
|
||||
kscloud1 has ufw active with `default deny incoming/routed`. The
|
||||
kitestacks-metrics-api-backup container (network_mode: host, binds 0.0.0.0:8000)
|
||||
was unreachable from homepage-backup via `host.docker.internal:8000` (TCP
|
||||
timeout, not refused -> ufw drop), causing the homepage System Status widget to
|
||||
show 0%/"Offline" when kscloud1 served the request. FIXED by adding:
|
||||
`sudo ufw allow from 172.16.0.0/12 to any port 8000 proto tcp` (covers all
|
||||
docker bridge subnets on this host: 172.17-172.29.x.x). Verified
|
||||
homepage-backup -> host.docker.internal:8000/api/metrics now returns real
|
||||
CPU/RAM/storage/network data. kscloud1 sudo password is in the "kscloud1
|
||||
access" section above - needed `echo PASS | sudo -S <cmd>` (no askpass helper,
|
||||
non-interactive sudo via -S works fine).
|
||||
|
||||
## Portal SSO/coming-soon push + Karakeep fix + OpenProject EE blocker + Portainer SSO setup (2026-06-10)
|
||||
Per user's original request: SSO for grafana/prometheus/portainer/cloudflare/uptime-kuma
|
||||
(authentik=source), KITEAI/LITELLM/OPENROUTER -> "coming soon", SSO for openproject/forgejo/
|
||||
karakeep, FluxCD card in AI&Automation. Hard constraint: user does NOT use Cloudflare Zero
|
||||
Trust/Access ("costs money") - any Cloudflare work must avoid those products (note: the
|
||||
Tunnel UI itself lives under Cloudflare's "Zero Trust" dashboard section, but configuring a
|
||||
Tunnel public hostname there is NOT the same as enabling Zero Trust/Access - fine to use).
|
||||
|
||||
### Portal UI changes - DEPLOYED to all 3 copies, verified live
|
||||
Edited the AI & AUTOMATION panel (`cards cards-3` -> `cards cards-2`, now 2x2):
|
||||
Kite AI and OpenRouter cards changed from external links to
|
||||
`href="#" data-coming-soon="1"` (LiteLLM was already coming-soon); added a 4th
|
||||
card "FluxCD" / "GitOps Automation" using `/images/icons/fluxcd.png`, also
|
||||
coming-soon (automation scripts with FluxCD+Prometheus+Grafana+node-exporter
|
||||
are a future project). Applied identically to:
|
||||
- `~/kitestacks-live/docker/kitestacks-portal-test/public/index.html` (monk, dev, port 3008)
|
||||
- `~/kitestacks-live/docker/kitestacks-portal/public/index.html` (monk, LIVE, served by
|
||||
"homepage" container 3005->3000 - this is the file that backs www.kitestacks.com)
|
||||
- `/opt/kitestacks/docker/www-backup/kitestacks-portal/public/index.html` (kscloud1,
|
||||
served by `homepage-backup` port 3015)
|
||||
Verified `https://www.kitestacks.com` returns "FluxCD" consistently (6/6 requests
|
||||
across both connectors).
|
||||
NOTE: Portainer card on the live portal is currently `data-coming-soon="1"` -
|
||||
update this to a real `href="https://portainer.kitestacks.com"` link (remove
|
||||
data-coming-soon) once the Portainer SSO manual steps below are completed.
|
||||
NOTE 2: "cloudflare should all be in the networking side" from the original
|
||||
request was never resolved - Cloudflare card is still in the INFRASTRUCTURE
|
||||
panel, not moved/renamed to a "NETWORKING" panel. Ambiguous, deprioritized,
|
||||
not revisited.
|
||||
|
||||
### Karakeep SSO redirect_uri fix - DONE, confirmed working
|
||||
Karakeep uses NextAuth.js with provider id "custom" (not "authentik") - actual
|
||||
OAuth callback path is `/api/auth/callback/custom`, but Authentik's Karakeep
|
||||
OAuth2Provider's `_redirect_uris` had the wrong path -> "Redirect URI Error".
|
||||
FIX: direct Postgres UPDATE to
|
||||
`authentik_providers_oauth2_oauth2provider._redirect_uris` (JSON column) on
|
||||
the shared kscloud1 authentik-postgres (100.123.254.52), wrapped in explicit
|
||||
`BEGIN; UPDATE ...; COMMIT;` (a bare single-statement -c "UPDATE..." reported
|
||||
"UPDATE 1" but did NOT persist on first attempt - cause unclear, explicit
|
||||
transaction fixed it). After the DB write, restarted authentik+authentik-worker
|
||||
on BOTH monk and kscloud1 and polled
|
||||
`docker inspect --format '{{.State.Health.Status}}'` until both reported
|
||||
"healthy" (~50s) before retesting - first retest hit a transient 502 because
|
||||
kscloud1's authentik was still "starting". CONFIRMED: Authentik now serves the
|
||||
login page (not "Redirect URI Error") for Karakeep SSO.
|
||||
PG_PASS GOTCHA: `~/kitestacks-live/docker/authentik/.env` PG_PASS value ends in
|
||||
`=` - extract with `cut -d= -f2-` (NOT `-f2`, which truncates the trailing `=`
|
||||
and causes "password authentication failed").
|
||||
REUSABLE PATTERN for any future direct Authentik DB edit: (1) wrap writes in
|
||||
explicit BEGIN/COMMIT, (2) restart authentik+authentik-worker on BOTH monk and
|
||||
kscloud1, (3) wait for health=healthy on both before testing.
|
||||
|
||||
### OpenProject SSO - config bug fixed, but BLOCKED by Enterprise licensing (no further action possible)
|
||||
`~/kitestacks-live/docker/openproject/docker-compose.yml` env vars were wrong in
|
||||
two ways: (1) extra "PROVIDERS_" segment in var names caused
|
||||
`seed_oidc_provider = {"providers": {"authentik": {...}}}` instead of
|
||||
`{"authentik": {...}}`, producing a broken stub provider record (slug=
|
||||
"providers", id=1, since deleted via Rails runner); (2) `discovery_endpoint`
|
||||
isn't read by `ConfigurationMapper` at all - replaced with explicit
|
||||
ISSUER/AUTHORIZATION__ENDPOINT/TOKEN__ENDPOINT/USERINFO__ENDPOINT/
|
||||
END__SESSION__ENDPOINT/JWKS__URI vars (current docker-compose.yml has the
|
||||
corrected version, see file - all derived from
|
||||
`https://auth.kitestacks.com/application/o/openproject/.well-known/openid-configuration`).
|
||||
After fixing both, the seeder correctly creates provider slug="authentik",
|
||||
available=true, all fields correct - BUT the SSO button still does not appear
|
||||
on /login. CONFIRMED ROOT CAUSE (terminal, source-code-verified): OpenProject
|
||||
CE 2025/v15's OmniAuth SSO strategy
|
||||
(`OpenProject::Plugins::AuthPlugin`/`OpenIDConnect`) AND SAML
|
||||
(`auth_saml/lib/open_project/auth_saml/engine.rb`, `enterprise_feature:
|
||||
"sso_auth_providers"`) are BOTH gated behind an Enterprise Edition license -
|
||||
"OmniAuth SSO strategy ... is only available for Enterprise Editions". No
|
||||
app/config-level workaround exists. Only remaining options: buy EE license, OR
|
||||
put a forward-auth proxy (oauth2-proxy / Authentik embedded outpost) in front
|
||||
of OpenProject - DEFERRED along with Prometheus/Uptime Kuma proxy work (see
|
||||
below) until Oracle VPS topology is decided.
|
||||
OpenProject container is healthy, `/login` returns 200, no projects yet.
|
||||
|
||||
### Portainer SSO - Authentik side DONE, two manual steps PENDING (not yet done by user)
|
||||
Per user: "yes continue with portainer" / "yes but make sure it is still
|
||||
secure" (approved exposing Portainer publicly via a NEW Cloudflare Tunnel
|
||||
hostname, with explicit requirement to keep it secure -> access restricted to
|
||||
the `homelab-admin` Authentik group).
|
||||
Created via `docker exec authentik ak shell` (Django ORM, no Authentik API
|
||||
token configured) on kscloud1's shared authentik-postgres:
|
||||
- OAuth2Provider "Portainer": client_id=`portainer`,
|
||||
client_secret=`wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF`,
|
||||
provider_id=9, redirect_uri=`https://portainer.kitestacks.com` (strict),
|
||||
scopes openid/email/profile, sub_mode=user_email, signing key + flows copied
|
||||
from existing providers (same pattern as Karakeep/Grafana).
|
||||
- Application "Portainer" (slug="portainer", meta_launch_url=
|
||||
`https://portainer.kitestacks.com`).
|
||||
- PolicyBinding restricting the Portainer application to Authentik group
|
||||
`homelab-admin` (UUID e21b0aa5-62e7-4b3a-8302-130b0ae148a5) - this is the
|
||||
"make sure it is still secure" piece (only homelab-admin members can SSO in).
|
||||
- Verified discovery doc resolves:
|
||||
`https://auth.kitestacks.com/application/o/portainer/.well-known/openid-configuration`.
|
||||
PENDING MANUAL STEPS (user must do via UI - confirmed `portainer.kitestacks.com`
|
||||
still returns `000` as of 2026-06-10):
|
||||
1. Cloudflare dashboard -> Tunnel -> add Public Hostname `portainer.kitestacks.com`
|
||||
-> service `https://portainer:9443` (HTTPS), enable "No TLS Verify". (This is
|
||||
in the Tunnel config UI, which Cloudflare happens to host under the "Zero
|
||||
Trust" nav section, but adding a Tunnel hostname is NOT enabling Zero
|
||||
Trust/Access - does not violate the no-Zero-Trust constraint.)
|
||||
2. In Portainer -> Settings -> Authentication -> OAuth (Provider: Custom), on
|
||||
BOTH monk's and kscloud1's SEPARATE Portainer instances, configure:
|
||||
- Client ID: `portainer`
|
||||
- Client Secret: `wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF`
|
||||
- Authorization URL: `https://auth.kitestacks.com/application/o/authorize/`
|
||||
- Access Token URL: `https://auth.kitestacks.com/application/o/token/`
|
||||
- Resource/Userinfo URL: `https://auth.kitestacks.com/application/o/userinfo/`
|
||||
- Redirect URL: `https://portainer.kitestacks.com`
|
||||
- Logout URL: `https://auth.kitestacks.com/application/o/portainer/end-session/`
|
||||
- Scopes: `openid email profile`, User identifier claim: `email`
|
||||
AFTER both steps done: update the live portal's Portainer card (in the 3 files
|
||||
above) from `data-coming-soon="1"` to a real
|
||||
`href="https://portainer.kitestacks.com" target="_blank" rel="noopener"` link.
|
||||
|
||||
### App-level SSO status summary (end of 2026-06-10 session)
|
||||
Grafana: working (pre-existing). Forgejo: working (pre-existing). Karakeep:
|
||||
fixed this session, working. OpenProject: blocked by EE license (terminal at
|
||||
app level). Portainer: Authentik side done, waiting on user's 2 manual steps
|
||||
above. Prometheus + Uptime Kuma: DEFERRED - neither has native OAuth, need a
|
||||
forward-auth proxy (oauth2-proxy or Authentik embedded outpost) - deferred per
|
||||
user's "ok lets do smaller app level" (hold new infra until Oracle VPS decided).
|
||||
Cloudflare itself: no SSO concept applicable (it's Cloudflare's own dashboard
|
||||
login) - was always about the portal's Cloudflare card placement, see "Portal UI
|
||||
changes" note above.
|
||||
|
||||
### Oracle VPS migration - PLANNED, upcoming (stated 2026-06-11)
|
||||
User confirmed on 2026-06-11: "we are going to switch things soon from hetzner
|
||||
cloud to oracle soon." -> kscloud1 (Hetzner, 5.78.233.28) is intended to be
|
||||
REPLACED by an Oracle Cloud VPS in the near future ("soon", no firm date yet).
|
||||
Originally raised 2026-06-10 as exploratory ("how easy would it be to move
|
||||
everything to oracle vps after?"), now an actual plan.
|
||||
Implication: avoid investing further one-off/manual config work that's hard to
|
||||
redo (e.g. more one-time DB syncs, hand-edited sqlite, etc.) on kscloud1 if
|
||||
avoidable - prefer changes that are easy to replicate on a new host. When the
|
||||
Oracle VPS is provisioned, plan to follow the same pattern as the kscloud1
|
||||
cloud-failover build-out (new Cloudflare Tunnel connector + full service
|
||||
replicas + shared Authentik/Postgres/Redis over Tailscale + the Forgejo
|
||||
FORGEJO_API_BASE-over-Tailscale pattern for the portal's Recent Activity, see
|
||||
"Recent Activity" fix below) - then retire kscloud1 the same way assassin/T14
|
||||
was retired (decommission once Oracle replica verified working).
|
||||
|
||||
## Prior migration gotchas (monk, kept for reference - see git history/old notes if needed)
|
||||
- rsync --files-from recursion bug, bind-mount postgres dirs come over empty as
|
||||
non-root (use pg_dumpall/pg_dump --clean from running container instead),
|
||||
pg_dumpall --clean across template1 breaks on client/server version mismatch
|
||||
(use single-db pg_dump+psql instead), grafana data dir needs chown 472:472,
|
||||
kite-litellm needed manual `docker network connect kitestacks kite-litellm`.
|
||||
|
||||
## 2026-06-12: SSO fixes + Portainer deployed on kscloud1
|
||||
|
||||
### Root cause: monk reconnect race condition
|
||||
When monk goes offline (user travels) and reconnects, Cloudflare starts routing
|
||||
some token exchange requests to monk while codes were created on kscloud1 during
|
||||
the offline window. Auth codes had a 60-second TTL, which expired before monk's
|
||||
Authentik fully started (~5 min startup). FIX: increased `access_code_validity`
|
||||
from `minutes=1` to `minutes=10` for ALL 9 OAuth2 providers in the shared Postgres
|
||||
DB. This gives enough buffer for monk's containers to start before codes expire.
|
||||
Command used (via python:3-alpine container):
|
||||
`docker run --rm --network host -v /tmp/fix_auth.py:/fix.py python:3-alpine sh -c ...`
|
||||
connecting to shared Postgres at 100.123.254.52.
|
||||
|
||||
### Karakeep redirect_uri reverted and re-fixed
|
||||
The Karakeep OAuth2Provider `_redirect_uris` had reverted back to the proxy pattern
|
||||
(`/outpost.goauthentik.io/callback?...`) instead of the correct NextAuth callback
|
||||
(`https://links.kitestacks.com/api/auth/callback/custom`). This caused "Redirect URI
|
||||
Error" from Authentik whenever SSO was attempted. Root cause unknown (possibly an
|
||||
Authentik blueprint or UI save that regenerated/overrode the field). FIX: same
|
||||
Postgres UPDATE pattern. WATCH: if this reverts again, check Authentik blueprints
|
||||
or if someone modified the Karakeep provider via the Authentik admin UI.
|
||||
|
||||
### Portainer deployed on kscloud1
|
||||
Created `/opt/kitestacks/docker/portainer/docker-compose.yml` (same image/config as
|
||||
monk's portainer). Container running as `portainer`, port 9443:9443, on `kitestacks`
|
||||
network. Volume is local (NOT shared with monk - fresh Portainer instance).
|
||||
STILL PENDING (user action in Cloudflare dashboard):
|
||||
- Tunnel ID: 5e60ea8e-a543-49b6-bab5-325f39441e00, Account: d0bb7673333fcd794622956f1662f785
|
||||
- Add hostname `portainer.kitestacks.com` → service `https://portainer:9443`, No TLS Verify
|
||||
STILL PENDING (user action in both Portainer UIs): configure OAuth (see prior notes
|
||||
in "Portainer SSO" section above for exact credentials).
|
||||
Portal card update (3 files) also still pending until tunnel+OAuth done.
|
||||
|
||||
## Phase 2 Planned: Obsidian Mind Map → HTML Mind Map Sync
|
||||
User wants to create an Obsidian mind map of the KiteStacks homelab that syncs/exports to a live HTML mind map embedded in the homelab portal or a standalone page. To be built after full Obsidian+samurai setup is complete.
|
||||
Loading…
Add table
Add a link
Reference in a new issue