- KiteStacks migration memory updated: OSticket live, Portainer SSO live on both monk+kscloud1, portainer.kitestacks.com HTTP 200, CF noTLSVerify fixed via API, auth code TTL bumped 1->10min, Karakeep redirect_uri fixed - Oracle Cloud ARM migration next: user provisioning manually (Ampere A1, 4 OCPU, 24GB RAM). OSticket x86-only issue to solve on Oracle side. - CF API token kitestacks-dns-fix needs rolling (was exposed in chat) - Portainer admin creds: monk=admin/n1t1MvVHCdcXWIIu, kscloud1=kenpat7177/same - Added: feedback-forgejo-redaction, project-a-plus-core2 memories
30 KiB
| name | description | metadata | ||||||
|---|---|---|---|---|---|---|---|---|
| project-kitestacks-migration | Migration of the live KiteStacks homelab/website from assassin (T14) to monk — COMPLETE. Plus full Hetzner cloud failover (kscloud1, 5.78.233.28) — COMPLETE. All 9 subdomains can run from any single host. Plus 2026-06-10 portal/SSO push: portal FluxCD+coming-soon changes deployed, Karakeep SSO fixed, OpenProject SSO blocked by EE license, Portainer SSO Authentik-side done (pending user manual steps). |
|
STATUS: MIGRATION + CLOUD FAILOVER COMPLETE (2026-06-10)
monk is the live production host. assassin (T14) is OFF. kscloud1 (Hetzner VPS, 5.78.233.28) is now a THIRD active Cloudflare Tunnel connector and runs a FULL replica of all 9 services, so the site stays up even if both monk and assassin are off (verified by user testing with home wifi off, from phone + mom's phone).
All 9 public subdomains (www, ai, auth, gitforge, grafana, kavita, links, status, tasks) verified returning correct status codes via the live tunnel with kscloud1 in rotation.
Governing principle (user's explicit words)
"leave the cloud backup on at all times" / "thats the point of it. if I am travelling my site will go down otherwise." -> kscloud1 runs PERMANENTLY as a 3rd connector, NOT cold standby. Cloudflare Tunnel load-balances ACTIVE-ACTIVE across all 3 connectors (no primary/backup priority). This means stateful apps (gitforge, openproject, authentik, karakeep, kavita, openwebui, etc.) may show DIFFERENT/STALE data depending on which connector serves a given request - EXPLICITLY ACCEPTED by user as the cost of guaranteed uptime. Fresh/separate databases on kscloud1 are fine; do not try to sync data between monk and kscloud1.
kscloud1 access
SSH: ssh -i ~/.ssh/id_ed25519_kscloud1 kenpat@5.78.233.28 (passwordless, key auth).
sudo needs a password ("p12217177") and has no askpass helper - avoid sudo;
most things doable as kenpat or via docker.
All services live under /opt/kitestacks/docker/<service>/docker-compose.yml,
same one-dir-per-app pattern as monk's ~/kitestacks-live/docker/.
kscloud1 services deployed (all docker compose up -d, joined to local kitestacks network)
- cloudflared (3rd tunnel connector, same TUNNEL_TOKEN, connector id 78521d9f-71c0-4e3d-992f-bd1f77da1a8f)
- homepage-backup (alias
homepage) + caddy + kitestacks-metrics-api-backup - PRE-EXISTING - forgejo (alias
forgejo) - PRE-EXISTING, separate DB from monk's (gitforge data inconsistent across connectors, accepted) - prometheus + node-exporter (job
kscloud1-node) - grafana (alias
grafana, port 3150) - Prometheus datasource (uid 000000001) + "Node Exporter Full" dashboard (id 1860) provisioned via./provisioning/. OAuth->authentik config present but authentik on kscloud1 has a FRESH db (no provider apps configured) so OAuth login won't work there; local admin login works. - uptime-kuma (alias
status->uptime-kuma) - kuma.db seeded by copying monk's admin user (same login: kenpat / same password hash) + monitors: kscloud1 self-ping, Google DNS, and HTTP checks for all 9 *.kitestacks.com subdomains (external monitoring of the live site). - kavita (alias
kavita) - empty library (fresh) - karakeep + karakeep-chrome + karakeep-meilisearch (alias
karakeep) - fresh meilisearch/db - authentik + authentik-worker + authentik-postgres + authentik-redis (alias on
auth) - FRESH DB. Bootstrap admin:akadmin@kitestacks.com/ password6KlYpfCyYxbnKQNiOewN(set via AUTHENTIK_BOOTSTRAP_PASSWORD in .env). No OAuth provider apps exist yet (would need to be manually recreated in authentik UI for grafana/openwebui/karakeep/openproject SSO to work when kscloud1 is the active backend). - kite-litellm + kite-openwebui (alias
ai->openwebui) - same .env/secrets as monk. OpenWebUI hasENABLE_SIGNUP=true(changed from monk'sfalse) so kenpat can create a local admin account on first use, since authentik OAuth won't work with kscloud1's fresh authentik. - openproject (alias on
tasks, port 8090:80 host - port 80 was taken by caddy) - FRESH db, self-initialized via the all-in-one image's bootstrap (took ~3-4 min). Empty/no projects yet.
monk-side changes made for cross-host monitoring
~/kitestacks-live/docker/prometheus/prometheus.yml: added scrape jobkscloud1-node->5.78.233.28:9100(kscloud1's node-exporter is exposed 0.0.0.0:9100, no firewall - reachable from monk's public IP). monk's grafana (the live one, "Node Exporter Full" dashboard now provisioned via~/kitestacks-live/docker/grafana/provisioning/) shows BOTHt14-node(monk/"this pc") andkscloud1-node("the cloud") via the instance picker.- kscloud1's prometheus only scrapes itself (
kscloud1-node) - monk is behind home NAT, not reachable from kscloud1.
Resource notes (kscloud1: 3 vCPU, 3.7GB RAM + 6GB swap, 75GB disk)
With all services running: ~2.8-2.9GB RAM used, ~2.6-2.8GB swap used (of 6GB), ~835MB-1.2GB "available", disk 29GB/75GB used. Site is functional but under memory pressure - if BOTH monk and assassin are down for an extended period with real concurrent usage, expect sluggishness (esp. openproject/authentik/ openwebui). Not yet stress-tested under real failover load.
Key gotchas from THIS phase (cloud failover build-out)
- kscloud1's
kitestacksDocker network is LOCAL/separate from monk's (same name, no conflict). cloudflared on each host resolves container names against its own host's network. - Adding a new tunnel connector that lacks a backend for an ingress hostname -> 502 for requests routed there. If it has a DIFFERENT backend (e.g. forgejo) -> serves different data inconsistently. Both accepted/expected now that all 9 hostnames have backends on kscloud1.
- port 80 on kscloud1 is owned by
caddy(serves www-backup/git-backup.kitestacks.com direct A-records, pre-existing, unrelated to the tunnel) - openproject uses 8090:80 for its host port instead (internal container port 8080 is what cloudflared hits). - uptime-kuma / grafana have no simple file-based config API for monitors/datasources
beyond grafana provisioning - used direct sqlite manipulation (
docker exec ... sqlite3, or python3 sqlite3 module via a throwawaypython:3-alpinecontainer with the volume mounted) to seed uptime-kuma's kuma.db with users/monitors. - authentik first boot takes ~1-2 min (migrations); openproject first boot takes
~4-5 min (postgres initdb + Rails migrations + Puma boot), watch
docker logsfor "Listening on http://0.0.0.0:8080" before testing.
Authentik/Kavita login fix (2026-06-10, post cloud-failover)
PROBLEM: Cloudflare Tunnel load-balances auth./kavita. active-active across monk
and kscloud1. kscloud1's authentik had only the fresh akadmin bootstrap user
(not kenpat7177) and kscloud1's kavita had ZERO users -> ~50% of requests showed
"wrong password" on authentik and a "create admin account" (signup) screen on
kavita instead of login. This contradicts the earlier "fresh DBs are fine"
assumption - for IDENTITY apps it breaks login, so it was NOT acceptable.
FIX APPLIED (one-time sync, same pattern as uptime-kuma's kuma.db seed):
- pg_dump'd monk's authentik-postgres
authentikdb (--clean --if-exists), scp'd to kscloud1, stopped authentik+authentik-worker on kscloud1, restored viadocker exec -i authentik-postgres psql -U authentik -d authentik < dump, restarted. Worked cleanly because AUTHENTIK_SECRET_KEY and PG_PASS were ALREADY IDENTICAL between monk's and kscloud1's authentik/.env. - For kavita: copying the raw kavita.db file via plain
cpproduced "database disk image is malformed" (WAL-mode db isn't standalone-consistent as a flat file copy even when -wal/-shm look small). FIX: use python3 sqlite3Connection.backup()(via throwaway python:3-alpine container) to produce a consistent copy, THEN on kscloud1 stop kavita, rm the OLD kavita.db-shm/kavita.db-wal too (stale WAL files against new db = same corruption error), copy in the new kavita.db (chown root:root, chmod 644 to match original ownership - kavita container runs as root), restart. - Result: kscloud1 authentik now has kenpat7177 (matches monk), kscloud1 kavita now has kenpat7177 + acurrie (matches monk). Both connectors now return the same login screen/credentials. NOTE: this is a ONE-TIME sync, not continuous - if monk's users/passwords change later, kscloud1 will drift again and the same symptoms could return; re-run this sync if so.
- kscloud1 kavita's library entries point at /books paths that don't exist on kscloud1 (no actual book files there) - login works fine, but browsing the library when served by kscloud1 will show entries with missing files. Same "stale data" tradeoff as gitforge, accepted.
Authentik shared Postgres+Redis over Tailscale (2026-06-10) - fixes "invalid_grant" SSO
PROBLEM: Even after the one-time DB sync above, "Sign in with Authentik" on
Kavita could fail with "invalid_grant" / "Code does not exist". Root cause:
monk and kscloud1 each ran their OWN authentik-postgres. OAuth2 authorization
codes are short-lived per-flow rows in authentik_providers_oauth2_authorizationcode
- if Cloudflare Tunnel's active-active routing sends
/authorizeto one connector and/application/o/token/to the other, the code only exists in one of the two DBs -> invalid_grant. A one-time data sync can't fix this because the data is created fresh on every login attempt. FIX: Converted to a single shared Postgres+Redis (HA pattern), hosted on kscloud1, reachable ONLY over Tailscale: - Installed Tailscale on both monk and kscloud1 (same tailnet). kscloud1's
tailscale IP is
100.123.254.52. - kscloud1's
/opt/kitestacks/docker/authentik/docker-compose.yml: authentik-postgres now binds100.123.254.52:5432:5432(was unbound/internal-only), authentik-redis now binds100.123.254.52:6379:6379. Both still also reachable on the localkitestacksdocker network for kscloud1's own authentik+worker. Backup of pre-change file:docker-compose.yml.backup-before-shared-db-20260610-1138. - monk's
~/kitestacks-live/docker/authentik/docker-compose.yml: REMOVED thepostgresqlandredisservices entirely. monk'sauthentik/authentik-workernow pointAUTHENTIK_POSTGRESQL__HOSTandAUTHENTIK_REDIS__HOSTat100.123.254.52(kscloud1 over Tailscale), using the samePG_PASS/AUTHENTIK_SECRET_KEYas before (already identical between hosts). - monk's old local
authentik-postgres/authentik-rediscontainers were STOPPED (not removed) - data dirs preserved under~/kitestacks-live/docker/authentik/postgresin case of rollback, but no longer in use. - Result: BOTH connectors' authentik+worker now read/write the SAME db/redis,
regardless of which one handles
/authorizevs/application/o/token/. Verified bothauthentik+authentik-workerhealthy on monk and kscloud1, OIDC discovery docs identical, user list matches (kenpat7177etc.) on both. CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" on Kavita now works (when monk's connector serves the request).
Kavita "Sign in with Authentik" button missing on kscloud1 - FIXED 2026-06-10
After the shared-Authentik-DB fix above, the button still didn't appear when
Cloudflare routed kavita.kitestacks.com to kscloud1's connector. CAUSE: Kavita's
OIDC config lives in ITS OWN db (kavita.db ServerSetting table, Key=40, a JSON
blob with Authority/ClientId/Secret/Enabled), separate from Authentik's db.
The earlier one-time kavita.db sync (see fix above) was taken BEFORE OIDC SSO
was configured in monk's Kavita, so kscloud1's copy had Key=40 with empty
Authority/Secret and "Enabled":false. FIX: copied monk's Key=40 JSON value
verbatim into kscloud1's kavita.db (stop kavita, docker run --rm -v .../kavita/config:/data -v fix.sql:/fix.sql alpine + apk sqlite + sqlite3 /data/kavita.db < fix.sql with UPDATE ServerSetting SET Value='...' WHERE "Key"=40, restart kavita). NOTE: AspNetUserLogins (OIDC account-linkage table)
is empty on BOTH monk and kscloud1 - Kavita creates this row on first OIDC
login per-instance (matches existing local user by email since
ProvisionAccounts=false), so no extra action needed there.
GOTCHA: ServerSetting's PK column is "Key" (INTEGER), not Id - must quote
it in sqlite ("Key") since KEY is a SQL reserved word.
DRIFT WARNING: any future Kavita server-setting change (OIDC config, library
paths, etc.) made on monk will NOT propagate to kscloud1's kavita.db
automatically - same one-time-sync caveat as the user-table sync above.
UPDATE 2026-06-10 (RESOLVED via Kavita UI, not direct DB edit): Direct SQL
edits to ServerSetting Key=40 got WIPED back to disabled/empty by Kavita on
every container restart (RowVersion incremented +2 each time, Authority/Secret
cleared, Enabled->false) - confirmed twice, even with a full WAL-consistent
kavita.db replace from monk. Direct DB writes to this table do NOT survive a
restart; only saves through Kavita's own Settings UI/API persist correctly.
FIX: opened an SSH local port-forward (ssh -L 5099:localhost:5000 kenpat@5.78.233.28) so the user could reach kscloud1's Kavita directly at
http://localhost:5099 (bypassing the Cloudflare load-balanced domain), logged
in with their normal kenpat7177 Kavita password, and re-entered the OIDC
config in Settings -> OIDC:
- Authority:
https://auth.kitestacks.com/application/o/kavita/(MUST include trailing slash - Kavita validates that this exactly matches theissuerclaim in Authentik's.well-known/openid-configuration, which has a trailing slash. Without it: "Kavita can load the OIDC configuration, but the issuer does not match".) - Client ID:
kavita, Client Secret: (96-hex-char secret from Authentik's Kavita OAuth2 provider - watch for copy/paste truncation, verify length=96) - Enabled: true, ProviderName: authentik
Saved via UI (RowVersion 8->12->14 across two saves to fix a 1-char-truncated
secret), then
docker compose restart kavitaon kscloud1 - config SURVIVED this restart (unlike the direct-SQL attempts) and/api/settings/oidcnow reports"enabled": true. SSH tunnel closed afterward (no firewall changes were made/needed). Set a temporary ApiKey on kenpat7177's kscloud1 kavita account during troubleshooting (for a Plugin/authenticate attempt that turned out to return 401 / unused) - left in place, harmless (grants API access to that user's own account only). TAKEAWAY FOR FUTURE KAVITA CONFIG CHANGES ON KSCLOUD1: always use the Kavita UI (via SSH port-forward to localhost:5000) rather than direct sqlite edits - direct edits to ServerSetting do not survive a restart. CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" now works on Kavita regardless of which connector (monk/kscloud1) answers.
Kavita cover images missing on kscloud1 - FIXED 2026-06-10
After the kavita.db sync from monk, kscloud1's db referenced cover image files
(e.g. v1_c1.png..v10_c10.png in ServerSetting/Series.CoverImage) that
didn't exist on kscloud1's filesystem - kscloud1's config/covers/ dir was
empty (monk has 9 files, ~1.3MB). Result: book/series cover thumbnails didn't
load when kscloud1 served the request. FIX: tar'd monk's
~/kitestacks-live/docker/kavita/config/covers/ (owned 1000:1000), scp'd to
kscloud1, extracted into /opt/kitestacks/docker/kavita/config/covers/ via a
throwaway alpine container, chown -R 1000:1000. No kavita restart needed -
covers are served as static files from disk. CONFIRMED BY USER: covers now
load correctly.
NOTE: this is another one-time sync (same drift caveat) - if new books/covers
are added on monk later, they won't appear on kscloud1 unless re-synced
(covers/ dir + kavita.db + actual book files under library/books, none of
which exist on kscloud1 per the earlier "stale data" note).
SECURITY NOTE: postgres/redis on kscloud1 are bound to the Tailscale interface
IP only (100.123.254.52), not 0.0.0.0 - not exposed to the public internet.
ROLLBACK: if Tailscale connectivity ever breaks, monk's authentik will fail to
start (can't reach 100.123.254.52). To roll back: restore monk's
docker-compose.yml from git/backup to use local postgresql/redis services
again, restart monk's old authentik-postgres/authentik-redis containers
(docker start authentik-postgres authentik-redis in
~/kitestacks-live/docker/authentik), docker compose up -d. Note this would
mean monk's authentik db is now STALE (kscloud1's shared db has any logins/
changes since 2026-06-10) - would need a fresh pg_dump sync from kscloud1 first.
kscloud1 ufw blocks docker-bridge -> host port 8000 (metrics API) - FIXED 2026-06-10
kscloud1 has ufw active with default deny incoming/routed. The
kitestacks-metrics-api-backup container (network_mode: host, binds 0.0.0.0:8000)
was unreachable from homepage-backup via host.docker.internal:8000 (TCP
timeout, not refused -> ufw drop), causing the homepage System Status widget to
show 0%/"Offline" when kscloud1 served the request. FIXED by adding:
sudo ufw allow from 172.16.0.0/12 to any port 8000 proto tcp (covers all
docker bridge subnets on this host: 172.17-172.29.x.x). Verified
homepage-backup -> host.docker.internal:8000/api/metrics now returns real
CPU/RAM/storage/network data. kscloud1 sudo password is in the "kscloud1
access" section above - needed echo PASS | sudo -S <cmd> (no askpass helper,
non-interactive sudo via -S works fine).
Portal SSO/coming-soon push + Karakeep fix + OpenProject EE blocker + Portainer SSO setup (2026-06-10)
Per user's original request: SSO for grafana/prometheus/portainer/cloudflare/uptime-kuma (authentik=source), KITEAI/LITELLM/OPENROUTER -> "coming soon", SSO for openproject/forgejo/ karakeep, FluxCD card in AI&Automation. Hard constraint: user does NOT use Cloudflare Zero Trust/Access ("costs money") - any Cloudflare work must avoid those products (note: the Tunnel UI itself lives under Cloudflare's "Zero Trust" dashboard section, but configuring a Tunnel public hostname there is NOT the same as enabling Zero Trust/Access - fine to use).
Portal UI changes - DEPLOYED to all 3 copies, verified live
Edited the AI & AUTOMATION panel (cards cards-3 -> cards cards-2, now 2x2):
Kite AI and OpenRouter cards changed from external links to
href="#" data-coming-soon="1" (LiteLLM was already coming-soon); added a 4th
card "FluxCD" / "GitOps Automation" using /images/icons/fluxcd.png, also
coming-soon (automation scripts with FluxCD+Prometheus+Grafana+node-exporter
are a future project). Applied identically to:
~/kitestacks-live/docker/kitestacks-portal-test/public/index.html(monk, dev, port 3008)~/kitestacks-live/docker/kitestacks-portal/public/index.html(monk, LIVE, served by "homepage" container 3005->3000 - this is the file that backs www.kitestacks.com)/opt/kitestacks/docker/www-backup/kitestacks-portal/public/index.html(kscloud1, served byhomepage-backupport 3015) Verifiedhttps://www.kitestacks.comreturns "FluxCD" consistently (6/6 requests across both connectors). NOTE: Portainer card on the live portal is currentlydata-coming-soon="1"- update this to a realhref="https://portainer.kitestacks.com"link (remove data-coming-soon) once the Portainer SSO manual steps below are completed. NOTE 2: "cloudflare should all be in the networking side" from the original request was never resolved - Cloudflare card is still in the INFRASTRUCTURE panel, not moved/renamed to a "NETWORKING" panel. Ambiguous, deprioritized, not revisited.
Karakeep SSO redirect_uri fix - DONE, confirmed working
Karakeep uses NextAuth.js with provider id "custom" (not "authentik") - actual
OAuth callback path is /api/auth/callback/custom, but Authentik's Karakeep
OAuth2Provider's _redirect_uris had the wrong path -> "Redirect URI Error".
FIX: direct Postgres UPDATE to
authentik_providers_oauth2_oauth2provider._redirect_uris (JSON column) on
the shared kscloud1 authentik-postgres (100.123.254.52), wrapped in explicit
BEGIN; UPDATE ...; COMMIT; (a bare single-statement -c "UPDATE..." reported
"UPDATE 1" but did NOT persist on first attempt - cause unclear, explicit
transaction fixed it). After the DB write, restarted authentik+authentik-worker
on BOTH monk and kscloud1 and polled
docker inspect --format '{{.State.Health.Status}}' until both reported
"healthy" (~50s) before retesting - first retest hit a transient 502 because
kscloud1's authentik was still "starting". CONFIRMED: Authentik now serves the
login page (not "Redirect URI Error") for Karakeep SSO.
PG_PASS GOTCHA: ~/kitestacks-live/docker/authentik/.env PG_PASS value ends in
= - extract with cut -d= -f2- (NOT -f2, which truncates the trailing =
and causes "password authentication failed").
REUSABLE PATTERN for any future direct Authentik DB edit: (1) wrap writes in
explicit BEGIN/COMMIT, (2) restart authentik+authentik-worker on BOTH monk and
kscloud1, (3) wait for health=healthy on both before testing.
OpenProject SSO - config bug fixed, but BLOCKED by Enterprise licensing (no further action possible)
~/kitestacks-live/docker/openproject/docker-compose.yml env vars were wrong in
two ways: (1) extra "PROVIDERS_" segment in var names caused
seed_oidc_provider = {"providers": {"authentik": {...}}} instead of
{"authentik": {...}}, producing a broken stub provider record (slug=
"providers", id=1, since deleted via Rails runner); (2) discovery_endpoint
isn't read by ConfigurationMapper at all - replaced with explicit
ISSUER/AUTHORIZATION__ENDPOINT/TOKEN__ENDPOINT/USERINFO__ENDPOINT/
END__SESSION__ENDPOINT/JWKS__URI vars (current docker-compose.yml has the
corrected version, see file - all derived from
https://auth.kitestacks.com/application/o/openproject/.well-known/openid-configuration).
After fixing both, the seeder correctly creates provider slug="authentik",
available=true, all fields correct - BUT the SSO button still does not appear
on /login. CONFIRMED ROOT CAUSE (terminal, source-code-verified): OpenProject
CE 2025/v15's OmniAuth SSO strategy
(OpenProject::Plugins::AuthPlugin/OpenIDConnect) AND SAML
(auth_saml/lib/open_project/auth_saml/engine.rb, enterprise_feature: "sso_auth_providers") are BOTH gated behind an Enterprise Edition license -
"OmniAuth SSO strategy ... is only available for Enterprise Editions". No
app/config-level workaround exists. Only remaining options: buy EE license, OR
put a forward-auth proxy (oauth2-proxy / Authentik embedded outpost) in front
of OpenProject - DEFERRED along with Prometheus/Uptime Kuma proxy work (see
below) until Oracle VPS topology is decided.
OpenProject container is healthy, /login returns 200, no projects yet.
Portainer SSO - Authentik side DONE, two manual steps PENDING (not yet done by user)
Per user: "yes continue with portainer" / "yes but make sure it is still
secure" (approved exposing Portainer publicly via a NEW Cloudflare Tunnel
hostname, with explicit requirement to keep it secure -> access restricted to
the homelab-admin Authentik group).
Created via docker exec authentik ak shell (Django ORM, no Authentik API
token configured) on kscloud1's shared authentik-postgres:
- OAuth2Provider "Portainer": client_id=
portainer, client_secret=wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF, provider_id=9, redirect_uri=https://portainer.kitestacks.com(strict), scopes openid/email/profile, sub_mode=user_email, signing key + flows copied from existing providers (same pattern as Karakeep/Grafana). - Application "Portainer" (slug="portainer", meta_launch_url=
https://portainer.kitestacks.com). - PolicyBinding restricting the Portainer application to Authentik group
homelab-admin(UUID e21b0aa5-62e7-4b3a-8302-130b0ae148a5) - this is the "make sure it is still secure" piece (only homelab-admin members can SSO in). - Verified discovery doc resolves:
https://auth.kitestacks.com/application/o/portainer/.well-known/openid-configuration. PENDING MANUAL STEPS (user must do via UI - confirmedportainer.kitestacks.comstill returns000as of 2026-06-10):
- Cloudflare dashboard -> Tunnel -> add Public Hostname
portainer.kitestacks.com-> servicehttps://portainer:9443(HTTPS), enable "No TLS Verify". (This is in the Tunnel config UI, which Cloudflare happens to host under the "Zero Trust" nav section, but adding a Tunnel hostname is NOT enabling Zero Trust/Access - does not violate the no-Zero-Trust constraint.) - In Portainer -> Settings -> Authentication -> OAuth (Provider: Custom), on
BOTH monk's and kscloud1's SEPARATE Portainer instances, configure:
- Client ID:
portainer - Client Secret:
wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF - Authorization URL:
https://auth.kitestacks.com/application/o/authorize/ - Access Token URL:
https://auth.kitestacks.com/application/o/token/ - Resource/Userinfo URL:
https://auth.kitestacks.com/application/o/userinfo/ - Redirect URL:
https://portainer.kitestacks.com - Logout URL:
https://auth.kitestacks.com/application/o/portainer/end-session/ - Scopes:
openid email profile, User identifier claim:emailAFTER both steps done: update the live portal's Portainer card (in the 3 files above) fromdata-coming-soon="1"to a realhref="https://portainer.kitestacks.com" target="_blank" rel="noopener"link.
- Client ID:
App-level SSO status summary (end of 2026-06-10 session)
Grafana: working (pre-existing). Forgejo: working (pre-existing). Karakeep: fixed this session, working. OpenProject: blocked by EE license (terminal at app level). Portainer: Authentik side done, waiting on user's 2 manual steps above. Prometheus + Uptime Kuma: DEFERRED - neither has native OAuth, need a forward-auth proxy (oauth2-proxy or Authentik embedded outpost) - deferred per user's "ok lets do smaller app level" (hold new infra until Oracle VPS decided). Cloudflare itself: no SSO concept applicable (it's Cloudflare's own dashboard login) - was always about the portal's Cloudflare card placement, see "Portal UI changes" note above.
Oracle VPS migration - PLANNED, upcoming (stated 2026-06-11)
User confirmed on 2026-06-11: "we are going to switch things soon from hetzner cloud to oracle soon." -> kscloud1 (Hetzner, 5.78.233.28) is intended to be REPLACED by an Oracle Cloud VPS in the near future ("soon", no firm date yet). Originally raised 2026-06-10 as exploratory ("how easy would it be to move everything to oracle vps after?"), now an actual plan. Implication: avoid investing further one-off/manual config work that's hard to redo (e.g. more one-time DB syncs, hand-edited sqlite, etc.) on kscloud1 if avoidable - prefer changes that are easy to replicate on a new host. When the Oracle VPS is provisioned, plan to follow the same pattern as the kscloud1 cloud-failover build-out (new Cloudflare Tunnel connector + full service replicas + shared Authentik/Postgres/Redis over Tailscale + the Forgejo FORGEJO_API_BASE-over-Tailscale pattern for the portal's Recent Activity, see "Recent Activity" fix below) - then retire kscloud1 the same way assassin/T14 was retired (decommission once Oracle replica verified working).
Prior migration gotchas (monk, kept for reference - see git history/old notes if needed)
- rsync --files-from recursion bug, bind-mount postgres dirs come over empty as
non-root (use pg_dumpall/pg_dump --clean from running container instead),
pg_dumpall --clean across template1 breaks on client/server version mismatch
(use single-db pg_dump+psql instead), grafana data dir needs chown 472:472,
kite-litellm needed manual
docker network connect kitestacks kite-litellm.
2026-06-12: SSO fixes + Portainer deployed on kscloud1
Root cause: monk reconnect race condition
When monk goes offline (user travels) and reconnects, Cloudflare starts routing
some token exchange requests to monk while codes were created on kscloud1 during
the offline window. Auth codes had a 60-second TTL, which expired before monk's
Authentik fully started (~5 min startup). FIX: increased access_code_validity
from minutes=1 to minutes=10 for ALL 9 OAuth2 providers in the shared Postgres
DB. This gives enough buffer for monk's containers to start before codes expire.
Command used (via python:3-alpine container):
docker run --rm --network host -v /tmp/fix_auth.py:/fix.py python:3-alpine sh -c ...
connecting to shared Postgres at 100.123.254.52.
Karakeep redirect_uri reverted and re-fixed
The Karakeep OAuth2Provider _redirect_uris had reverted back to the proxy pattern
(/outpost.goauthentik.io/callback?...) instead of the correct NextAuth callback
(https://links.kitestacks.com/api/auth/callback/custom). This caused "Redirect URI
Error" from Authentik whenever SSO was attempted. Root cause unknown (possibly an
Authentik blueprint or UI save that regenerated/overrode the field). FIX: same
Postgres UPDATE pattern. WATCH: if this reverts again, check Authentik blueprints
or if someone modified the Karakeep provider via the Authentik admin UI.
Portainer deployed on kscloud1
Created /opt/kitestacks/docker/portainer/docker-compose.yml (same image/config as
monk's portainer). Container running as portainer, port 9443:9443, on kitestacks
network. Volume is local (NOT shared with monk - fresh Portainer instance).
STILL PENDING (user action in Cloudflare dashboard):
- Tunnel ID: 5e60ea8e-a543-49b6-bab5-325f39441e00, Account: d0bb7673333fcd794622956f1662f785
- Add hostname
portainer.kitestacks.com→ servicehttps://portainer:9443, No TLS Verify STILL PENDING (user action in both Portainer UIs): configure OAuth (see prior notes in "Portainer SSO" section above for exact credentials). Portal card update (3 files) also still pending until tunnel+OAuth done.
Phase 2 Planned: Obsidian Mind Map → HTML Mind Map Sync
User wants to create an Obsidian mind map of the KiteStacks homelab that syncs/exports to a live HTML mind map embedded in the homelab portal or a standalone page. To be built after full Obsidian+samurai setup is complete.