2026-06-12: full KiteStacks session sync
- KiteStacks migration memory updated: OSticket live, Portainer SSO live on both monk+kscloud1, portainer.kitestacks.com HTTP 200, CF noTLSVerify fixed via API, auth code TTL bumped 1->10min, Karakeep redirect_uri fixed - Oracle Cloud ARM migration next: user provisioning manually (Ampere A1, 4 OCPU, 24GB RAM). OSticket x86-only issue to solve on Oracle side. - CF API token kitestacks-dns-fix needs rolling (was exposed in chat) - Portainer admin creds: monk=admin/n1t1MvVHCdcXWIIu, kscloud1=kenpat7177/same - Added: feedback-forgejo-redaction, project-a-plus-core2 memories
This commit is contained in:
parent
00145b1657
commit
fcd8def71a
4 changed files with 511 additions and 5 deletions
|
|
@ -1,5 +1,3 @@
|
||||||
# Memory Index
|
- [KiteStacks migration + Hetzner cloud failover (COMPLETE)](project-kitestacks-migration.md) — monk primary, kscloud1 cloud replica, Oracle VPS coming. 2026-06-12 DONE: OSticket live, Portainer SSO live on both hosts (portainer.kitestacks.com HTTP 200, noTLSVerify fixed via CF API), docs v1.4.0 in Forgejo. NEXT: Oracle Cloud ARM VPS (user provisioning manually — 4 OCPU 24GB Ampere A1). OSticket is x86-only so needs swap for Oracle ARM. CF API token kitestacks-dns-fix needs rolling (was exposed in chat).
|
||||||
|
- [Forgejo doc redaction rule](feedback-forgejo-redaction.md) — always redact IPs, ports, and passwords in any homelab Forgejo repo files before committing.
|
||||||
- [Claude memory sync](project_claude_memory_sync.md) — claude-memory Forgejo repo for cross-device context
|
- [A+ Core 2 study plan](project-a-plus-core2.md) — exam goal June 28 2026, started 2026-06-11 9:15 PM, Professor Messer diagnostic first, CertMaster next week.
|
||||||
- [Cyberpunk wallpaper project](project_cyberpunk_wallpaper.md) — Rainmeter/Wallpaper Engine dashboard on samurai
|
|
||||||
- [Periodic memory commits](feedback_periodic_memory_commits.md) — push memory updates to claude-memory repo throughout long sessions, not just at the end
|
|
||||||
|
|
|
||||||
18
feedback-forgejo-redaction.md
Normal file
18
feedback-forgejo-redaction.md
Normal file
|
|
@ -0,0 +1,18 @@
|
||||||
|
---
|
||||||
|
name: feedback-forgejo-redaction
|
||||||
|
description: "Always redact IPs, ports, and passwords in any files committed to the homelab Forgejo repo"
|
||||||
|
metadata:
|
||||||
|
node_type: memory
|
||||||
|
type: feedback
|
||||||
|
originSessionId: 20e70bfb-0880-4ec4-aece-a21855bb3dfe
|
||||||
|
---
|
||||||
|
|
||||||
|
Always redact IPs, ports, and passwords before committing or editing any file in the KiteStacks homelab Forgejo repo (kitestacks-homelab). This applies to all documents: RUNBOOK.md, docs/, projects/, DEBUG-DOCUMENTATION.md, README.md, etc.
|
||||||
|
|
||||||
|
**Why:** Security — user does not want real infrastructure details (IPs, port bindings, credentials) in the public Forgejo repository.
|
||||||
|
|
||||||
|
**How to apply:**
|
||||||
|
- IPs → descriptive placeholders like `<KSCLOUD1_PUBLIC_IP>`, `<MONK_LAN_IP>`, `<KSCLOUD1_TAILSCALE_IP>`, etc.
|
||||||
|
- Port numbers in host bindings, IP:port combos, explicit app URLs → `<port>` placeholder
|
||||||
|
- Passwords, sudo passwords, OAuth secrets → `<password>` or descriptive placeholder like `<KSCLOUD1_SUDO_PASSWORD>`
|
||||||
|
- Apply proactively when writing new content for these docs, not just on request
|
||||||
49
project-a-plus-core2.md
Normal file
49
project-a-plus-core2.md
Normal file
|
|
@ -0,0 +1,49 @@
|
||||||
|
---
|
||||||
|
name: project-a-plus-core2
|
||||||
|
description: "A+ Core 2 study plan and progress tracking — exam goal June 28, 2026"
|
||||||
|
metadata:
|
||||||
|
node_type: memory
|
||||||
|
type: project
|
||||||
|
originSessionId: 20e70bfb-0880-4ec4-aece-a21855bb3dfe
|
||||||
|
---
|
||||||
|
|
||||||
|
## A+ Core 2 Study Progress
|
||||||
|
|
||||||
|
**Exam goal:** Before July 4th week (preferred ~June 28), hard deadline July 12, 2026
|
||||||
|
**July 4th week:** Time off — buffer week if needed, or use for final prep
|
||||||
|
**Strategy:** Monitor readiness via practice tests, don't sit the real exam until consistently hitting 85%+
|
||||||
|
**Study started:** 2026-06-11 at 9:15 PM
|
||||||
|
**Strategy:** Diagnostic test first, then focus on weak areas
|
||||||
|
|
||||||
|
**Why:** June 28 is achievable at 3.5 hours/day. User passed Core 1 with highest score in class of 22.
|
||||||
|
|
||||||
|
## Study Log
|
||||||
|
|
||||||
|
| Date | Activity | Notes |
|
||||||
|
|------|----------|-------|
|
||||||
|
| 2026-06-11 | Started Core 2 study, 9:15 PM | Took Sybex diagnostic practice exam — scored 50% (50/100) |
|
||||||
|
|
||||||
|
## Planned Tests
|
||||||
|
- **This week (started 2026-06-11):** Professor Messer practice exam (diagnostic — taken cold first)
|
||||||
|
- **Next week:** CompTIA CertMaster practice test
|
||||||
|
|
||||||
|
## Study Plan (17 days)
|
||||||
|
- Days 1–4: Operating Systems domain
|
||||||
|
- Days 5–8: Security domain
|
||||||
|
- Days 9–11: Software Troubleshooting
|
||||||
|
- Days 12–13: Operational Procedures
|
||||||
|
- Days 14–15: Full timed practice exams
|
||||||
|
- Day 16: Weak area review only
|
||||||
|
- Day 17 (June 28): Exam
|
||||||
|
|
||||||
|
## Key Weak Areas to Watch (common for homelab/Linux users)
|
||||||
|
- Windows command line tools (sfc, DISM, chkdsk, bootrec, diskpart)
|
||||||
|
- Malware types and removal procedures (pure memorization)
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
- Professor Messer A+ Core 2 (YouTube + paid practice exams)
|
||||||
|
- CompTIA CertMaster (next week)
|
||||||
|
- Jason Dion practice exams (Udemy backup)
|
||||||
|
- r/CompTIA, Professor Messer Discord for group study
|
||||||
|
|
||||||
|
**How to apply:** When user mentions Core 2 or exam prep, reference this log and check in on progress toward June 28 goal.
|
||||||
441
project-kitestacks-migration.md
Normal file
441
project-kitestacks-migration.md
Normal file
|
|
@ -0,0 +1,441 @@
|
||||||
|
---
|
||||||
|
name: project-kitestacks-migration
|
||||||
|
description: "Migration of the live KiteStacks homelab/website from assassin (T14) to monk — COMPLETE. Plus full Hetzner cloud failover (kscloud1, 5.78.233.28) — COMPLETE. All 9 subdomains can run from any single host. Plus 2026-06-10 portal/SSO push: portal FluxCD+coming-soon changes deployed, Karakeep SSO fixed, OpenProject SSO blocked by EE license, Portainer SSO Authentik-side done (pending user manual steps)."
|
||||||
|
metadata:
|
||||||
|
node_type: memory
|
||||||
|
type: project
|
||||||
|
originSessionId: 33992890-3940-4d4a-a94a-22b5621e9c1a
|
||||||
|
---
|
||||||
|
|
||||||
|
## STATUS: MIGRATION + CLOUD FAILOVER COMPLETE (2026-06-10)
|
||||||
|
|
||||||
|
monk is the live production host. assassin (T14) is OFF. kscloud1 (Hetzner VPS,
|
||||||
|
5.78.233.28) is now a THIRD active Cloudflare Tunnel connector and runs a FULL
|
||||||
|
replica of all 9 services, so the site stays up even if both monk and assassin
|
||||||
|
are off (verified by user testing with home wifi off, from phone + mom's phone).
|
||||||
|
|
||||||
|
All 9 public subdomains (www, ai, auth, gitforge, grafana, kavita, links, status, tasks)
|
||||||
|
verified returning correct status codes via the live tunnel with kscloud1 in rotation.
|
||||||
|
|
||||||
|
## Governing principle (user's explicit words)
|
||||||
|
"leave the cloud backup on at all times" / "thats the point of it. if I am
|
||||||
|
travelling my site will go down otherwise." -> kscloud1 runs PERMANENTLY as a
|
||||||
|
3rd connector, NOT cold standby. Cloudflare Tunnel load-balances ACTIVE-ACTIVE
|
||||||
|
across all 3 connectors (no primary/backup priority). This means stateful apps
|
||||||
|
(gitforge, openproject, authentik, karakeep, kavita, openwebui, etc.) may show
|
||||||
|
DIFFERENT/STALE data depending on which connector serves a given request -
|
||||||
|
EXPLICITLY ACCEPTED by user as the cost of guaranteed uptime. Fresh/separate
|
||||||
|
databases on kscloud1 are fine; do not try to sync data between monk and kscloud1.
|
||||||
|
|
||||||
|
## kscloud1 access
|
||||||
|
SSH: `ssh -i ~/.ssh/id_ed25519_kscloud1 kenpat@5.78.233.28` (passwordless, key auth).
|
||||||
|
sudo needs a password ("p12217177") and has no askpass helper - avoid sudo;
|
||||||
|
most things doable as kenpat or via docker.
|
||||||
|
All services live under `/opt/kitestacks/docker/<service>/docker-compose.yml`,
|
||||||
|
same one-dir-per-app pattern as monk's `~/kitestacks-live/docker/`.
|
||||||
|
|
||||||
|
## kscloud1 services deployed (all `docker compose up -d`, joined to local `kitestacks` network)
|
||||||
|
- cloudflared (3rd tunnel connector, same TUNNEL_TOKEN, connector id 78521d9f-71c0-4e3d-992f-bd1f77da1a8f)
|
||||||
|
- homepage-backup (alias `homepage`) + caddy + kitestacks-metrics-api-backup - PRE-EXISTING
|
||||||
|
- forgejo (alias `forgejo`) - PRE-EXISTING, separate DB from monk's (gitforge data inconsistent across connectors, accepted)
|
||||||
|
- prometheus + node-exporter (job `kscloud1-node`)
|
||||||
|
- grafana (alias `grafana`, port 3150) - Prometheus datasource (uid 000000001) + "Node Exporter Full"
|
||||||
|
dashboard (id 1860) provisioned via `./provisioning/`. OAuth->authentik config present but
|
||||||
|
authentik on kscloud1 has a FRESH db (no provider apps configured) so OAuth login won't work
|
||||||
|
there; local admin login works.
|
||||||
|
- uptime-kuma (alias `status`->`uptime-kuma`) - kuma.db seeded by copying monk's admin user
|
||||||
|
(same login: kenpat / same password hash) + monitors: kscloud1 self-ping, Google DNS, and
|
||||||
|
HTTP checks for all 9 *.kitestacks.com subdomains (external monitoring of the live site).
|
||||||
|
- kavita (alias `kavita`) - empty library (fresh)
|
||||||
|
- karakeep + karakeep-chrome + karakeep-meilisearch (alias `karakeep`) - fresh meilisearch/db
|
||||||
|
- authentik + authentik-worker + authentik-postgres + authentik-redis (alias on `auth`) - FRESH DB.
|
||||||
|
Bootstrap admin: `akadmin@kitestacks.com` / password `6KlYpfCyYxbnKQNiOewN` (set via
|
||||||
|
AUTHENTIK_BOOTSTRAP_PASSWORD in .env). No OAuth provider apps exist yet (would need to be
|
||||||
|
manually recreated in authentik UI for grafana/openwebui/karakeep/openproject SSO to work
|
||||||
|
when kscloud1 is the active backend).
|
||||||
|
- kite-litellm + kite-openwebui (alias `ai`->openwebui) - same .env/secrets as monk. OpenWebUI
|
||||||
|
has `ENABLE_SIGNUP=true` (changed from monk's `false`) so kenpat can create a local admin
|
||||||
|
account on first use, since authentik OAuth won't work with kscloud1's fresh authentik.
|
||||||
|
- openproject (alias on `tasks`, port 8090:80 host - port 80 was taken by caddy) - FRESH db,
|
||||||
|
self-initialized via the all-in-one image's bootstrap (took ~3-4 min). Empty/no projects yet.
|
||||||
|
|
||||||
|
## monk-side changes made for cross-host monitoring
|
||||||
|
- `~/kitestacks-live/docker/prometheus/prometheus.yml`: added scrape job
|
||||||
|
`kscloud1-node` -> `5.78.233.28:9100` (kscloud1's node-exporter is exposed
|
||||||
|
0.0.0.0:9100, no firewall - reachable from monk's public IP). monk's grafana
|
||||||
|
(the live one, "Node Exporter Full" dashboard now provisioned via
|
||||||
|
`~/kitestacks-live/docker/grafana/provisioning/`) shows BOTH `t14-node`
|
||||||
|
(monk/"this pc") and `kscloud1-node` ("the cloud") via the instance picker.
|
||||||
|
- kscloud1's prometheus only scrapes itself (`kscloud1-node`) - monk is behind
|
||||||
|
home NAT, not reachable from kscloud1.
|
||||||
|
|
||||||
|
## Resource notes (kscloud1: 3 vCPU, 3.7GB RAM + 6GB swap, 75GB disk)
|
||||||
|
With all services running: ~2.8-2.9GB RAM used, ~2.6-2.8GB swap used (of 6GB),
|
||||||
|
~835MB-1.2GB "available", disk 29GB/75GB used. Site is functional but under
|
||||||
|
memory pressure - if BOTH monk and assassin are down for an extended period
|
||||||
|
with real concurrent usage, expect sluggishness (esp. openproject/authentik/
|
||||||
|
openwebui). Not yet stress-tested under real failover load.
|
||||||
|
|
||||||
|
## Key gotchas from THIS phase (cloud failover build-out)
|
||||||
|
- kscloud1's `kitestacks` Docker network is LOCAL/separate from monk's (same name,
|
||||||
|
no conflict). cloudflared on each host resolves container names against its
|
||||||
|
own host's network.
|
||||||
|
- Adding a new tunnel connector that lacks a backend for an ingress hostname ->
|
||||||
|
502 for requests routed there. If it has a DIFFERENT backend (e.g. forgejo) ->
|
||||||
|
serves different data inconsistently. Both accepted/expected now that all 9
|
||||||
|
hostnames have backends on kscloud1.
|
||||||
|
- port 80 on kscloud1 is owned by `caddy` (serves www-backup/git-backup.kitestacks.com
|
||||||
|
direct A-records, pre-existing, unrelated to the tunnel) - openproject uses 8090:80
|
||||||
|
for its host port instead (internal container port 8080 is what cloudflared hits).
|
||||||
|
- uptime-kuma / grafana have no simple file-based config API for monitors/datasources
|
||||||
|
beyond grafana provisioning - used direct sqlite manipulation (`docker exec ... sqlite3`,
|
||||||
|
or python3 sqlite3 module via a throwaway `python:3-alpine` container with the volume
|
||||||
|
mounted) to seed uptime-kuma's kuma.db with users/monitors.
|
||||||
|
- authentik first boot takes ~1-2 min (migrations); openproject first boot takes
|
||||||
|
~4-5 min (postgres initdb + Rails migrations + Puma boot), watch `docker logs`
|
||||||
|
for "Listening on http://0.0.0.0:8080" before testing.
|
||||||
|
|
||||||
|
## Authentik/Kavita login fix (2026-06-10, post cloud-failover)
|
||||||
|
PROBLEM: Cloudflare Tunnel load-balances auth./kavita. active-active across monk
|
||||||
|
and kscloud1. kscloud1's authentik had only the fresh `akadmin` bootstrap user
|
||||||
|
(not kenpat7177) and kscloud1's kavita had ZERO users -> ~50% of requests showed
|
||||||
|
"wrong password" on authentik and a "create admin account" (signup) screen on
|
||||||
|
kavita instead of login. This contradicts the earlier "fresh DBs are fine"
|
||||||
|
assumption - for IDENTITY apps it breaks login, so it was NOT acceptable.
|
||||||
|
FIX APPLIED (one-time sync, same pattern as uptime-kuma's kuma.db seed):
|
||||||
|
- pg_dump'd monk's authentik-postgres `authentik` db (--clean --if-exists),
|
||||||
|
scp'd to kscloud1, stopped authentik+authentik-worker on kscloud1, restored
|
||||||
|
via `docker exec -i authentik-postgres psql -U authentik -d authentik < dump`,
|
||||||
|
restarted. Worked cleanly because AUTHENTIK_SECRET_KEY and PG_PASS were
|
||||||
|
ALREADY IDENTICAL between monk's and kscloud1's authentik/.env.
|
||||||
|
- For kavita: copying the raw kavita.db file via plain `cp` produced
|
||||||
|
"database disk image is malformed" (WAL-mode db isn't standalone-consistent
|
||||||
|
as a flat file copy even when -wal/-shm look small). FIX: use python3
|
||||||
|
sqlite3 `Connection.backup()` (via throwaway python:3-alpine container) to
|
||||||
|
produce a consistent copy, THEN on kscloud1 stop kavita, rm the OLD
|
||||||
|
kavita.db-shm/kavita.db-wal too (stale WAL files against new db = same
|
||||||
|
corruption error), copy in the new kavita.db (chown root:root, chmod 644
|
||||||
|
to match original ownership - kavita container runs as root), restart.
|
||||||
|
- Result: kscloud1 authentik now has kenpat7177 (matches monk), kscloud1
|
||||||
|
kavita now has kenpat7177 + acurrie (matches monk). Both connectors now
|
||||||
|
return the same login screen/credentials. NOTE: this is a ONE-TIME sync,
|
||||||
|
not continuous - if monk's users/passwords change later, kscloud1 will
|
||||||
|
drift again and the same symptoms could return; re-run this sync if so.
|
||||||
|
- kscloud1 kavita's library entries point at /books paths that don't exist on
|
||||||
|
kscloud1 (no actual book files there) - login works fine, but browsing the
|
||||||
|
library when served by kscloud1 will show entries with missing files. Same
|
||||||
|
"stale data" tradeoff as gitforge, accepted.
|
||||||
|
|
||||||
|
## Authentik shared Postgres+Redis over Tailscale (2026-06-10) - fixes "invalid_grant" SSO
|
||||||
|
PROBLEM: Even after the one-time DB sync above, "Sign in with Authentik" on
|
||||||
|
Kavita could fail with "invalid_grant" / "Code does not exist". Root cause:
|
||||||
|
monk and kscloud1 each ran their OWN authentik-postgres. OAuth2 authorization
|
||||||
|
codes are short-lived per-flow rows in `authentik_providers_oauth2_authorizationcode`
|
||||||
|
- if Cloudflare Tunnel's active-active routing sends `/authorize` to one
|
||||||
|
connector and `/application/o/token/` to the other, the code only exists in
|
||||||
|
one of the two DBs -> invalid_grant. A one-time data sync can't fix this
|
||||||
|
because the data is created fresh on every login attempt.
|
||||||
|
FIX: Converted to a single shared Postgres+Redis (HA pattern), hosted on
|
||||||
|
kscloud1, reachable ONLY over Tailscale:
|
||||||
|
- Installed Tailscale on both monk and kscloud1 (same tailnet). kscloud1's
|
||||||
|
tailscale IP is `100.123.254.52`.
|
||||||
|
- kscloud1's `/opt/kitestacks/docker/authentik/docker-compose.yml`:
|
||||||
|
authentik-postgres now binds `100.123.254.52:5432:5432` (was unbound/internal-only),
|
||||||
|
authentik-redis now binds `100.123.254.52:6379:6379`. Both still also reachable
|
||||||
|
on the local `kitestacks` docker network for kscloud1's own authentik+worker.
|
||||||
|
Backup of pre-change file: `docker-compose.yml.backup-before-shared-db-20260610-1138`.
|
||||||
|
- monk's `~/kitestacks-live/docker/authentik/docker-compose.yml`: REMOVED the
|
||||||
|
`postgresql` and `redis` services entirely. monk's `authentik`/`authentik-worker`
|
||||||
|
now point `AUTHENTIK_POSTGRESQL__HOST` and `AUTHENTIK_REDIS__HOST` at
|
||||||
|
`100.123.254.52` (kscloud1 over Tailscale), using the same `PG_PASS` /
|
||||||
|
`AUTHENTIK_SECRET_KEY` as before (already identical between hosts).
|
||||||
|
- monk's old local `authentik-postgres`/`authentik-redis` containers were
|
||||||
|
STOPPED (not removed) - data dirs preserved under
|
||||||
|
`~/kitestacks-live/docker/authentik/postgres` in case of rollback, but no
|
||||||
|
longer in use.
|
||||||
|
- Result: BOTH connectors' authentik+worker now read/write the SAME db/redis,
|
||||||
|
regardless of which one handles `/authorize` vs `/application/o/token/`.
|
||||||
|
Verified both `authentik`+`authentik-worker` healthy on monk and kscloud1,
|
||||||
|
OIDC discovery docs identical, user list matches (`kenpat7177` etc.) on both.
|
||||||
|
CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" on Kavita now works
|
||||||
|
(when monk's connector serves the request).
|
||||||
|
|
||||||
|
## Kavita "Sign in with Authentik" button missing on kscloud1 - FIXED 2026-06-10
|
||||||
|
After the shared-Authentik-DB fix above, the button still didn't appear when
|
||||||
|
Cloudflare routed kavita.kitestacks.com to kscloud1's connector. CAUSE: Kavita's
|
||||||
|
OIDC config lives in ITS OWN db (kavita.db `ServerSetting` table, Key=40, a JSON
|
||||||
|
blob with Authority/ClientId/Secret/Enabled), separate from Authentik's db.
|
||||||
|
The earlier one-time kavita.db sync (see fix above) was taken BEFORE OIDC SSO
|
||||||
|
was configured in monk's Kavita, so kscloud1's copy had Key=40 with empty
|
||||||
|
Authority/Secret and `"Enabled":false`. FIX: copied monk's Key=40 JSON value
|
||||||
|
verbatim into kscloud1's kavita.db (stop kavita, `docker run --rm -v
|
||||||
|
.../kavita/config:/data -v fix.sql:/fix.sql alpine` + apk sqlite + `sqlite3
|
||||||
|
/data/kavita.db < fix.sql` with `UPDATE ServerSetting SET Value='...' WHERE
|
||||||
|
"Key"=40`, restart kavita). NOTE: AspNetUserLogins (OIDC account-linkage table)
|
||||||
|
is empty on BOTH monk and kscloud1 - Kavita creates this row on first OIDC
|
||||||
|
login per-instance (matches existing local user by email since
|
||||||
|
ProvisionAccounts=false), so no extra action needed there.
|
||||||
|
GOTCHA: ServerSetting's PK column is `"Key"` (INTEGER), not `Id` - must quote
|
||||||
|
it in sqlite (`"Key"`) since KEY is a SQL reserved word.
|
||||||
|
DRIFT WARNING: any future Kavita server-setting change (OIDC config, library
|
||||||
|
paths, etc.) made on monk will NOT propagate to kscloud1's kavita.db
|
||||||
|
automatically - same one-time-sync caveat as the user-table sync above.
|
||||||
|
|
||||||
|
UPDATE 2026-06-10 (RESOLVED via Kavita UI, not direct DB edit): Direct SQL
|
||||||
|
edits to ServerSetting Key=40 got WIPED back to disabled/empty by Kavita on
|
||||||
|
every container restart (RowVersion incremented +2 each time, Authority/Secret
|
||||||
|
cleared, Enabled->false) - confirmed twice, even with a full WAL-consistent
|
||||||
|
kavita.db replace from monk. Direct DB writes to this table do NOT survive a
|
||||||
|
restart; only saves through Kavita's own Settings UI/API persist correctly.
|
||||||
|
FIX: opened an SSH local port-forward (`ssh -L 5099:localhost:5000
|
||||||
|
kenpat@5.78.233.28`) so the user could reach kscloud1's Kavita directly at
|
||||||
|
http://localhost:5099 (bypassing the Cloudflare load-balanced domain), logged
|
||||||
|
in with their normal kenpat7177 Kavita password, and re-entered the OIDC
|
||||||
|
config in Settings -> OIDC:
|
||||||
|
- Authority: `https://auth.kitestacks.com/application/o/kavita/`
|
||||||
|
(MUST include trailing slash - Kavita validates that this exactly matches
|
||||||
|
the `issuer` claim in Authentik's `.well-known/openid-configuration`,
|
||||||
|
which has a trailing slash. Without it: "Kavita can load the OIDC
|
||||||
|
configuration, but the issuer does not match".)
|
||||||
|
- Client ID: `kavita`, Client Secret: (96-hex-char secret from Authentik's
|
||||||
|
Kavita OAuth2 provider - watch for copy/paste truncation, verify length=96)
|
||||||
|
- Enabled: true, ProviderName: authentik
|
||||||
|
Saved via UI (RowVersion 8->12->14 across two saves to fix a 1-char-truncated
|
||||||
|
secret), then `docker compose restart kavita` on kscloud1 - config SURVIVED
|
||||||
|
this restart (unlike the direct-SQL attempts) and `/api/settings/oidc` now
|
||||||
|
reports `"enabled": true`. SSH tunnel closed afterward (no firewall changes
|
||||||
|
were made/needed). Set a temporary ApiKey on kenpat7177's kscloud1 kavita
|
||||||
|
account during troubleshooting (for a Plugin/authenticate attempt that turned
|
||||||
|
out to return 401 / unused) - left in place, harmless (grants API access to
|
||||||
|
that user's own account only).
|
||||||
|
TAKEAWAY FOR FUTURE KAVITA CONFIG CHANGES ON KSCLOUD1: always use the Kavita
|
||||||
|
UI (via SSH port-forward to localhost:5000) rather than direct sqlite edits -
|
||||||
|
direct edits to ServerSetting do not survive a restart.
|
||||||
|
CONFIRMED BY USER 2026-06-10: "Sign in with Authentik" now works on Kavita
|
||||||
|
regardless of which connector (monk/kscloud1) answers.
|
||||||
|
|
||||||
|
## Kavita cover images missing on kscloud1 - FIXED 2026-06-10
|
||||||
|
After the kavita.db sync from monk, kscloud1's db referenced cover image files
|
||||||
|
(e.g. `v1_c1.png`..`v10_c10.png` in `ServerSetting`/`Series.CoverImage`) that
|
||||||
|
didn't exist on kscloud1's filesystem - kscloud1's `config/covers/` dir was
|
||||||
|
empty (monk has 9 files, ~1.3MB). Result: book/series cover thumbnails didn't
|
||||||
|
load when kscloud1 served the request. FIX: tar'd monk's
|
||||||
|
`~/kitestacks-live/docker/kavita/config/covers/` (owned 1000:1000), scp'd to
|
||||||
|
kscloud1, extracted into `/opt/kitestacks/docker/kavita/config/covers/` via a
|
||||||
|
throwaway alpine container, `chown -R 1000:1000`. No kavita restart needed -
|
||||||
|
covers are served as static files from disk. CONFIRMED BY USER: covers now
|
||||||
|
load correctly.
|
||||||
|
NOTE: this is another one-time sync (same drift caveat) - if new books/covers
|
||||||
|
are added on monk later, they won't appear on kscloud1 unless re-synced
|
||||||
|
(covers/ dir + kavita.db + actual book files under library/books, none of
|
||||||
|
which exist on kscloud1 per the earlier "stale data" note).
|
||||||
|
SECURITY NOTE: postgres/redis on kscloud1 are bound to the Tailscale interface
|
||||||
|
IP only (100.123.254.52), not 0.0.0.0 - not exposed to the public internet.
|
||||||
|
ROLLBACK: if Tailscale connectivity ever breaks, monk's authentik will fail to
|
||||||
|
start (can't reach 100.123.254.52). To roll back: restore monk's
|
||||||
|
docker-compose.yml from git/backup to use local postgresql/redis services
|
||||||
|
again, restart monk's old authentik-postgres/authentik-redis containers
|
||||||
|
(`docker start authentik-postgres authentik-redis` in
|
||||||
|
~/kitestacks-live/docker/authentik), `docker compose up -d`. Note this would
|
||||||
|
mean monk's authentik db is now STALE (kscloud1's shared db has any logins/
|
||||||
|
changes since 2026-06-10) - would need a fresh pg_dump sync from kscloud1 first.
|
||||||
|
|
||||||
|
## kscloud1 ufw blocks docker-bridge -> host port 8000 (metrics API) - FIXED 2026-06-10
|
||||||
|
kscloud1 has ufw active with `default deny incoming/routed`. The
|
||||||
|
kitestacks-metrics-api-backup container (network_mode: host, binds 0.0.0.0:8000)
|
||||||
|
was unreachable from homepage-backup via `host.docker.internal:8000` (TCP
|
||||||
|
timeout, not refused -> ufw drop), causing the homepage System Status widget to
|
||||||
|
show 0%/"Offline" when kscloud1 served the request. FIXED by adding:
|
||||||
|
`sudo ufw allow from 172.16.0.0/12 to any port 8000 proto tcp` (covers all
|
||||||
|
docker bridge subnets on this host: 172.17-172.29.x.x). Verified
|
||||||
|
homepage-backup -> host.docker.internal:8000/api/metrics now returns real
|
||||||
|
CPU/RAM/storage/network data. kscloud1 sudo password is in the "kscloud1
|
||||||
|
access" section above - needed `echo PASS | sudo -S <cmd>` (no askpass helper,
|
||||||
|
non-interactive sudo via -S works fine).
|
||||||
|
|
||||||
|
## Portal SSO/coming-soon push + Karakeep fix + OpenProject EE blocker + Portainer SSO setup (2026-06-10)
|
||||||
|
Per user's original request: SSO for grafana/prometheus/portainer/cloudflare/uptime-kuma
|
||||||
|
(authentik=source), KITEAI/LITELLM/OPENROUTER -> "coming soon", SSO for openproject/forgejo/
|
||||||
|
karakeep, FluxCD card in AI&Automation. Hard constraint: user does NOT use Cloudflare Zero
|
||||||
|
Trust/Access ("costs money") - any Cloudflare work must avoid those products (note: the
|
||||||
|
Tunnel UI itself lives under Cloudflare's "Zero Trust" dashboard section, but configuring a
|
||||||
|
Tunnel public hostname there is NOT the same as enabling Zero Trust/Access - fine to use).
|
||||||
|
|
||||||
|
### Portal UI changes - DEPLOYED to all 3 copies, verified live
|
||||||
|
Edited the AI & AUTOMATION panel (`cards cards-3` -> `cards cards-2`, now 2x2):
|
||||||
|
Kite AI and OpenRouter cards changed from external links to
|
||||||
|
`href="#" data-coming-soon="1"` (LiteLLM was already coming-soon); added a 4th
|
||||||
|
card "FluxCD" / "GitOps Automation" using `/images/icons/fluxcd.png`, also
|
||||||
|
coming-soon (automation scripts with FluxCD+Prometheus+Grafana+node-exporter
|
||||||
|
are a future project). Applied identically to:
|
||||||
|
- `~/kitestacks-live/docker/kitestacks-portal-test/public/index.html` (monk, dev, port 3008)
|
||||||
|
- `~/kitestacks-live/docker/kitestacks-portal/public/index.html` (monk, LIVE, served by
|
||||||
|
"homepage" container 3005->3000 - this is the file that backs www.kitestacks.com)
|
||||||
|
- `/opt/kitestacks/docker/www-backup/kitestacks-portal/public/index.html` (kscloud1,
|
||||||
|
served by `homepage-backup` port 3015)
|
||||||
|
Verified `https://www.kitestacks.com` returns "FluxCD" consistently (6/6 requests
|
||||||
|
across both connectors).
|
||||||
|
NOTE: Portainer card on the live portal is currently `data-coming-soon="1"` -
|
||||||
|
update this to a real `href="https://portainer.kitestacks.com"` link (remove
|
||||||
|
data-coming-soon) once the Portainer SSO manual steps below are completed.
|
||||||
|
NOTE 2: "cloudflare should all be in the networking side" from the original
|
||||||
|
request was never resolved - Cloudflare card is still in the INFRASTRUCTURE
|
||||||
|
panel, not moved/renamed to a "NETWORKING" panel. Ambiguous, deprioritized,
|
||||||
|
not revisited.
|
||||||
|
|
||||||
|
### Karakeep SSO redirect_uri fix - DONE, confirmed working
|
||||||
|
Karakeep uses NextAuth.js with provider id "custom" (not "authentik") - actual
|
||||||
|
OAuth callback path is `/api/auth/callback/custom`, but Authentik's Karakeep
|
||||||
|
OAuth2Provider's `_redirect_uris` had the wrong path -> "Redirect URI Error".
|
||||||
|
FIX: direct Postgres UPDATE to
|
||||||
|
`authentik_providers_oauth2_oauth2provider._redirect_uris` (JSON column) on
|
||||||
|
the shared kscloud1 authentik-postgres (100.123.254.52), wrapped in explicit
|
||||||
|
`BEGIN; UPDATE ...; COMMIT;` (a bare single-statement -c "UPDATE..." reported
|
||||||
|
"UPDATE 1" but did NOT persist on first attempt - cause unclear, explicit
|
||||||
|
transaction fixed it). After the DB write, restarted authentik+authentik-worker
|
||||||
|
on BOTH monk and kscloud1 and polled
|
||||||
|
`docker inspect --format '{{.State.Health.Status}}'` until both reported
|
||||||
|
"healthy" (~50s) before retesting - first retest hit a transient 502 because
|
||||||
|
kscloud1's authentik was still "starting". CONFIRMED: Authentik now serves the
|
||||||
|
login page (not "Redirect URI Error") for Karakeep SSO.
|
||||||
|
PG_PASS GOTCHA: `~/kitestacks-live/docker/authentik/.env` PG_PASS value ends in
|
||||||
|
`=` - extract with `cut -d= -f2-` (NOT `-f2`, which truncates the trailing `=`
|
||||||
|
and causes "password authentication failed").
|
||||||
|
REUSABLE PATTERN for any future direct Authentik DB edit: (1) wrap writes in
|
||||||
|
explicit BEGIN/COMMIT, (2) restart authentik+authentik-worker on BOTH monk and
|
||||||
|
kscloud1, (3) wait for health=healthy on both before testing.
|
||||||
|
|
||||||
|
### OpenProject SSO - config bug fixed, but BLOCKED by Enterprise licensing (no further action possible)
|
||||||
|
`~/kitestacks-live/docker/openproject/docker-compose.yml` env vars were wrong in
|
||||||
|
two ways: (1) extra "PROVIDERS_" segment in var names caused
|
||||||
|
`seed_oidc_provider = {"providers": {"authentik": {...}}}` instead of
|
||||||
|
`{"authentik": {...}}`, producing a broken stub provider record (slug=
|
||||||
|
"providers", id=1, since deleted via Rails runner); (2) `discovery_endpoint`
|
||||||
|
isn't read by `ConfigurationMapper` at all - replaced with explicit
|
||||||
|
ISSUER/AUTHORIZATION__ENDPOINT/TOKEN__ENDPOINT/USERINFO__ENDPOINT/
|
||||||
|
END__SESSION__ENDPOINT/JWKS__URI vars (current docker-compose.yml has the
|
||||||
|
corrected version, see file - all derived from
|
||||||
|
`https://auth.kitestacks.com/application/o/openproject/.well-known/openid-configuration`).
|
||||||
|
After fixing both, the seeder correctly creates provider slug="authentik",
|
||||||
|
available=true, all fields correct - BUT the SSO button still does not appear
|
||||||
|
on /login. CONFIRMED ROOT CAUSE (terminal, source-code-verified): OpenProject
|
||||||
|
CE 2025/v15's OmniAuth SSO strategy
|
||||||
|
(`OpenProject::Plugins::AuthPlugin`/`OpenIDConnect`) AND SAML
|
||||||
|
(`auth_saml/lib/open_project/auth_saml/engine.rb`, `enterprise_feature:
|
||||||
|
"sso_auth_providers"`) are BOTH gated behind an Enterprise Edition license -
|
||||||
|
"OmniAuth SSO strategy ... is only available for Enterprise Editions". No
|
||||||
|
app/config-level workaround exists. Only remaining options: buy EE license, OR
|
||||||
|
put a forward-auth proxy (oauth2-proxy / Authentik embedded outpost) in front
|
||||||
|
of OpenProject - DEFERRED along with Prometheus/Uptime Kuma proxy work (see
|
||||||
|
below) until Oracle VPS topology is decided.
|
||||||
|
OpenProject container is healthy, `/login` returns 200, no projects yet.
|
||||||
|
|
||||||
|
### Portainer SSO - Authentik side DONE, two manual steps PENDING (not yet done by user)
|
||||||
|
Per user: "yes continue with portainer" / "yes but make sure it is still
|
||||||
|
secure" (approved exposing Portainer publicly via a NEW Cloudflare Tunnel
|
||||||
|
hostname, with explicit requirement to keep it secure -> access restricted to
|
||||||
|
the `homelab-admin` Authentik group).
|
||||||
|
Created via `docker exec authentik ak shell` (Django ORM, no Authentik API
|
||||||
|
token configured) on kscloud1's shared authentik-postgres:
|
||||||
|
- OAuth2Provider "Portainer": client_id=`portainer`,
|
||||||
|
client_secret=`wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF`,
|
||||||
|
provider_id=9, redirect_uri=`https://portainer.kitestacks.com` (strict),
|
||||||
|
scopes openid/email/profile, sub_mode=user_email, signing key + flows copied
|
||||||
|
from existing providers (same pattern as Karakeep/Grafana).
|
||||||
|
- Application "Portainer" (slug="portainer", meta_launch_url=
|
||||||
|
`https://portainer.kitestacks.com`).
|
||||||
|
- PolicyBinding restricting the Portainer application to Authentik group
|
||||||
|
`homelab-admin` (UUID e21b0aa5-62e7-4b3a-8302-130b0ae148a5) - this is the
|
||||||
|
"make sure it is still secure" piece (only homelab-admin members can SSO in).
|
||||||
|
- Verified discovery doc resolves:
|
||||||
|
`https://auth.kitestacks.com/application/o/portainer/.well-known/openid-configuration`.
|
||||||
|
PENDING MANUAL STEPS (user must do via UI - confirmed `portainer.kitestacks.com`
|
||||||
|
still returns `000` as of 2026-06-10):
|
||||||
|
1. Cloudflare dashboard -> Tunnel -> add Public Hostname `portainer.kitestacks.com`
|
||||||
|
-> service `https://portainer:9443` (HTTPS), enable "No TLS Verify". (This is
|
||||||
|
in the Tunnel config UI, which Cloudflare happens to host under the "Zero
|
||||||
|
Trust" nav section, but adding a Tunnel hostname is NOT enabling Zero
|
||||||
|
Trust/Access - does not violate the no-Zero-Trust constraint.)
|
||||||
|
2. In Portainer -> Settings -> Authentication -> OAuth (Provider: Custom), on
|
||||||
|
BOTH monk's and kscloud1's SEPARATE Portainer instances, configure:
|
||||||
|
- Client ID: `portainer`
|
||||||
|
- Client Secret: `wTim3mrMwt34ko1RYMvK1RNnjwWOMi_d4r4cS6exr7DjozCrL5zKthHl-5KjargF`
|
||||||
|
- Authorization URL: `https://auth.kitestacks.com/application/o/authorize/`
|
||||||
|
- Access Token URL: `https://auth.kitestacks.com/application/o/token/`
|
||||||
|
- Resource/Userinfo URL: `https://auth.kitestacks.com/application/o/userinfo/`
|
||||||
|
- Redirect URL: `https://portainer.kitestacks.com`
|
||||||
|
- Logout URL: `https://auth.kitestacks.com/application/o/portainer/end-session/`
|
||||||
|
- Scopes: `openid email profile`, User identifier claim: `email`
|
||||||
|
AFTER both steps done: update the live portal's Portainer card (in the 3 files
|
||||||
|
above) from `data-coming-soon="1"` to a real
|
||||||
|
`href="https://portainer.kitestacks.com" target="_blank" rel="noopener"` link.
|
||||||
|
|
||||||
|
### App-level SSO status summary (end of 2026-06-10 session)
|
||||||
|
Grafana: working (pre-existing). Forgejo: working (pre-existing). Karakeep:
|
||||||
|
fixed this session, working. OpenProject: blocked by EE license (terminal at
|
||||||
|
app level). Portainer: Authentik side done, waiting on user's 2 manual steps
|
||||||
|
above. Prometheus + Uptime Kuma: DEFERRED - neither has native OAuth, need a
|
||||||
|
forward-auth proxy (oauth2-proxy or Authentik embedded outpost) - deferred per
|
||||||
|
user's "ok lets do smaller app level" (hold new infra until Oracle VPS decided).
|
||||||
|
Cloudflare itself: no SSO concept applicable (it's Cloudflare's own dashboard
|
||||||
|
login) - was always about the portal's Cloudflare card placement, see "Portal UI
|
||||||
|
changes" note above.
|
||||||
|
|
||||||
|
### Oracle VPS migration - PLANNED, upcoming (stated 2026-06-11)
|
||||||
|
User confirmed on 2026-06-11: "we are going to switch things soon from hetzner
|
||||||
|
cloud to oracle soon." -> kscloud1 (Hetzner, 5.78.233.28) is intended to be
|
||||||
|
REPLACED by an Oracle Cloud VPS in the near future ("soon", no firm date yet).
|
||||||
|
Originally raised 2026-06-10 as exploratory ("how easy would it be to move
|
||||||
|
everything to oracle vps after?"), now an actual plan.
|
||||||
|
Implication: avoid investing further one-off/manual config work that's hard to
|
||||||
|
redo (e.g. more one-time DB syncs, hand-edited sqlite, etc.) on kscloud1 if
|
||||||
|
avoidable - prefer changes that are easy to replicate on a new host. When the
|
||||||
|
Oracle VPS is provisioned, plan to follow the same pattern as the kscloud1
|
||||||
|
cloud-failover build-out (new Cloudflare Tunnel connector + full service
|
||||||
|
replicas + shared Authentik/Postgres/Redis over Tailscale + the Forgejo
|
||||||
|
FORGEJO_API_BASE-over-Tailscale pattern for the portal's Recent Activity, see
|
||||||
|
"Recent Activity" fix below) - then retire kscloud1 the same way assassin/T14
|
||||||
|
was retired (decommission once Oracle replica verified working).
|
||||||
|
|
||||||
|
## Prior migration gotchas (monk, kept for reference - see git history/old notes if needed)
|
||||||
|
- rsync --files-from recursion bug, bind-mount postgres dirs come over empty as
|
||||||
|
non-root (use pg_dumpall/pg_dump --clean from running container instead),
|
||||||
|
pg_dumpall --clean across template1 breaks on client/server version mismatch
|
||||||
|
(use single-db pg_dump+psql instead), grafana data dir needs chown 472:472,
|
||||||
|
kite-litellm needed manual `docker network connect kitestacks kite-litellm`.
|
||||||
|
|
||||||
|
## 2026-06-12: SSO fixes + Portainer deployed on kscloud1
|
||||||
|
|
||||||
|
### Root cause: monk reconnect race condition
|
||||||
|
When monk goes offline (user travels) and reconnects, Cloudflare starts routing
|
||||||
|
some token exchange requests to monk while codes were created on kscloud1 during
|
||||||
|
the offline window. Auth codes had a 60-second TTL, which expired before monk's
|
||||||
|
Authentik fully started (~5 min startup). FIX: increased `access_code_validity`
|
||||||
|
from `minutes=1` to `minutes=10` for ALL 9 OAuth2 providers in the shared Postgres
|
||||||
|
DB. This gives enough buffer for monk's containers to start before codes expire.
|
||||||
|
Command used (via python:3-alpine container):
|
||||||
|
`docker run --rm --network host -v /tmp/fix_auth.py:/fix.py python:3-alpine sh -c ...`
|
||||||
|
connecting to shared Postgres at 100.123.254.52.
|
||||||
|
|
||||||
|
### Karakeep redirect_uri reverted and re-fixed
|
||||||
|
The Karakeep OAuth2Provider `_redirect_uris` had reverted back to the proxy pattern
|
||||||
|
(`/outpost.goauthentik.io/callback?...`) instead of the correct NextAuth callback
|
||||||
|
(`https://links.kitestacks.com/api/auth/callback/custom`). This caused "Redirect URI
|
||||||
|
Error" from Authentik whenever SSO was attempted. Root cause unknown (possibly an
|
||||||
|
Authentik blueprint or UI save that regenerated/overrode the field). FIX: same
|
||||||
|
Postgres UPDATE pattern. WATCH: if this reverts again, check Authentik blueprints
|
||||||
|
or if someone modified the Karakeep provider via the Authentik admin UI.
|
||||||
|
|
||||||
|
### Portainer deployed on kscloud1
|
||||||
|
Created `/opt/kitestacks/docker/portainer/docker-compose.yml` (same image/config as
|
||||||
|
monk's portainer). Container running as `portainer`, port 9443:9443, on `kitestacks`
|
||||||
|
network. Volume is local (NOT shared with monk - fresh Portainer instance).
|
||||||
|
STILL PENDING (user action in Cloudflare dashboard):
|
||||||
|
- Tunnel ID: 5e60ea8e-a543-49b6-bab5-325f39441e00, Account: d0bb7673333fcd794622956f1662f785
|
||||||
|
- Add hostname `portainer.kitestacks.com` → service `https://portainer:9443`, No TLS Verify
|
||||||
|
STILL PENDING (user action in both Portainer UIs): configure OAuth (see prior notes
|
||||||
|
in "Portainer SSO" section above for exact credentials).
|
||||||
|
Portal card update (3 files) also still pending until tunnel+OAuth done.
|
||||||
|
|
||||||
|
## Phase 2 Planned: Obsidian Mind Map → HTML Mind Map Sync
|
||||||
|
User wants to create an Obsidian mind map of the KiteStacks homelab that syncs/exports to a live HTML mind map embedded in the homelab portal or a standalone page. To be built after full Obsidian+samurai setup is complete.
|
||||||
Loading…
Add table
Add a link
Reference in a new issue