From 2233c3274fdb6280fea41d56303a7b6b9d76845c Mon Sep 17 00:00:00 2001 From: kenpat Date: Thu, 18 Jun 2026 21:08:26 +0000 Subject: [PATCH] Update architecture overview: add BookStack, fix 2-connector diagram --- architecture/overview.md | 375 ++++++++++++++++++--------------------- 1 file changed, 170 insertions(+), 205 deletions(-) diff --git a/architecture/overview.md b/architecture/overview.md index 83a1722..0868a0f 100644 --- a/architecture/overview.md +++ b/architecture/overview.md @@ -1,221 +1,186 @@ -# KiteStacks Architecture — Full System Overview +# KiteStacks Architecture Overview -## The Big Picture +**Last updated:** 2026-06-18 +**Status:** Active production homelab + +--- + +## High-Level Architecture ``` - INTERNET - │ - ┌──────▼──────┐ - │ Cloudflare │ DNS + TLS termination - │ (edge) │ Zero Trust Tunnel - └──────┬──────┘ - │ HTTPS (443) only - ┌────────────────┼────────────────┐ - │ connector 1 │ connector 2 │ connector 3 - │ │ │ - ┌──────▼──────┐ │ ┌──────▼──────┐ - │ MONK │ │ │ KSCLOUD1 │ - │ (home PC) │ │ │ (Hetzner VPS│ - │ │ Active │ │ 5.78.x.x) │ - │ All 9 │ Active │ │ │ - │ services │ │ │ All 9 │ - │ │ │ │ services │ - └──────┬──────┘ │ └──────┬──────┘ - │ │ │ - └────────────────┼───────────────┘ - TAILSCALE VPN - (100.x.x.x range) - │ - ┌────────▼────────┐ - │ SHARED DB LAYER │ - │ on kscloud1 │ - │ Postgres :5432 │ - │ Redis :6379 │ - │ (Tailscale │ - │ only, private)│ - └─────────────────┘ +┌──────────────────────────────────────────────────┐ +│ Public Internet │ +│ (via Cloudflare Tunnel) │ +└───────────────────────┬──────────────────────────┘ + │ + ┌─────────────▼──────────────┐ + │ Cloudflare Zero Trust │ + │ Active-Active Tunnel │ + └──────┬────────────┬────────┘ + │ │ + ┌────────────▼───┐ ┌─────▼──────────────┐ + │ monk (home) │ │ kscloud1 (Hetzner)│ + │ cloudflared │ │ cloudflared │ + │ All services │ │ Replica services │ + │ Tailscale mesh │ │ Shared Authentik DB │ + └────────────────┘ └─────────────────────┘ + │ │ + └────────────────────┘ + Tailscale overlay + (private network) +``` + +The two machines share one Cloudflare Tunnel token, so Cloudflare load-balances across both connectors automatically. If monk goes offline, kscloud1 continues serving all public subdomains within seconds. + +--- + +## Service Map + +### Identity & Access +| Service | Host | URL | Purpose | +|---------|------|-----|---------| +| Authentik server | monk | auth.kitestacks.com | SSO identity provider | +| Authentik worker | monk | (internal) | Background jobs, flow execution | +| Authentik LDAP | monk | (internal) | LDAP proxy for non-OIDC apps | +| Authentik PostgreSQL | kscloud1 | (Tailscale only) | Shared auth database | +| Authentik Redis | kscloud1 | (Tailscale only) | Session cache | + +### Infrastructure +| Service | Host | URL | Purpose | +|---------|------|-----|---------| +| cloudflared | monk + kscloud1 | (no UI) | CF Tunnel connector | +| Portainer | monk | portainer.kitestacks.com | Docker container management | +| Forgejo | monk | gitforge.kitestacks.com | Self-hosted Git (repos + CI) | +| Uptime Kuma | monk | status.kitestacks.com | Service uptime monitoring | + +### Observability +| Service | Host | URL | Purpose | +|---------|------|-----|---------| +| Prometheus | monk | (internal) | Metrics collection | +| Grafana | monk | grafana.kitestacks.com | Metrics dashboards | +| Node Exporter | monk | (internal) | Host OS metrics | +| Blackbox Exporter | monk | (internal) | External endpoint probing | + +### Knowledge & Productivity +| Service | Host | URL | Purpose | +|---------|------|-----|---------| +| BookStack | monk + kscloud1 | wiki.kitestacks.com | Internal wiki / documentation | +| Karakeep | monk | links.kitestacks.com | Bookmark manager | +| Kavita | monk | kavita.kitestacks.com | Ebook/manga reader | +| OSTicket | monk | tasks.kitestacks.com | Help desk / ticket system | +| ntfy | monk | (push notifications) | Push notifications | + +### AI Stack +| Service | Host | URL | Purpose | +|---------|------|-----|---------| +| Open WebUI | monk | ai.kitestacks.com | Chat interface (GPT-4, Claude, local) | +| LiteLLM | monk | (internal) | LLM API proxy / model router | + +### Portal +| Service | Host | URL | Purpose | +|---------|------|-----|---------| +| KiteStacks Portal | monk + kscloud1 | www.kitestacks.com | Custom homepage / service launcher | +| Metrics API | monk | (internal at /api) | FastAPI — live stats for portal | + +--- + +## Authentication Flow + +Every service uses Authentik SSO via OIDC or OAuth2: + +``` +Browser → https://service.kitestacks.com + │ + └─► Service: "Not logged in" → redirect to Authentik + │ + ▼ + https://auth.kitestacks.com/if/flow/... + │ + ├─ User logs in with username + password + ├─ Authentik validates credentials + └─ Issues authorization code → redirect back to service + │ + ▼ + Service exchanges code for tokens + Decodes JWT to get user info (email) + Creates local session +``` + +**BookStack-specific note:** `OIDC_ISSUER_DISCOVER=true` and `OIDC_ISSUER` must point to the per-app URL (`/application/o/bookstack/`), not the global Authentik URL. The Authentik provider must have `issuer_mode='per_provider'`. + +--- + +## Network Architecture + +### External Access +All public traffic enters via Cloudflare Tunnel. No ports are open on monk's router. kscloud1 (Hetzner) has no firewall rules open for HTTP/HTTPS either — all access via the same tunnel. + +### Internal Networking +- All Docker containers attach to the `kitestacks` bridge network +- Containers communicate using container names as DNS (e.g., `bookstack-db`, `prometheus`) +- Docker's embedded DNS server (`127.0.0.11`) resolves container names automatically + +### Tailscale Overlay +Tailscale creates an encrypted mesh between monk and kscloud1: +- Used for: Authentik PostgreSQL/Redis access, SSH to kscloud1, Prometheus scraping kscloud1 metrics +- Not used for: public traffic (that goes through Cloudflare) + +--- + +## Storage Layout + +### monk +``` +~/kitestacks-live/docker/ +├── authentik/ # media, custom-templates +├── bookstack/ # config/, db/ +├── cloudflared/ # .env (TUNNEL_TOKEN) +├── forgejo/ # data/ +├── grafana/ # grafana_data volume +├── karakeep/ # data/ +├── kavita/ # config/ +├── kitestacks-portal/ # static HTML + nginx +├── osticket/ # db/, uploads/ +├── portainer/ # portainer_data volume +└── prometheus/ # prometheus.yml, prometheus_data volume +``` + +### kscloud1 +``` +/opt/kitestacks/docker/ +├── authentik/ # postgresql data volume, redis data +├── bookstack/ # config/, db/ +├── cloudflared/ # .env (same TUNNEL_TOKEN) +└── ... # replica services ``` --- -## Every Service and What It Does +## Resilience Model -### The Nine Public Services - -| Service | Container Name | What It Does | Why It's Here | -|---------|---------------|--------------|---------------| -| **Portal** | `homepage` | The public website (kitestacks.com) — custom nginx serving static HTML/CSS/JS with a cyberpunk theme | Front door to everything. Shows system stats, recent activity, links to all services | -| **Authentik** | `authentik` | Identity provider — handles all logins via OIDC/OAuth2 SSO | Single place to manage all user accounts and access control | -| **Forgejo** | `forgejo` | Self-hosted Git platform (like GitHub but yours) | Store all homelab code, config, and documentation | -| **OpenProject** | `openproject` | Project management (like Jira) | Task tracking, project planning | -| **Open WebUI** | `kite-openwebui` | ChatGPT-like AI chat interface | Access multiple AI models through one interface | -| **Karakeep** | `karakeep` | Bookmark and read-it-later manager | Save links, articles, and content | -| **Kavita** | `kavita` | eBook and manga reader | Personal digital library | -| **Grafana** | `grafana` | Monitoring dashboards | Visualize CPU, RAM, network, uptime across both hosts | -| **Uptime Kuma** | `uptime-kuma` | Status page and uptime monitoring | Monitor that all 9 services are up and alert if they go down | - -### The Infrastructure Services (Not Public-Facing) - -| Service | What It Does | -|---------|-------------| -| `cloudflared` | Cloudflare Tunnel connector — creates encrypted outbound tunnel to Cloudflare edge | -| `prometheus` | Metrics collection — scrapes system stats from both monk and kscloud1 every 15 seconds | -| `node-exporter` | Exposes host system metrics (CPU, RAM, disk, network) for Prometheus to scrape | -| `kite-litellm` | LLM proxy gateway — routes AI requests to OpenRouter (multiple free models) | -| `portainer` | Docker management UI — visual interface to manage all containers | -| `kitestacks-metrics-api` | Python FastAPI service — serves real-time system stats, weather, and Forgejo activity to the portal | +| Scenario | Impact | Recovery | +|----------|--------|----------| +| monk goes offline | All monk services unreachable; kscloud1 serves portal + wiki | Automatic (CF Tunnel failover) | +| kscloud1 goes offline | Authentik logins may fail (DB unreachable); all other services up | Restart kscloud1 or point Authentik to local postgres | +| Cloudflare Tunnel down | All public access lost; Tailscale still works | Check CF dashboard; restart cloudflared | +| MariaDB crash (BookStack) | BookStack down | `docker restart bookstack-db` then `docker restart bookstack` | +| Portainer lockout | No Docker UI | Use `portainer/helper-reset-password` | --- -## How Traffic Flows +## Key Design Decisions -### When Someone Visits www.kitestacks.com +**Why Cloudflare Tunnel instead of port-forwarding?** +Port-forwarding exposes your home IP, requires a static IP, and can't failover. CF Tunnel is free, hides your IPs, and trivially supports multi-origin failover. -``` -1. Browser sends HTTPS request to www.kitestacks.com -2. DNS resolves to Cloudflare's anycast IP (not your home IP) -3. Cloudflare terminates TLS — your home router never sees HTTPS -4. Cloudflare routes the request through the tunnel to whichever - cloudflared connector responds first (monk or kscloud1) -5. cloudflared resolves "homepage" via Docker DNS -6. Request hits the nginx container serving the static portal -7. Portal's JavaScript fetches /api/metrics and /api/activity - from the kitestacks-metrics-api container via nginx proxy -8. Page renders with live system stats and recent git activity -``` +**Why active-active instead of active-passive?** +Active-passive requires detecting failure and switching. Active-active — same token, two connectors — Cloudflare handles routing automatically. Simpler and zero RPO. -### When Someone Clicks "Sign In with Authentik" +**Why Authentik over Keycloak or Authelia?** +Authentik is easier to self-host (Docker Compose, sensible defaults), has a good UI, and supports LDAP + OIDC + SAML. Authelia lacks SAML. Keycloak is heavier and more complex. -``` -1. App (e.g., Grafana) redirects browser to auth.kitestacks.com/application/o/authorize/ -2. Authentik presents login page -3. User enters credentials — Authentik validates against its database - (stored on kscloud1's Postgres, shared over Tailscale) -4. Authentik generates an authorization code and redirects back to Grafana -5. Grafana's backend calls auth.kitestacks.com/application/o/token/ - to exchange the code for an access token -6. Authentik validates the code (found in shared DB) and returns a JWT -7. Grafana reads the user's email/name from the JWT and logs them in -``` +**Why BookStack over Notion/Confluence?** +Self-hosted, no external API calls, Markdown-first, OIDC SSO. Data stays in-house. -**The critical detail:** Steps 1 and 5 can hit different tunnel connectors (monk vs kscloud1). The authorization code from step 4 must exist in whichever database step 5 hits. That's why both connectors point to the SAME Postgres on kscloud1 — otherwise step 5 returns `invalid_grant` because the code isn't found. - ---- - -## The Two Hosts in Detail - -### Monk (Primary Home Machine) - -- **Role:** Primary production host -- **Network:** Home LAN, no open ports on router (Cloudflare Tunnel handles all inbound) -- **Services:** All 9 public services + all infrastructure services -- **Data:** Each service has its own database/storage -- **Authentik DB:** Points to kscloud1's Postgres over Tailscale (100.x.x.x) - -### kscloud1 (Hetzner VPS) - -- **Role:** Permanent cloud replica — always on, even when monk is off (travel, power outage, etc.) -- **Network:** Public IP, Cloudflare Tunnel connector 3 -- **Services:** Full replica of all 9 public services (separate databases except Authentik) -- **Hosts:** The shared Authentik Postgres + Redis (bound to Tailscale interface only) -- **Resources:** 3 vCPU, 3.7 GB RAM — tight but functional - -### What's the Same Across Both - -- Same Cloudflare Tunnel token (different connector IDs assigned automatically) -- Same Authentik database (shared via Tailscale) -- Same Authentik secret key (required for JWT validation) -- Same kavita.db (one-time sync — users and OIDC config) - -### What's Different Across Both - -- Forgejo data (separate repos — accepted inconsistency) -- OpenProject data (separate projects) -- Karakeep bookmarks (separate) -- Kavita book files (monk has them, kscloud1 doesn't — covers synced, books not) - ---- - -## The Docker Network - -Every container joins the `kitestacks` external Docker bridge network: - -```bash -docker network create kitestacks -``` - -This is what makes Cloudflare Tunnel work. The cloudflared container is also on this network, so when Cloudflare tells cloudflared to route `http://grafana:3000`, Docker's internal DNS resolves `grafana` to the grafana container's IP on that network. - -Without this shared network, cloudflared can't reach the service containers by name. - ---- - -## Why No Open Ports on the Router - -Traditional homelab: open port 80/443 on home router → NAT to home server → expose home IP. - -Problems with that: -- Your home IP is public (DDoS risk, targeted attacks) -- Router configuration is fragile -- ISP can change your IP (dynamic IP) -- Some ISPs block port 80/443 - -Cloudflare Tunnel approach: -- cloudflared container makes an OUTBOUND connection to Cloudflare -- Cloudflare holds that connection open -- Inbound requests come through Cloudflare, over that existing outbound tunnel -- Your home IP is never exposed -- Works on any network, any ISP, any firewall - -This is why you can run a public website from a home PC with zero router configuration. - ---- - -## Tailscale — The Private Backbone - -Tailscale creates a private overlay network (VPN mesh) across all your devices: - -``` -monk (100.x.x.x) ←—— encrypted ——→ kscloud1 (100.x.x.x) -monk (100.x.x.x) ←—— encrypted ——→ pixel-6 (100.x.x.x) -``` - -Used in this project for: -1. **Shared Authentik DB:** kscloud1's Postgres binds to its Tailscale IP, not its public IP. Only devices on the tailnet can connect. Monk points to that address. -2. **Forgejo activity feed:** On kscloud1, the metrics API fetches recent commits from monk's Forgejo via monk's Tailscale IP — so both portal instances show the same activity feed. -3. **SSH/Admin access:** You can SSH into any device on the tailnet from anywhere. - ---- - -## The Monitoring Stack - -``` -node-exporter (monk) → prometheus (monk) → grafana (monk) -node-exporter (kscloud1) ↗ (scrapes 5.78.x.x:9100) -``` - -Prometheus scrapes metrics every 15 seconds from: -- `node-exporter:9100` — monk's own node-exporter (via Docker DNS) -- `5.78.x.x:9100` — kscloud1's node-exporter (via public IP, port exposed 0.0.0.0) - -Grafana visualizes both, letting you switch between hosts in the instance picker. - ---- - -## The Portal Architecture - -The portal is NOT gethomepage or any pre-built dashboard. It's a custom-built static site: - -``` -nginx (container: "homepage") - ├── / → serves static HTML/CSS/JS from ./public/ - └── /api/* → proxy_pass to kitestacks-metrics-api:8000 (host) - -kitestacks-metrics-api (network_mode: host, pid: host) - ├── GET /api/metrics → psutil reads HOST's CPU/RAM/disk/network - ├── GET /api/weather → wttr.in API → current weather by IP geolocation - ├── GET /api/activity → Forgejo API → recent commits - └── GET /api/health → {"ok": true} -``` - -The metrics API runs with `network_mode: host` and `pid: host` so it reads the HOST machine's process table and `/proc` filesystem — not the container's. Without this, it would report container stats, not laptop stats. +**Why Forgejo over GitLab?** +Forgejo is lightweight (~200MB RAM vs GitLab's 4GB+). Full git server with CI runners, issues, PRs. GitLab is overkill for a homelab.