Update architecture overview: add BookStack, fix 2-connector diagram

This commit is contained in:
kenpat 2026-06-18 21:08:26 +00:00
parent ca9e8a7959
commit 2233c3274f

View file

@ -1,221 +1,186 @@
# KiteStacks Architecture — Full System Overview
# KiteStacks Architecture Overview
## The Big Picture
**Last updated:** 2026-06-18
**Status:** Active production homelab
---
## High-Level Architecture
```
INTERNET
┌──────────────────────────────────────────────────┐
│ Public Internet │
│ (via Cloudflare Tunnel) │
└───────────────────────┬──────────────────────────┘
┌──────▼──────┐
│ Cloudflare │ DNS + TLS termination
│ (edge) │ Zero Trust Tunnel
└──────┬──────┘
│ HTTPS (443) only
┌────────────────┼────────────────┐
│ connector 1 │ connector 2 │ connector 3
│ │ │
┌──────▼──────┐ │ ┌──────▼──────┐
│ MONK │ │ │ KSCLOUD1 │
│ (home PC) │ │ │ (Hetzner VPS│
│ │ Active │ │ 5.78.x.x) │
│ All 9 │ Active │ │ │
│ services │ │ │ All 9 │
│ │ │ │ services │
└──────┬──────┘ │ └──────┬──────┘
│ │ │
└────────────────┼───────────────┘
TAILSCALE VPN
(100.x.x.x range)
┌─────────────▼──────────────┐
│ Cloudflare Zero Trust │
│ Active-Active Tunnel │
└──────┬────────────┬────────┘
│ │
┌────────────▼───┐ ┌─────▼──────────────┐
│ monk (home) │ │ kscloud1 (Hetzner)│
│ cloudflared │ │ cloudflared │
│ All services │ │ Replica services │
│ Tailscale mesh │ │ Shared Authentik DB │
└────────────────┘ └─────────────────────┘
│ │
└────────────────────┘
Tailscale overlay
(private network)
```
The two machines share one Cloudflare Tunnel token, so Cloudflare load-balances across both connectors automatically. If monk goes offline, kscloud1 continues serving all public subdomains within seconds.
---
## Service Map
### Identity & Access
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| Authentik server | monk | auth.kitestacks.com | SSO identity provider |
| Authentik worker | monk | (internal) | Background jobs, flow execution |
| Authentik LDAP | monk | (internal) | LDAP proxy for non-OIDC apps |
| Authentik PostgreSQL | kscloud1 | (Tailscale only) | Shared auth database |
| Authentik Redis | kscloud1 | (Tailscale only) | Session cache |
### Infrastructure
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| cloudflared | monk + kscloud1 | (no UI) | CF Tunnel connector |
| Portainer | monk | portainer.kitestacks.com | Docker container management |
| Forgejo | monk | gitforge.kitestacks.com | Self-hosted Git (repos + CI) |
| Uptime Kuma | monk | status.kitestacks.com | Service uptime monitoring |
### Observability
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| Prometheus | monk | (internal) | Metrics collection |
| Grafana | monk | grafana.kitestacks.com | Metrics dashboards |
| Node Exporter | monk | (internal) | Host OS metrics |
| Blackbox Exporter | monk | (internal) | External endpoint probing |
### Knowledge & Productivity
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| BookStack | monk + kscloud1 | wiki.kitestacks.com | Internal wiki / documentation |
| Karakeep | monk | links.kitestacks.com | Bookmark manager |
| Kavita | monk | kavita.kitestacks.com | Ebook/manga reader |
| OSTicket | monk | tasks.kitestacks.com | Help desk / ticket system |
| ntfy | monk | (push notifications) | Push notifications |
### AI Stack
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| Open WebUI | monk | ai.kitestacks.com | Chat interface (GPT-4, Claude, local) |
| LiteLLM | monk | (internal) | LLM API proxy / model router |
### Portal
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| KiteStacks Portal | monk + kscloud1 | www.kitestacks.com | Custom homepage / service launcher |
| Metrics API | monk | (internal at /api) | FastAPI — live stats for portal |
---
## Authentication Flow
Every service uses Authentik SSO via OIDC or OAuth2:
```
Browser → https://service.kitestacks.com
┌────────▼────────┐
│ SHARED DB LAYER │
│ on kscloud1 │
│ Postgres :5432 │
│ Redis :6379 │
│ (Tailscale │
│ only, private)│
└─────────────────┘
└─► Service: "Not logged in" → redirect to Authentik
https://auth.kitestacks.com/if/flow/...
├─ User logs in with username + password
├─ Authentik validates credentials
└─ Issues authorization code → redirect back to service
Service exchanges code for tokens
Decodes JWT to get user info (email)
Creates local session
```
**BookStack-specific note:** `OIDC_ISSUER_DISCOVER=true` and `OIDC_ISSUER` must point to the per-app URL (`/application/o/bookstack/`), not the global Authentik URL. The Authentik provider must have `issuer_mode='per_provider'`.
---
## Network Architecture
### External Access
All public traffic enters via Cloudflare Tunnel. No ports are open on monk's router. kscloud1 (Hetzner) has no firewall rules open for HTTP/HTTPS either — all access via the same tunnel.
### Internal Networking
- All Docker containers attach to the `kitestacks` bridge network
- Containers communicate using container names as DNS (e.g., `bookstack-db`, `prometheus`)
- Docker's embedded DNS server (`127.0.0.11`) resolves container names automatically
### Tailscale Overlay
Tailscale creates an encrypted mesh between monk and kscloud1:
- Used for: Authentik PostgreSQL/Redis access, SSH to kscloud1, Prometheus scraping kscloud1 metrics
- Not used for: public traffic (that goes through Cloudflare)
---
## Storage Layout
### monk
```
~/kitestacks-live/docker/
├── authentik/ # media, custom-templates
├── bookstack/ # config/, db/
├── cloudflared/ # .env (TUNNEL_TOKEN)
├── forgejo/ # data/
├── grafana/ # grafana_data volume
├── karakeep/ # data/
├── kavita/ # config/
├── kitestacks-portal/ # static HTML + nginx
├── osticket/ # db/, uploads/
├── portainer/ # portainer_data volume
└── prometheus/ # prometheus.yml, prometheus_data volume
```
### kscloud1
```
/opt/kitestacks/docker/
├── authentik/ # postgresql data volume, redis data
├── bookstack/ # config/, db/
├── cloudflared/ # .env (same TUNNEL_TOKEN)
└── ... # replica services
```
---
## Every Service and What It Does
## Resilience Model
### The Nine Public Services
| Service | Container Name | What It Does | Why It's Here |
|---------|---------------|--------------|---------------|
| **Portal** | `homepage` | The public website (kitestacks.com) — custom nginx serving static HTML/CSS/JS with a cyberpunk theme | Front door to everything. Shows system stats, recent activity, links to all services |
| **Authentik** | `authentik` | Identity provider — handles all logins via OIDC/OAuth2 SSO | Single place to manage all user accounts and access control |
| **Forgejo** | `forgejo` | Self-hosted Git platform (like GitHub but yours) | Store all homelab code, config, and documentation |
| **OpenProject** | `openproject` | Project management (like Jira) | Task tracking, project planning |
| **Open WebUI** | `kite-openwebui` | ChatGPT-like AI chat interface | Access multiple AI models through one interface |
| **Karakeep** | `karakeep` | Bookmark and read-it-later manager | Save links, articles, and content |
| **Kavita** | `kavita` | eBook and manga reader | Personal digital library |
| **Grafana** | `grafana` | Monitoring dashboards | Visualize CPU, RAM, network, uptime across both hosts |
| **Uptime Kuma** | `uptime-kuma` | Status page and uptime monitoring | Monitor that all 9 services are up and alert if they go down |
### The Infrastructure Services (Not Public-Facing)
| Service | What It Does |
|---------|-------------|
| `cloudflared` | Cloudflare Tunnel connector — creates encrypted outbound tunnel to Cloudflare edge |
| `prometheus` | Metrics collection — scrapes system stats from both monk and kscloud1 every 15 seconds |
| `node-exporter` | Exposes host system metrics (CPU, RAM, disk, network) for Prometheus to scrape |
| `kite-litellm` | LLM proxy gateway — routes AI requests to OpenRouter (multiple free models) |
| `portainer` | Docker management UI — visual interface to manage all containers |
| `kitestacks-metrics-api` | Python FastAPI service — serves real-time system stats, weather, and Forgejo activity to the portal |
| Scenario | Impact | Recovery |
|----------|--------|----------|
| monk goes offline | All monk services unreachable; kscloud1 serves portal + wiki | Automatic (CF Tunnel failover) |
| kscloud1 goes offline | Authentik logins may fail (DB unreachable); all other services up | Restart kscloud1 or point Authentik to local postgres |
| Cloudflare Tunnel down | All public access lost; Tailscale still works | Check CF dashboard; restart cloudflared |
| MariaDB crash (BookStack) | BookStack down | `docker restart bookstack-db` then `docker restart bookstack` |
| Portainer lockout | No Docker UI | Use `portainer/helper-reset-password` |
---
## How Traffic Flows
## Key Design Decisions
### When Someone Visits www.kitestacks.com
**Why Cloudflare Tunnel instead of port-forwarding?**
Port-forwarding exposes your home IP, requires a static IP, and can't failover. CF Tunnel is free, hides your IPs, and trivially supports multi-origin failover.
```
1. Browser sends HTTPS request to www.kitestacks.com
2. DNS resolves to Cloudflare's anycast IP (not your home IP)
3. Cloudflare terminates TLS — your home router never sees HTTPS
4. Cloudflare routes the request through the tunnel to whichever
cloudflared connector responds first (monk or kscloud1)
5. cloudflared resolves "homepage" via Docker DNS
6. Request hits the nginx container serving the static portal
7. Portal's JavaScript fetches /api/metrics and /api/activity
from the kitestacks-metrics-api container via nginx proxy
8. Page renders with live system stats and recent git activity
```
**Why active-active instead of active-passive?**
Active-passive requires detecting failure and switching. Active-active — same token, two connectors — Cloudflare handles routing automatically. Simpler and zero RPO.
### When Someone Clicks "Sign In with Authentik"
**Why Authentik over Keycloak or Authelia?**
Authentik is easier to self-host (Docker Compose, sensible defaults), has a good UI, and supports LDAP + OIDC + SAML. Authelia lacks SAML. Keycloak is heavier and more complex.
```
1. App (e.g., Grafana) redirects browser to auth.kitestacks.com/application/o/authorize/
2. Authentik presents login page
3. User enters credentials — Authentik validates against its database
(stored on kscloud1's Postgres, shared over Tailscale)
4. Authentik generates an authorization code and redirects back to Grafana
5. Grafana's backend calls auth.kitestacks.com/application/o/token/
to exchange the code for an access token
6. Authentik validates the code (found in shared DB) and returns a JWT
7. Grafana reads the user's email/name from the JWT and logs them in
```
**Why BookStack over Notion/Confluence?**
Self-hosted, no external API calls, Markdown-first, OIDC SSO. Data stays in-house.
**The critical detail:** Steps 1 and 5 can hit different tunnel connectors (monk vs kscloud1). The authorization code from step 4 must exist in whichever database step 5 hits. That's why both connectors point to the SAME Postgres on kscloud1 — otherwise step 5 returns `invalid_grant` because the code isn't found.
---
## The Two Hosts in Detail
### Monk (Primary Home Machine)
- **Role:** Primary production host
- **Network:** Home LAN, no open ports on router (Cloudflare Tunnel handles all inbound)
- **Services:** All 9 public services + all infrastructure services
- **Data:** Each service has its own database/storage
- **Authentik DB:** Points to kscloud1's Postgres over Tailscale (100.x.x.x)
### kscloud1 (Hetzner VPS)
- **Role:** Permanent cloud replica — always on, even when monk is off (travel, power outage, etc.)
- **Network:** Public IP, Cloudflare Tunnel connector 3
- **Services:** Full replica of all 9 public services (separate databases except Authentik)
- **Hosts:** The shared Authentik Postgres + Redis (bound to Tailscale interface only)
- **Resources:** 3 vCPU, 3.7 GB RAM — tight but functional
### What's the Same Across Both
- Same Cloudflare Tunnel token (different connector IDs assigned automatically)
- Same Authentik database (shared via Tailscale)
- Same Authentik secret key (required for JWT validation)
- Same kavita.db (one-time sync — users and OIDC config)
### What's Different Across Both
- Forgejo data (separate repos — accepted inconsistency)
- OpenProject data (separate projects)
- Karakeep bookmarks (separate)
- Kavita book files (monk has them, kscloud1 doesn't — covers synced, books not)
---
## The Docker Network
Every container joins the `kitestacks` external Docker bridge network:
```bash
docker network create kitestacks
```
This is what makes Cloudflare Tunnel work. The cloudflared container is also on this network, so when Cloudflare tells cloudflared to route `http://grafana:3000`, Docker's internal DNS resolves `grafana` to the grafana container's IP on that network.
Without this shared network, cloudflared can't reach the service containers by name.
---
## Why No Open Ports on the Router
Traditional homelab: open port 80/443 on home router → NAT to home server → expose home IP.
Problems with that:
- Your home IP is public (DDoS risk, targeted attacks)
- Router configuration is fragile
- ISP can change your IP (dynamic IP)
- Some ISPs block port 80/443
Cloudflare Tunnel approach:
- cloudflared container makes an OUTBOUND connection to Cloudflare
- Cloudflare holds that connection open
- Inbound requests come through Cloudflare, over that existing outbound tunnel
- Your home IP is never exposed
- Works on any network, any ISP, any firewall
This is why you can run a public website from a home PC with zero router configuration.
---
## Tailscale — The Private Backbone
Tailscale creates a private overlay network (VPN mesh) across all your devices:
```
monk (100.x.x.x) ←—— encrypted ——→ kscloud1 (100.x.x.x)
monk (100.x.x.x) ←—— encrypted ——→ pixel-6 (100.x.x.x)
```
Used in this project for:
1. **Shared Authentik DB:** kscloud1's Postgres binds to its Tailscale IP, not its public IP. Only devices on the tailnet can connect. Monk points to that address.
2. **Forgejo activity feed:** On kscloud1, the metrics API fetches recent commits from monk's Forgejo via monk's Tailscale IP — so both portal instances show the same activity feed.
3. **SSH/Admin access:** You can SSH into any device on the tailnet from anywhere.
---
## The Monitoring Stack
```
node-exporter (monk) → prometheus (monk) → grafana (monk)
node-exporter (kscloud1) ↗ (scrapes 5.78.x.x:9100)
```
Prometheus scrapes metrics every 15 seconds from:
- `node-exporter:9100` — monk's own node-exporter (via Docker DNS)
- `5.78.x.x:9100` — kscloud1's node-exporter (via public IP, port exposed 0.0.0.0)
Grafana visualizes both, letting you switch between hosts in the instance picker.
---
## The Portal Architecture
The portal is NOT gethomepage or any pre-built dashboard. It's a custom-built static site:
```
nginx (container: "homepage")
├── / → serves static HTML/CSS/JS from ./public/
└── /api/* → proxy_pass to kitestacks-metrics-api:8000 (host)
kitestacks-metrics-api (network_mode: host, pid: host)
├── GET /api/metrics → psutil reads HOST's CPU/RAM/disk/network
├── GET /api/weather → wttr.in API → current weather by IP geolocation
├── GET /api/activity → Forgejo API → recent commits
└── GET /api/health → {"ok": true}
```
The metrics API runs with `network_mode: host` and `pid: host` so it reads the HOST machine's process table and `/proc` filesystem — not the container's. Without this, it would report container stats, not laptop stats.
**Why Forgejo over GitLab?**
Forgejo is lightweight (~200MB RAM vs GitLab's 4GB+). Full git server with CI runners, issues, PRs. GitLab is overkill for a homelab.