Update architecture overview: add BookStack, fix 2-connector diagram
This commit is contained in:
parent
ca9e8a7959
commit
2233c3274f
1 changed files with 170 additions and 205 deletions
|
|
@ -1,221 +1,186 @@
|
|||
# KiteStacks Architecture — Full System Overview
|
||||
# KiteStacks Architecture Overview
|
||||
|
||||
## The Big Picture
|
||||
**Last updated:** 2026-06-18
|
||||
**Status:** Active production homelab
|
||||
|
||||
---
|
||||
|
||||
## High-Level Architecture
|
||||
|
||||
```
|
||||
INTERNET
|
||||
│
|
||||
┌──────▼──────┐
|
||||
│ Cloudflare │ DNS + TLS termination
|
||||
│ (edge) │ Zero Trust Tunnel
|
||||
└──────┬──────┘
|
||||
│ HTTPS (443) only
|
||||
┌────────────────┼────────────────┐
|
||||
│ connector 1 │ connector 2 │ connector 3
|
||||
│ │ │
|
||||
┌──────▼──────┐ │ ┌──────▼──────┐
|
||||
│ MONK │ │ │ KSCLOUD1 │
|
||||
│ (home PC) │ │ │ (Hetzner VPS│
|
||||
│ │ Active │ │ 5.78.x.x) │
|
||||
│ All 9 │ Active │ │ │
|
||||
│ services │ │ │ All 9 │
|
||||
│ │ │ │ services │
|
||||
└──────┬──────┘ │ └──────┬──────┘
|
||||
│ │ │
|
||||
└────────────────┼───────────────┘
|
||||
TAILSCALE VPN
|
||||
(100.x.x.x range)
|
||||
│
|
||||
┌────────▼────────┐
|
||||
│ SHARED DB LAYER │
|
||||
│ on kscloud1 │
|
||||
│ Postgres :5432 │
|
||||
│ Redis :6379 │
|
||||
│ (Tailscale │
|
||||
│ only, private)│
|
||||
└─────────────────┘
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ Public Internet │
|
||||
│ (via Cloudflare Tunnel) │
|
||||
└───────────────────────┬──────────────────────────┘
|
||||
│
|
||||
┌─────────────▼──────────────┐
|
||||
│ Cloudflare Zero Trust │
|
||||
│ Active-Active Tunnel │
|
||||
└──────┬────────────┬────────┘
|
||||
│ │
|
||||
┌────────────▼───┐ ┌─────▼──────────────┐
|
||||
│ monk (home) │ │ kscloud1 (Hetzner)│
|
||||
│ cloudflared │ │ cloudflared │
|
||||
│ All services │ │ Replica services │
|
||||
│ Tailscale mesh │ │ Shared Authentik DB │
|
||||
└────────────────┘ └─────────────────────┘
|
||||
│ │
|
||||
└────────────────────┘
|
||||
Tailscale overlay
|
||||
(private network)
|
||||
```
|
||||
|
||||
The two machines share one Cloudflare Tunnel token, so Cloudflare load-balances across both connectors automatically. If monk goes offline, kscloud1 continues serving all public subdomains within seconds.
|
||||
|
||||
---
|
||||
|
||||
## Service Map
|
||||
|
||||
### Identity & Access
|
||||
| Service | Host | URL | Purpose |
|
||||
|---------|------|-----|---------|
|
||||
| Authentik server | monk | auth.kitestacks.com | SSO identity provider |
|
||||
| Authentik worker | monk | (internal) | Background jobs, flow execution |
|
||||
| Authentik LDAP | monk | (internal) | LDAP proxy for non-OIDC apps |
|
||||
| Authentik PostgreSQL | kscloud1 | (Tailscale only) | Shared auth database |
|
||||
| Authentik Redis | kscloud1 | (Tailscale only) | Session cache |
|
||||
|
||||
### Infrastructure
|
||||
| Service | Host | URL | Purpose |
|
||||
|---------|------|-----|---------|
|
||||
| cloudflared | monk + kscloud1 | (no UI) | CF Tunnel connector |
|
||||
| Portainer | monk | portainer.kitestacks.com | Docker container management |
|
||||
| Forgejo | monk | gitforge.kitestacks.com | Self-hosted Git (repos + CI) |
|
||||
| Uptime Kuma | monk | status.kitestacks.com | Service uptime monitoring |
|
||||
|
||||
### Observability
|
||||
| Service | Host | URL | Purpose |
|
||||
|---------|------|-----|---------|
|
||||
| Prometheus | monk | (internal) | Metrics collection |
|
||||
| Grafana | monk | grafana.kitestacks.com | Metrics dashboards |
|
||||
| Node Exporter | monk | (internal) | Host OS metrics |
|
||||
| Blackbox Exporter | monk | (internal) | External endpoint probing |
|
||||
|
||||
### Knowledge & Productivity
|
||||
| Service | Host | URL | Purpose |
|
||||
|---------|------|-----|---------|
|
||||
| BookStack | monk + kscloud1 | wiki.kitestacks.com | Internal wiki / documentation |
|
||||
| Karakeep | monk | links.kitestacks.com | Bookmark manager |
|
||||
| Kavita | monk | kavita.kitestacks.com | Ebook/manga reader |
|
||||
| OSTicket | monk | tasks.kitestacks.com | Help desk / ticket system |
|
||||
| ntfy | monk | (push notifications) | Push notifications |
|
||||
|
||||
### AI Stack
|
||||
| Service | Host | URL | Purpose |
|
||||
|---------|------|-----|---------|
|
||||
| Open WebUI | monk | ai.kitestacks.com | Chat interface (GPT-4, Claude, local) |
|
||||
| LiteLLM | monk | (internal) | LLM API proxy / model router |
|
||||
|
||||
### Portal
|
||||
| Service | Host | URL | Purpose |
|
||||
|---------|------|-----|---------|
|
||||
| KiteStacks Portal | monk + kscloud1 | www.kitestacks.com | Custom homepage / service launcher |
|
||||
| Metrics API | monk | (internal at /api) | FastAPI — live stats for portal |
|
||||
|
||||
---
|
||||
|
||||
## Authentication Flow
|
||||
|
||||
Every service uses Authentik SSO via OIDC or OAuth2:
|
||||
|
||||
```
|
||||
Browser → https://service.kitestacks.com
|
||||
│
|
||||
└─► Service: "Not logged in" → redirect to Authentik
|
||||
│
|
||||
▼
|
||||
https://auth.kitestacks.com/if/flow/...
|
||||
│
|
||||
├─ User logs in with username + password
|
||||
├─ Authentik validates credentials
|
||||
└─ Issues authorization code → redirect back to service
|
||||
│
|
||||
▼
|
||||
Service exchanges code for tokens
|
||||
Decodes JWT to get user info (email)
|
||||
Creates local session
|
||||
```
|
||||
|
||||
**BookStack-specific note:** `OIDC_ISSUER_DISCOVER=true` and `OIDC_ISSUER` must point to the per-app URL (`/application/o/bookstack/`), not the global Authentik URL. The Authentik provider must have `issuer_mode='per_provider'`.
|
||||
|
||||
---
|
||||
|
||||
## Network Architecture
|
||||
|
||||
### External Access
|
||||
All public traffic enters via Cloudflare Tunnel. No ports are open on monk's router. kscloud1 (Hetzner) has no firewall rules open for HTTP/HTTPS either — all access via the same tunnel.
|
||||
|
||||
### Internal Networking
|
||||
- All Docker containers attach to the `kitestacks` bridge network
|
||||
- Containers communicate using container names as DNS (e.g., `bookstack-db`, `prometheus`)
|
||||
- Docker's embedded DNS server (`127.0.0.11`) resolves container names automatically
|
||||
|
||||
### Tailscale Overlay
|
||||
Tailscale creates an encrypted mesh between monk and kscloud1:
|
||||
- Used for: Authentik PostgreSQL/Redis access, SSH to kscloud1, Prometheus scraping kscloud1 metrics
|
||||
- Not used for: public traffic (that goes through Cloudflare)
|
||||
|
||||
---
|
||||
|
||||
## Storage Layout
|
||||
|
||||
### monk
|
||||
```
|
||||
~/kitestacks-live/docker/
|
||||
├── authentik/ # media, custom-templates
|
||||
├── bookstack/ # config/, db/
|
||||
├── cloudflared/ # .env (TUNNEL_TOKEN)
|
||||
├── forgejo/ # data/
|
||||
├── grafana/ # grafana_data volume
|
||||
├── karakeep/ # data/
|
||||
├── kavita/ # config/
|
||||
├── kitestacks-portal/ # static HTML + nginx
|
||||
├── osticket/ # db/, uploads/
|
||||
├── portainer/ # portainer_data volume
|
||||
└── prometheus/ # prometheus.yml, prometheus_data volume
|
||||
```
|
||||
|
||||
### kscloud1
|
||||
```
|
||||
/opt/kitestacks/docker/
|
||||
├── authentik/ # postgresql data volume, redis data
|
||||
├── bookstack/ # config/, db/
|
||||
├── cloudflared/ # .env (same TUNNEL_TOKEN)
|
||||
└── ... # replica services
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Every Service and What It Does
|
||||
## Resilience Model
|
||||
|
||||
### The Nine Public Services
|
||||
|
||||
| Service | Container Name | What It Does | Why It's Here |
|
||||
|---------|---------------|--------------|---------------|
|
||||
| **Portal** | `homepage` | The public website (kitestacks.com) — custom nginx serving static HTML/CSS/JS with a cyberpunk theme | Front door to everything. Shows system stats, recent activity, links to all services |
|
||||
| **Authentik** | `authentik` | Identity provider — handles all logins via OIDC/OAuth2 SSO | Single place to manage all user accounts and access control |
|
||||
| **Forgejo** | `forgejo` | Self-hosted Git platform (like GitHub but yours) | Store all homelab code, config, and documentation |
|
||||
| **OpenProject** | `openproject` | Project management (like Jira) | Task tracking, project planning |
|
||||
| **Open WebUI** | `kite-openwebui` | ChatGPT-like AI chat interface | Access multiple AI models through one interface |
|
||||
| **Karakeep** | `karakeep` | Bookmark and read-it-later manager | Save links, articles, and content |
|
||||
| **Kavita** | `kavita` | eBook and manga reader | Personal digital library |
|
||||
| **Grafana** | `grafana` | Monitoring dashboards | Visualize CPU, RAM, network, uptime across both hosts |
|
||||
| **Uptime Kuma** | `uptime-kuma` | Status page and uptime monitoring | Monitor that all 9 services are up and alert if they go down |
|
||||
|
||||
### The Infrastructure Services (Not Public-Facing)
|
||||
|
||||
| Service | What It Does |
|
||||
|---------|-------------|
|
||||
| `cloudflared` | Cloudflare Tunnel connector — creates encrypted outbound tunnel to Cloudflare edge |
|
||||
| `prometheus` | Metrics collection — scrapes system stats from both monk and kscloud1 every 15 seconds |
|
||||
| `node-exporter` | Exposes host system metrics (CPU, RAM, disk, network) for Prometheus to scrape |
|
||||
| `kite-litellm` | LLM proxy gateway — routes AI requests to OpenRouter (multiple free models) |
|
||||
| `portainer` | Docker management UI — visual interface to manage all containers |
|
||||
| `kitestacks-metrics-api` | Python FastAPI service — serves real-time system stats, weather, and Forgejo activity to the portal |
|
||||
| Scenario | Impact | Recovery |
|
||||
|----------|--------|----------|
|
||||
| monk goes offline | All monk services unreachable; kscloud1 serves portal + wiki | Automatic (CF Tunnel failover) |
|
||||
| kscloud1 goes offline | Authentik logins may fail (DB unreachable); all other services up | Restart kscloud1 or point Authentik to local postgres |
|
||||
| Cloudflare Tunnel down | All public access lost; Tailscale still works | Check CF dashboard; restart cloudflared |
|
||||
| MariaDB crash (BookStack) | BookStack down | `docker restart bookstack-db` then `docker restart bookstack` |
|
||||
| Portainer lockout | No Docker UI | Use `portainer/helper-reset-password` |
|
||||
|
||||
---
|
||||
|
||||
## How Traffic Flows
|
||||
## Key Design Decisions
|
||||
|
||||
### When Someone Visits www.kitestacks.com
|
||||
**Why Cloudflare Tunnel instead of port-forwarding?**
|
||||
Port-forwarding exposes your home IP, requires a static IP, and can't failover. CF Tunnel is free, hides your IPs, and trivially supports multi-origin failover.
|
||||
|
||||
```
|
||||
1. Browser sends HTTPS request to www.kitestacks.com
|
||||
2. DNS resolves to Cloudflare's anycast IP (not your home IP)
|
||||
3. Cloudflare terminates TLS — your home router never sees HTTPS
|
||||
4. Cloudflare routes the request through the tunnel to whichever
|
||||
cloudflared connector responds first (monk or kscloud1)
|
||||
5. cloudflared resolves "homepage" via Docker DNS
|
||||
6. Request hits the nginx container serving the static portal
|
||||
7. Portal's JavaScript fetches /api/metrics and /api/activity
|
||||
from the kitestacks-metrics-api container via nginx proxy
|
||||
8. Page renders with live system stats and recent git activity
|
||||
```
|
||||
**Why active-active instead of active-passive?**
|
||||
Active-passive requires detecting failure and switching. Active-active — same token, two connectors — Cloudflare handles routing automatically. Simpler and zero RPO.
|
||||
|
||||
### When Someone Clicks "Sign In with Authentik"
|
||||
**Why Authentik over Keycloak or Authelia?**
|
||||
Authentik is easier to self-host (Docker Compose, sensible defaults), has a good UI, and supports LDAP + OIDC + SAML. Authelia lacks SAML. Keycloak is heavier and more complex.
|
||||
|
||||
```
|
||||
1. App (e.g., Grafana) redirects browser to auth.kitestacks.com/application/o/authorize/
|
||||
2. Authentik presents login page
|
||||
3. User enters credentials — Authentik validates against its database
|
||||
(stored on kscloud1's Postgres, shared over Tailscale)
|
||||
4. Authentik generates an authorization code and redirects back to Grafana
|
||||
5. Grafana's backend calls auth.kitestacks.com/application/o/token/
|
||||
to exchange the code for an access token
|
||||
6. Authentik validates the code (found in shared DB) and returns a JWT
|
||||
7. Grafana reads the user's email/name from the JWT and logs them in
|
||||
```
|
||||
**Why BookStack over Notion/Confluence?**
|
||||
Self-hosted, no external API calls, Markdown-first, OIDC SSO. Data stays in-house.
|
||||
|
||||
**The critical detail:** Steps 1 and 5 can hit different tunnel connectors (monk vs kscloud1). The authorization code from step 4 must exist in whichever database step 5 hits. That's why both connectors point to the SAME Postgres on kscloud1 — otherwise step 5 returns `invalid_grant` because the code isn't found.
|
||||
|
||||
---
|
||||
|
||||
## The Two Hosts in Detail
|
||||
|
||||
### Monk (Primary Home Machine)
|
||||
|
||||
- **Role:** Primary production host
|
||||
- **Network:** Home LAN, no open ports on router (Cloudflare Tunnel handles all inbound)
|
||||
- **Services:** All 9 public services + all infrastructure services
|
||||
- **Data:** Each service has its own database/storage
|
||||
- **Authentik DB:** Points to kscloud1's Postgres over Tailscale (100.x.x.x)
|
||||
|
||||
### kscloud1 (Hetzner VPS)
|
||||
|
||||
- **Role:** Permanent cloud replica — always on, even when monk is off (travel, power outage, etc.)
|
||||
- **Network:** Public IP, Cloudflare Tunnel connector 3
|
||||
- **Services:** Full replica of all 9 public services (separate databases except Authentik)
|
||||
- **Hosts:** The shared Authentik Postgres + Redis (bound to Tailscale interface only)
|
||||
- **Resources:** 3 vCPU, 3.7 GB RAM — tight but functional
|
||||
|
||||
### What's the Same Across Both
|
||||
|
||||
- Same Cloudflare Tunnel token (different connector IDs assigned automatically)
|
||||
- Same Authentik database (shared via Tailscale)
|
||||
- Same Authentik secret key (required for JWT validation)
|
||||
- Same kavita.db (one-time sync — users and OIDC config)
|
||||
|
||||
### What's Different Across Both
|
||||
|
||||
- Forgejo data (separate repos — accepted inconsistency)
|
||||
- OpenProject data (separate projects)
|
||||
- Karakeep bookmarks (separate)
|
||||
- Kavita book files (monk has them, kscloud1 doesn't — covers synced, books not)
|
||||
|
||||
---
|
||||
|
||||
## The Docker Network
|
||||
|
||||
Every container joins the `kitestacks` external Docker bridge network:
|
||||
|
||||
```bash
|
||||
docker network create kitestacks
|
||||
```
|
||||
|
||||
This is what makes Cloudflare Tunnel work. The cloudflared container is also on this network, so when Cloudflare tells cloudflared to route `http://grafana:3000`, Docker's internal DNS resolves `grafana` to the grafana container's IP on that network.
|
||||
|
||||
Without this shared network, cloudflared can't reach the service containers by name.
|
||||
|
||||
---
|
||||
|
||||
## Why No Open Ports on the Router
|
||||
|
||||
Traditional homelab: open port 80/443 on home router → NAT to home server → expose home IP.
|
||||
|
||||
Problems with that:
|
||||
- Your home IP is public (DDoS risk, targeted attacks)
|
||||
- Router configuration is fragile
|
||||
- ISP can change your IP (dynamic IP)
|
||||
- Some ISPs block port 80/443
|
||||
|
||||
Cloudflare Tunnel approach:
|
||||
- cloudflared container makes an OUTBOUND connection to Cloudflare
|
||||
- Cloudflare holds that connection open
|
||||
- Inbound requests come through Cloudflare, over that existing outbound tunnel
|
||||
- Your home IP is never exposed
|
||||
- Works on any network, any ISP, any firewall
|
||||
|
||||
This is why you can run a public website from a home PC with zero router configuration.
|
||||
|
||||
---
|
||||
|
||||
## Tailscale — The Private Backbone
|
||||
|
||||
Tailscale creates a private overlay network (VPN mesh) across all your devices:
|
||||
|
||||
```
|
||||
monk (100.x.x.x) ←—— encrypted ——→ kscloud1 (100.x.x.x)
|
||||
monk (100.x.x.x) ←—— encrypted ——→ pixel-6 (100.x.x.x)
|
||||
```
|
||||
|
||||
Used in this project for:
|
||||
1. **Shared Authentik DB:** kscloud1's Postgres binds to its Tailscale IP, not its public IP. Only devices on the tailnet can connect. Monk points to that address.
|
||||
2. **Forgejo activity feed:** On kscloud1, the metrics API fetches recent commits from monk's Forgejo via monk's Tailscale IP — so both portal instances show the same activity feed.
|
||||
3. **SSH/Admin access:** You can SSH into any device on the tailnet from anywhere.
|
||||
|
||||
---
|
||||
|
||||
## The Monitoring Stack
|
||||
|
||||
```
|
||||
node-exporter (monk) → prometheus (monk) → grafana (monk)
|
||||
node-exporter (kscloud1) ↗ (scrapes 5.78.x.x:9100)
|
||||
```
|
||||
|
||||
Prometheus scrapes metrics every 15 seconds from:
|
||||
- `node-exporter:9100` — monk's own node-exporter (via Docker DNS)
|
||||
- `5.78.x.x:9100` — kscloud1's node-exporter (via public IP, port exposed 0.0.0.0)
|
||||
|
||||
Grafana visualizes both, letting you switch between hosts in the instance picker.
|
||||
|
||||
---
|
||||
|
||||
## The Portal Architecture
|
||||
|
||||
The portal is NOT gethomepage or any pre-built dashboard. It's a custom-built static site:
|
||||
|
||||
```
|
||||
nginx (container: "homepage")
|
||||
├── / → serves static HTML/CSS/JS from ./public/
|
||||
└── /api/* → proxy_pass to kitestacks-metrics-api:8000 (host)
|
||||
|
||||
kitestacks-metrics-api (network_mode: host, pid: host)
|
||||
├── GET /api/metrics → psutil reads HOST's CPU/RAM/disk/network
|
||||
├── GET /api/weather → wttr.in API → current weather by IP geolocation
|
||||
├── GET /api/activity → Forgejo API → recent commits
|
||||
└── GET /api/health → {"ok": true}
|
||||
```
|
||||
|
||||
The metrics API runs with `network_mode: host` and `pid: host` so it reads the HOST machine's process table and `/proc` filesystem — not the container's. Without this, it would report container stats, not laptop stats.
|
||||
**Why Forgejo over GitLab?**
|
||||
Forgejo is lightweight (~200MB RAM vs GitLab's 4GB+). Full git server with CI runners, issues, PRs. GitLab is overkill for a homelab.
|
||||
|
|
|
|||
Reference in a new issue