kitestacks-homelab/homelab-mastery/architecture/overview.md
kenpat 1e8319ee75 docs: comprehensive homelab-mastery rewrite with full build guides
Complete documentation suite for KiteStacks covering all 11 services across
2-host active-active architecture. Includes beginner track (with AI, 8 files)
and advanced track (without AI, 7 files) with time estimates, real troubleshooting
cases, and command-by-command explanations. Updates certifications roadmap to
reflect July 7 2026 A+ Core 2 exam goal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-19 01:08:43 -05:00

12 KiB

KiteStacks Architecture — Full System Overview

Last Updated: 2026-06-19


The Big Picture

                          INTERNET
                             │
                      ┌──────▼──────┐
                      │  Cloudflare  │  DNS + TLS termination
                      │   (edge)     │  Tunnel routing
                      └──────┬──────┘
                             │  HTTPS only — home IP never exposed
              ┌──────────────┴──────────────┐
              │ connector 1                 │ connector 2
              │                             │
       ┌──────▼──────┐               ┌──────▼──────┐
       │    MONK     │               │   KSCLOUD1  │
       │ (ThinkPad   │               │ (Hetzner VPS│
       │  T14s, home)│               │  Germany)   │
       │             │               │             │
       │ Development │               │ ALWAYS LIVE │
       │ Pushes to → │               │ Receives ←  │
       │ kscloud1    │               │ from monk   │
       └──────┬──────┘               └──────┬──────┘
              │                             │
              └─────────── TAILSCALE ───────┘
                         (100.x.x.x range)
                         Encrypted peer-to-peer
                                 │
                    ┌────────────▼────────────┐
                    │    SHARED DATABASE LAYER │
                    │    hosted on kscloud1    │
                    │                         │
                    │  PostgreSQL  :5432       │
                    │  Redis       :6379       │
                    │                         │
                    │  Bound to Tailscale IP   │
                    │  only — not public       │
                    └─────────────────────────┘

The key idea: Cloudflare holds two persistent outbound connections — one from monk, one from kscloud1. Every request to kitestacks.com arrives at Cloudflare, which routes it to whichever connector responds. If monk goes offline, kscloud1 handles everything. Your home IP is never involved.


How Work Flows Between the Two Hosts

monk (dev)  ──push──►  kscloud1 (prod, always live)
  • monk is where changes are made: editing config files, testing new services, writing code
  • kscloud1 receives those changes and is always serving live traffic
  • If monk is off, kscloud1 continues serving the last pushed state — users see no downtime
  • A third machine (Samurai desktop) is planned as a future second home connector

The Eleven Public Services

Service Container URL What It Does
Portal homepage www.kitestacks.com Custom homepage — links, live stats, cyberpunk theme
Authentik authentik auth.kitestacks.com SSO identity provider — handles all logins
Forgejo forgejo gitforge.kitestacks.com Self-hosted Git (like GitHub)
Open WebUI kite-openwebui ai.kitestacks.com AI chat interface
Karakeep karakeep links.kitestacks.com Bookmark and read-it-later manager
Kavita kavita kavita.kitestacks.com eBook and manga reader
Grafana grafana grafana.kitestacks.com Monitoring dashboards
Uptime Kuma uptime-kuma status.kitestacks.com Public status page and uptime monitoring
BookStack bookstack wiki.kitestacks.com Self-hosted wiki / docs platform
OSTicket osticket-app tasks.kitestacks.com Help desk ticketing system
Portainer portainer portainer.kitestacks.com Docker management dashboard

The Infrastructure Services (Internal Only)

Container What It Does
cloudflared Cloudflare Tunnel connector — outbound connection to Cloudflare edge
prometheus Metrics collector — scrapes node-exporter every 15 seconds
node-exporter Exposes host CPU/RAM/disk/network metrics for Prometheus
blackbox-exporter HTTP probe monitor — checks endpoints are returning 200
kite-litellm LLM proxy — routes AI requests to OpenRouter (many free models)
kitestacks-metrics-api Python FastAPI — serves live stats and Forgejo activity to portal
ntfy Push notification server — sends alerts to phone
flux GitOps controller — watches Forgejo, deploys changes automatically
authentik-worker Background job processor for Authentik
authentik-ldap LDAP proxy layer for Authentik

How Traffic Flows — Step by Step

Someone visits www.kitestacks.com

1. Browser → DNS lookup "www.kitestacks.com"
2. DNS returns Cloudflare's anycast IP (not your home IP)
3. Browser → HTTPS request to Cloudflare edge
4. Cloudflare reads Host header: "www.kitestacks.com"
5. Cloudflare routes request through active tunnel connector
   (monk or kscloud1 — whichever responds first)
6. cloudflared resolves "homepage" via Docker DNS
7. Request hits nginx in the homepage container
8. nginx serves static HTML/CSS/JS from ./public/
9. Browser JavaScript calls /api/metrics and /api/activity
10. nginx proxies those to kitestacks-metrics-api (Python, host network)
11. metrics-api reads CPU/RAM via psutil (sees real host, not container)
12. metrics-api calls Forgejo API for recent commits
13. Browser renders complete page with live stats

Someone clicks "Sign In with Authentik"

1. App (e.g. Grafana) redirects browser to:
   https://auth.kitestacks.com/application/o/authorize/
   ?client_id=grafana&redirect_uri=...&response_type=code

2. Cloudflare routes this to a cloudflared connector
3. Authentik shows login page
4. User enters username + password
5. Authentik validates against shared Postgres (on kscloud1, over Tailscale)
6. Authentik creates an authorization code (row in DB) and redirects:
   https://grafana.kitestacks.com/login/generic_oauth?code=abc123

7. Grafana backend POSTs to auth.kitestacks.com/application/o/token/
   with code=abc123 and client_secret

8. THIS REQUEST may hit a DIFFERENT connector than step 2 did
   → This is why the shared DB matters: the code must exist in one DB,
     not two separate ones that might be out of sync

9. Authentik finds code=abc123 in shared Postgres, validates it
10. Authentik returns JWT (access_token + id_token)
11. Grafana reads user's email from JWT, creates/updates local user
12. User is logged in — never re-enters password for other SSO apps

The Shared Database — Why It Exists

After deploying two connectors (monk + kscloud1), users got invalid_grant errors when signing in. The cause: each host had its own separate Authentik database. The OAuth2 flow makes two separate HTTP requests:

  1. /authorize → creates authorization code → stored in Database A
  2. /application/o/token/ → looks up authorization code → hits Database B → not found

Cloudflare load-balances requests, so steps 1 and 2 can hit different hosts.

Fix: Both connectors point to a single shared Postgres+Redis hosted on kscloud1. It is bound only to kscloud1's Tailscale IP (100.123.x.x) — never the public IP. Only devices on the Tailscale network can connect.

Forgejo also uses this shared Postgres (separate database on the same server). Both monk's and kscloud1's Forgejo read from the same data, so git repos are consistent regardless of which connector serves the request.


The Docker Network

Every container joins the kitestacks external Docker bridge network:

# Create once on each host:
docker network create kitestacks

All service containers and the cloudflared container join this network. Docker provides built-in DNS: when cloudflared needs to route to Grafana, it resolves the hostname grafana to that container's IP address on the bridge network.

cloudflared → "grafana" → Docker DNS → 172.x.x.x:3000 → grafana container

Without this shared network, cloudflared cannot reach services by name.


Why No Open Ports on the Home Router

Traditional approach: open port 80 and 443 on the router → NAT to home server → home IP in DNS.

Problems:

  • Home IP is exposed publicly (DDoS target, ISP tracks it)
  • Dynamic home IP breaks DNS when it changes
  • Some ISPs block residential port 80/443
  • Router misconfiguration = exposed server

Cloudflare Tunnel approach:

  • cloudflared makes one outbound HTTPS connection to Cloudflare edge servers
  • Cloudflare holds that connection open permanently
  • All inbound traffic arrives over that existing outbound connection
  • The home router sees only one outbound HTTPS connection — nothing unusual
  • Home IP is never in DNS, never exposed

Result: A public website running on a home PC with zero router configuration and no exposed home IP address.


Tailscale — The Private Backbone

Tailscale creates an encrypted overlay network across all your devices. Every device gets a stable 100.x.x.x IP regardless of physical location.

monk       100.85.x.x  ←── WireGuard ───► 100.123.x.x  kscloud1
samurai    100.74.x.x  ←── WireGuard ───► 100.123.x.x  kscloud1
phone      100.x.x.x   ←── WireGuard ───► 100.123.x.x  kscloud1

Used in this homelab for:

  1. Shared Authentik DB: kscloud1 Postgres and Redis are bound to 100.123.x.x only. Monk's Authentik connects to that address. Traffic is encrypted peer-to-peer.

  2. SSH admin access: SSH to kscloud1 from anywhere using its Tailscale IP. Even behind a hotel firewall or mobile data — Tailscale routes around it.

  3. Uptime monitoring: The Conky desktop widget on monk reads Uptime Kuma status from kscloud1 directly via Tailscale (not through Cloudflare), so it shows the true kscloud1-side status.


The Monitoring Stack

                  ┌──────────────┐
monk's            │  node-exporter│ ← exposes CPU/RAM/disk/network
node-exporter     │  port 9100    │
                  └──────┬───────┘
                         │ scrape every 15s
                  ┌──────▼───────┐
kscloud1's  ───► │  prometheus   │ (also scrapes kscloud1:9100 via public IP)
metrics           └──────┬───────┘
                         │
                  ┌──────▼───────┐
                  │   grafana    │ ← visualize both hosts, switch via instance picker
                  └──────────────┘

Uptime Kuma → HTTP checks every 60s → all 13 public service URLs
Conky widget → reads Uptime Kuma API on kscloud1 → shows live dot per service

The Portal Architecture

The portal is a custom static site — not a pre-built dashboard:

nginx container ("homepage")
  ├── /           → static HTML/CSS/JS (cyberpunk theme, service cards)
  └── /api/*      → proxy_pass → kitestacks-metrics-api on host

kitestacks-metrics-api (Python FastAPI, network_mode: host, pid: host)
  ├── GET /api/metrics   → psutil reads HOST CPU/RAM/disk/network
  ├── GET /api/weather   → wttr.in API → current conditions
  ├── GET /api/activity  → Forgejo API → recent commits across all repos
  └── GET /api/health    → {"ok": true}

network_mode: host — the container shares the host's network namespace. Without it, psutil would report the container's stats, not the laptop's.

pid: host — the container can see the host's process table via /proc. Without it, system stats would be wrong.