Complete documentation suite for KiteStacks covering all 11 services across 2-host active-active architecture. Includes beginner track (with AI, 8 files) and advanced track (without AI, 7 files) with time estimates, real troubleshooting cases, and command-by-command explanations. Updates certifications roadmap to reflect July 7 2026 A+ Core 2 exam goal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
280 lines
12 KiB
Markdown
280 lines
12 KiB
Markdown
# KiteStacks Architecture — Full System Overview
|
|
|
|
**Last Updated:** 2026-06-19
|
|
|
|
---
|
|
|
|
## The Big Picture
|
|
|
|
```
|
|
INTERNET
|
|
│
|
|
┌──────▼──────┐
|
|
│ Cloudflare │ DNS + TLS termination
|
|
│ (edge) │ Tunnel routing
|
|
└──────┬──────┘
|
|
│ HTTPS only — home IP never exposed
|
|
┌──────────────┴──────────────┐
|
|
│ connector 1 │ connector 2
|
|
│ │
|
|
┌──────▼──────┐ ┌──────▼──────┐
|
|
│ MONK │ │ KSCLOUD1 │
|
|
│ (ThinkPad │ │ (Hetzner VPS│
|
|
│ T14s, home)│ │ Germany) │
|
|
│ │ │ │
|
|
│ Development │ │ ALWAYS LIVE │
|
|
│ Pushes to → │ │ Receives ← │
|
|
│ kscloud1 │ │ from monk │
|
|
└──────┬──────┘ └──────┬──────┘
|
|
│ │
|
|
└─────────── TAILSCALE ───────┘
|
|
(100.x.x.x range)
|
|
Encrypted peer-to-peer
|
|
│
|
|
┌────────────▼────────────┐
|
|
│ SHARED DATABASE LAYER │
|
|
│ hosted on kscloud1 │
|
|
│ │
|
|
│ PostgreSQL :5432 │
|
|
│ Redis :6379 │
|
|
│ │
|
|
│ Bound to Tailscale IP │
|
|
│ only — not public │
|
|
└─────────────────────────┘
|
|
```
|
|
|
|
**The key idea:** Cloudflare holds two persistent outbound connections — one from monk,
|
|
one from kscloud1. Every request to kitestacks.com arrives at Cloudflare, which routes
|
|
it to whichever connector responds. If monk goes offline, kscloud1 handles everything.
|
|
Your home IP is never involved.
|
|
|
|
---
|
|
|
|
## How Work Flows Between the Two Hosts
|
|
|
|
```
|
|
monk (dev) ──push──► kscloud1 (prod, always live)
|
|
```
|
|
|
|
- **monk** is where changes are made: editing config files, testing new services, writing code
|
|
- **kscloud1** receives those changes and is always serving live traffic
|
|
- If monk is off, kscloud1 continues serving the last pushed state — users see no downtime
|
|
- A third machine (Samurai desktop) is planned as a future second home connector
|
|
|
|
---
|
|
|
|
## The Eleven Public Services
|
|
|
|
| Service | Container | URL | What It Does |
|
|
|---------|-----------|-----|-------------|
|
|
| Portal | `homepage` | www.kitestacks.com | Custom homepage — links, live stats, cyberpunk theme |
|
|
| Authentik | `authentik` | auth.kitestacks.com | SSO identity provider — handles all logins |
|
|
| Forgejo | `forgejo` | gitforge.kitestacks.com | Self-hosted Git (like GitHub) |
|
|
| Open WebUI | `kite-openwebui` | ai.kitestacks.com | AI chat interface |
|
|
| Karakeep | `karakeep` | links.kitestacks.com | Bookmark and read-it-later manager |
|
|
| Kavita | `kavita` | kavita.kitestacks.com | eBook and manga reader |
|
|
| Grafana | `grafana` | grafana.kitestacks.com | Monitoring dashboards |
|
|
| Uptime Kuma | `uptime-kuma` | status.kitestacks.com | Public status page and uptime monitoring |
|
|
| BookStack | `bookstack` | wiki.kitestacks.com | Self-hosted wiki / docs platform |
|
|
| OSTicket | `osticket-app` | tasks.kitestacks.com | Help desk ticketing system |
|
|
| Portainer | `portainer` | portainer.kitestacks.com | Docker management dashboard |
|
|
|
|
## The Infrastructure Services (Internal Only)
|
|
|
|
| Container | What It Does |
|
|
|-----------|-------------|
|
|
| `cloudflared` | Cloudflare Tunnel connector — outbound connection to Cloudflare edge |
|
|
| `prometheus` | Metrics collector — scrapes node-exporter every 15 seconds |
|
|
| `node-exporter` | Exposes host CPU/RAM/disk/network metrics for Prometheus |
|
|
| `blackbox-exporter` | HTTP probe monitor — checks endpoints are returning 200 |
|
|
| `kite-litellm` | LLM proxy — routes AI requests to OpenRouter (many free models) |
|
|
| `kitestacks-metrics-api` | Python FastAPI — serves live stats and Forgejo activity to portal |
|
|
| `ntfy` | Push notification server — sends alerts to phone |
|
|
| `flux` | GitOps controller — watches Forgejo, deploys changes automatically |
|
|
| `authentik-worker` | Background job processor for Authentik |
|
|
| `authentik-ldap` | LDAP proxy layer for Authentik |
|
|
|
|
---
|
|
|
|
## How Traffic Flows — Step by Step
|
|
|
|
### Someone visits www.kitestacks.com
|
|
|
|
```
|
|
1. Browser → DNS lookup "www.kitestacks.com"
|
|
2. DNS returns Cloudflare's anycast IP (not your home IP)
|
|
3. Browser → HTTPS request to Cloudflare edge
|
|
4. Cloudflare reads Host header: "www.kitestacks.com"
|
|
5. Cloudflare routes request through active tunnel connector
|
|
(monk or kscloud1 — whichever responds first)
|
|
6. cloudflared resolves "homepage" via Docker DNS
|
|
7. Request hits nginx in the homepage container
|
|
8. nginx serves static HTML/CSS/JS from ./public/
|
|
9. Browser JavaScript calls /api/metrics and /api/activity
|
|
10. nginx proxies those to kitestacks-metrics-api (Python, host network)
|
|
11. metrics-api reads CPU/RAM via psutil (sees real host, not container)
|
|
12. metrics-api calls Forgejo API for recent commits
|
|
13. Browser renders complete page with live stats
|
|
```
|
|
|
|
### Someone clicks "Sign In with Authentik"
|
|
|
|
```
|
|
1. App (e.g. Grafana) redirects browser to:
|
|
https://auth.kitestacks.com/application/o/authorize/
|
|
?client_id=grafana&redirect_uri=...&response_type=code
|
|
|
|
2. Cloudflare routes this to a cloudflared connector
|
|
3. Authentik shows login page
|
|
4. User enters username + password
|
|
5. Authentik validates against shared Postgres (on kscloud1, over Tailscale)
|
|
6. Authentik creates an authorization code (row in DB) and redirects:
|
|
https://grafana.kitestacks.com/login/generic_oauth?code=abc123
|
|
|
|
7. Grafana backend POSTs to auth.kitestacks.com/application/o/token/
|
|
with code=abc123 and client_secret
|
|
|
|
8. THIS REQUEST may hit a DIFFERENT connector than step 2 did
|
|
→ This is why the shared DB matters: the code must exist in one DB,
|
|
not two separate ones that might be out of sync
|
|
|
|
9. Authentik finds code=abc123 in shared Postgres, validates it
|
|
10. Authentik returns JWT (access_token + id_token)
|
|
11. Grafana reads user's email from JWT, creates/updates local user
|
|
12. User is logged in — never re-enters password for other SSO apps
|
|
```
|
|
|
|
---
|
|
|
|
## The Shared Database — Why It Exists
|
|
|
|
After deploying two connectors (monk + kscloud1), users got `invalid_grant` errors when
|
|
signing in. The cause: each host had its own separate Authentik database. The OAuth2 flow
|
|
makes two separate HTTP requests:
|
|
|
|
1. `/authorize` → creates authorization code → stored in Database A
|
|
2. `/application/o/token/` → looks up authorization code → hits Database B → **not found**
|
|
|
|
Cloudflare load-balances requests, so steps 1 and 2 can hit different hosts.
|
|
|
|
**Fix:** Both connectors point to a single shared Postgres+Redis hosted on kscloud1.
|
|
It is bound only to kscloud1's Tailscale IP (`100.123.x.x`) — never the public IP.
|
|
Only devices on the Tailscale network can connect.
|
|
|
|
**Forgejo** also uses this shared Postgres (separate database on the same server).
|
|
Both monk's and kscloud1's Forgejo read from the same data, so git repos are consistent
|
|
regardless of which connector serves the request.
|
|
|
|
---
|
|
|
|
## The Docker Network
|
|
|
|
Every container joins the `kitestacks` external Docker bridge network:
|
|
|
|
```bash
|
|
# Create once on each host:
|
|
docker network create kitestacks
|
|
```
|
|
|
|
All service containers and the cloudflared container join this network. Docker provides
|
|
built-in DNS: when cloudflared needs to route to Grafana, it resolves the hostname `grafana`
|
|
to that container's IP address on the bridge network.
|
|
|
|
```
|
|
cloudflared → "grafana" → Docker DNS → 172.x.x.x:3000 → grafana container
|
|
```
|
|
|
|
Without this shared network, cloudflared cannot reach services by name.
|
|
|
|
---
|
|
|
|
## Why No Open Ports on the Home Router
|
|
|
|
Traditional approach: open port 80 and 443 on the router → NAT to home server → home IP in DNS.
|
|
|
|
Problems:
|
|
- Home IP is exposed publicly (DDoS target, ISP tracks it)
|
|
- Dynamic home IP breaks DNS when it changes
|
|
- Some ISPs block residential port 80/443
|
|
- Router misconfiguration = exposed server
|
|
|
|
**Cloudflare Tunnel approach:**
|
|
- cloudflared makes one outbound HTTPS connection to Cloudflare edge servers
|
|
- Cloudflare holds that connection open permanently
|
|
- All inbound traffic arrives over that existing outbound connection
|
|
- The home router sees only one outbound HTTPS connection — nothing unusual
|
|
- Home IP is never in DNS, never exposed
|
|
|
|
**Result:** A public website running on a home PC with zero router configuration and
|
|
no exposed home IP address.
|
|
|
|
---
|
|
|
|
## Tailscale — The Private Backbone
|
|
|
|
Tailscale creates an encrypted overlay network across all your devices.
|
|
Every device gets a stable `100.x.x.x` IP regardless of physical location.
|
|
|
|
```
|
|
monk 100.85.x.x ←── WireGuard ───► 100.123.x.x kscloud1
|
|
samurai 100.74.x.x ←── WireGuard ───► 100.123.x.x kscloud1
|
|
phone 100.x.x.x ←── WireGuard ───► 100.123.x.x kscloud1
|
|
```
|
|
|
|
Used in this homelab for:
|
|
|
|
1. **Shared Authentik DB:** kscloud1 Postgres and Redis are bound to `100.123.x.x` only.
|
|
Monk's Authentik connects to that address. Traffic is encrypted peer-to-peer.
|
|
|
|
2. **SSH admin access:** SSH to kscloud1 from anywhere using its Tailscale IP.
|
|
Even behind a hotel firewall or mobile data — Tailscale routes around it.
|
|
|
|
3. **Uptime monitoring:** The Conky desktop widget on monk reads Uptime Kuma status
|
|
from kscloud1 directly via Tailscale (not through Cloudflare), so it shows the
|
|
true kscloud1-side status.
|
|
|
|
---
|
|
|
|
## The Monitoring Stack
|
|
|
|
```
|
|
┌──────────────┐
|
|
monk's │ node-exporter│ ← exposes CPU/RAM/disk/network
|
|
node-exporter │ port 9100 │
|
|
└──────┬───────┘
|
|
│ scrape every 15s
|
|
┌──────▼───────┐
|
|
kscloud1's ───► │ prometheus │ (also scrapes kscloud1:9100 via public IP)
|
|
metrics └──────┬───────┘
|
|
│
|
|
┌──────▼───────┐
|
|
│ grafana │ ← visualize both hosts, switch via instance picker
|
|
└──────────────┘
|
|
|
|
Uptime Kuma → HTTP checks every 60s → all 13 public service URLs
|
|
Conky widget → reads Uptime Kuma API on kscloud1 → shows live dot per service
|
|
```
|
|
|
|
---
|
|
|
|
## The Portal Architecture
|
|
|
|
The portal is a custom static site — not a pre-built dashboard:
|
|
|
|
```
|
|
nginx container ("homepage")
|
|
├── / → static HTML/CSS/JS (cyberpunk theme, service cards)
|
|
└── /api/* → proxy_pass → kitestacks-metrics-api on host
|
|
|
|
kitestacks-metrics-api (Python FastAPI, network_mode: host, pid: host)
|
|
├── GET /api/metrics → psutil reads HOST CPU/RAM/disk/network
|
|
├── GET /api/weather → wttr.in API → current conditions
|
|
├── GET /api/activity → Forgejo API → recent commits across all repos
|
|
└── GET /api/health → {"ok": true}
|
|
```
|
|
|
|
`network_mode: host` — the container shares the host's network namespace.
|
|
Without it, psutil would report the container's stats, not the laptop's.
|
|
|
|
`pid: host` — the container can see the host's process table via `/proc`.
|
|
Without it, system stats would be wrong.
|