Rewrite architecture overview and build guide in simple plain-English

Both docs now use everyday analogies (Cloudflare = post office, Authentik = doorman)
instead of technical jargon, making them accessible to anyone learning the project.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
kenpat 2026-06-18 18:46:47 -05:00
parent c3eedf91af
commit 4f91c42780
2 changed files with 145 additions and 157 deletions

View file

@ -1,186 +1,173 @@
# KiteStacks Architecture Overview # KiteStacks Architecture — How It All Works
**Last updated:** 2026-06-18 **Last updated:** 2026-06-18
**Status:** Active production homelab
--- ---
## High-Level Architecture ## The Simple Version
KiteStacks is two computers working together to run a bunch of websites.
- **monk** — your home machine. Runs almost everything.
- **kscloud1** — a rented computer in Germany (Hetzner). Backs everything up.
People visit the websites through **Cloudflare**, which acts like a secret post-office.
Cloudflare knows where monk and kscloud1 are, but the rest of the internet doesn't.
That means your home address never gets exposed.
``` ```
┌──────────────────────────────────────────────────┐ You (on any device) → Cloudflare (the post office) → monk or kscloud1
│ Public Internet │
│ (via Cloudflare Tunnel) │
└───────────────────────┬──────────────────────────┘
┌─────────────▼──────────────┐
│ Cloudflare Zero Trust │
│ Active-Active Tunnel │
└──────┬────────────┬────────┘
│ │
┌────────────▼───┐ ┌─────▼──────────────┐
│ monk (home) │ │ kscloud1 (Hetzner)│
│ cloudflared │ │ cloudflared │
│ All services │ │ Replica services │
│ Tailscale mesh │ │ Shared Authentik DB │
└────────────────┘ └─────────────────────┘
│ │
└────────────────────┘
Tailscale overlay
(private network)
``` ```
The two machines share one Cloudflare Tunnel token, so Cloudflare load-balances across both connectors automatically. If monk goes offline, kscloud1 continues serving all public subdomains within seconds. If monk goes offline, Cloudflare automatically sends traffic to kscloud1 instead.
Both are always ready to handle requests — this is called **active-active**.
--- ---
## Service Map ## What Each Service Does
### Identity & Access ### Login (Identity)
| Service | Host | URL | Purpose | | Service | What it does |
|---------|------|-----|---------| |---------|-------------|
| Authentik server | monk | auth.kitestacks.com | SSO identity provider | | **Authentik** | The doorman — checks who you are before letting you into any site |
| Authentik worker | monk | (internal) | Background jobs, flow execution | | Authentik worker | Runs background jobs for Authentik |
| Authentik LDAP | monk | (internal) | LDAP proxy for non-OIDC apps | | Authentik PostgreSQL | The address book — stores all usernames and passwords (on kscloud1) |
| Authentik PostgreSQL | kscloud1 | (Tailscale only) | Shared auth database | | Authentik Redis | Fast memory — remembers who is logged in so you don't need to log in again |
| Authentik Redis | kscloud1 | (Tailscale only) | Session cache |
### Infrastructure ### Infrastructure
| Service | Host | URL | Purpose | | Service | What it does |
|---------|------|-----|---------| |---------|-------------|
| cloudflared | monk + kscloud1 | (no UI) | CF Tunnel connector | | **cloudflared** | Runs on both machines — creates the secret tunnel to Cloudflare |
| Portainer | monk | portainer.kitestacks.com | Docker container management | | **Portainer** | A control panel to manage all the little program-boxes (containers) |
| Forgejo | monk | gitforge.kitestacks.com | Self-hosted Git (repos + CI) | | **Forgejo** | Like GitHub but yours — stores all the code and scripts |
| Uptime Kuma | monk | status.kitestacks.com | Service uptime monitoring | | **Uptime Kuma** | A watchdog — alerts when any service goes down |
### Observability ### Monitoring
| Service | Host | URL | Purpose | | Service | What it does |
|---------|------|-----|---------| |---------|-------------|
| Prometheus | monk | (internal) | Metrics collection | | **Prometheus** | Collects numbers (CPU, memory, disk) from both machines every 15 seconds |
| Grafana | monk | grafana.kitestacks.com | Metrics dashboards | | **Grafana** | Turns those numbers into charts you can watch |
| Node Exporter | monk | (internal) | Host OS metrics | | **Node Exporter** | Runs on each machine and reports its health to Prometheus |
| Blackbox Exporter | monk | (internal) | External endpoint probing |
### Knowledge & Productivity ### Apps
| Service | Host | URL | Purpose | | Service | What it does |
|---------|------|-----|---------| |---------|-------------|
| BookStack | monk + kscloud1 | wiki.kitestacks.com | Internal wiki / documentation | | **BookStack** | A private wiki — all notes and guides live here |
| Karakeep | monk | links.kitestacks.com | Bookmark manager | | **Karakeep** | Saves bookmarks and website archives |
| Kavita | monk | kavita.kitestacks.com | Ebook/manga reader | | **Kavita** | Reads ebooks and manga |
| OSTicket | monk | tasks.kitestacks.com | Help desk / ticket system | | **OSTicket** | Help-desk system — tracks tasks and tickets |
| ntfy | monk | (push notifications) | Push notifications | | **Open WebUI** | Chat with AI (GPT-4, Claude, or local models) |
| **LiteLLM** | Routes AI requests to the right model |
### AI Stack | **KiteStacks Portal** | The homepage at www.kitestacks.com |
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| Open WebUI | monk | ai.kitestacks.com | Chat interface (GPT-4, Claude, local) |
| LiteLLM | monk | (internal) | LLM API proxy / model router |
### Portal
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| KiteStacks Portal | monk + kscloud1 | www.kitestacks.com | Custom homepage / service launcher |
| Metrics API | monk | (internal at /api) | FastAPI — live stats for portal |
--- ---
## Authentication Flow ## How Login Works (SSO)
Every service uses Authentik SSO via OIDC or OAuth2: Every website on KiteStacks uses **Authentik** for login. You log in once, and every
website trusts that. This is called **Single Sign-On (SSO)**.
Here's what happens when you visit a site:
``` ```
Browser → https://service.kitestacks.com 1. You go to wiki.kitestacks.com (BookStack)
2. BookStack checks: "Are you logged in?" — No.
└─► Service: "Not logged in" → redirect to Authentik 3. BookStack sends you to auth.kitestacks.com (Authentik)
4. Authentik asks for your username and password
5. You log in — Authentik issues a proof-token
https://auth.kitestacks.com/if/flow/... 6. Authentik sends you back to BookStack with the proof
7. BookStack reads the proof and creates your session
├─ User logs in with username + password 8. You're in!
├─ Authentik validates credentials
└─ Issues authorization code → redirect back to service
Service exchanges code for tokens
Decodes JWT to get user info (email)
Creates local session
``` ```
**BookStack-specific note:** `OIDC_ISSUER_DISCOVER=true` and `OIDC_ISSUER` must point to the per-app URL (`/application/o/bookstack/`), not the global Authentik URL. The Authentik provider must have `issuer_mode='per_provider'`. This system uses a standard called **OIDC** (OpenID Connect). Every website speaks OIDC,
so they all work the same way with Authentik as the login source.
--- ---
## Network Architecture ## How the Network Works
### External Access ### Public traffic (the websites)
All public traffic enters via Cloudflare Tunnel. No ports are open on monk's router. kscloud1 (Hetzner) has no firewall rules open for HTTP/HTTPS either — all access via the same tunnel. All public traffic enters through **Cloudflare Tunnel**.
### Internal Networking - Both monk and kscloud1 run a small program called `cloudflared`
- All Docker containers attach to the `kitestacks` bridge network - `cloudflared` connects outward to Cloudflare — no ports need to be open on your router
- Containers communicate using container names as DNS (e.g., `bookstack-db`, `prometheus`) - Cloudflare sends visitor traffic through whichever connector is healthy
- Docker's embedded DNS server (`127.0.0.11`) resolves container names automatically - If monk is off, kscloud1 handles everything within seconds
### Tailscale Overlay ### Private traffic (machine-to-machine)
Tailscale creates an encrypted mesh between monk and kscloud1: monk and kscloud1 talk to each other through **Tailscale** — a private encrypted network.
- Used for: Authentik PostgreSQL/Redis access, SSH to kscloud1, Prometheus scraping kscloud1 metrics
- Not used for: public traffic (that goes through Cloudflare) Tailscale is used for:
- monk reaching the database (PostgreSQL) on kscloud1 for Authentik logins
- SSH from monk to kscloud1 for management
- Prometheus on monk scraping metrics from kscloud1
Nothing on Tailscale is visible to the public internet.
--- ---
## Storage Layout ## Where Files Live
### monk ### On monk
``` ```
~/kitestacks-live/docker/ ~/kitestacks-live/docker/
├── authentik/ # media, custom-templates ├── authentik/ ← login system
├── bookstack/ # config/, db/ ├── bookstack/ ← wiki + its database
├── cloudflared/ # .env (TUNNEL_TOKEN) ├── cloudflared/ ← cloudflare tunnel connector
├── forgejo/ # data/ ├── forgejo/ ← git server
├── grafana/ # grafana_data volume ├── grafana/ ← monitoring charts
├── karakeep/ # data/ ├── karakeep/ ← bookmarks
├── kavita/ # config/ ├── kavita/ ← ebook reader
├── kitestacks-portal/ # static HTML + nginx ├── kitestacks-portal/ ← homepage
├── osticket/ # db/, uploads/ ├── osticket/ ← help desk
├── portainer/ # portainer_data volume ├── portainer/ ← container dashboard
└── prometheus/ # prometheus.yml, prometheus_data volume └── prometheus/ ← metrics collector
``` ```
### kscloud1 ### On kscloud1
``` ```
/opt/kitestacks/docker/ /opt/kitestacks/docker/
├── authentik/ # postgresql data volume, redis data ├── authentik/ ← PostgreSQL + Redis (shared with monk's Authentik)
├── bookstack/ # config/, db/ ├── bookstack/ ← backup wiki
├── cloudflared/ # .env (same TUNNEL_TOKEN) └── cloudflared/ ← backup tunnel connector
└── ... # replica services
``` ```
--- ---
## Resilience Model ## What Happens When Things Break
| Scenario | Impact | Recovery | | What breaks | What users see | Comes back automatically? |
|----------|--------|----------| |-------------|----------------|--------------------------|
| monk goes offline | All monk services unreachable; kscloud1 serves portal + wiki | Automatic (CF Tunnel failover) | | monk offline | monk services down; portal + wiki still work on kscloud1 | Yes — Cloudflare switches to kscloud1 |
| kscloud1 goes offline | Authentik logins may fail (DB unreachable); all other services up | Restart kscloud1 or point Authentik to local postgres | | kscloud1 offline | Authentik logins may fail (database unreachable) | No — restart kscloud1 or switch to local DB |
| Cloudflare Tunnel down | All public access lost; Tailscale still works | Check CF dashboard; restart cloudflared | | Cloudflare tunnel down | All public websites unreachable | No — check CF dashboard, restart cloudflared |
| MariaDB crash (BookStack) | BookStack down | `docker restart bookstack-db` then `docker restart bookstack` | | BookStack database crashes | BookStack shows an error | Run: `docker restart bookstack-db && docker restart bookstack` |
| Portainer lockout | No Docker UI | Use `portainer/helper-reset-password` | | Portainer lockout | Can't manage containers from the web | Run the password reset helper (see RUNBOOK.md) |
--- ---
## Key Design Decisions ## Key Design Decisions
**Why Cloudflare Tunnel instead of port-forwarding?** **Why Cloudflare Tunnel instead of opening router ports?**
Port-forwarding exposes your home IP, requires a static IP, and can't failover. CF Tunnel is free, hides your IPs, and trivially supports multi-origin failover. Opening ports exposes your home IP address. Anyone can then scan it, try to break in,
or use it to locate you. Cloudflare Tunnel creates a private outbound connection — your
IP stays hidden. It's also free and supports automatic failover.
**Why active-active instead of active-passive?** **Why active-active instead of active-passive?**
Active-passive requires detecting failure and switching. Active-active — same token, two connectors — Cloudflare handles routing automatically. Simpler and zero RPO. Active-passive requires detecting failure and switching over, which takes time. Active-active
is simpler — both machines are always handling traffic, so Cloudflare just stops sending
to the broken one automatically.
**Why Authentik over Keycloak or Authelia?** **Why Authentik for login instead of passwords per app?**
Authentik is easier to self-host (Docker Compose, sensible defaults), has a good UI, and supports LDAP + OIDC + SAML. Authelia lacks SAML. Keycloak is heavier and more complex. If every app has its own password, you manage dozens of credentials and each app stores
its own user database. Authentik is one place — one login to change, one place to block
a user. Every app just asks Authentik "is this person who they say they are?"
**Why BookStack over Notion/Confluence?** **Why Forgejo instead of just GitHub?**
Self-hosted, no external API calls, Markdown-first, OIDC SSO. Data stays in-house. GitHub can disappear, change pricing, or expose your private repos. Forgejo is
self-hosted — runs on monk, uses almost no RAM, and keeps everything in-house.
**Why Forgejo over GitLab?** **Why BookStack instead of Notion?**
Forgejo is lightweight (~200MB RAM vs GitLab's 4GB+). Full git server with CI runners, issues, PRs. GitLab is overkill for a homelab. Notion is a third-party service that can change pricing or lose your data. BookStack is
self-hosted — the data is on your machine, and you own it completely.

View file

@ -1,52 +1,53 @@
# KiteStacks Build Guide # KiteStacks Build Guide
This guide walks you through rebuilding the entire KiteStacks homelab from scratch on a blank machine. Two paths are available — choose the one that fits how you work. This guide walks you through rebuilding the entire KiteStacks homelab from scratch
on a blank machine. Two paths are available — choose the one that fits how you work.
--- ---
## Choose Your Path ## Choose Your Path
### Path A — With AI (Claude Code) ### Path A — With AI (Claude Code)
You provide the high-level goals, Claude Code writes the configs, debugs the errors, and explains every decision. Fastest path. Best for learning while doing. Tell Claude Code what you want to build. Claude writes the configs, debugs errors,
and explains every decision as it goes. Fastest path. Great for learning while doing.
→ [Build with AI](./with-ai/README.md) → [Build with AI](./with-ai/README.md)
### Path B — Manual (No AI) ### Path B — Do It Yourself
Step-by-step instructions you follow yourself. Every command, every config, every file. Best for deep understanding and exam prep (answering "how does this work" in interviews). Step-by-step instructions where you type every command yourself. Every config, every
file, explained. Best for really understanding how things work — great for exam prep.
→ [Build Manually](./without-ai/README.md) → [Build Manually](./without-ai/README.md)
--- ---
## Prerequisites (Both Paths) ## What You Need Before Starting (Both Paths)
Before starting either path, have the following ready: | What you need | Details |
|---------------|---------|
| Requirement | Details | | A Linux computer | Ubuntu 24.04 recommended. At least 16GB RAM, 500GB SSD |
|-------------|---------| | A Cloudflare account | Free tier. You need a domain name pointed to Cloudflare |
| A Linux machine | Ubuntu 24.04+ or CachyOS/Arch recommended. At least 16GB RAM, 500GB SSD |
| A Cloudflare account | Free tier is fine. You need a domain pointed to Cloudflare |
| A domain name | Any registrar works — point nameservers to Cloudflare | | A domain name | Any registrar works — point nameservers to Cloudflare |
| A Hetzner account (optional) | For the cloud replica (kscloud1). CAX11 or CX22 works | | A Hetzner account (optional) | For the cloud backup machine (kscloud1). Any small VPS works |
| A Tailscale account | Free tier — needed for the private overlay network | | A Tailscale account | Free — creates the private network between machines |
| Docker + Docker Compose | Install before starting either path | | Docker installed | The foundation everything runs on |
--- ---
## High-Level Build Order ## Build Order (Both Paths Follow This)
Regardless of which path you take, build in this order: Build in this order — each step depends on the one before it:
``` ```
1. Docker + networking foundation Step 1: Install Docker and set up networking
2. Cloudflare Tunnel (cloudflared) Step 2: Set up Cloudflare Tunnel (the secret post-office connection)
3. Authentik (SSO identity provider) Step 3: Set up Authentik (the single login system)
4. Core services (Portainer, Forgejo, BookStack) Step 4: Set up core services (Portainer, Forgejo, BookStack)
5. Monitoring (Prometheus, Node Exporter, Grafana) Step 5: Set up monitoring (Prometheus, Node Exporter, Grafana)
6. Application services (Karakeep, Kavita, OSTicket) Step 6: Set up app services (Karakeep, Kavita, OSTicket)
7. AI services (Open WebUI, LiteLLM) Step 7: Set up AI services (Open WebUI, LiteLLM)
8. Portal (homepage + metrics API) Step 8: Set up the portal (main homepage)
9. kscloud1 cloud replica Step 9: Add the cloud backup machine (kscloud1)
``` ```
Each layer depends on the one before it. Don't skip ahead. Don't skip ahead — if you skip Authentik, none of the SSO logins will work.