Rewrite architecture overview and build guide in simple plain-English

Both docs now use everyday analogies (Cloudflare = post office, Authentik = doorman)
instead of technical jargon, making them accessible to anyone learning the project.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
kenpat 2026-06-18 18:46:47 -05:00
parent c3eedf91af
commit 4f91c42780
2 changed files with 145 additions and 157 deletions

View file

@ -1,186 +1,173 @@
# KiteStacks Architecture Overview
# KiteStacks Architecture — How It All Works
**Last updated:** 2026-06-18
**Status:** Active production homelab
**Last updated:** 2026-06-18
---
## High-Level Architecture
## The Simple Version
KiteStacks is two computers working together to run a bunch of websites.
- **monk** — your home machine. Runs almost everything.
- **kscloud1** — a rented computer in Germany (Hetzner). Backs everything up.
People visit the websites through **Cloudflare**, which acts like a secret post-office.
Cloudflare knows where monk and kscloud1 are, but the rest of the internet doesn't.
That means your home address never gets exposed.
```
┌──────────────────────────────────────────────────┐
│ Public Internet │
│ (via Cloudflare Tunnel) │
└───────────────────────┬──────────────────────────┘
┌─────────────▼──────────────┐
│ Cloudflare Zero Trust │
│ Active-Active Tunnel │
└──────┬────────────┬────────┘
│ │
┌────────────▼───┐ ┌─────▼──────────────┐
│ monk (home) │ │ kscloud1 (Hetzner)│
│ cloudflared │ │ cloudflared │
│ All services │ │ Replica services │
│ Tailscale mesh │ │ Shared Authentik DB │
└────────────────┘ └─────────────────────┘
│ │
└────────────────────┘
Tailscale overlay
(private network)
You (on any device) → Cloudflare (the post office) → monk or kscloud1
```
The two machines share one Cloudflare Tunnel token, so Cloudflare load-balances across both connectors automatically. If monk goes offline, kscloud1 continues serving all public subdomains within seconds.
If monk goes offline, Cloudflare automatically sends traffic to kscloud1 instead.
Both are always ready to handle requests — this is called **active-active**.
---
## Service Map
## What Each Service Does
### Identity & Access
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| Authentik server | monk | auth.kitestacks.com | SSO identity provider |
| Authentik worker | monk | (internal) | Background jobs, flow execution |
| Authentik LDAP | monk | (internal) | LDAP proxy for non-OIDC apps |
| Authentik PostgreSQL | kscloud1 | (Tailscale only) | Shared auth database |
| Authentik Redis | kscloud1 | (Tailscale only) | Session cache |
### Login (Identity)
| Service | What it does |
|---------|-------------|
| **Authentik** | The doorman — checks who you are before letting you into any site |
| Authentik worker | Runs background jobs for Authentik |
| Authentik PostgreSQL | The address book — stores all usernames and passwords (on kscloud1) |
| Authentik Redis | Fast memory — remembers who is logged in so you don't need to log in again |
### Infrastructure
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| cloudflared | monk + kscloud1 | (no UI) | CF Tunnel connector |
| Portainer | monk | portainer.kitestacks.com | Docker container management |
| Forgejo | monk | gitforge.kitestacks.com | Self-hosted Git (repos + CI) |
| Uptime Kuma | monk | status.kitestacks.com | Service uptime monitoring |
| Service | What it does |
|---------|-------------|
| **cloudflared** | Runs on both machines — creates the secret tunnel to Cloudflare |
| **Portainer** | A control panel to manage all the little program-boxes (containers) |
| **Forgejo** | Like GitHub but yours — stores all the code and scripts |
| **Uptime Kuma** | A watchdog — alerts when any service goes down |
### Observability
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| Prometheus | monk | (internal) | Metrics collection |
| Grafana | monk | grafana.kitestacks.com | Metrics dashboards |
| Node Exporter | monk | (internal) | Host OS metrics |
| Blackbox Exporter | monk | (internal) | External endpoint probing |
### Monitoring
| Service | What it does |
|---------|-------------|
| **Prometheus** | Collects numbers (CPU, memory, disk) from both machines every 15 seconds |
| **Grafana** | Turns those numbers into charts you can watch |
| **Node Exporter** | Runs on each machine and reports its health to Prometheus |
### Knowledge & Productivity
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| BookStack | monk + kscloud1 | wiki.kitestacks.com | Internal wiki / documentation |
| Karakeep | monk | links.kitestacks.com | Bookmark manager |
| Kavita | monk | kavita.kitestacks.com | Ebook/manga reader |
| OSTicket | monk | tasks.kitestacks.com | Help desk / ticket system |
| ntfy | monk | (push notifications) | Push notifications |
### AI Stack
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| Open WebUI | monk | ai.kitestacks.com | Chat interface (GPT-4, Claude, local) |
| LiteLLM | monk | (internal) | LLM API proxy / model router |
### Portal
| Service | Host | URL | Purpose |
|---------|------|-----|---------|
| KiteStacks Portal | monk + kscloud1 | www.kitestacks.com | Custom homepage / service launcher |
| Metrics API | monk | (internal at /api) | FastAPI — live stats for portal |
### Apps
| Service | What it does |
|---------|-------------|
| **BookStack** | A private wiki — all notes and guides live here |
| **Karakeep** | Saves bookmarks and website archives |
| **Kavita** | Reads ebooks and manga |
| **OSTicket** | Help-desk system — tracks tasks and tickets |
| **Open WebUI** | Chat with AI (GPT-4, Claude, or local models) |
| **LiteLLM** | Routes AI requests to the right model |
| **KiteStacks Portal** | The homepage at www.kitestacks.com |
---
## Authentication Flow
## How Login Works (SSO)
Every service uses Authentik SSO via OIDC or OAuth2:
Every website on KiteStacks uses **Authentik** for login. You log in once, and every
website trusts that. This is called **Single Sign-On (SSO)**.
Here's what happens when you visit a site:
```
Browser → https://service.kitestacks.com
└─► Service: "Not logged in" → redirect to Authentik
https://auth.kitestacks.com/if/flow/...
├─ User logs in with username + password
├─ Authentik validates credentials
└─ Issues authorization code → redirect back to service
Service exchanges code for tokens
Decodes JWT to get user info (email)
Creates local session
1. You go to wiki.kitestacks.com (BookStack)
2. BookStack checks: "Are you logged in?" — No.
3. BookStack sends you to auth.kitestacks.com (Authentik)
4. Authentik asks for your username and password
5. You log in — Authentik issues a proof-token
6. Authentik sends you back to BookStack with the proof
7. BookStack reads the proof and creates your session
8. You're in!
```
**BookStack-specific note:** `OIDC_ISSUER_DISCOVER=true` and `OIDC_ISSUER` must point to the per-app URL (`/application/o/bookstack/`), not the global Authentik URL. The Authentik provider must have `issuer_mode='per_provider'`.
This system uses a standard called **OIDC** (OpenID Connect). Every website speaks OIDC,
so they all work the same way with Authentik as the login source.
---
## Network Architecture
## How the Network Works
### External Access
All public traffic enters via Cloudflare Tunnel. No ports are open on monk's router. kscloud1 (Hetzner) has no firewall rules open for HTTP/HTTPS either — all access via the same tunnel.
### Public traffic (the websites)
All public traffic enters through **Cloudflare Tunnel**.
### Internal Networking
- All Docker containers attach to the `kitestacks` bridge network
- Containers communicate using container names as DNS (e.g., `bookstack-db`, `prometheus`)
- Docker's embedded DNS server (`127.0.0.11`) resolves container names automatically
- Both monk and kscloud1 run a small program called `cloudflared`
- `cloudflared` connects outward to Cloudflare — no ports need to be open on your router
- Cloudflare sends visitor traffic through whichever connector is healthy
- If monk is off, kscloud1 handles everything within seconds
### Tailscale Overlay
Tailscale creates an encrypted mesh between monk and kscloud1:
- Used for: Authentik PostgreSQL/Redis access, SSH to kscloud1, Prometheus scraping kscloud1 metrics
- Not used for: public traffic (that goes through Cloudflare)
### Private traffic (machine-to-machine)
monk and kscloud1 talk to each other through **Tailscale** — a private encrypted network.
Tailscale is used for:
- monk reaching the database (PostgreSQL) on kscloud1 for Authentik logins
- SSH from monk to kscloud1 for management
- Prometheus on monk scraping metrics from kscloud1
Nothing on Tailscale is visible to the public internet.
---
## Storage Layout
## Where Files Live
### monk
### On monk
```
~/kitestacks-live/docker/
├── authentik/ # media, custom-templates
├── bookstack/ # config/, db/
├── cloudflared/ # .env (TUNNEL_TOKEN)
├── forgejo/ # data/
├── grafana/ # grafana_data volume
├── karakeep/ # data/
├── kavita/ # config/
├── kitestacks-portal/ # static HTML + nginx
├── osticket/ # db/, uploads/
├── portainer/ # portainer_data volume
└── prometheus/ # prometheus.yml, prometheus_data volume
├── authentik/ ← login system
├── bookstack/ ← wiki + its database
├── cloudflared/ ← cloudflare tunnel connector
├── forgejo/ ← git server
├── grafana/ ← monitoring charts
├── karakeep/ ← bookmarks
├── kavita/ ← ebook reader
├── kitestacks-portal/ ← homepage
├── osticket/ ← help desk
├── portainer/ ← container dashboard
└── prometheus/ ← metrics collector
```
### kscloud1
### On kscloud1
```
/opt/kitestacks/docker/
├── authentik/ # postgresql data volume, redis data
├── bookstack/ # config/, db/
├── cloudflared/ # .env (same TUNNEL_TOKEN)
└── ... # replica services
├── authentik/ ← PostgreSQL + Redis (shared with monk's Authentik)
├── bookstack/ ← backup wiki
└── cloudflared/ ← backup tunnel connector
```
---
## Resilience Model
## What Happens When Things Break
| Scenario | Impact | Recovery |
|----------|--------|----------|
| monk goes offline | All monk services unreachable; kscloud1 serves portal + wiki | Automatic (CF Tunnel failover) |
| kscloud1 goes offline | Authentik logins may fail (DB unreachable); all other services up | Restart kscloud1 or point Authentik to local postgres |
| Cloudflare Tunnel down | All public access lost; Tailscale still works | Check CF dashboard; restart cloudflared |
| MariaDB crash (BookStack) | BookStack down | `docker restart bookstack-db` then `docker restart bookstack` |
| Portainer lockout | No Docker UI | Use `portainer/helper-reset-password` |
| What breaks | What users see | Comes back automatically? |
|-------------|----------------|--------------------------|
| monk offline | monk services down; portal + wiki still work on kscloud1 | Yes — Cloudflare switches to kscloud1 |
| kscloud1 offline | Authentik logins may fail (database unreachable) | No — restart kscloud1 or switch to local DB |
| Cloudflare tunnel down | All public websites unreachable | No — check CF dashboard, restart cloudflared |
| BookStack database crashes | BookStack shows an error | Run: `docker restart bookstack-db && docker restart bookstack` |
| Portainer lockout | Can't manage containers from the web | Run the password reset helper (see RUNBOOK.md) |
---
## Key Design Decisions
**Why Cloudflare Tunnel instead of port-forwarding?**
Port-forwarding exposes your home IP, requires a static IP, and can't failover. CF Tunnel is free, hides your IPs, and trivially supports multi-origin failover.
**Why Cloudflare Tunnel instead of opening router ports?**
Opening ports exposes your home IP address. Anyone can then scan it, try to break in,
or use it to locate you. Cloudflare Tunnel creates a private outbound connection — your
IP stays hidden. It's also free and supports automatic failover.
**Why active-active instead of active-passive?**
Active-passive requires detecting failure and switching. Active-active — same token, two connectors — Cloudflare handles routing automatically. Simpler and zero RPO.
Active-passive requires detecting failure and switching over, which takes time. Active-active
is simpler — both machines are always handling traffic, so Cloudflare just stops sending
to the broken one automatically.
**Why Authentik over Keycloak or Authelia?**
Authentik is easier to self-host (Docker Compose, sensible defaults), has a good UI, and supports LDAP + OIDC + SAML. Authelia lacks SAML. Keycloak is heavier and more complex.
**Why Authentik for login instead of passwords per app?**
If every app has its own password, you manage dozens of credentials and each app stores
its own user database. Authentik is one place — one login to change, one place to block
a user. Every app just asks Authentik "is this person who they say they are?"
**Why BookStack over Notion/Confluence?**
Self-hosted, no external API calls, Markdown-first, OIDC SSO. Data stays in-house.
**Why Forgejo instead of just GitHub?**
GitHub can disappear, change pricing, or expose your private repos. Forgejo is
self-hosted — runs on monk, uses almost no RAM, and keeps everything in-house.
**Why Forgejo over GitLab?**
Forgejo is lightweight (~200MB RAM vs GitLab's 4GB+). Full git server with CI runners, issues, PRs. GitLab is overkill for a homelab.
**Why BookStack instead of Notion?**
Notion is a third-party service that can change pricing or lose your data. BookStack is
self-hosted — the data is on your machine, and you own it completely.

View file

@ -1,52 +1,53 @@
# KiteStacks Build Guide
This guide walks you through rebuilding the entire KiteStacks homelab from scratch on a blank machine. Two paths are available — choose the one that fits how you work.
This guide walks you through rebuilding the entire KiteStacks homelab from scratch
on a blank machine. Two paths are available — choose the one that fits how you work.
---
## Choose Your Path
### Path A — With AI (Claude Code)
You provide the high-level goals, Claude Code writes the configs, debugs the errors, and explains every decision. Fastest path. Best for learning while doing.
Tell Claude Code what you want to build. Claude writes the configs, debugs errors,
and explains every decision as it goes. Fastest path. Great for learning while doing.
→ [Build with AI](./with-ai/README.md)
### Path B — Manual (No AI)
Step-by-step instructions you follow yourself. Every command, every config, every file. Best for deep understanding and exam prep (answering "how does this work" in interviews).
### Path B — Do It Yourself
Step-by-step instructions where you type every command yourself. Every config, every
file, explained. Best for really understanding how things work — great for exam prep.
→ [Build Manually](./without-ai/README.md)
---
## Prerequisites (Both Paths)
## What You Need Before Starting (Both Paths)
Before starting either path, have the following ready:
| Requirement | Details |
|-------------|---------|
| A Linux machine | Ubuntu 24.04+ or CachyOS/Arch recommended. At least 16GB RAM, 500GB SSD |
| A Cloudflare account | Free tier is fine. You need a domain pointed to Cloudflare |
| What you need | Details |
|---------------|---------|
| A Linux computer | Ubuntu 24.04 recommended. At least 16GB RAM, 500GB SSD |
| A Cloudflare account | Free tier. You need a domain name pointed to Cloudflare |
| A domain name | Any registrar works — point nameservers to Cloudflare |
| A Hetzner account (optional) | For the cloud replica (kscloud1). CAX11 or CX22 works |
| A Tailscale account | Free tier — needed for the private overlay network |
| Docker + Docker Compose | Install before starting either path |
| A Hetzner account (optional) | For the cloud backup machine (kscloud1). Any small VPS works |
| A Tailscale account | Free — creates the private network between machines |
| Docker installed | The foundation everything runs on |
---
## High-Level Build Order
## Build Order (Both Paths Follow This)
Regardless of which path you take, build in this order:
Build in this order — each step depends on the one before it:
```
1. Docker + networking foundation
2. Cloudflare Tunnel (cloudflared)
3. Authentik (SSO identity provider)
4. Core services (Portainer, Forgejo, BookStack)
5. Monitoring (Prometheus, Node Exporter, Grafana)
6. Application services (Karakeep, Kavita, OSTicket)
7. AI services (Open WebUI, LiteLLM)
8. Portal (homepage + metrics API)
9. kscloud1 cloud replica
Step 1: Install Docker and set up networking
Step 2: Set up Cloudflare Tunnel (the secret post-office connection)
Step 3: Set up Authentik (the single login system)
Step 4: Set up core services (Portainer, Forgejo, BookStack)
Step 5: Set up monitoring (Prometheus, Node Exporter, Grafana)
Step 6: Set up app services (Karakeep, Kavita, OSTicket)
Step 7: Set up AI services (Open WebUI, LiteLLM)
Step 8: Set up the portal (main homepage)
Step 9: Add the cloud backup machine (kscloud1)
```
Each layer depends on the one before it. Don't skip ahead.
Don't skip ahead — if you skip Authentik, none of the SSO logins will work.