docs: comprehensive homelab-mastery rewrite with full build guides

Complete documentation suite for KiteStacks covering all 11 services across
2-host active-active architecture. Includes beginner track (with AI, 8 files)
and advanced track (without AI, 7 files) with time estimates, real troubleshooting
cases, and command-by-command explanations. Updates certifications roadmap to
reflect July 7 2026 A+ Core 2 exam goal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
kenpat 2026-06-19 01:08:43 -05:00
parent e3cfa80d98
commit 1e8319ee75
24 changed files with 5243 additions and 298 deletions

View file

@ -1,48 +1,109 @@
# Homelab Mastery — KiteStacks Learning Guide
# KiteStacks Homelab — Master Guide
**Owner:** kenpat
**Purpose:** Everything needed to understand, explain, rebuild, and build a career around the KiteStacks homelab project.
**Domain:** kitestacks.com
**Status:** Live and running
**Last Updated:** 2026-06-19
---
## Your Current Status
## What Is KiteStacks?
| Milestone | Status |
|-----------|--------|
| CompTIA A+ Core 1 | ✅ Passed — highest score in class (22 people) |
| CompTIA A+ Core 2 | 🔄 In progress |
| CCNA | 📅 Next |
| Cloud / AI certs | 📅 After CCNA |
KiteStacks is a self-hosted homelab — a real, production web platform running on two computers
that serves eleven public websites to the internet, 24 hours a day, even when the home machine
is off.
It is not a tutorial project. It is not a demo. It runs at a real domain, with real users,
real uptime monitoring, and real failover. Every service is protected by single sign-on (SSO),
meaning one account unlocks everything. All traffic goes through Cloudflare's global network —
no ports are open on the home router, and the home IP address is never exposed.
### The One-Paragraph Summary
> *KiteStacks is a self-hosted homelab running eleven public-facing services behind Cloudflare
> Tunnel with no open ports on the home router. All logins are handled by Authentik — a
> self-hosted identity provider using OIDC/OAuth2, so one account unlocks every service.
> A Hetzner cloud VPS (kscloud1) acts as a permanent cloud replica: if the home machine (monk)
> goes offline, kscloud1 keeps everything running with zero downtime. Both hosts share a single
> Postgres and Redis database over a private Tailscale VPN, so SSO logins always work regardless
> of which server answers. Monitoring runs via Prometheus, Grafana, Uptime Kuma, and a desktop
> Conky widget that shows live kscloud1 service health at a glance.*
---
## What This Repo Is
## The Two Computers
You built a production homelab — a real multi-host, highly available web platform with SSO, monitoring, cloud failover, and AI services. Most people learning DevOps do tutorials with fake projects. You have a real one running at a real domain.
| Name | What It Is | Role |
|------|-----------|------|
| **monk** | Home PC (ThinkPad T14s) | Development machine. Code and configs are built here, then pushed to kscloud1. |
| **kscloud1** | Hetzner VPS in Germany | Always-live production server. Receives what monk pushes. Stays up even if monk is off. |
This repo exists so you can:
1. **Understand** what everything does at the conceptual level
2. **Explain it** confidently to a hiring manager, recruiter, or LinkedIn connection
3. **Rebuild it** from scratch on a new machine if you ever need to
4. **Map it** to real certifications and career paths
A third machine — the **Samurai desktop** — will eventually join as a second home connector,
adding more redundancy when it is running.
---
## The Eleven Public Services
| Service | URL | What It Does |
|---------|-----|-------------|
| **Portal** | www.kitestacks.com | The homepage — links to everything, live system stats |
| **Authentik** | auth.kitestacks.com | SSO login provider — one account for all services |
| **Forgejo** | gitforge.kitestacks.com | Self-hosted Git — stores all code and documentation |
| **Open WebUI** | ai.kitestacks.com | AI chat interface (ChatGPT-style, self-hosted) |
| **Karakeep** | links.kitestacks.com | Bookmark and read-it-later manager |
| **Kavita** | kavita.kitestacks.com | eBook and manga library |
| **Grafana** | grafana.kitestacks.com | Monitoring dashboards — CPU, RAM, network |
| **Uptime Kuma** | status.kitestacks.com | Service uptime status page |
| **BookStack** | wiki.kitestacks.com | Self-hosted wiki and documentation platform |
| **OSTicket** | tasks.kitestacks.com | Help desk and ticket tracking system |
| **Portainer** | portainer.kitestacks.com | Docker container management dashboard |
---
## Navigation
| Section | What's Inside |
|---------|--------------|
| [certifications/](certifications/roadmap.md) | Full cert roadmap for cloud engineering, what each cert proves, study order |
| [architecture/](architecture/overview.md) | How the entire system works, why it was built this way |
| [concepts/](concepts/) | Deep dives on every technology: Docker, networking, OAuth2, Tailscale, etc. |
| [build-guide/](build-guide/README.md) | Step-by-step rebuild from a blank machine, with explanations of every decision |
| [interview-prep/](interview-prep/explain-the-project.md) | Exactly what to say to hiring managers, common questions + model answers |
| [learning-path/](learning-path/README.md) | Structured study plan, free resources, what to learn in what order |
| Section | What Is Inside |
|---------|---------------|
| [architecture/overview.md](architecture/overview.md) | How the whole system is wired together — diagrams, traffic flow |
| [architecture/services.md](architecture/services.md) | Every service: container name, port, volume, command reference |
| [architecture/decisions.md](architecture/decisions.md) | Why each technology was chosen over the alternatives |
| [build-guide/README.md](build-guide/README.md) | How to build this from scratch — choose beginner (AI) or advanced |
| [concepts/docker.md](concepts/docker.md) | What Docker actually is and how containers work |
| [concepts/networking.md](concepts/networking.md) | DNS, ports, TLS, Tailscale, Cloudflare Tunnel, firewalls |
| [concepts/oauth2-oidc.md](concepts/oauth2-oidc.md) | How SSO works — OAuth2, OIDC, JWTs explained simply |
| [concepts/linux.md](concepts/linux.md) | Linux commands, file ownership, sudo, SSH tunnels |
| [certifications/roadmap.md](certifications/roadmap.md) | Cert path from A+ to CKA — what to study and in what order |
| [interview-prep/explain-the-project.md](interview-prep/explain-the-project.md) | What to say to hiring managers — model answers |
| [learning-path/README.md](learning-path/README.md) | Structured study plan, free resources, daily habits |
---
## The One-Paragraph Project Summary
## Where to Start
> *KiteStacks is a self-hosted homelab running nine public-facing services behind Cloudflare Tunnel, with full SSO via Authentik (OIDC/OAuth2), active-active cloud failover on a Hetzner VPS, private networking over Tailscale, and real-time monitoring via Prometheus and Grafana. The platform serves a public domain (kitestacks.com) and stays online even when the primary home machine is off — all running on commodity hardware with no open ports on the home router.*
**If you want to understand what you built:**
→ [architecture/overview.md](architecture/overview.md)
That is what you built. Now learn to own every word of it.
**If you want to rebuild it from scratch:**
→ [build-guide/README.md](build-guide/README.md) — pick your track
**If you have an interview coming up:**
→ [interview-prep/explain-the-project.md](interview-prep/explain-the-project.md)
**If you want to understand the tech behind it:**
→ Pick a topic in [concepts/](concepts/)
**If you want to know what certifications to study next:**
→ [certifications/roadmap.md](certifications/roadmap.md)
---
## Certification Progress
| Cert | Status |
|------|--------|
| CompTIA A+ Core 1 | ✅ Passed — highest score in class (22 people) |
| CompTIA A+ Core 2 | 🔄 In progress — exam goal July 7, 2026 |
| CCNA | 📅 Next after A+ Core 2 |
| AWS Solutions Architect Associate | 📅 After CCNA |
| CKA (Kubernetes) | 📅 After AWS certs |

View file

@ -1,12 +1,16 @@
# Architecture Decisions — The Why Behind Every Choice
For every technology choice, there was a reason. Understanding the "why" is what separates someone who copied commands from someone who designed a system.
For every technology choice, there was a reason. Understanding the "why" is what separates
someone who copied commands from someone who designed a system.
**Last Updated:** 2026-06-19
---
## Why Docker Instead of Running Services Directly?
**Problem:** Running 15+ services directly on a Linux host creates dependency hell — different Python versions, conflicting library versions, services affecting each other.
**Problem:** Running 15+ services directly on a Linux host creates dependency conflicts —
different Python versions, conflicting library versions, services that break each other on updates.
**Options considered:**
- Bare metal: install each app directly on the OS
@ -16,13 +20,15 @@ For every technology choice, there was a reason. Understanding the "why" is what
**Decision:** Docker
**Why:**
- Each container has its own filesystem, dependencies, and runtime — they can't conflict
- Starting/stopping/updating one service doesn't affect others
- The `docker-compose.yml` file IS the documentation — it shows exactly what the service needs to run
- Each container has its own filesystem and runtime — they can't conflict
- Starting, stopping, or updating one service doesn't affect others
- The `docker-compose.yml` file IS the documentation — it shows exactly what the service needs
- Portability: move the same compose file to a new machine and it works identically
- Isolation: if Karakeep gets compromised, it can't easily touch Forgejo's data
- `restart: unless-stopped` means containers self-heal after a crash or host reboot
**What you'd say to a hiring manager:** *"I containerized every service using Docker and Docker Compose so each has isolated dependencies and the entire deployment is reproducible from a single YAML file."*
**What to say in an interview:**
> *"I containerized every service using Docker Compose so each has isolated dependencies
> and the entire deployment is reproducible from a single YAML file."*
---
@ -30,170 +36,247 @@ For every technology choice, there was a reason. Understanding the "why" is what
**Problem:** How do you make home services accessible from the internet?
**Traditional approach:** Open port 80 and 443 on the home router, configure NAT, point DNS to home IP.
**Traditional approach:** Open ports 80 and 443 on the home router, configure NAT,
point DNS to your home IP address.
**Problems with that:**
- Exposes your home IP address publicly (DDoS risk, can be found, ISP tracks it)
- Dynamic home IP means DNS breaks every time IP changes
- Some ISPs block residential port 80/443
- Router configuration is error-prone and varies by hardware
- Your home IP is public (DDoS risk, can be scanned and targeted)
- Dynamic home IP means DNS breaks every time the ISP changes it
- Some ISPs block residential ports 80 and 443
- Router configuration is fragile and varies by hardware
**Decision:** Cloudflare Tunnel (cloudflared)
**Why:**
- cloudflared makes an OUTBOUND connection to Cloudflare — no inbound ports needed
- Home IP never exposed
- Works regardless of ISP restrictions
- Cloudflare handles TLS/HTTPS — you don't manage SSL certificates
- cloudflared makes an outbound connection to Cloudflare — no inbound ports needed at all
- Home IP is never exposed to the public internet
- Works on any ISP, any network, any firewall
- Cloudflare handles TLS certificates automatically (no Let's Encrypt setup)
- Free tier covers everything needed
- Bonus: built-in DDoS protection
- Built-in DDoS protection at Cloudflare's edge
**The trade-off:** You depend on Cloudflare. If Cloudflare has an outage, your site goes down even if your hardware is fine. This is acceptable — Cloudflare's uptime is better than most home internet connections.
**The tradeoff:** You depend on Cloudflare. If Cloudflare has an outage, your site goes down
even if your hardware is fine. Acceptable — Cloudflare's uptime exceeds most home ISPs.
---
## Why Authentik for SSO Instead of Separate Logins Per App?
## Why Authentik for SSO?
**Problem:** 9 services means 9 different usernames and passwords to manage. Adding a user requires going into 9 admin panels. Removing access means 9 places to deactivate.
**Problem:** Eleven services means eleven separate usernames and passwords. Adding a user
means eleven admin panels. Removing access means eleven places to deactivate.
**Options:**
- Separate logins per service (no SSO)
- Authelia (simpler, forward-auth proxy only)
- Authentik (full OIDC provider, more complex)
- Keycloak (enterprise-grade, very heavy)
- No SSO — separate logins per service
- Authelia — simpler, forward-auth proxy only
- Authentik — full OIDC provider, more complex to set up
- Keycloak — enterprise-grade, very heavy on RAM
**Decision:** Authentik
**Why:**
- One account controls access to everything
- Apps that support native OIDC (Grafana, Kavita, Open WebUI, Karakeep) get real SSO — the user is authenticated inside the app
- Can restrict which groups can access which applications (Portainer restricted to homelab-admin group)
- Self-hosted — user data stays on your infrastructure
- Authentik supports both native OIDC (for apps that support it) and proxy provider (for apps that don't)
- Apps that support native OIDC (Grafana, Kavita, Karakeep, Open WebUI, Portainer, BookStack,
Forgejo) get real SSO — user is authenticated inside the app with a JWT, not just at a proxy
- Access policies per application (Portainer restricted to `homelab-admin` group only)
- Self-hosted — user data never leaves your infrastructure
**The trade-off:** Authentik is complex to set up and has a significant memory footprint. Authelia would be simpler. But Authelia only does forward-auth proxy — it can't give an app a real JWT. Authentik does both.
**Why not Authelia:** Authelia only does forward-auth proxy. It blocks the login page until
authenticated, but the app itself never receives user identity. Authentik sends a real JWT
with user email and name — apps can create user accounts automatically on first login.
---
## Why a Shared Postgres Instead of Separate Authentik Databases?
**Problem:** After setting up active-active failover, users kept getting `invalid_grant` errors when signing in through SSO.
**Problem:** After deploying two Cloudflare Tunnel connectors, users got `invalid_grant`
errors when signing in through SSO — roughly 50% of the time.
**Root cause:** OAuth2 authorization codes are rows in a database. The flow is:
1. `/authorize` → code stored in Database A (monk's Authentik)
2. `/token` → looks for code in Database B (kscloud1's Authentik)
3. Code not found → `invalid_grant`
**Root cause:** OAuth2 authorization codes are short-lived rows in a database.
Cloudflare Tunnel load-balances between monk and kscloud1 for every HTTP request. Steps 1 and 2 of the OAuth flow can hit different hosts.
```
Step 1: /authorize → creates code → stored in monk's Authentik DB
Step 2: /token → looks for code → hits kscloud1's Authentik DB → NOT FOUND
```
Cloudflare load-balances every HTTP request independently. Steps 1 and 2 of the OAuth2
flow can hit completely different hosts. The code exists in one database but not the other.
**Options:**
- Sync databases continuously (complex, slow, conflict-prone)
- Sync both databases continuously (complex, slow, conflict-prone)
- Use sticky sessions (Cloudflare paid feature)
- Share one database (simple, reliable)
- Share one database between both Authentik instances
**Decision:** Shared Postgres on kscloud1, accessible only over Tailscale
**Decision:** Single shared Postgres + Redis hosted on kscloud1, accessible only over Tailscale
**Why:**
- Both monk and kscloud1 Authentik read/write the same database — authorization codes always found
- Tailscale binding means the database is never exposed to the public internet (security)
- Simple: one line change in each `docker-compose.yml` to point to a different host
- Cost: free (already paying for kscloud1)
- Both connectors' Authentik instances read and write the same database
- Authorization codes are always found regardless of which host handles which request
- Database is bound to kscloud1's Tailscale IP — never reachable from the public internet
- Simple configuration change: one environment variable pointing to the shared host
**The trade-off:** If kscloud1 goes down and Tailscale connectivity breaks, monk's Authentik can't start. Rollback procedure: restore monk's compose to use a local Postgres.
**The tradeoff:** If kscloud1 and Tailscale both go down, monk's Authentik can't connect
to the database and fails to start. Rollback: restore local Postgres in monk's compose file.
---
## Why Tailscale Instead of WireGuard or OpenVPN?
**Problem:** Need private networking between monk (home) and kscloud1 (Hetzner cloud) without exposing the Authentik database to the public internet.
**Problem:** Need private networking between monk (home) and kscloud1 (Hetzner cloud).
The shared Authentik database must not be exposed to the public internet.
**Options:**
- WireGuard: manual key exchange, manual routing, technical to configure
- OpenVPN: even more complex, slower
- WireGuard: manual key exchange, manual routing, hard to configure through NAT
- OpenVPN: complex, slower, more overhead
- Tailscale: managed WireGuard, automatic key exchange, works behind NAT
**Decision:** Tailscale
**Why:**
- Works instantly — install, authenticate, done
- Handles NAT traversal automatically (monk is behind home router NAT)
- Devices get stable 100.x.x.x IPs regardless of actual network location
- Works in minutes: install, authenticate, done
- Handles NAT traversal automatically — monk is behind home router NAT
- Every device gets a stable `100.x.x.x` IP regardless of location
- Free for up to 100 devices
- Uses WireGuard under the hood — same encryption, much easier configuration
- WireGuard underneath — same encryption, much easier operation
**The trade-off:** Tailscale is a managed service — you trust Tailscale's coordination servers. The actual data is encrypted peer-to-peer (Tailscale can't see it), but they control device authentication. Self-hosted alternative: Headscale.
**The tradeoff:** You trust Tailscale's coordination servers to manage device authentication.
Actual data is encrypted peer-to-peer (Tailscale never sees it), but they control who can
join your network. Self-hosted alternative if needed: Headscale.
---
## Why Active-Active Instead of Active-Passive Failover?
## Why Active-Active Failover Instead of Active-Passive?
**The context:** The user travels. When away from home, monk might be inaccessible (home network down, ISP outage, power). kscloud1 should keep the site running.
**The situation:** The user travels. When away from home, monk may be unreachable.
kscloud1 must keep the site running.
**Active-Passive:** kscloud1 only starts serving if monk is detected as down. Cloudflare would need health checks and failover rules.
**Active-Passive:** kscloud1 only starts serving if Cloudflare detects monk as down.
Requires health checks, failover rules, and a delay before traffic switches.
**Active-Active:** Both monk and kscloud1 are always in the Cloudflare Tunnel rotation. Every request might hit either host.
**Active-Active:** Both monk and kscloud1 are always in the Cloudflare Tunnel rotation.
Every request may hit either host at any time.
**Decision:** Active-Active
**Why:**
- Simpler: no health checks to configure, no failover logic
- Instant: if monk goes down, kscloud1 is already handling 50% of traffic
- Free: Cloudflare Tunnel active-active is free; health-check-based failover requires paid plans
- No failover logic needed — both are always live
- Instant: if monk goes down, kscloud1 is already handling traffic
- Free: Cloudflare Tunnel active-active is included; health-check-based failover is paid
**The trade-off:** Stateful apps (Forgejo, OpenProject, Kavita) have separate databases on each host. A user might see different data depending on which host answers. This was explicitly accepted: the point is uptime, not data consistency across hosts.
**The tradeoff:** Stateful apps with separate databases (Kavita, Karakeep) may show
different data depending on which host answers. Explicitly accepted — the priority is
uptime, not data consistency across hosts. Forgejo and Authentik share databases so
they are consistent.
---
## Why nginx for the Portal Instead of a Pre-Built Dashboard?
## Why a Custom Portal Instead of a Pre-Built Dashboard?
**Options:**
- gethomepage (what was used before) — nice but limited customization
- Homepage (gethomepage) — nice but limited customization
- Heimdall — similar limitations
- Custom static site + nginx — full control
- Custom static HTML/CSS/JS + nginx — full control, full ownership
**Decision:** Custom static HTML/CSS/JS + nginx
**Decision:** Custom static site
**Why:**
- Complete visual control — the cyberpunk theme, the layout, every pixel
- Static files served by nginx are extremely fast and reliable
- Can proxy the metrics API for real-time stats without CORS issues
- No framework dependencies — no Node.js, no build step, just files
- Complete visual control — the cyberpunk theme, layout, every card, every color
- Static files + nginx are extremely fast and reliable (no Node.js, no build step)
- nginx proxies the `/api/*` endpoints to the metrics API without CORS issues
- No dependency on external frameworks that can change or break
**The trade-off:** More work to build and maintain than a pre-built dashboard. But you now understand every line of it.
**The tradeoff:** More work to build and maintain. But you understand every line of it,
and you can explain exactly why every piece is there.
---
## Why Python + FastAPI for the Metrics API?
**Problem:** The portal needs real-time system stats (CPU, RAM, network), weather, and Forgejo activity. These can't come from static HTML files.
**Problem:** The portal needs live system stats (CPU, RAM, network), weather, and
Forgejo git activity. Static HTML can't provide these.
**Options:**
- Shell scripts + cron → write stats to a JSON file the frontend reads
- Node.js + Express
- Python + FastAPI
**Decision:** Python FastAPI
**Decision:** Python FastAPI with `psutil`
**Why:**
- Python's `psutil` library reads system metrics with one line of code
- FastAPI is modern, fast, and automatically documents the API
- `psutil` reads host system metrics in one line of Python
- FastAPI auto-generates API documentation and handles async requests well
- Python is readable — easy to understand and modify
- `async/await` means the API doesn't block while waiting for weather API responses
- Python is readable — you can understand and modify the code
**The special requirement:** The container needs `network_mode: host` and `pid: host`. Without these:
- `network_mode: host`: the container can see the host's network interfaces and report real network throughput (not container-level)
- `pid: host`: psutil can read the host's `/proc` filesystem, showing actual system stats instead of container stats
**Special requirements:**
- `network_mode: host` — container shares host network namespace so psutil sees real
network interfaces, not the container's virtual interface
- `pid: host` — container can read the host's `/proc` filesystem for accurate process stats
Without these flags, the API would report container-level stats instead of actual laptop stats.
---
## Why the Forgejo Repo for Documentation?
## Why Forgejo Instead of GitHub or GitLab?
You could keep documentation in Notion, Google Docs, or a wiki.
**Problem:** Need to store all homelab code, configs, and documentation in version control.
**Why Forgejo:**
- It's self-hosted — you own the data
- Git tracks every change with a timestamp and message
- The documentation lives alongside the configs it describes
- Hiring managers can see the commit history and read your documentation directly
**Options:**
- GitHub: free, reliable, but your configs and docs are on someone else's server
- GitLab: self-hostable but heavy (4GB+ RAM for full install)
- Forgejo: lightweight GitHub-like self-hosted Git, fork of Gitea
**What this shows to a hiring manager:** You treat documentation like code — version-controlled, structured, maintained.
**Decision:** Forgejo
**Why:**
- Self-hosted — configs and documentation stay on your infrastructure
- Very lightweight — uses less than 100MB RAM
- GitHub-compatible API — tools that work with GitHub also work with Forgejo
- Full UI with code review, issues, CI/CD (Forgejo Actions)
- Shows commit history and documentation to anyone you give access to
**The tradeoff:** You maintain it yourself. If Forgejo goes down, git operations fail.
Mitigated by kscloud1 running a replica and the shared Postgres.
---
## Why OSTicket for the Help Desk?
**What it replaced:** OpenProject (project management tool on tasks.kitestacks.com)
**Why OpenProject was removed:**
- OpenProject CE (Community Edition) requires an Enterprise Edition license for SSO
- The SSO button simply does not appear in CE — it is a hard paywall with no workaround
- OpenProject is also resource-heavy for what it provides
**Why OSTicket:**
- Lightweight and runs well on the existing stack
- Email integration works (SMTP via Gmail app password — confirmed working)
- Handles the ticket/task tracking use case without the licensing barrier
---
## Why BookStack for the Wiki?
**Problem:** Need a place for long-form documentation that's more structured than markdown files.
**Decision:** BookStack
**Why:**
- Clean, organized UI: Shelves → Books → Chapters → Pages hierarchy
- WYSIWYG editor — easy to write docs without markdown syntax
- Authentik OIDC SSO works natively
- API available — docs can be pushed programmatically from scripts or CI
**Key gotcha:** Cache directory must be writable by the container user.
`chown -R abc:users /config/www/framework/cache/` is required after first install.
---
## Why the Forgejo Shared Postgres?
**Problem:** With two connectors in active-active, Forgejo on monk and kscloud1 had
separate SQLite databases. Repos created on one weren't visible on the other.
**Fix:** Migrated both Forgejo instances to a single shared PostgreSQL database on kscloud1
(same shared server as Authentik's Postgres). Both connectors now serve identical Forgejo data.
**How it was done:**
- `forgejo dump --database postgres` — exported clean SQL from monk's Forgejo
- Dropped the pgloader schema (had wrong structure), reloaded the clean SQL
- Both compose files point to `authentik-postgres:5432` database `forgejo`, user `forgejo`
- kscloud1's Forgejo joined the `authentik_default` Docker network to reach authentik-postgres

View file

@ -1,138 +1,169 @@
# KiteStacks Architecture — Full System Overview
**Last Updated:** 2026-06-19
---
## The Big Picture
```
INTERNET
┌──────▼──────┐
│ Cloudflare │ DNS + TLS termination
│ (edge) │ Zero Trust Tunnel
└──────┬──────┘
│ HTTPS (443) only
┌────────────────┼────────────────┐
│ connector 1 │ connector 2 │ connector 3
│ │ │
┌──────▼──────┐ │ ┌──────▼──────┐
│ MONK │ │ │ KSCLOUD1 │
│ (home PC) │ │ │ (Hetzner VPS│
│ │ Active │ │ 5.78.x.x) │
│ All 9 │ Active │ │ │
│ services │ │ │ All 9 │
│ │ │ │ services │
└──────┬──────┘ │ └──────┬──────┘
│ │ │
└────────────────┼───────────────┘
TAILSCALE VPN
(100.x.x.x range)
┌────────▼────────┐
│ SHARED DB LAYER │
│ on kscloud1 │
│ Postgres :5432 │
│ Redis :6379 │
│ (Tailscale │
│ only, private)│
└─────────────────┘
INTERNET
┌──────▼──────┐
│ Cloudflare │ DNS + TLS termination
│ (edge) │ Tunnel routing
└──────┬──────┘
│ HTTPS only — home IP never exposed
┌──────────────┴──────────────┐
│ connector 1 │ connector 2
│ │
┌──────▼──────┐ ┌──────▼──────┐
│ MONK │ │ KSCLOUD1 │
│ (ThinkPad │ │ (Hetzner VPS│
│ T14s, home)│ │ Germany) │
│ │ │ │
│ Development │ │ ALWAYS LIVE │
│ Pushes to → │ │ Receives ← │
│ kscloud1 │ │ from monk │
└──────┬──────┘ └──────┬──────┘
│ │
└─────────── TAILSCALE ───────┘
(100.x.x.x range)
Encrypted peer-to-peer
┌────────────▼────────────┐
│ SHARED DATABASE LAYER │
│ hosted on kscloud1 │
│ │
│ PostgreSQL :5432 │
│ Redis :6379 │
│ │
│ Bound to Tailscale IP │
│ only — not public │
└─────────────────────────┘
```
**The key idea:** Cloudflare holds two persistent outbound connections — one from monk,
one from kscloud1. Every request to kitestacks.com arrives at Cloudflare, which routes
it to whichever connector responds. If monk goes offline, kscloud1 handles everything.
Your home IP is never involved.
---
## How Work Flows Between the Two Hosts
```
monk (dev) ──push──► kscloud1 (prod, always live)
```
- **monk** is where changes are made: editing config files, testing new services, writing code
- **kscloud1** receives those changes and is always serving live traffic
- If monk is off, kscloud1 continues serving the last pushed state — users see no downtime
- A third machine (Samurai desktop) is planned as a future second home connector
---
## The Eleven Public Services
| Service | Container | URL | What It Does |
|---------|-----------|-----|-------------|
| Portal | `homepage` | www.kitestacks.com | Custom homepage — links, live stats, cyberpunk theme |
| Authentik | `authentik` | auth.kitestacks.com | SSO identity provider — handles all logins |
| Forgejo | `forgejo` | gitforge.kitestacks.com | Self-hosted Git (like GitHub) |
| Open WebUI | `kite-openwebui` | ai.kitestacks.com | AI chat interface |
| Karakeep | `karakeep` | links.kitestacks.com | Bookmark and read-it-later manager |
| Kavita | `kavita` | kavita.kitestacks.com | eBook and manga reader |
| Grafana | `grafana` | grafana.kitestacks.com | Monitoring dashboards |
| Uptime Kuma | `uptime-kuma` | status.kitestacks.com | Public status page and uptime monitoring |
| BookStack | `bookstack` | wiki.kitestacks.com | Self-hosted wiki / docs platform |
| OSTicket | `osticket-app` | tasks.kitestacks.com | Help desk ticketing system |
| Portainer | `portainer` | portainer.kitestacks.com | Docker management dashboard |
## The Infrastructure Services (Internal Only)
| Container | What It Does |
|-----------|-------------|
| `cloudflared` | Cloudflare Tunnel connector — outbound connection to Cloudflare edge |
| `prometheus` | Metrics collector — scrapes node-exporter every 15 seconds |
| `node-exporter` | Exposes host CPU/RAM/disk/network metrics for Prometheus |
| `blackbox-exporter` | HTTP probe monitor — checks endpoints are returning 200 |
| `kite-litellm` | LLM proxy — routes AI requests to OpenRouter (many free models) |
| `kitestacks-metrics-api` | Python FastAPI — serves live stats and Forgejo activity to portal |
| `ntfy` | Push notification server — sends alerts to phone |
| `flux` | GitOps controller — watches Forgejo, deploys changes automatically |
| `authentik-worker` | Background job processor for Authentik |
| `authentik-ldap` | LDAP proxy layer for Authentik |
---
## How Traffic Flows — Step by Step
### Someone visits www.kitestacks.com
```
1. Browser → DNS lookup "www.kitestacks.com"
2. DNS returns Cloudflare's anycast IP (not your home IP)
3. Browser → HTTPS request to Cloudflare edge
4. Cloudflare reads Host header: "www.kitestacks.com"
5. Cloudflare routes request through active tunnel connector
(monk or kscloud1 — whichever responds first)
6. cloudflared resolves "homepage" via Docker DNS
7. Request hits nginx in the homepage container
8. nginx serves static HTML/CSS/JS from ./public/
9. Browser JavaScript calls /api/metrics and /api/activity
10. nginx proxies those to kitestacks-metrics-api (Python, host network)
11. metrics-api reads CPU/RAM via psutil (sees real host, not container)
12. metrics-api calls Forgejo API for recent commits
13. Browser renders complete page with live stats
```
### Someone clicks "Sign In with Authentik"
```
1. App (e.g. Grafana) redirects browser to:
https://auth.kitestacks.com/application/o/authorize/
?client_id=grafana&redirect_uri=...&response_type=code
2. Cloudflare routes this to a cloudflared connector
3. Authentik shows login page
4. User enters username + password
5. Authentik validates against shared Postgres (on kscloud1, over Tailscale)
6. Authentik creates an authorization code (row in DB) and redirects:
https://grafana.kitestacks.com/login/generic_oauth?code=abc123
7. Grafana backend POSTs to auth.kitestacks.com/application/o/token/
with code=abc123 and client_secret
8. THIS REQUEST may hit a DIFFERENT connector than step 2 did
→ This is why the shared DB matters: the code must exist in one DB,
not two separate ones that might be out of sync
9. Authentik finds code=abc123 in shared Postgres, validates it
10. Authentik returns JWT (access_token + id_token)
11. Grafana reads user's email from JWT, creates/updates local user
12. User is logged in — never re-enters password for other SSO apps
```
---
## Every Service and What It Does
## The Shared Database — Why It Exists
### The Nine Public Services
After deploying two connectors (monk + kscloud1), users got `invalid_grant` errors when
signing in. The cause: each host had its own separate Authentik database. The OAuth2 flow
makes two separate HTTP requests:
| Service | Container Name | What It Does | Why It's Here |
|---------|---------------|--------------|---------------|
| **Portal** | `homepage` | The public website (kitestacks.com) — custom nginx serving static HTML/CSS/JS with a cyberpunk theme | Front door to everything. Shows system stats, recent activity, links to all services |
| **Authentik** | `authentik` | Identity provider — handles all logins via OIDC/OAuth2 SSO | Single place to manage all user accounts and access control |
| **Forgejo** | `forgejo` | Self-hosted Git platform (like GitHub but yours) | Store all homelab code, config, and documentation |
| **OpenProject** | `openproject` | Project management (like Jira) | Task tracking, project planning |
| **Open WebUI** | `kite-openwebui` | ChatGPT-like AI chat interface | Access multiple AI models through one interface |
| **Karakeep** | `karakeep` | Bookmark and read-it-later manager | Save links, articles, and content |
| **Kavita** | `kavita` | eBook and manga reader | Personal digital library |
| **Grafana** | `grafana` | Monitoring dashboards | Visualize CPU, RAM, network, uptime across both hosts |
| **Uptime Kuma** | `uptime-kuma` | Status page and uptime monitoring | Monitor that all 9 services are up and alert if they go down |
1. `/authorize` → creates authorization code → stored in Database A
2. `/application/o/token/` → looks up authorization code → hits Database B → **not found**
### The Infrastructure Services (Not Public-Facing)
Cloudflare load-balances requests, so steps 1 and 2 can hit different hosts.
| Service | What It Does |
|---------|-------------|
| `cloudflared` | Cloudflare Tunnel connector — creates encrypted outbound tunnel to Cloudflare edge |
| `prometheus` | Metrics collection — scrapes system stats from both monk and kscloud1 every 15 seconds |
| `node-exporter` | Exposes host system metrics (CPU, RAM, disk, network) for Prometheus to scrape |
| `kite-litellm` | LLM proxy gateway — routes AI requests to OpenRouter (multiple free models) |
| `portainer` | Docker management UI — visual interface to manage all containers |
| `kitestacks-metrics-api` | Python FastAPI service — serves real-time system stats, weather, and Forgejo activity to the portal |
**Fix:** Both connectors point to a single shared Postgres+Redis hosted on kscloud1.
It is bound only to kscloud1's Tailscale IP (`100.123.x.x`) — never the public IP.
Only devices on the Tailscale network can connect.
---
## How Traffic Flows
### When Someone Visits www.kitestacks.com
```
1. Browser sends HTTPS request to www.kitestacks.com
2. DNS resolves to Cloudflare's anycast IP (not your home IP)
3. Cloudflare terminates TLS — your home router never sees HTTPS
4. Cloudflare routes the request through the tunnel to whichever
cloudflared connector responds first (monk or kscloud1)
5. cloudflared resolves "homepage" via Docker DNS
6. Request hits the nginx container serving the static portal
7. Portal's JavaScript fetches /api/metrics and /api/activity
from the kitestacks-metrics-api container via nginx proxy
8. Page renders with live system stats and recent git activity
```
### When Someone Clicks "Sign In with Authentik"
```
1. App (e.g., Grafana) redirects browser to auth.kitestacks.com/application/o/authorize/
2. Authentik presents login page
3. User enters credentials — Authentik validates against its database
(stored on kscloud1's Postgres, shared over Tailscale)
4. Authentik generates an authorization code and redirects back to Grafana
5. Grafana's backend calls auth.kitestacks.com/application/o/token/
to exchange the code for an access token
6. Authentik validates the code (found in shared DB) and returns a JWT
7. Grafana reads the user's email/name from the JWT and logs them in
```
**The critical detail:** Steps 1 and 5 can hit different tunnel connectors (monk vs kscloud1). The authorization code from step 4 must exist in whichever database step 5 hits. That's why both connectors point to the SAME Postgres on kscloud1 — otherwise step 5 returns `invalid_grant` because the code isn't found.
---
## The Two Hosts in Detail
### Monk (Primary Home Machine)
- **Role:** Primary production host
- **Network:** Home LAN, no open ports on router (Cloudflare Tunnel handles all inbound)
- **Services:** All 9 public services + all infrastructure services
- **Data:** Each service has its own database/storage
- **Authentik DB:** Points to kscloud1's Postgres over Tailscale (100.x.x.x)
### kscloud1 (Hetzner VPS)
- **Role:** Permanent cloud replica — always on, even when monk is off (travel, power outage, etc.)
- **Network:** Public IP, Cloudflare Tunnel connector 3
- **Services:** Full replica of all 9 public services (separate databases except Authentik)
- **Hosts:** The shared Authentik Postgres + Redis (bound to Tailscale interface only)
- **Resources:** 3 vCPU, 3.7 GB RAM — tight but functional
### What's the Same Across Both
- Same Cloudflare Tunnel token (different connector IDs assigned automatically)
- Same Authentik database (shared via Tailscale)
- Same Authentik secret key (required for JWT validation)
- Same kavita.db (one-time sync — users and OIDC config)
### What's Different Across Both
- Forgejo data (separate repos — accepted inconsistency)
- OpenProject data (separate projects)
- Karakeep bookmarks (separate)
- Kavita book files (monk has them, kscloud1 doesn't — covers synced, books not)
**Forgejo** also uses this shared Postgres (separate database on the same server).
Both monk's and kscloud1's Forgejo read from the same data, so git repos are consistent
regardless of which connector serves the request.
---
@ -141,81 +172,109 @@
Every container joins the `kitestacks` external Docker bridge network:
```bash
# Create once on each host:
docker network create kitestacks
```
This is what makes Cloudflare Tunnel work. The cloudflared container is also on this network, so when Cloudflare tells cloudflared to route `http://grafana:3000`, Docker's internal DNS resolves `grafana` to the grafana container's IP on that network.
All service containers and the cloudflared container join this network. Docker provides
built-in DNS: when cloudflared needs to route to Grafana, it resolves the hostname `grafana`
to that container's IP address on the bridge network.
Without this shared network, cloudflared can't reach the service containers by name.
```
cloudflared → "grafana" → Docker DNS → 172.x.x.x:3000 → grafana container
```
Without this shared network, cloudflared cannot reach services by name.
---
## Why No Open Ports on the Router
## Why No Open Ports on the Home Router
Traditional homelab: open port 80/443 on home router → NAT to home server → expose home IP.
Traditional approach: open port 80 and 443 on the router → NAT to home server → home IP in DNS.
Problems with that:
- Your home IP is public (DDoS risk, targeted attacks)
- Router configuration is fragile
- ISP can change your IP (dynamic IP)
- Some ISPs block port 80/443
Problems:
- Home IP is exposed publicly (DDoS target, ISP tracks it)
- Dynamic home IP breaks DNS when it changes
- Some ISPs block residential port 80/443
- Router misconfiguration = exposed server
Cloudflare Tunnel approach:
- cloudflared container makes an OUTBOUND connection to Cloudflare
- Cloudflare holds that connection open
- Inbound requests come through Cloudflare, over that existing outbound tunnel
- Your home IP is never exposed
- Works on any network, any ISP, any firewall
**Cloudflare Tunnel approach:**
- cloudflared makes one outbound HTTPS connection to Cloudflare edge servers
- Cloudflare holds that connection open permanently
- All inbound traffic arrives over that existing outbound connection
- The home router sees only one outbound HTTPS connection — nothing unusual
- Home IP is never in DNS, never exposed
This is why you can run a public website from a home PC with zero router configuration.
**Result:** A public website running on a home PC with zero router configuration and
no exposed home IP address.
---
## Tailscale — The Private Backbone
Tailscale creates a private overlay network (VPN mesh) across all your devices:
Tailscale creates an encrypted overlay network across all your devices.
Every device gets a stable `100.x.x.x` IP regardless of physical location.
```
monk (100.x.x.x) ←—— encrypted ——→ kscloud1 (100.x.x.x)
monk (100.x.x.x) ←—— encrypted ——→ pixel-6 (100.x.x.x)
monk 100.85.x.x ←── WireGuard ───► 100.123.x.x kscloud1
samurai 100.74.x.x ←── WireGuard ───► 100.123.x.x kscloud1
phone 100.x.x.x ←── WireGuard ───► 100.123.x.x kscloud1
```
Used in this project for:
1. **Shared Authentik DB:** kscloud1's Postgres binds to its Tailscale IP, not its public IP. Only devices on the tailnet can connect. Monk points to that address.
2. **Forgejo activity feed:** On kscloud1, the metrics API fetches recent commits from monk's Forgejo via monk's Tailscale IP — so both portal instances show the same activity feed.
3. **SSH/Admin access:** You can SSH into any device on the tailnet from anywhere.
Used in this homelab for:
1. **Shared Authentik DB:** kscloud1 Postgres and Redis are bound to `100.123.x.x` only.
Monk's Authentik connects to that address. Traffic is encrypted peer-to-peer.
2. **SSH admin access:** SSH to kscloud1 from anywhere using its Tailscale IP.
Even behind a hotel firewall or mobile data — Tailscale routes around it.
3. **Uptime monitoring:** The Conky desktop widget on monk reads Uptime Kuma status
from kscloud1 directly via Tailscale (not through Cloudflare), so it shows the
true kscloud1-side status.
---
## The Monitoring Stack
```
node-exporter (monk) → prometheus (monk) → grafana (monk)
node-exporter (kscloud1) ↗ (scrapes 5.78.x.x:9100)
┌──────────────┐
monk's │ node-exporter│ ← exposes CPU/RAM/disk/network
node-exporter │ port 9100 │
└──────┬───────┘
│ scrape every 15s
┌──────▼───────┐
kscloud1's ───► │ prometheus │ (also scrapes kscloud1:9100 via public IP)
metrics └──────┬───────┘
┌──────▼───────┐
│ grafana │ ← visualize both hosts, switch via instance picker
└──────────────┘
Uptime Kuma → HTTP checks every 60s → all 13 public service URLs
Conky widget → reads Uptime Kuma API on kscloud1 → shows live dot per service
```
Prometheus scrapes metrics every 15 seconds from:
- `node-exporter:9100` — monk's own node-exporter (via Docker DNS)
- `5.78.x.x:9100` — kscloud1's node-exporter (via public IP, port exposed 0.0.0.0)
Grafana visualizes both, letting you switch between hosts in the instance picker.
---
## The Portal Architecture
The portal is NOT gethomepage or any pre-built dashboard. It's a custom-built static site:
The portal is a custom static site — not a pre-built dashboard:
```
nginx (container: "homepage")
├── / → serves static HTML/CSS/JS from ./public/
└── /api/* → proxy_pass to kitestacks-metrics-api:8000 (host)
nginx container ("homepage")
├── / → static HTML/CSS/JS (cyberpunk theme, service cards)
└── /api/* → proxy_pass → kitestacks-metrics-api on host
kitestacks-metrics-api (network_mode: host, pid: host)
├── GET /api/metrics → psutil reads HOST's CPU/RAM/disk/network
├── GET /api/weather → wttr.in API → current weather by IP geolocation
├── GET /api/activity → Forgejo API → recent commits
kitestacks-metrics-api (Python FastAPI, network_mode: host, pid: host)
├── GET /api/metrics → psutil reads HOST CPU/RAM/disk/network
├── GET /api/weather → wttr.in API → current conditions
├── GET /api/activity → Forgejo API → recent commits across all repos
└── GET /api/health → {"ok": true}
```
The metrics API runs with `network_mode: host` and `pid: host` so it reads the HOST machine's process table and `/proc` filesystem — not the container's. Without this, it would report container stats, not laptop stats.
`network_mode: host` — the container shares the host's network namespace.
Without it, psutil would report the container's stats, not the laptop's.
`pid: host` — the container can see the host's process table via `/proc`.
Without it, system stats would be wrong.

View file

@ -0,0 +1,388 @@
# KiteStacks — Complete Service Reference
Every service that runs in KiteStacks: what it does, where it lives, how to manage it,
and what commands to use. This is the day-to-day operations reference.
**Last Updated:** 2026-06-19
---
## Quick Reference — All Containers on monk
```
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
```
| Container | Purpose | Public URL |
|-----------|---------|-----------|
| `homepage` | Portal / main website | www.kitestacks.com |
| `authentik` | SSO identity provider | auth.kitestacks.com |
| `authentik-worker` | Authentik background jobs | — |
| `authentik-ldap` | LDAP interface for Authentik | — |
| `authentik-ldap-proxy` | LDAP proxy | — |
| `forgejo` | Git platform | gitforge.kitestacks.com |
| `kite-openwebui` | AI chat | ai.kitestacks.com |
| `kite-litellm` | LLM proxy gateway | — |
| `karakeep` | Bookmarks | links.kitestacks.com |
| `karakeep-chrome` | Headless browser for Karakeep | — |
| `karakeep-meilisearch` | Search engine for Karakeep | — |
| `kavita` | eBook reader | kavita.kitestacks.com |
| `grafana` | Monitoring dashboards | grafana.kitestacks.com |
| `uptime-kuma` | Status page | status.kitestacks.com |
| `bookstack` | Wiki / docs | wiki.kitestacks.com |
| `bookstack-db` | MariaDB for BookStack | — |
| `osticket-app` | Help desk | tasks.kitestacks.com |
| `osticket-db` | MySQL for OSTicket | — |
| `portainer` | Docker management UI | portainer.kitestacks.com |
| `cloudflared` | Tunnel connector | — |
| `prometheus` | Metrics collector | — |
| `node-exporter` | Host metrics exporter | — |
| `blackbox-exporter` | HTTP probe monitor | — |
| `kitestacks-metrics-api` | System stats API for portal | — |
| `ntfy` | Push notifications | — |
| `flux` | GitOps controller | — |
---
## Service Deep Dives
### homepage — Portal
**What it is:** Custom-built static website served by nginx.
**Directory:** `~/kitestacks-live/docker/kitestacks-portal/`
**Public files:** `./public/index.html` — edit this to change what the portal shows
**Config:** `./nginx.conf` — nginx routing rules
```bash
# Restart portal
cd ~/kitestacks-live/docker/kitestacks-portal
docker compose restart homepage
# Edit the portal
nano public/index.html
# View nginx logs
docker logs homepage -f
```
**Ports:** 3005:3000 (host:container). Cloudflare Tunnel uses container port 3000 directly.
---
### authentik — SSO Identity Provider
**What it is:** Self-hosted OAuth2/OIDC identity provider. Handles all logins for every service.
**Directory:** `~/kitestacks-live/docker/authentik/`
**Database:** Shared PostgreSQL on kscloud1 at `100.123.x.x:5432`, database `authentik`
**Redis:** Shared Redis on kscloud1 at `100.123.x.x:6379`
```bash
cd ~/kitestacks-live/docker/authentik
# Start all Authentik services
docker compose up -d
# Check health (wait for "healthy" before testing SSO)
docker inspect --format '{{.State.Health.Status}}' authentik
docker inspect --format '{{.State.Health.Status}}' authentik-worker
# Run a Django management command (admin tasks, user management)
docker exec authentik ak shell
# View logs
docker logs authentik -f
docker logs authentik-worker -f
```
**SSO apps configured in Authentik:**
- Grafana, Forgejo, Kavita, Karakeep, Open WebUI, Portainer, BookStack
**Key Authentik admin panel:** https://auth.kitestacks.com/if/admin/
**Important:** OAuth2 code TTL is set to 10 minutes (increased from default 1 minute)
to allow monk's Authentik to finish starting up after a reconnect before codes expire.
---
### forgejo — Git Platform
**What it is:** Self-hosted Git. Stores all homelab code, configs, and documentation.
**Directory:** `~/kitestacks-live/docker/forgejo/`
**Database:** Shared PostgreSQL on kscloud1, database `forgejo`, user `forgejo`
**Data volume:** `./data/` (repositories, avatars, attachments)
```bash
cd ~/kitestacks-live/docker/forgejo
# Start
docker compose up -d
# Admin commands
docker exec -u git forgejo forgejo admin user list
docker exec -u git forgejo forgejo admin user create --username newuser --password pass --email e@mail.com --admin
# View logs
docker logs forgejo -f
# API token for automation
# Token: stored in .env — used by kitestacks-metrics-api for activity feed
```
**API base URL:** `https://gitforge.kitestacks.com/api/v1/`
**Local access (via Cloudflare):** gitforge.kitestacks.com
---
### kite-openwebui — AI Chat
**What it is:** Self-hosted ChatGPT-like interface connected to LiteLLM proxy.
**Directory:** `~/kitestacks-live/docker/kite-openwebui/`
**Backend:** `kite-litellm` — routes to OpenRouter (many models, free tier available)
```bash
cd ~/kitestacks-live/docker/kite-openwebui
docker compose up -d
docker logs kite-openwebui -f
docker logs kite-litellm -f
```
**SSO:** Authentik OIDC — "Sign in with Authentik" on login page.
---
### karakeep — Bookmarks
**What it is:** Bookmark manager and read-it-later tool. Saves full page content.
**Directory:** `~/kitestacks-live/docker/karakeep/`
**Depends on:** `karakeep-chrome` (headless Chromium for page capture) + `karakeep-meilisearch` (search)
```bash
cd ~/kitestacks-live/docker/karakeep
docker compose up -d
# SSO callback URL: https://links.kitestacks.com/api/auth/callback/custom
# (NextAuth.js uses "custom" as the provider ID, not "authentik")
```
**SSO:** Authentik OAuth2 — redirect URI must be `/api/auth/callback/custom` (not `/callback/authentik`)
---
### kavita — eBook Reader
**What it is:** eBook, manga, and comic library.
**Directory:** `~/kitestacks-live/docker/kavita/`
**Book files:** `./library/books/` — add books here, then scan library in Kavita UI
**Config/DB:** `./config/kavita.db` (SQLite)
```bash
cd ~/kitestacks-live/docker/kavita
docker compose up -d
docker logs kavita -f
# If you change OIDC settings, use the Kavita UI at kavita.kitestacks.com/settings
# Do NOT edit kavita.db directly for OIDC config — Kavita overwrites it on restart
# Use SSH port-forward to access kscloud1's Kavita directly if needed:
# ssh -L 5099:localhost:5000 kenpat@kscloud1-tailscale-ip
# Then visit http://localhost:5099
```
**SSO:** Authentik OIDC — Authority URL must end with trailing slash:
`https://auth.kitestacks.com/application/o/kavita/`
---
### grafana — Monitoring Dashboards
**What it is:** Visualizes metrics collected by Prometheus.
**Directory:** `~/kitestacks-live/docker/grafana/`
**Provisioning:** `./provisioning/` — auto-loads datasource (Prometheus) and dashboard (Node Exporter Full)
**Data:** Named Docker volume `grafana-data`
```bash
cd ~/kitestacks-live/docker/grafana
docker compose up -d
docker logs grafana -f
```
**Dashboards auto-loaded:**
- Node Exporter Full (id 1860) — CPU, RAM, disk, network for both monk and kscloud1
- Switch between hosts using the "instance" variable at top of dashboard
**SSO:** Authentik OAuth2. Local admin login also works.
---
### uptime-kuma — Status Page
**What it is:** Uptime monitoring with a public status page.
**Directory:** `~/kitestacks-live/docker/uptime-kuma/`
**Database:** Named Docker volume `uptime-kuma` (SQLite kuma.db)
**Status page slug:** `homelab` → https://status.kitestacks.com/status/homelab
```bash
cd ~/kitestacks-live/docker/uptime-kuma
docker compose up -d
docker logs uptime-kuma -f
# To push kuma.db to kscloud1 after changes (monk → kscloud1):
# See scripts/sync-kuma.sh (or follow the sqlite backup pattern)
```
**Monitors configured:** All 11 public services + kscloud1 ping + Monk ping + Samurai ping.
**Conky widget:** Reads kscloud1's Uptime Kuma directly via Tailscale IP at
`http://100.123.x.x:3001/api/status-page/homelab`. This means the widget shows
kscloud1's health, not monk's — which is what matters for production status.
---
### bookstack — Wiki
**What it is:** Self-hosted documentation wiki with a clean UI.
**Directory:** `~/kitestacks-live/docker/bookstack/`
**Database:** MariaDB container `bookstack-db`
**Config:** `.env` file (APP_URL, DB settings, OIDC config)
```bash
cd ~/kitestacks-live/docker/bookstack
docker compose up -d
docker logs bookstack -f
# BookStack API (used to push docs from Forgejo):
# Token created via: DB injection + bcrypt hash for API key
# Token ID/secret stored in .env
```
**SSO:** Authentik OIDC. Key config:
- `OIDC_ISSUER=https://auth.kitestacks.com/application/o/bookstack/`
- `OIDC_ISSUER_DISCOVER=true`
- Cache dir must be writable: `chown -R abc:users /config/www/framework/cache/`
---
### osticket-app — Help Desk
**What it is:** OSTicket help desk and ticketing system.
**Directory:** `~/kitestacks-live/docker/osticket/`
**Database:** MySQL container `osticket-db`
**URL:** tasks.kitestacks.com (took over from OpenProject)
```bash
cd ~/kitestacks-live/docker/osticket
docker compose up -d
docker logs osticket-app -f
```
**SMTP:** Configured for smtp.gmail.com:587 using kitestacks.helpdesk@gmail.com.
App password stored in `ost_email` table (smtp_auth_creds=1 for all email entries).
**Confirmed working:** Email delivery verified 2026-06-19.
---
### portainer — Docker Management
**What it is:** Web UI for managing Docker containers on both monk and kscloud1.
**Directory:** `~/kitestacks-live/docker/portainer/`
**URL:** portainer.kitestacks.com
```bash
cd ~/kitestacks-live/docker/portainer
docker compose up -d
```
**SSO:** Authentik OAuth2 (AuthenticationMethod=3). User kenpat7177@gmail.com pre-created as admin.
**Security:** Authentik PolicyBinding restricts Portainer app to `homelab-admin` group only.
---
### cloudflared — Tunnel Connector
**What it is:** Creates the outbound tunnel to Cloudflare. This is what makes all
public services reachable without opening ports on the router.
**Directory:** `~/kitestacks-live/docker/cloudflared/`
**Token:** Read from `.env` file as `TUNNEL_TOKEN` (never hardcoded in docker-compose.yml)
```bash
cd ~/kitestacks-live/docker/cloudflared
docker compose up -d
docker logs cloudflared -f
# To rotate the token (runs on both monk and kscloud1):
# ~/kitestacks-homelab/scripts/rollout-cloudflared-token.sh '<new-token>'
```
**Tunnel ID:** 5e60ea8e-a543-49b6-bab5-325f39441e00
**Account:** Cloudflare dashboard → Zero Trust → Networks → Tunnels
---
### prometheus + node-exporter — Metrics
**What it is:** Prometheus collects time-series metrics. node-exporter exposes host stats.
**Directory:** `~/kitestacks-live/docker/prometheus/`
**Config:** `./prometheus.yml` — defines scrape targets
```bash
cd ~/kitestacks-live/docker/prometheus
docker compose up -d
docker logs prometheus -f
# Scrape targets configured:
# - node-exporter:9100 (monk, via Docker DNS)
# - 5.78.x.x:9100 (kscloud1, via public IP — node-exporter exposed on 0.0.0.0)
```
---
## Common Operations
### Restart a single service
```bash
cd ~/kitestacks-live/docker/<service-name>
docker compose restart <container-name>
```
### View live logs
```bash
docker logs <container-name> -f
# -f = follow (live tail). Ctrl+C to stop.
```
### Update a service to latest image
```bash
cd ~/kitestacks-live/docker/<service-name>
docker compose pull
docker compose up -d
```
### Check all container health at once
```bash
docker ps --format "table {{.Names}}\t{{.Status}}"
```
### Enter a container's shell
```bash
docker exec -it <container-name> bash
# or sh if bash isn't available:
docker exec -it <container-name> sh
```
### Check disk and memory usage
```bash
docker system df # Docker disk usage
free -h # RAM usage
df -h # Disk usage
```
### Push a kuma.db update to kscloud1
```bash
# 1. Make changes to monk's Uptime Kuma (add monitors, etc.)
# 2. Backup monk's db:
docker run --rm -v uptime-kuma:/src:ro -v /tmp:/out python:3-alpine \
python3 -c "import sqlite3; s=sqlite3.connect('/src/kuma.db'); b=sqlite3.connect('/out/kuma.db.push'); s.backup(b); b.close(); s.close()"
# 3. Transfer and restore on kscloud1:
gzip -c /tmp/kuma.db.push | ssh -i ~/.ssh/id_ed25519_kscloud1 kenpat@100.123.x.x \
"gunzip > /home/kenpat/kuma.db.push"
# Then on kscloud1: stop uptime-kuma, restore via same sqlite.backup() pattern, restart
```

View file

@ -0,0 +1,85 @@
# KiteStacks Build Guide — Choose Your Path
This guide teaches you how to build the entire KiteStacks homelab from a blank machine.
There are two tracks. Pick the one that fits where you are right now.
---
## Track A — With AI (Beginner)
**Who this is for:** Someone with zero or very little tech experience.
You do not need to know Linux, Docker, or networking. You just need to be able to
follow instructions and copy commands.
**How it works:** You use an AI assistant (Claude, ChatGPT, or similar) as your guide
throughout the build. The AI explains what each command does in plain language before
you run it. You never copy something without understanding what it does — the AI makes
sure of that.
**Time to complete:** 24 weeks of evenings and weekends (23 hours per session).
**What you will have at the end:** A fully working homelab identical to KiteStacks.
**[Start the AI-Assisted Build](with-ai/01-what-you-need.md)**
---
## Track B — Without AI (Advanced)
**Who this is for:** Someone who wants to understand everything deeply and build skills
along the way — not just copy commands but know what every line does and why.
**How it works:** You build the homelab from scratch, learning Bash scripting, Python,
Docker internals, Linux administration, and networking as you go. Every command is
explained in full. No shortcuts.
**Time to complete:** 36 months of consistent part-time study and building
(evenings and weekends). Full-time: 610 weeks.
**What you will learn:** Linux, Bash scripting, Python, Docker, networking (DNS, ports,
TLS, firewalls), OAuth2/OIDC, infrastructure design, and troubleshooting methodology.
**[Start the Advanced Build](without-ai/01-linux-foundations.md)**
---
## What Both Tracks Build
By the end of either track you will have:
- ✅ A public domain (e.g. kitestacks.com) serving real websites
- ✅ Eleven self-hosted services running in Docker
- ✅ Single sign-on — one account for everything
- ✅ A cloud VPS as a permanent backup — site stays up when your home PC is off
- ✅ Private networking between home and cloud via Tailscale VPN
- ✅ Real-time monitoring with Grafana and Uptime Kuma
- ✅ A desktop widget showing live service status
---
## Hardware and Accounts Needed (Both Tracks)
### Hardware
- Any PC or laptop running Linux (or you can install Linux on it) — minimum 8GB RAM, 100GB disk
- A domain name — buy from Cloudflare Registrar, Namecheap, or similar (~$1015/year)
- A credit card for the cloud VPS (~€45/month on Hetzner — less than a coffee)
### Accounts to Create
- **Cloudflare** — free account at cloudflare.com
- **Hetzner** — cloud VPS provider at hetzner.com (or any VPS: DigitalOcean, Vultr, Linode)
- **Tailscale** — free at tailscale.com (up to 100 devices)
- **OpenRouter** — free AI model access at openrouter.ai (for the AI chat service)
### What You Are Building On
```
Home PC (monk)
└── Ubuntu or similar Linux OS
└── Docker + Docker Compose
└── ~15 containers running
Cloud VPS (kscloud1)
└── Ubuntu Linux
└── Docker + Docker Compose
└── Same 15 containers running (replica)
└── Shared PostgreSQL + Redis
```

View file

@ -0,0 +1,182 @@
# Step 1 — What You Need Before You Start
**Track:** With AI (Beginner)
**Time for this step:** 12 hours
Welcome. You are about to build a real, working homelab that serves websites to the
actual internet. It sounds complicated, but with an AI assistant helping you every step
of the way, you can absolutely do this even if you have never used Linux before.
---
## How to Use This Guide
Throughout this build, whenever you see a command like this:
```bash
docker ps
```
That is something you type into a terminal (a black window where you type commands).
Before you type any command, **ask your AI assistant what it does**. Say:
> "What does this command do: `docker ps`"
The AI will explain it in plain language. Never run a command you do not understand.
That is the rule throughout this entire build.
---
## What You Need
### 1. A Computer to Run Everything On
You need a PC or laptop that will be your home server. This will be called **monk**
throughout this guide (that is just a nickname — you can call it whatever you want).
Minimum specs:
- **RAM:** 8 GB (16 GB recommended — you will run about 15 programs at once)
- **Storage:** 100 GB free space
- **Operating system:** Linux (Ubuntu 22.04 or 24.04 recommended)
If your computer currently runs Windows, you have two options:
- Install Ubuntu alongside Windows (dual boot)
- Replace Windows with Ubuntu entirely (easier, recommended)
**Ask your AI:** "How do I install Ubuntu 24.04 on my computer?"
---
### 2. A Domain Name
A domain name is your address on the internet — for example, `kitestacks.com`.
You need to buy one. It costs about $1015 per year.
**Where to buy:** Cloudflare Registrar (registrar.cloudflare.com) is recommended
because you will use Cloudflare for everything else and it keeps things in one place.
**Tips for picking a domain:**
- Keep it short and memorable
- `.com` is most professional
- Avoid hyphens and numbers
**Ask your AI:** "How do I buy a domain name on Cloudflare Registrar?"
---
### 3. A Cloudflare Account
Cloudflare is the service that sits between the internet and your home computer.
It hides your home IP address, handles all the security, and routes traffic to
your services. Best part: everything you need is on their free plan.
Go to cloudflare.com and create a free account.
If you bought your domain from Cloudflare Registrar, your account is already set up.
If you bought it elsewhere, you will need to move it to Cloudflare — ask your AI how.
---
### 4. A Cloud VPS (Virtual Private Server)
A VPS is a small computer that rents space in a data center. It runs 24 hours a day
even when your home computer is off. This is what keeps your websites online when
you are travelling or when your home internet goes down.
**Recommended provider:** Hetzner (hetzner.com) — excellent value, based in Germany.
**Plan to choose:** CX22 — 2 vCPU, 4 GB RAM, 40 GB disk — approximately €4/month.
Create a Hetzner account, then ask your AI: "How do I create a new CX22 VPS on Hetzner
with Ubuntu 24.04?"
This second computer will be called **kscloud1** throughout this guide.
---
### 5. A Tailscale Account
Tailscale is a free service that creates a private, encrypted connection between your
home computer and your cloud VPS. Think of it as a private tunnel that only your
devices can use.
Go to tailscale.com and create a free account.
---
### 6. An OpenRouter Account (for AI services)
OpenRouter gives you access to dozens of AI models for free (with rate limits) or
for very low cost. Your KiteStacks AI service will use this.
Go to openrouter.ai and create a free account.
---
## Setting Up Your Home Computer (monk)
Once Ubuntu is installed on your home computer, open a terminal. On Ubuntu,
press `Ctrl + Alt + T` to open one.
You will see something like:
```
kenpatmonk@monk:~$
```
That `$` means you are ready to type commands.
**First, update your system. Ask your AI what this does, then run it:**
```bash
sudo apt update && sudo apt upgrade -y
```
**Then install some tools you will need:**
```bash
sudo apt install -y curl git nano wget
```
**Ask your AI:** "What does `sudo apt install` do and why do I need curl, git, nano, and wget?"
---
## Setting Up Your Cloud VPS (kscloud1)
After creating your VPS on Hetzner, you will get an IP address (something like `5.78.233.28`).
You connect to it using a tool called SSH.
**Ask your AI:** "What is SSH and how do I connect to my VPS from Ubuntu?"
The basic command looks like this:
```bash
ssh root@YOUR_VPS_IP
```
Replace `YOUR_VPS_IP` with the actual IP Hetzner gave you.
Once connected, update the VPS just like you did on your home computer:
```bash
apt update && apt upgrade -y
```
---
## Checkpoint
Before moving to Step 2, make sure you have:
- [ ] Ubuntu installed and running on your home computer
- [ ] A domain name purchased and pointing to Cloudflare
- [ ] A Cloudflare account (free)
- [ ] A Hetzner VPS created with Ubuntu (noted your VPS IP address)
- [ ] A Tailscale account (free)
- [ ] An OpenRouter account (free)
- [ ] You can open a terminal on your home computer
- [ ] You can SSH into your VPS
If any of these are not done, stop here and ask your AI for help completing them
before moving on. Every future step assumes all of these are in place.
---
**Next:** [Step 2 — DNS and Cloudflare Setup](02-dns-and-cloudflare.md)

View file

@ -0,0 +1,129 @@
# Step 2 — DNS and Cloudflare Setup
**Track:** With AI (Beginner)
**Time for this step:** 12 hours
In this step you will set up Cloudflare so your domain points to Cloudflare's servers,
and you will create the Cloudflare Tunnel that allows the internet to reach your home
computer without exposing your home IP address.
---
## What Is Happening Here?
When someone types `www.kitestacks.com` into a browser, their computer asks a system
called DNS: "What is the IP address for kitestacks.com?"
Normally, that answer would be your home IP address. But we do NOT want that — your
home IP could change, could be targeted by attackers, or could be blocked by your ISP.
Instead, the DNS answer will be Cloudflare's IP address. Traffic goes to Cloudflare,
Cloudflare sends it to your computer through a tunnel, and your home IP is never involved.
**Ask your AI:** "Can you explain in simple terms how Cloudflare Tunnel works?"
---
## Step 2A — Add Your Domain to Cloudflare
If you bought your domain from Cloudflare Registrar, skip to Step 2B.
If you bought it elsewhere (Namecheap, GoDaddy, etc.):
1. Log in to Cloudflare at cloudflare.com
2. Click "Add a site"
3. Enter your domain name
4. Choose the Free plan
5. Cloudflare will give you two nameserver addresses (like `vera.ns.cloudflare.com`)
6. Go to your domain registrar's website and replace the nameservers with Cloudflare's
**Ask your AI:** "How do I change nameservers on [your registrar]?"
It can take up to 24 hours for nameserver changes to propagate worldwide, but usually
it happens within an hour.
---
## Step 2B — Create Your Cloudflare Tunnel
A Cloudflare Tunnel is the invisible connection between your home computer and Cloudflare.
Your home computer reaches out to Cloudflare (outbound connection). Cloudflare holds that
connection open. When someone visits your website, Cloudflare sends the request back through
that existing connection. Your home router never needs to be configured.
**To create a tunnel:**
1. In your Cloudflare dashboard, go to: **Zero Trust → Networks → Tunnels**
2. Click **"Create a tunnel"**
3. Choose **"Cloudflared"** as the connector type
4. Name your tunnel (e.g., `kitestacks-tunnel`)
5. Cloudflare will show you a token — a long string of characters starting with `eyJ`
6. **Save this token somewhere safe** — you will need it in Step 3
---
## Step 2C — Add Public Hostnames to the Tunnel
A public hostname tells Cloudflare: "When someone visits this URL, send the traffic
to this container on my home computer."
You will set up hostnames for all eleven of your services. For each one:
1. In the tunnel settings, click **"Public Hostnames"**
2. Click **"Add a public hostname"**
Add all of these (you will complete the services in later steps, but adding the
hostnames now means they are ready):
| Subdomain | Domain | Service | URL |
|-----------|--------|---------|-----|
| www | yourdomain.com | http://homepage:3000 | www.yourdomain.com |
| auth | yourdomain.com | http://authentik:9000 | auth.yourdomain.com |
| gitforge | yourdomain.com | http://forgejo:3000 | gitforge.yourdomain.com |
| ai | yourdomain.com | http://kite-openwebui:8080 | ai.yourdomain.com |
| links | yourdomain.com | http://karakeep:3000 | links.yourdomain.com |
| kavita | yourdomain.com | http://kavita:5000 | kavita.yourdomain.com |
| grafana | yourdomain.com | http://grafana:3000 | grafana.yourdomain.com |
| status | yourdomain.com | http://uptime-kuma:3001 | status.yourdomain.com |
| wiki | yourdomain.com | http://bookstack:80 | wiki.yourdomain.com |
| tasks | yourdomain.com | http://osticket-app:80 | tasks.yourdomain.com |
| portainer | yourdomain.com | https://portainer:9443 | portainer.yourdomain.com |
For the `portainer` entry, enable **"No TLS Verify"** (Portainer uses its own self-signed certificate internally).
Replace `yourdomain.com` with your actual domain throughout.
**Ask your AI:** "What does the 'service' field in a Cloudflare Tunnel hostname mean?
Why do I use `http://homepage:3000` instead of an IP address?"
---
## Step 2D — Create the Docker Network
Everything in this homelab runs in Docker (covered in the next step), and all the
containers need to be able to talk to each other and to the Cloudflare connector.
They do this by being on the same Docker network.
On your **home computer**, run:
```bash
docker network create kitestacks
```
You will also do this on your **cloud VPS** in a later step.
**Ask your AI:** "What is a Docker network and why do all containers need to be on the same one?"
---
## Checkpoint
Before moving to Step 3, make sure:
- [ ] Your domain is on Cloudflare (nameservers changed or bought from Cloudflare)
- [ ] You created a Cloudflare Tunnel and saved the tunnel token
- [ ] You added all 11 public hostnames to the tunnel
- [ ] You ran `docker network create kitestacks` on your home computer
---
**Next:** [Step 3 — Installing Docker](03-docker-setup.md)

View file

@ -0,0 +1,196 @@
# Step 3 — Installing Docker
**Track:** With AI (Beginner)
**Time for this step:** 3060 minutes (on both your home computer and your VPS)
Docker is the technology that runs all your services. Think of it like a machine that
can run many small, isolated programs at the same time — each program thinks it is
the only one on the computer, even though they are all sharing the same hardware.
Each program is called a **container**. You will have about 15 containers running.
---
## What Is Docker? (Plain English)
Imagine you want to run fifteen different apps on your computer. If you installed them
all directly, they might conflict — one app needs Python version 3.9, another needs 3.11,
and they fight over which one to use. Docker solves this by giving each app its own
little bubble where it has exactly what it needs, completely separate from everything else.
A **container** is one of those bubbles.
A **Docker image** is the recipe for making a bubble.
**Docker Compose** is a tool that lets you describe multiple containers in one file
and start them all with one command.
**Ask your AI:** "Can you explain Docker containers vs Docker images using a simple analogy?"
---
## Installing Docker on Your Home Computer (monk)
Run these commands one at a time. Before each one, ask your AI what it does.
```bash
# Install required packages
sudo apt install -y ca-certificates curl
# Add Docker's official GPG key (proves the software is authentic)
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add Docker's package source
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Update package list and install Docker
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
```
Now let Docker start automatically when your computer boots:
```bash
sudo systemctl enable docker
sudo systemctl start docker
```
Add yourself to the Docker group so you do not need `sudo` every time:
```bash
sudo usermod -aG docker $USER
```
**Log out and log back in** (or reboot) for this change to take effect.
Test that Docker is installed:
```bash
docker --version
docker compose version
```
You should see version numbers printed. If you see errors, ask your AI to help.
---
## Installing Docker on Your Cloud VPS (kscloud1)
SSH into your VPS and run the exact same commands as above. The process is identical.
```bash
ssh root@YOUR_VPS_IP
```
Then run all the same installation commands.
---
## Your First Container — Cloudflared (Tunnel Connector)
The first container you will run is `cloudflared` — this is what creates the tunnel
between your computer and Cloudflare. Without this, nothing else can be reached from
the internet.
**On your home computer**, create a folder for it:
```bash
mkdir -p ~/kitestacks-live/docker/cloudflared
cd ~/kitestacks-live/docker/cloudflared
```
Create a file called `.env` that holds your tunnel token:
```bash
nano .env
```
Inside the file, type:
```
TUNNEL_TOKEN=paste-your-token-here
```
Replace `paste-your-token-here` with the token you saved from Step 2.
Press `Ctrl+X`, then `Y`, then `Enter` to save.
Now create the `docker-compose.yml` file:
```bash
nano docker-compose.yml
```
Paste this content:
```yaml
services:
cloudflared:
image: cloudflare/cloudflared:latest
container_name: cloudflared
restart: unless-stopped
command: tunnel --no-autoupdate run
environment:
- TUNNEL_TOKEN=${TUNNEL_TOKEN:?set TUNNEL_TOKEN in .env}
networks:
- default
- kitestacks
networks:
kitestacks:
external: true
```
Save and close the file. Then start it:
```bash
docker compose up -d
```
Check that it is running:
```bash
docker ps
```
You should see `cloudflared` in the list with a status of `Up`.
Check the logs to confirm it connected:
```bash
docker logs cloudflared
```
You should see something like "Connection established" or "Registered tunnel connection".
**Ask your AI:** "What does `restart: unless-stopped` mean in a Docker Compose file?"
---
## Run Cloudflared on Your VPS Too
SSH into your VPS and do the exact same thing. Use the **same tunnel token** — Cloudflare
will register this as a second connector for the same tunnel. If your home computer goes
offline, the VPS will keep serving traffic.
```bash
mkdir -p /opt/kitestacks/docker/cloudflared
cd /opt/kitestacks/docker/cloudflared
```
Create the same `.env` and `docker-compose.yml` files, then:
```bash
docker compose up -d
docker logs cloudflared
```
---
## Checkpoint
Before moving to Step 4:
- [ ] Docker is installed on your home computer
- [ ] Docker is installed on your VPS
- [ ] `docker ps` shows `cloudflared` running on both machines
- [ ] `docker logs cloudflared` shows successful connection on both
Go to your Cloudflare Tunnel dashboard. Under your tunnel, you should now see
**2 connectors** listed — one from your home computer and one from your VPS.
If you only see one, wait a few minutes and refresh.
---
**Next:** [Step 4 — Core Services](04-core-services.md)

View file

@ -0,0 +1,298 @@
# Step 4 — Core Services: Portal, Forgejo, and Authentik
**Track:** With AI (Beginner)
**Time for this step:** 35 hours
These three services form the foundation of KiteStacks:
- **Portal** — the homepage that links to everything
- **Forgejo** — stores all your code and configurations in Git
- **Authentik** — handles all logins for every service (SSO)
Set these up first. Everything else depends on them.
---
## How Docker Compose Files Work
Every service in this homelab has its own folder with a `docker-compose.yml` file.
That file describes the service: what image to use, what environment variables to set,
what folders to use for data, and what network to join.
You will create these files using `nano` (a simple text editor in the terminal).
**Ask your AI:** "Can you explain what each section of a docker-compose.yml file does:
services, image, container_name, restart, environment, volumes, networks?"
---
## Service 1 — The Portal (Homepage)
The portal is your home page at `www.yourdomain.com`. It shows links to all your
services and displays live system stats.
```bash
mkdir -p ~/kitestacks-live/docker/kitestacks-portal/public
cd ~/kitestacks-live/docker/kitestacks-portal
```
Create `docker-compose.yml`:
```yaml
services:
homepage:
image: nginx:alpine
container_name: homepage
restart: unless-stopped
volumes:
- ./public:/usr/share/nginx/html:ro
- ./nginx.conf:/etc/nginx/conf.d/default.conf:ro
networks:
- kitestacks
networks:
kitestacks:
external: true
```
Create a basic `nginx.conf`:
```nginx
server {
listen 3000;
root /usr/share/nginx/html;
index index.html;
location / {
try_files $uri $uri/ /index.html;
}
}
```
Create a basic `public/index.html` to test:
```html
<!DOCTYPE html>
<html>
<head><title>KiteStacks</title></head>
<body>
<h1>KiteStacks is live!</h1>
</body>
</html>
```
Start it:
```bash
docker compose up -d
docker ps
```
Visit `www.yourdomain.com` in a browser. You should see your page.
If it works, you have confirmed the tunnel is routing correctly.
**Ask your AI:** "I want to build a proper homepage for my homelab. It should have a
dark cyberpunk theme with cards for each of my services. Can you help me write the HTML?"
Work with your AI to build the portal you want. The KiteStacks portal source is in
`~/kitestacks-homelab/apps/kitestacks-portal/` as reference.
---
## Service 2 — Forgejo (Git)
Forgejo stores all your code. You will push your homelab configs to it so everything
is version-controlled and you never lose your work.
First, set up the shared PostgreSQL database (Forgejo will use this):
```bash
mkdir -p ~/kitestacks-live/docker/postgres
cd ~/kitestacks-live/docker/postgres
```
Create `.env`:
```
POSTGRES_USER=authentik
POSTGRES_PASSWORD=choose-a-strong-password-here
POSTGRES_DB=authentik
```
Create `docker-compose.yml`:
```yaml
services:
authentik-postgres:
image: postgres:16-alpine
container_name: authentik-postgres
restart: unless-stopped
env_file: .env
volumes:
- ./data:/var/lib/postgresql/data
networks:
- kitestacks
networks:
kitestacks:
external: true
```
```bash
docker compose up -d
```
Now create the Forgejo service:
```bash
mkdir -p ~/kitestacks-live/docker/forgejo
cd ~/kitestacks-live/docker/forgejo
```
Create `.env`:
```
FORGEJO_DB_TYPE=postgres
FORGEJO_DB_HOST=authentik-postgres:5432
FORGEJO_DB_NAME=forgejo
FORGEJO_DB_USER=forgejo
FORGEJO_DB_PASSWD=choose-a-strong-password-here
FORGEJO_DOMAIN=gitforge.yourdomain.com
FORGEJO_SSH_DOMAIN=gitforge.yourdomain.com
FORGEJO_ROOT_URL=https://gitforge.yourdomain.com
```
Create `docker-compose.yml`:
```yaml
services:
forgejo:
image: codeberg.org/forgejo/forgejo:latest
container_name: forgejo
restart: unless-stopped
env_file: .env
volumes:
- ./data:/data
networks:
- kitestacks
networks:
kitestacks:
external: true
```
```bash
docker compose up -d
docker logs forgejo -f
```
Wait for it to finish starting (about 30 seconds), then visit `gitforge.yourdomain.com`.
You will see a Forgejo setup page — follow the on-screen instructions to create your admin account.
**Ask your AI:** "How do I create a repository on Forgejo and push my local files to it?"
---
## Service 3 — Authentik (Single Sign-On)
Authentik is the most complex service to set up, but it is worth it — once done,
you log in once and every other service recognizes you automatically.
First, set up Redis (Authentik needs this for session management):
```bash
mkdir -p ~/kitestacks-live/docker/redis
cd ~/kitestacks-live/docker/redis
```
Create `docker-compose.yml`:
```yaml
services:
authentik-redis:
image: redis:alpine
container_name: authentik-redis
restart: unless-stopped
networks:
- kitestacks
networks:
kitestacks:
external: true
```
```bash
docker compose up -d
```
Now create Authentik:
```bash
mkdir -p ~/kitestacks-live/docker/authentik
cd ~/kitestacks-live/docker/authentik
```
Generate a secret key (run this and save the output):
```bash
openssl rand -base64 60 | tr -d '\n'
```
Create `.env` (replace the values):
```
PG_PASS=same-postgres-password-from-above
AUTHENTIK_SECRET_KEY=paste-the-generated-key-here
AUTHENTIK_BOOTSTRAP_EMAIL=your@email.com
AUTHENTIK_BOOTSTRAP_PASSWORD=choose-a-strong-admin-password
AUTHENTIK_POSTGRESQL__HOST=authentik-postgres
AUTHENTIK_POSTGRESQL__USER=authentik
AUTHENTIK_POSTGRESQL__NAME=authentik
AUTHENTIK_POSTGRESQL__PASSWORD=same-postgres-password-from-above
AUTHENTIK_REDIS__HOST=authentik-redis
```
Create `docker-compose.yml`:
```yaml
services:
authentik:
image: ghcr.io/goauthentik/server:latest
container_name: authentik
restart: unless-stopped
command: server
env_file: .env
networks:
- kitestacks
authentik-worker:
image: ghcr.io/goauthentik/server:latest
container_name: authentik-worker
restart: unless-stopped
command: worker
env_file: .env
networks:
- kitestacks
networks:
kitestacks:
external: true
```
```bash
docker compose up -d
```
Authentik takes about 2 minutes to start on first run (it sets up the database).
Watch the logs:
```bash
docker logs authentik -f
```
When you see "Starting authentik server" it is ready.
Visit `auth.yourdomain.com` and log in with the bootstrap email and password you set.
**Ask your AI:** "I have Authentik running. How do I create an OAuth2 provider for Grafana
so it can use SSO? Walk me through the steps in the Authentik admin panel."
Use the same process (with your AI's help) to create OAuth2 providers for each service
as you add them in the next steps.
---
## Checkpoint
Before moving to Step 5:
- [ ] Portal is live at `www.yourdomain.com`
- [ ] Forgejo is live at `gitforge.yourdomain.com` with your admin account created
- [ ] Authentik is live at `auth.yourdomain.com` and you can log in
- [ ] You can see all three containers in `docker ps`
---
**Next:** [Step 5 — All Remaining Services](05-all-services.md)

View file

@ -0,0 +1,266 @@
# Step 5 — All Remaining Services
**Track:** With AI (Beginner)
**Time for this step:** 48 hours (take breaks — deploy one service at a time)
In this step you will deploy the remaining eight services. For each one:
1. Create the folder
2. Create the `docker-compose.yml` file
3. Run `docker compose up -d`
4. Verify it is working
5. Move on to the next one
For each service, ask your AI to explain the docker-compose file before you run it.
---
## How to Use Your AI for Each Service
For every service in this step, you can say to your AI:
> "I am setting up [service name] in my KiteStacks homelab. It is a self-hosted [description].
> Can you give me a docker-compose.yml for it that joins a network called 'kitestacks'?
> I want to understand each part before I run it."
Then ask follow-up questions about anything you do not understand.
---
## Service 4 — Open WebUI + LiteLLM (AI Chat)
Open WebUI is your ChatGPT-style interface. LiteLLM sits behind it and routes your
AI requests to OpenRouter (where you have free model access).
```bash
mkdir -p ~/kitestacks-live/docker/kite-openwebui
mkdir -p ~/kitestacks-live/docker/kite-litellm
```
**Ask your AI:**
> "I want to set up Open WebUI (ghcr.io/open-webui/open-webui) with LiteLLM as the
> backend. LiteLLM should route to OpenRouter. Can you give me docker-compose files
> for both? Container names: kite-openwebui and kite-litellm. Network: kitestacks."
Work with your AI to get the right environment variables (you will need your OpenRouter
API key from openrouter.ai).
Start both:
```bash
cd ~/kitestacks-live/docker/kite-litellm && docker compose up -d
cd ~/kitestacks-live/docker/kite-openwebui && docker compose up -d
```
Visit `ai.yourdomain.com` and create your admin account.
---
## Service 5 — Karakeep (Bookmarks)
Karakeep saves bookmarks, articles, and links. It uses a headless Chrome browser
to capture the full content of pages you save.
```bash
mkdir -p ~/kitestacks-live/docker/karakeep
```
**Ask your AI:**
> "I want to set up Karakeep (ghcr.io/karakeep/karakeep) for bookmark management.
> It needs a headless Chrome container (browserless/chrome) for page capture and
> a Meilisearch container for search. Container names: karakeep, karakeep-chrome,
> karakeep-meilisearch. All on the 'kitestacks' network. Give me one docker-compose.yml
> for all three."
```bash
cd ~/kitestacks-live/docker/karakeep && docker compose up -d
```
Visit `links.yourdomain.com`.
**Important:** When you set up SSO for Karakeep in Step 6, note that Karakeep uses
NextAuth.js with the provider ID `custom` — so the OAuth2 redirect URL will be
`https://links.yourdomain.com/api/auth/callback/custom` (not `/callback/authentik`).
This is a common mistake. Make a note of it now.
---
## Service 6 — Kavita (eBook Reader)
Kavita lets you read ebooks, manga, and comics from a library you maintain.
```bash
mkdir -p ~/kitestacks-live/docker/kavita/library/books
mkdir -p ~/kitestacks-live/docker/kavita/config
```
**Ask your AI:**
> "I want to set up Kavita (jvmilazz0/kavita) as an ebook reader. Container name: kavita.
> The library should be mounted from ./library/books into the container. Config directory
> at ./config. Network: kitestacks. Give me the docker-compose.yml."
```bash
cd ~/kitestacks-live/docker/kavita && docker compose up -d
```
Visit `kavita.yourdomain.com` and create your admin account. Add your books by placing
ebook files in `~/kitestacks-live/docker/kavita/library/books/` and scanning the library
in Kavita's settings.
**Important for SSO:** Kavita's OIDC settings must be configured through the Kavita web UI,
not by editing files directly. The Authority URL must end with a trailing slash:
`https://auth.yourdomain.com/application/o/kavita/`
---
## Service 7 — Grafana (Monitoring Dashboards)
Grafana shows you beautiful graphs of your server's CPU, RAM, network, and disk usage.
```bash
mkdir -p ~/kitestacks-live/docker/grafana/provisioning/datasources
mkdir -p ~/kitestacks-live/docker/grafana/provisioning/dashboards
```
**Ask your AI:**
> "I want to set up Grafana (grafana/grafana) with Prometheus as the data source.
> I want the 'Node Exporter Full' dashboard (id 1860) to auto-load via provisioning.
> Container name: grafana. Network: kitestacks. Give me the docker-compose.yml and
> the provisioning YAML files for the datasource and dashboard."
```bash
cd ~/kitestacks-live/docker/grafana && docker compose up -d
```
Visit `grafana.yourdomain.com`.
**Also set up Prometheus and node-exporter (Grafana needs these for data):**
**Ask your AI:**
> "I want to set up Prometheus to scrape metrics from node-exporter running on the same
> host. Container names: prometheus and node-exporter. Network: kitestacks. Give me the
> docker-compose.yml and prometheus.yml config file."
---
## Service 8 — Uptime Kuma (Status Page)
Uptime Kuma monitors all your services and shows a public status page.
```bash
mkdir -p ~/kitestacks-live/docker/uptime-kuma
```
**Ask your AI:**
> "Set up Uptime Kuma (louislam/uptime-kuma). Container name: uptime-kuma. Network: kitestacks.
> Use a named volume called 'uptime-kuma' for data. Give me the docker-compose.yml."
```bash
cd ~/kitestacks-live/docker/uptime-kuma && docker compose up -d
```
Visit `status.yourdomain.com`, create your admin account, then add HTTP monitors for
each of your eleven services. Set each monitor to check every 60 seconds.
**Add a status page:**
- In Uptime Kuma → Status Pages → New Status Page
- Slug: `homelab`
- Add all your monitors to it
- Your public status page will be at `status.yourdomain.com/status/homelab`
---
## Service 9 — BookStack (Wiki)
BookStack is a clean wiki for writing and organizing documentation.
```bash
mkdir -p ~/kitestacks-live/docker/bookstack
```
**Ask your AI:**
> "Set up BookStack (lscr.io/linuxserver/bookstack) with its own MariaDB database.
> Container names: bookstack and bookstack-db. APP_URL should be https://wiki.yourdomain.com.
> Network: kitestacks. Give me the docker-compose.yml."
```bash
cd ~/kitestacks-live/docker/bookstack && docker compose up -d
```
BookStack takes about a minute to start on first run. Visit `wiki.yourdomain.com`.
Default login: `admin@admin.com` / `password` — change this immediately.
---
## Service 10 — OSTicket (Help Desk)
OSTicket is a help desk and ticketing system.
```bash
mkdir -p ~/kitestacks-live/docker/osticket
```
**Ask your AI:**
> "Set up OSTicket using the docker image campbellsoftwaresolutions/osticket with its
> own MySQL database. Container names: osticket-app and osticket-db. Network: kitestacks.
> What environment variables do I need? Give me the docker-compose.yml."
```bash
cd ~/kitestacks-live/docker/osticket && docker compose up -d
```
Visit `tasks.yourdomain.com` to complete the web-based setup.
---
## Service 11 — Portainer (Docker Management)
Portainer gives you a visual dashboard to manage all your containers.
```bash
mkdir -p ~/kitestacks-live/docker/portainer
```
**Ask your AI:**
> "Set up Portainer CE (portainer/portainer-ce). Container name: portainer. Port 9443 (HTTPS).
> Mount the Docker socket (/var/run/docker.sock) so it can manage containers.
> Network: kitestacks. Give me the docker-compose.yml."
```bash
cd ~/kitestacks-live/docker/portainer && docker compose up -d
```
Visit `portainer.yourdomain.com`. Create your admin account.
---
## Checkpoint
Run this to see all your containers:
```bash
docker ps --format "table {{.Names}}\t{{.Status}}"
```
You should see all of these running:
- cloudflared
- homepage
- forgejo
- authentik + authentik-worker
- kite-openwebui + kite-litellm
- karakeep + karakeep-chrome + karakeep-meilisearch
- kavita
- grafana + prometheus + node-exporter
- uptime-kuma
- bookstack + bookstack-db
- osticket-app + osticket-db
- portainer
- authentik-postgres + authentik-redis
If any are missing or show as unhealthy, check their logs:
```bash
docker logs <container-name>
```
Ask your AI to help diagnose any errors.
---
**Next:** [Step 6 — Single Sign-On (SSO)](06-sso.md)

View file

@ -0,0 +1,242 @@
# Step 6 — Single Sign-On (SSO)
**Track:** With AI (Beginner)
**Time for this step:** 35 hours
SSO (Single Sign-On) means one login for everything. After this step, you will log in
with your Authentik account once and every service will recognize you automatically.
No more logging in to each service separately.
---
## How SSO Works (Plain English)
Without SSO:
```
You → Grafana login page → type username + password → logged in to Grafana
You → Forgejo login page → type username + password → logged in to Forgejo
(repeat for every service)
```
With SSO:
```
You → Grafana "Sign in with Authentik" button
→ Authentik asks for login (once, or already remembered)
→ Authentik tells Grafana "this is kenpat, let them in"
→ Logged in to Grafana
You → Forgejo "Sign in with Authentik"
→ Already logged into Authentik → instantly logged in to Forgejo
```
The technology behind this is called **OAuth2** and **OIDC**. For now, you do not
need to know the details — just follow the steps. (The concepts file explains it
deeply if you are curious: [concepts/oauth2-oidc.md](../../concepts/oauth2-oidc.md))
---
## The Process for Each Service
For every service, you do the same three things:
**In Authentik:**
1. Create an OAuth2 Provider for the service
2. Create an Application that links to that Provider
3. (Optional) Add a Policy to restrict who can access it
**In the service:**
4. Enter the Authentik credentials (client ID, client secret, URLs)
Your AI will guide you through each one. Use this prompt template:
> "I want to configure SSO for [service name] using Authentik as the OIDC provider.
> The service is at https://[service].yourdomain.com. Walk me through:
> 1. Creating an OAuth2 provider in Authentik's admin panel
> 2. What redirect URI to use
> 3. How to configure the service to use Authentik for login"
---
## SSO for Grafana
**In Authentik admin panel (auth.yourdomain.com/if/admin/):**
1. Go to **Applications → Providers → Create**
2. Choose **OAuth2/OpenID Provider**
3. Name: `Grafana`
4. Client type: `Confidential`
5. Redirect URIs: `https://grafana.yourdomain.com/login/generic_oauth`
6. Scopes: openid, email, profile
7. Save — note the **Client ID** and **Client Secret**
8. Go to **Applications → Applications → Create**
9. Name: `Grafana`, Slug: `grafana`
10. Provider: select the Grafana provider you just created
11. Save
**In Grafana's `.env` or `docker-compose.yml` environment:**
```
GF_AUTH_GENERIC_OAUTH_ENABLED=true
GF_AUTH_GENERIC_OAUTH_NAME=Authentik
GF_AUTH_GENERIC_OAUTH_CLIENT_ID=paste-client-id-here
GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET=paste-client-secret-here
GF_AUTH_GENERIC_OAUTH_SCOPES=openid email profile
GF_AUTH_GENERIC_OAUTH_AUTH_URL=https://auth.yourdomain.com/application/o/authorize/
GF_AUTH_GENERIC_OAUTH_TOKEN_URL=https://auth.yourdomain.com/application/o/token/
GF_AUTH_GENERIC_OAUTH_API_URL=https://auth.yourdomain.com/application/o/userinfo/
GF_AUTH_GENERIC_OAUTH_ROLE_ATTRIBUTE_PATH=contains(groups, 'homelab-admin') && 'Admin' || 'Viewer'
```
Restart Grafana: `docker compose restart grafana`
Visit `grafana.yourdomain.com` — you should see a "Sign in with Authentik" button.
---
## SSO for Forgejo
**In Authentik:** Create an OAuth2 Provider with:
- Redirect URI: `https://gitforge.yourdomain.com/user/oauth2/authentik/callback`
**In Forgejo:**
- Site Administration → Authentication Sources → Add Authentication Source
- Type: OAuth2
- Name: `authentik`
- OAuth2 Provider: OpenID Connect
- Client ID and Secret from Authentik
- OpenID Connect Discovery URL: `https://auth.yourdomain.com/application/o/forgejo/.well-known/openid-configuration`
**Ask your AI:** "Walk me through adding an OAuth2 authentication source in Forgejo's admin panel."
---
## SSO for Karakeep
**Important:** Karakeep uses NextAuth.js internally. The redirect URI is NOT the usual
`/callback/authentik` — it is `/api/auth/callback/custom`.
**In Authentik:** Create OAuth2 Provider with:
- Redirect URI: `https://links.yourdomain.com/api/auth/callback/custom`
**In Karakeep's environment:**
```
NEXTAUTH_URL=https://links.yourdomain.com
NEXTAUTH_SECRET=generate-a-random-secret
OAUTH_WELLKNOWN_URL=https://auth.yourdomain.com/application/o/karakeep/.well-known/openid-configuration
OAUTH_CLIENT_ID=paste-client-id
OAUTH_CLIENT_SECRET=paste-client-secret
OAUTH_PROVIDER_NAME=Authentik
OAUTH_ALLOW_DANGEROUS_EMAIL_ACCOUNT_LINKING=true
```
---
## SSO for Kavita
**In Authentik:** Create OAuth2 Provider with:
- Redirect URI: `https://kavita.yourdomain.com/api/auth/callback`
**In Kavita:** Go to Settings → OIDC (must be done through the UI, not by editing files):
- Authority: `https://auth.yourdomain.com/application/o/kavita/` ← trailing slash required
- Client ID and Client Secret from Authentik
- Enabled: on
**Critical:** The trailing slash in the Authority URL is required. Without it, Kavita
gives an "issuer does not match" error.
---
## SSO for Open WebUI
**In Authentik:** Create OAuth2 Provider with:
- Redirect URI: `https://ai.yourdomain.com/oauth/oidc/callback`
**In Open WebUI's environment:**
```
ENABLE_OAUTH_SIGNUP=true
OAUTH_PROVIDER_NAME=Authentik
OPENID_PROVIDER_URL=https://auth.yourdomain.com/application/o/openwebui/.well-known/openid-configuration
OAUTH_CLIENT_ID=paste-client-id
OAUTH_CLIENT_SECRET=paste-client-secret
```
---
## SSO for BookStack
**In Authentik:** Create OAuth2 Provider with:
- Redirect URI: `https://wiki.yourdomain.com/oidc/callback`
- Issuer mode: **Per Provider** (important — set this in Authentik's provider settings)
**In BookStack's `.env`:**
```
AUTH_METHOD=oidc
AUTH_AUTO_INITIATE=false
OIDC_NAME=Authentik
OIDC_DISPLAY_NAME_CLAIMS=name
OIDC_CLIENT_ID=paste-client-id
OIDC_CLIENT_SECRET=paste-client-secret
OIDC_ISSUER=https://auth.yourdomain.com/application/o/bookstack/
OIDC_ISSUER_DISCOVER=true
```
After setting this up, the BookStack cache directory needs to be writable:
```bash
docker exec bookstack chown -R abc:users /config/www/framework/cache/
docker compose restart bookstack
```
---
## SSO for Portainer
**In Authentik:** Create OAuth2 Provider with:
- Redirect URI: `https://portainer.yourdomain.com`
**In Portainer:** Settings → Authentication → OAuth:
- Provider: Custom
- Client ID and Secret from Authentik
- Authorization URL: `https://auth.yourdomain.com/application/o/authorize/`
- Token URL: `https://auth.yourdomain.com/application/o/token/`
- Userinfo URL: `https://auth.yourdomain.com/application/o/userinfo/`
- Redirect URL: `https://portainer.yourdomain.com`
- Scopes: `openid email profile`
**Security note:** In Authentik, add a Policy Binding to the Portainer application
to restrict access to your admin group only. This prevents anyone with an Authentik
account from accessing the Docker management panel.
---
## Restricting Access by Group (Security)
For sensitive services like Portainer, you want only administrators to access them:
1. In Authentik, go to **Directory → Groups → Create**
2. Name: `homelab-admin`
3. Add yourself to this group
4. Go to **Applications → Applications → [Portainer] → Policy Bindings**
5. Add a binding: Group → `homelab-admin` → Allow
Now only members of `homelab-admin` can use the Portainer application through SSO.
---
## Checkpoint
Test SSO for each service:
- [ ] Grafana — "Sign in with Authentik" works
- [ ] Forgejo — OAuth2 login works
- [ ] Karakeep — SSO login works
- [ ] Kavita — "Sign in with Authentik" works
- [ ] Open WebUI — SSO login works
- [ ] BookStack — OIDC login works
- [ ] Portainer — OAuth login works
If any fail, check the error message and ask your AI: "I'm getting this error when
signing in to [service] with Authentik: [paste the error]. What does it mean and
how do I fix it?"
---
**Next:** [Step 7 — Cloud Failover (kscloud1)](07-cloud-failover.md)

View file

@ -0,0 +1,202 @@
# Step 7 — Cloud Failover (kscloud1)
**Track:** With AI (Beginner)
**Time for this step:** 46 hours
Right now, if your home computer goes off, your entire website goes offline. This step
fixes that. You will turn your cloud VPS (kscloud1) into a full mirror of your homelab,
so that when your home computer is off, kscloud1 keeps everything running.
---
## What You Are Building
```
Home (monk) ←—— always developing ——→ pushes to ——→ Cloud (kscloud1)
always live
never goes down
Cloudflare routes traffic to whichever host responds.
If monk is off, kscloud1 handles everything by itself.
```
---
## Step 7A — Set Up Tailscale on Both Machines
Tailscale creates a private, encrypted connection between your home computer and your VPS.
You need this so both machines can share a database securely.
**On your home computer:**
```bash
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
```
Follow the link it gives you to authenticate in your browser.
**On your VPS (via SSH):**
```bash
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
```
Authenticate again.
After both are connected, check their Tailscale IPs:
```bash
tailscale ip -4
```
Write down both IPs — they look like `100.x.x.x`. You will use these in the next steps.
**Ask your AI:** "I have Tailscale installed on two machines. How do I verify they can
reach each other using their Tailscale IPs?"
---
## Step 7B — Move the Shared Databases to kscloud1
For SSO to work properly across both machines, both Authentik instances must share
one database. If they have separate databases, logins will fail roughly half the time.
This means:
- Move (or start fresh) Postgres and Redis on kscloud1
- Configure both monk and kscloud1's Authentik to point to kscloud1's database over Tailscale
**On kscloud1**, create the database containers. Use the same passwords you used on monk:
```bash
mkdir -p /opt/kitestacks/docker/authentik
cd /opt/kitestacks/docker/authentik
```
Create `docker-compose.yml` with Postgres and Redis bound to the Tailscale IP:
```yaml
services:
authentik-postgres:
image: postgres:16-alpine
container_name: authentik-postgres
restart: unless-stopped
environment:
POSTGRES_PASSWORD: your-db-password
POSTGRES_USER: authentik
POSTGRES_DB: authentik
ports:
- "100.123.x.x:5432:5432" # bind to Tailscale IP only
volumes:
- ./postgres:/var/lib/postgresql/data
networks:
- kitestacks
authentik-redis:
image: redis:alpine
container_name: authentik-redis
restart: unless-stopped
ports:
- "100.123.x.x:6379:6379" # bind to Tailscale IP only
networks:
- kitestacks
networks:
kitestacks:
external: true
```
Replace `100.123.x.x` with kscloud1's actual Tailscale IP.
```bash
docker compose up -d
```
**On monk**, update Authentik's environment to point to kscloud1's database:
```
AUTHENTIK_POSTGRESQL__HOST=100.123.x.x # kscloud1's Tailscale IP
AUTHENTIK_REDIS__HOST=100.123.x.x
```
Restart Authentik on monk:
```bash
cd ~/kitestacks-live/docker/authentik
docker compose down
docker compose up -d
```
**Ask your AI:** "I need to migrate my Authentik database from one host to another.
How do I dump the data from my current Postgres and restore it on the new host?"
---
## Step 7C — Deploy All Services on kscloud1
Now deploy the same services on kscloud1. SSH into your VPS and create the same
folder structure and docker-compose files that you have on monk.
```bash
mkdir -p /opt/kitestacks/docker
```
For each service (forgejo, homepage, karakeep, kavita, grafana, etc.):
1. Create the folder: `mkdir -p /opt/kitestacks/docker/<service>`
2. Copy your docker-compose.yml from monk (with any path changes for `/opt/kitestacks/`)
3. Copy your .env files
4. Run `docker compose up -d`
The fastest way is to have your AI help you:
> "I have all my services running on my home computer at ~/kitestacks-live/docker/.
> I want to replicate them on my VPS at /opt/kitestacks/docker/. Can you help me
> go through each service and identify what needs to change for the VPS environment?"
**Important differences on kscloud1:**
- Authentik already points to the shared Postgres/Redis (same as monk now)
- Forgejo should also use the shared Postgres (add a `forgejo` database to it)
- Paths use `/opt/kitestacks/` instead of `~/kitestacks-live/`
---
## Step 7D — Verify Failover Works
With both machines running and both cloudflared connectors active, test that failover works:
1. In your Cloudflare Tunnel dashboard, you should see **2 connectors**
2. Visit your website from your phone (not connected to home WiFi)
3. Everything should work
4. Now stop monk's cloudflared: `cd ~/kitestacks-live/docker/cloudflared && docker compose stop`
5. Visit your website again from your phone
6. Everything should still work (kscloud1 is serving it)
7. Restart monk's cloudflared: `docker compose start cloudflared`
If step 6 works, your cloud failover is complete.
---
## Step 7E — Set Up Uptime Kuma on kscloud1
Your Conky desktop widget reads Uptime Kuma from kscloud1 (not monk). Set it up there:
Deploy uptime-kuma on kscloud1 the same way you did on monk. Then push your monitors
from monk to kscloud1 by copying the database.
**Ask your AI:** "How do I copy a SQLite database from one Docker container to another
on a different machine, safely and without data corruption?"
The trick is using Python's `sqlite3.backup()` method — it creates a consistent copy
even while the database is in use.
---
## Checkpoint
- [ ] Tailscale is installed on both machines and they can reach each other
- [ ] Shared Postgres and Redis are running on kscloud1's Tailscale IP
- [ ] Both Authentik instances (monk and kscloud1) point to the shared database
- [ ] All 11 services are running on kscloud1
- [ ] Cloudflare Tunnel shows 2 connectors
- [ ] Website works when monk's cloudflared is stopped
---
**Next:** [Step 8 — Monitoring](08-monitoring.md)

View file

@ -0,0 +1,229 @@
# Step 8 — Monitoring
**Track:** With AI (Beginner)
**Time for this step:** 23 hours
Monitoring means knowing when something is wrong before your users tell you.
In this step you will set up three layers of monitoring:
1. **Grafana** — beautiful dashboards showing CPU, RAM, disk, and network over time
2. **Uptime Kuma** — checks every 60 seconds that each service responds correctly
3. **Conky** — a desktop widget on your home computer showing live kscloud1 status
---
## Monitoring Layer 1 — Grafana + Prometheus
You already deployed Grafana and Prometheus in Step 5. Now configure them properly.
### Edit the Prometheus Config
Prometheus needs to know where to collect metrics from. Tell it about both machines:
```bash
nano ~/kitestacks-live/docker/prometheus/prometheus.yml
```
Add this content:
```yaml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'monk-node'
static_configs:
- targets: ['node-exporter:9100']
labels:
instance: 'monk'
- job_name: 'kscloud1-node'
static_configs:
- targets: ['YOUR_VPS_IP:9100']
labels:
instance: 'kscloud1'
```
Replace `YOUR_VPS_IP` with your VPS's public IP address.
**On kscloud1**, make sure node-exporter is configured to be reachable publicly:
```yaml
# In node-exporter's docker-compose.yml on kscloud1
ports:
- "0.0.0.0:9100:9100"
```
Restart Prometheus:
```bash
cd ~/kitestacks-live/docker/prometheus
docker compose restart prometheus
```
### Configure Grafana Provisioning
Tell Grafana to automatically load Prometheus as a data source and load the
Node Exporter Full dashboard:
Create `~/kitestacks-live/docker/grafana/provisioning/datasources/prometheus.yml`:
```yaml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
uid: 000000001
url: http://prometheus:9090
isDefault: true
```
Create `~/kitestacks-live/docker/grafana/provisioning/dashboards/dashboards.yml`:
```yaml
apiVersion: 1
providers:
- name: default
folder: KiteStacks
type: file
options:
path: /etc/grafana/provisioning/dashboards
```
The Node Exporter Full dashboard (id 1860) can be imported from Grafana's dashboard library:
1. Log in to grafana.yourdomain.com
2. Left menu → Dashboards → Import
3. Enter ID: `1860`
4. Select your Prometheus datasource
5. Import
You should now see CPU, RAM, disk, and network graphs for both monk and kscloud1.
Switch between them using the "instance" dropdown at the top of the dashboard.
---
## Monitoring Layer 2 — Uptime Kuma
You set up Uptime Kuma in Step 5. Now add monitors for all your services.
Log in to `status.yourdomain.com` and add an HTTP monitor for each service:
| Monitor Name | URL | Check Interval |
|-------------|-----|----------------|
| Main Website | https://www.yourdomain.com | 60s |
| Authentik | https://auth.yourdomain.com | 60s |
| Forgejo | https://gitforge.yourdomain.com | 60s |
| KiteAI | https://ai.yourdomain.com | 60s |
| Karakeep | https://links.yourdomain.com | 60s |
| Kavita | https://kavita.yourdomain.com | 60s |
| Grafana | https://grafana.yourdomain.com | 60s |
| BookStack | https://wiki.yourdomain.com | 60s |
| OSTicket | https://tasks.yourdomain.com | 60s |
| Portainer | https://portainer.yourdomain.com | 60s |
| kscloud1 | (ping to kscloud1 IP) | 60s |
| Monk | (ping to monk's Tailscale IP) | 60s |
Then create a Status Page:
1. Status Pages → New Status Page
2. Title: "KiteStacks Status"
3. Slug: `homelab`
4. Add all monitors to it
**Push Uptime Kuma to kscloud1:**
The Conky widget on your desktop reads kscloud1's Uptime Kuma, not monk's. Push monk's
database to kscloud1 after setting up monitors:
**Ask your AI:** "How do I copy a Docker named volume's SQLite database from one machine
to another using Python's sqlite3.backup() method?"
---
## Monitoring Layer 3 — Conky Desktop Widget
Conky is a program that draws information on your desktop background in real time.
Your KiteStacks widget shows whether each service on kscloud1 is up (green dot) or
down (red dot), refreshed every 15 seconds.
### Install Conky
```bash
sudo apt install conky-all
```
### Install the Widget Script
The widget script reads Uptime Kuma's API and formats the output for Conky.
The script is at `~/.local/bin/kitestacks-uptime-widget.sh` in the homelab repo.
Copy it to your machine:
```bash
mkdir -p ~/.local/bin
cp ~/kitestacks-homelab/apps/conky/kitestacks-uptime-widget.sh ~/.local/bin/
chmod +x ~/.local/bin/kitestacks-uptime-widget.sh
```
Edit the script to use your kscloud1's Tailscale IP:
```bash
nano ~/.local/bin/kitestacks-uptime-widget.sh
```
Change the `KUMA_URL` line:
```bash
KUMA_URL="http://100.123.x.x:3001" # kscloud1's Tailscale IP
```
### Enable the Conky Config
```bash
cp ~/kitestacks-homelab/apps/conky/kitestacks-uptime.conf ~/.config/conky/kitestacks-uptime.conf
conky -c ~/.config/conky/kitestacks-uptime.conf -d
```
The widget should appear in the top-right corner of your desktop, showing a dot for
each service — green for up, red for down.
**Ask your AI:** "How do I make Conky start automatically when I log in to my Ubuntu desktop?"
---
## Setting Up Alerts
Uptime Kuma can send you a notification on your phone when a service goes down.
**Option 1: ntfy (recommended — self-hosted)**
You have ntfy running as a container. Set up an ntfy notification in Uptime Kuma:
- Notification Type: ntfy
- URL: your ntfy server URL
- Topic: choose a topic name (e.g., `homelab-alerts`)
Install the ntfy app on your phone and subscribe to your topic.
**Option 2: Email**
Configure email notifications in Uptime Kuma using your email address.
**Ask your AI:** "How do I configure Uptime Kuma to send notifications via ntfy?"
---
## Checkpoint
- [ ] Prometheus is collecting metrics from both monk and kscloud1
- [ ] Grafana shows Node Exporter Full dashboard with both hosts
- [ ] Uptime Kuma has monitors for all 11 services
- [ ] Uptime Kuma status page is live at status.yourdomain.com/status/homelab
- [ ] Uptime Kuma database has been pushed to kscloud1
- [ ] Conky widget is showing on your desktop with live service status
- [ ] You receive a notification when you manually pause a service in Uptime Kuma
---
## Congratulations — Your Homelab Is Complete
You have built a production homelab with:
- 11 self-hosted services running in Docker
- Single sign-on via Authentik
- Cloud failover on a Hetzner VPS
- Private networking over Tailscale
- Real-time monitoring via Grafana and Uptime Kuma
- A live desktop status widget
Everything you built here maps directly to enterprise cloud engineering skills.
Every concept has a certification that covers it in depth.
**Your next step:** [certifications/roadmap.md](../../certifications/roadmap.md)

View file

@ -0,0 +1,321 @@
# Without AI — Part 1: Linux Foundations
**Track:** Advanced (No AI)
**Time for this section:** 12 weeks of evenings and weekends
Before you touch Docker or any service, you need a solid foundation in Linux.
Every command you run in this homelab is a Linux command. If you skip this,
you will be copying without understanding — which means you cannot debug when
things go wrong.
---
## Total Build Time Estimate (Without AI)
Before you start, here is an honest breakdown of how long this entire homelab
takes to build from scratch — assuming you are learning as you go, working
23 hours on evenings and weekends:
| Phase | What You Are Learning / Building | Estimated Time |
|-------|----------------------------------|---------------|
| 1 — Linux Foundations | Shell, filesystem, permissions, SSH | 12 weeks |
| 2 — Bash Scripting | Variables, loops, conditionals, scripts | 12 weeks |
| 3 — Python Basics | Data structures, sqlite3, HTTP requests | 12 weeks |
| 4 — Docker Deep Dive | Images, volumes, networks, compose | 12 weeks |
| 5 — Networking | DNS, ports, TLS, Tailscale, firewalls | 12 weeks |
| 6 — Full Build | Deploying all 11 services + cloud failover | 48 weeks |
| 7 — Troubleshooting | Debugging, production issues, fixes | Ongoing |
| Documentation | Writing what you built and why | 1 week |
**Total: approximately 36 months** working part-time (evenings + weekends).
**Full-time (8 hours/day):** 610 weeks.
The wide ranges reflect the honest reality: some people hit a DNS issue that takes
3 hours to debug. Some services take a day to configure SSO for. Budget extra time.
The troubleshooting you will do along the way is not wasted time — it is where most
of the real learning happens.
---
## What Is Linux?
Linux is an operating system — like Windows or macOS — but open source, free, and
used to run most of the internet. Your home server, your cloud VPS, and almost every
web server in existence runs Linux.
**Why Linux and not Windows Server?**
- Free — no licensing cost
- More control — no hidden processes you can't see or stop
- Docker runs natively on Linux (on Windows, Docker runs inside a hidden Linux VM)
- The entire cloud engineering industry is Linux-first
You will use **Ubuntu 24.04 LTS** — the most widely used Linux distribution for servers.
---
## The Terminal
The terminal (also called the shell or command line) is where you work. There is no
graphical interface for most server tasks. You type a command, press Enter, read the
output, and type the next command.
Open a terminal on Ubuntu: `Ctrl + Alt + T`
You will see a prompt like:
```
kenpat@monk:~$
```
Breaking that down:
- `kenpat` — your username
- `monk` — the machine name (hostname)
- `~` — your current directory (`~` means your home directory, `/home/kenpat`)
- `$` — indicates you are a regular user (not root/admin)
---
## The Filesystem
Linux organizes everything in a tree of directories (folders) starting at `/` (root).
```
/
├── home/ ← user home directories
│ └── kenpat/ ← your home directory (~)
├── etc/ ← system configuration files
├── var/ ← variable data (logs, databases)
├── usr/ ← installed programs
├── tmp/ ← temporary files (cleared on reboot)
├── opt/ ← optional software (we use this for kscloud1)
└── proc/ ← virtual filesystem — represents running processes
```
**Key commands:**
```bash
pwd # Print Working Directory — where am I right now?
ls # List files in current directory
ls -la # List all files, including hidden ones, with permissions
cd /home/kenpat # Change Directory — move to a specific path
cd ~ # Go to your home directory
cd .. # Go up one level
mkdir mydir # Make a new directory
mkdir -p a/b/c # Make directories including parents (-p = parents)
rm file.txt # Remove a file
rm -rf mydir/ # Remove a directory and everything inside it (-r = recursive, -f = force)
cp file.txt backup.txt # Copy a file
mv file.txt newname.txt# Move or rename a file
cat file.txt # Print the contents of a file
less file.txt # View a file page by page (q to quit)
nano file.txt # Open a file in the nano text editor
```
**Practice:** Run these commands. Navigate around the filesystem. Understand what you see.
```bash
pwd # Where are you?
ls / # What is in the root directory?
ls /home # What home directories exist?
ls -la ~ # What files are in YOUR home directory? (hidden files too)
cd /var/log # Go to the log directory
ls # What log files exist?
cat /etc/hostname # What is this machine's hostname?
cd ~ # Go back home
```
---
## File Permissions
Every file in Linux has permissions that control who can read it, write to it, or
execute it. This is crucial — misconfigured permissions are a common source of bugs.
```
-rw-r--r-- 1 kenpat kenpat 1234 Jun 19 10:00 myfile.txt
```
Breaking it down:
- `-` — file type (`d` = directory, `-` = regular file, `l` = symlink)
- `rw-` — owner permissions: read, write, no execute
- `r--` — group permissions: read only
- `r--` — everyone else: read only
- `kenpat kenpat` — owner and group
```bash
chmod 644 myfile.txt # rw-r--r-- (owner read/write, others read)
chmod 755 myscript.sh # rwxr-xr-x (owner full, others read+execute)
chmod +x myscript.sh # Add execute permission for everyone
chown kenpat:kenpat file.txt # Change owner to kenpat, group to kenpat
chown -R 1000:1000 /mydir/ # Change owner recursively for entire directory
```
**Why this matters in Docker:** Docker containers run as specific user IDs.
If a container expects to own a file (e.g., UID 1000) but the file is owned by
root, the container cannot write to it. Many Docker setup issues come down to
file permission mistakes.
---
## Users and sudo
Linux separates regular users from the administrator (called `root`).
Root can do anything — delete system files, stop critical services, change any setting.
Regular users cannot.
`sudo` lets a trusted user run a single command as root:
```bash
sudo apt update # Run apt update as root
sudo systemctl restart docker # Restart Docker as root
sudo nano /etc/hosts # Edit a system file as root
```
**Non-interactive sudo** (used in scripts when there is no terminal to type a password):
```bash
echo mypassword | sudo -S apt update
# -S reads password from stdin (standard input)
```
**Become root entirely** (use carefully):
```bash
sudo -i # Opens a root shell. Prompt changes from $ to #
exit # Return to regular user
```
---
## SSH — Connecting to Remote Machines
SSH (Secure Shell) lets you control a remote machine over an encrypted connection.
```bash
ssh kenpat@192.168.1.100 # Connect to a local machine
ssh root@5.78.x.x # Connect to your VPS as root
ssh -i ~/.ssh/mykey kenpat@host # Connect using a specific private key
ssh -L 5099:localhost:5000 kenpat@host # Local port forward
```
### SSH Keys (Better Than Passwords)
Instead of typing a password every time, you generate a key pair:
- **Private key** (`~/.ssh/id_ed25519`) — stays on your machine, never shared
- **Public key** (`~/.ssh/id_ed25519.pub`) — put this on the server
```bash
# Generate a new key pair
ssh-keygen -t ed25519 -C "monk-to-kscloud1" -f ~/.ssh/id_ed25519_kscloud1
# Copy your public key to the server
ssh-copy-id -i ~/.ssh/id_ed25519_kscloud1.pub kenpat@your-vps-ip
# Connect using the key
ssh -i ~/.ssh/id_ed25519_kscloud1 kenpat@your-vps-ip
```
### SSH Local Port Forwarding
Sometimes a service is running on a remote machine but not exposed publicly.
You can forward a local port to a remote port through the SSH connection:
```bash
ssh -L 5099:localhost:5000 kenpat@kscloud1-tailscale-ip
```
This means: "On MY machine, port 5099 forwards to kscloud1's localhost:5000."
Now visiting `http://localhost:5099` in your browser reaches kscloud1's port 5000.
Used in this homelab to access kscloud1's Kavita directly (bypassing Cloudflare)
when configuring OIDC settings.
---
## Package Management (apt)
Ubuntu uses `apt` to install, update, and remove software:
```bash
sudo apt update # Refresh the list of available packages
sudo apt upgrade -y # Install all available updates
sudo apt install -y curl git # Install specific packages
sudo apt remove package # Remove a package
sudo apt search keyword # Search for a package by name
dpkg -l | grep docker # List installed packages matching "docker"
```
---
## Processes and Services
```bash
ps aux # List all running processes
ps aux | grep docker # Find processes matching "docker"
top # Live process monitor (q to quit)
htop # Better live monitor (install with: sudo apt install htop)
kill 1234 # Send kill signal to process ID 1234
kill -9 1234 # Force kill (cannot be ignored)
pkill conky # Kill all processes named "conky"
systemctl status docker # Check if Docker service is running
systemctl start docker # Start it
systemctl stop docker # Stop it
systemctl restart docker # Restart it
systemctl enable docker # Make it start automatically on boot
systemctl disable docker # Prevent it from starting on boot
```
---
## Reading Logs
When something breaks, you read the logs to find out why:
```bash
journalctl -u docker # System logs for the Docker service
journalctl -f # Follow all system logs live
cat /var/log/syslog # System log file
tail -f /var/log/syslog # Follow (live tail) the system log
dmesg | tail -20 # Kernel messages, last 20 lines
```
---
## Essential Tools
```bash
curl -s https://example.com # Make an HTTP GET request
curl -s https://example.com | head # Pipe output through head (first 10 lines)
wget https://example.com/file.zip # Download a file
grep "error" /var/log/syslog # Search a file for a pattern
grep -r "TUNNEL_TOKEN" ~/kitestacks-live/ # Search recursively in a directory
find ~ -name "*.env" 2>/dev/null # Find all .env files in home dir
find /opt -name "docker-compose.yml" # Find all compose files
wc -l file.txt # Count lines in a file
cut -d= -f2 file.env # Cut: split by = and take field 2
tr -d '\n' # Remove newlines from input
| # Pipe: send output of one command to another
> # Redirect: write output to a file (overwrites)
>> # Redirect: append output to a file
2>/dev/null # Redirect error output to /dev/null (discard errors)
```
---
## Practice Exercises
Do these before moving on:
1. Navigate to `/var/log` and read the last 20 lines of `syslog`
2. Create a directory structure: `~/practice/a/b/c/`
3. Create a file in `c/` with your name in it using `echo "your name" > ~/practice/a/b/c/name.txt`
4. Read it with `cat`
5. Check its permissions with `ls -la`
6. Change its permissions to read-only: `chmod 444 ~/practice/a/b/c/name.txt`
7. Try to edit it — what happens?
8. Find all `.conf` files in `/etc/` that contain the word "ubuntu"
9. Generate an SSH key pair with `ssh-keygen`
10. SSH into your VPS
---
**Next:** [Part 2 — Bash Scripting](02-bash-scripting.md)

View file

@ -0,0 +1,333 @@
# Without AI — Part 2: Bash Scripting
**Track:** Advanced (No AI)
**Time for this section:** 12 weeks
Bash is the language of the Linux shell. Almost every automation script in this
homelab is a Bash script. You do not need to master it — you need to be able to
read it, write simple scripts, and understand what a script does before you run it.
---
## What Is a Script?
A script is a text file containing a sequence of shell commands. Instead of typing
commands one by one, you put them in a file and run the file.
```bash
#!/usr/bin/env bash
# This is a comment
echo "Hello from my script"
```
The first line (`#!/usr/bin/env bash`) is called the **shebang**. It tells Linux
which interpreter to use to run this file. Without it, Linux may use the wrong shell.
To run a script:
```bash
chmod +x myscript.sh # Make it executable
./myscript.sh # Run it
```
Or without making it executable:
```bash
bash myscript.sh
```
---
## Variables
Variables store values you want to reuse:
```bash
name="kenpat"
port=3000
greeting="Hello, $name"
echo $name # prints: kenpat
echo $port # prints: 3000
echo $greeting # prints: Hello, kenpat
echo "${name}s" # prints: kenpats (braces needed when appending)
```
**Special variables:**
```bash
$0 # The script's own filename
$1 $2 $3 # Command-line arguments (first, second, third)
$# # Number of arguments passed
$? # Exit code of the last command (0 = success, non-zero = error)
$$ # Current process ID (PID)
$HOME # Your home directory path
$USER # Your username
```
**Read-only environment variables:**
```bash
export MY_VAR="value" # Make available to child processes
printenv # List all environment variables
printenv MY_VAR # Print one variable
```
---
## Conditionals (if/else)
```bash
if [[ condition ]]; then
# commands if true
elif [[ other_condition ]]; then
# commands if second condition is true
else
# commands if nothing was true
fi
```
**Common conditions:**
```bash
[[ -f /path/to/file ]] # True if file exists and is a regular file
[[ -d /path/to/dir ]] # True if directory exists
[[ -s /path/to/file ]] # True if file exists and is non-empty
[[ -z "$var" ]] # True if variable is empty
[[ -n "$var" ]] # True if variable is NOT empty
[[ "$a" == "$b" ]] # True if strings are equal
[[ "$a" != "$b" ]] # True if strings are NOT equal
[[ $n -eq 5 ]] # True if number equals 5
[[ $n -gt 5 ]] # True if number is greater than 5
[[ $n -lt 5 ]] # True if number is less than 5
```
**Real example from the homelab:**
```bash
if [[ $# -ne 1 ]]; then
echo "Usage: $0 '<cloudflare_tunnel_token>'" >&2
exit 2
fi
```
This checks that exactly one argument was provided (`$# -ne 1` means "number of args
is not equal to 1"). If not, it prints usage instructions and exits with code 2 (error).
The `>&2` sends the message to stderr (error output) instead of stdout (normal output).
---
## Loops
**For loop — iterate over a list:**
```bash
for item in one two three; do
echo "Item: $item"
done
# Iterate over files
for file in *.yml; do
echo "Found compose file: $file"
done
# Iterate over a range of numbers
for i in {1..10}; do
echo "Number: $i"
done
```
**While loop — repeat while a condition is true:**
```bash
count=0
while [[ $count -lt 5 ]]; do
echo "Count: $count"
count=$(( count + 1 ))
done
# Wait until a container is healthy
while [[ "$(docker inspect --format '{{.State.Health.Status}}' authentik)" != "healthy" ]]; do
echo "Waiting for authentik..."
sleep 5
done
echo "Authentik is healthy"
```
---
## Functions
```bash
greet() {
local name="$1" # local = only exists inside this function
echo "Hello, $name"
}
greet "kenpat" # prints: Hello, kenpat
greet "world" # prints: Hello, world
```
**Why local variables matter:** Without `local`, variables are global and can
accidentally overwrite values from other parts of the script.
---
## Error Handling
```bash
set -euo pipefail
```
Put this near the top of every script you write. It sets three behaviors:
- `-e` — exit immediately if any command fails (returns non-zero exit code)
- `-u` — exit if you use an undefined variable
- `-o pipefail` — if any command in a pipeline fails, the whole pipeline fails
Without this, a script can silently continue after an error, potentially causing
damage downstream (like deleting data after a failed backup).
**Checking a command's result:**
```bash
if curl -s https://example.com > /dev/null; then
echo "Site is up"
else
echo "Site is down"
fi
```
**Exit codes:**
```bash
exit 0 # Success
exit 1 # Generic error
exit 2 # Misuse (bad arguments)
```
---
## String Manipulation
```bash
var="TUNNEL_TOKEN=abc123"
# Split by delimiter, take field 2
echo "$var" | cut -d= -f2 # prints: abc123
# But what if the value itself contains = signs?
echo "$var" | cut -d= -f2- # prints: abc123 (f2- = from field 2 to end)
# Remove trailing newline
echo "hello" | tr -d '\n'
# Convert to lowercase
echo "HELLO" | tr '[:upper:]' '[:lower:]'
# Replace text
echo "hello world" | sed 's/world/there/' # prints: hello there
echo "aabbcc" | sed 's/b/B/g' # prints: aaBBcc (g = all occurrences)
# Extract with grep
echo "addr: 192.168.1.1" | grep -oP '\d+\.\d+\.\d+\.\d+' # prints: 192.168.1.1
```
---
## Here Documents (heredoc)
A heredoc lets you write multi-line strings inline:
```bash
cat <<'EOF'
This is line one
This is line two
Variables like $HOME are NOT expanded (because of the quotes around EOF)
EOF
cat <<EOF
This is line one
HOME is: $HOME (expanded because no quotes)
EOF
```
Used in this homelab to write multi-line content to files:
```bash
cat > /tmp/fix.sql <<'EOF'
BEGIN;
UPDATE ServerSetting SET Value='{"enabled":true}' WHERE "Key"=40;
COMMIT;
EOF
```
---
## Real Scripts in This Homelab
### The Token Rotation Script
`~/kitestacks-homelab/scripts/rollout-cloudflared-token.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
if [[ $# -ne 1 ]]; then
echo "Usage: $0 '<cloudflare_tunnel_token>'" >&2
exit 2
fi
token="$1"
monk_dir="${MONK_CLOUDFLARED_DIR:-$HOME/kitestacks-live/docker/cloudflared}"
kscloud1_host="${KSCLOUD1_HOST:?set KSCLOUD1_HOST, for example user@host}"
kscloud1_key="${KSCLOUD1_KEY:-$HOME/.ssh/id_ed25519_kscloud1}"
kscloud1_dir="${KSCLOUD1_CLOUDFLARED_DIR:-/opt/kitestacks/docker/cloudflared}"
```
Walking through each line:
- `set -euo pipefail` — fail fast and safely
- `$# -ne 1` — check exactly one argument was given
- `${MONK_CLOUDFLARED_DIR:-$HOME/...}` — use environment variable if set, otherwise use default
- `${KSCLOUD1_HOST:?...}` — if `KSCLOUD1_HOST` is not set, exit with that error message
This is a real production script. Read it in full at that path.
---
## Writing Your Own Scripts
**Template for any script:**
```bash
#!/usr/bin/env bash
set -euo pipefail
# --- Configuration (change these) ---
MY_VAR="${MY_ENV_VAR:-default_value}"
TARGET_HOST="${1:?Usage: $0 <hostname>}"
# --- Functions ---
log() {
echo "[$(date '+%H:%M:%S')] $*"
}
die() {
echo "ERROR: $*" >&2
exit 1
}
# --- Main ---
log "Starting..."
if [[ ! -d "$TARGET_HOST" ]]; then
die "Directory does not exist: $TARGET_HOST"
fi
log "Done."
```
---
## Practice Exercises
1. Write a script that checks if Docker is running and prints "Docker is up" or "Docker is down"
2. Write a script that takes a service name as an argument and shows its logs:
`./show-logs.sh forgejo`
3. Write a script that loops through all directories in `~/kitestacks-live/docker/`
and prints the service name and whether it has a `.env` file
4. Write a script that checks if a URL returns 200 OK and prints "UP" or "DOWN":
`./check-url.sh https://gitforge.kitestacks.com`
5. Read and understand every line of `scripts/rollout-cloudflared-token.sh`
---
**Next:** [Part 3 — Python Basics](03-python-basics.md)

View file

@ -0,0 +1,347 @@
# Without AI — Part 3: Python Basics
**Track:** Advanced (No AI)
**Time for this section:** 12 weeks
Python is used in this homelab for:
1. **Database operations** — copying SQLite databases safely between machines
2. **HTTP requests** — hitting APIs to configure services
3. **The metrics API** — the Python FastAPI service that feeds live stats to the portal
4. **One-off automation** — scripts that are too complex for Bash
You do not need to be a Python developer. You need to read Python code, understand
what it does, modify it for your situation, and write simple scripts.
---
## Installing Python
Ubuntu 24.04 comes with Python 3 already installed:
```bash
python3 --version # Should show 3.12.x or similar
pip3 --version # Package manager for Python
```
Install the packages used in this homelab:
```bash
pip3 install requests fastapi uvicorn psutil
```
---
## Python Syntax Basics
Python uses indentation (spaces) to define blocks of code instead of `{}` like many
other languages. This is critical — wrong indentation causes errors.
```python
# This is a comment
name = "kenpat" # string
port = 3000 # integer
price = 4.99 # float
is_running = True # boolean
print(name) # prints: kenpat
print(f"Port is {port}") # f-string: prints: Port is 3000
print(f"{name!r}") # repr: prints: 'kenpat' (with quotes)
```
---
## Data Structures
```python
# List (ordered, mutable)
services = ["forgejo", "grafana", "authentik"]
services.append("portainer") # add to end
services[0] # "forgejo" (zero-indexed)
services[-1] # "portainer" (last item)
len(services) # 4
for service in services:
print(service)
# Dictionary (key-value pairs, like JSON)
monitor = {
"name": "Forgejo",
"url": "https://gitforge.kitestacks.com",
"id": 16,
"active": True
}
monitor["name"] # "Forgejo"
monitor.get("missing", "default") # "default" (safe get with fallback)
monitor.keys() # dict_keys(["name", "url", "id", "active"])
for key, value in monitor.items():
print(f"{key}: {value}")
# List of dicts (very common in API responses)
monitors = [
{"id": 16, "name": "Forgejo"},
{"id": 17, "name": "Grafana"},
]
for m in monitors:
print(m["id"], m["name"])
```
---
## Functions and Conditionals
```python
def check_service(name, url):
"""Check if a service URL is reachable."""
if not url.startswith("https://"):
return False
print(f"Checking {name} at {url}")
return True
result = check_service("Grafana", "https://grafana.kitestacks.com")
print(result) # True
```
**Conditionals:**
```python
status = 200
if status == 200:
print("OK")
elif status in (301, 302):
print("Redirect")
elif status >= 500:
print("Server error")
else:
print(f"Unexpected status: {status}")
```
---
## Working with JSON
Almost every API in this homelab sends and receives JSON (JavaScript Object Notation).
Python's `json` module converts between JSON strings and Python dicts/lists:
```python
import json
# JSON string to Python dict
data = json.loads('{"name": "Forgejo", "id": 16}')
print(data["name"]) # Forgejo
# Python dict to JSON string
obj = {"monitors": [1, 2, 3]}
json_str = json.dumps(obj, indent=2)
print(json_str)
# {
# "monitors": [1, 2, 3]
# }
# Read JSON from a file
with open("/tmp/kuma.meta.json") as f:
kuma_data = json.load(f)
# Parse Uptime Kuma heartbeat data
for monitor_id, heartbeats in kuma_data.get("heartbeatList", {}).items():
if heartbeats:
last = heartbeats[-1]
status = "UP" if last["status"] == 1 else "DOWN"
print(f"Monitor {monitor_id}: {status}")
```
---
## HTTP Requests with `requests`
The `requests` library makes HTTP calls easy:
```python
import requests
# GET request
response = requests.get("https://gitforge.kitestacks.com/api/v1/repos/search",
headers={"Authorization": "token your-api-token"},
timeout=5)
print(response.status_code) # 200
data = response.json() # Parse JSON response body
print(data["data"][0]["name"]) # First repo name
# POST request with JSON body
response = requests.post(
"https://auth.kitestacks.com/api/v3/core/tokens/",
headers={"Authorization": "Bearer your-admin-token"},
json={"identifier": "my-token", "user": "kenpat"},
timeout=5
)
if response.ok: # True for 2xx status codes
print("Token created:", response.json()["key"])
else:
print(f"Failed: {response.status_code} {response.text}")
```
---
## SQLite — The Key Database Skill in This Homelab
SQLite is a database that lives in a single file. Uptime Kuma, Kavita, and other services
use SQLite. You used Python's `sqlite3` module to copy databases safely between machines.
```python
import sqlite3
# Connect to a database file
conn = sqlite3.connect("/path/to/kuma.db")
# Run a query
cursor = conn.execute("SELECT id, name, url FROM monitor ORDER BY id")
rows = cursor.fetchall() # Get all results
for row in rows:
print(row[0], row[1], row[2])
# Insert data
conn.execute(
"INSERT INTO monitor (name, type, url, active) VALUES (?, ?, ?, ?)",
("BookStack", "http", "https://wiki.kitestacks.com", 1)
)
conn.commit() # Save changes (without commit, nothing is written)
# Use a transaction explicitly (safer for multiple changes)
conn.execute("BEGIN")
conn.execute("UPDATE monitor SET active=1 WHERE id=26")
conn.execute("UPDATE monitor SET active=1 WHERE id=27")
conn.execute("COMMIT")
conn.close()
```
### The `backup()` Method — Copying Databases Safely
SQLite databases in WAL mode (write-ahead log) cannot be copied with a plain file copy
while they are in use. The `Connection.backup()` method creates a consistent snapshot:
```python
import sqlite3
def safe_backup(source_path, dest_path):
"""Copy a SQLite database safely, even if it's in use."""
src = sqlite3.connect(source_path)
dst = sqlite3.connect(dest_path)
src.backup(dst) # Creates a consistent copy
dst.close()
src.close()
print(f"Backed up {source_path} to {dest_path}")
safe_backup("/src/kuma.db", "/out/kuma.db.backup")
```
**Why a plain `cp` would fail:** SQLite in WAL mode has two extra files:
`kuma.db-wal` (uncommitted changes) and `kuma.db-shm` (shared memory). If you copy
the main file without those, or in the wrong order, you get a corrupted database.
`Connection.backup()` handles all of this correctly.
---
## Writing a Simple FastAPI Service
The kitestacks-metrics-api is a Python FastAPI service. Understanding it helps you
modify or extend it:
```python
from fastapi import FastAPI
import psutil
app = FastAPI()
@app.get("/api/health")
def health():
return {"ok": True}
@app.get("/api/metrics")
def metrics():
return {
"cpu_percent": psutil.cpu_percent(interval=1),
"ram_percent": psutil.virtual_memory().percent,
"ram_used_gb": psutil.virtual_memory().used / 1e9,
"disk_percent": psutil.disk_usage("/").percent,
}
```
Run it:
```bash
uvicorn myapi:app --host 0.0.0.0 --port 8000
```
`psutil` reads these values from the host's `/proc` filesystem. When running inside
a Docker container with `pid: host`, it reads the HOST's stats.
---
## Environment Variables in Python
```python
import os
token = os.environ.get("FORGEJO_TOKEN") # None if not set
token = os.environ.get("FORGEJO_TOKEN", "") # Empty string if not set
token = os.environ["FORGEJO_TOKEN"] # KeyError if not set (explicit)
# Check and fail clearly
token = os.environ.get("FORGEJO_TOKEN")
if not token:
raise ValueError("FORGEJO_TOKEN environment variable is required")
```
---
## File Operations
```python
import os
# Read a file
with open("/tmp/kuma.json") as f:
content = f.read()
# Write a file
with open("/tmp/output.sql", "w") as f:
f.write("UPDATE ServerSetting SET Value='test' WHERE \"Key\"=40;\n")
# Check if a file exists
if os.path.exists("/data/kuma.db"):
print("Database found")
# Delete a file safely
for fname in ["/data/kuma.db-shm", "/data/kuma.db-wal"]:
if os.path.exists(fname):
os.remove(fname)
print(f"Removed {fname}")
# List files in a directory
for filename in os.listdir("/app/data"):
print(filename)
```
---
## Practice Exercises
1. Write a Python script that reads `monitors.json` from Uptime Kuma's API response
and prints each monitor's name and status
2. Write a script that connects to a SQLite database, lists all tables, and prints
the first 5 rows of the `monitor` table
3. Write a script that uses `requests` to check if all 11 KiteStacks URLs return
a status code between 200 and 399, and prints a summary
4. Read the kitestacks-metrics-api source code and understand what each endpoint does
5. Modify the `safe_backup()` function to also delete `-shm` and `-wal` files from
the destination before writing (prevents WAL conflicts after restore)
---
**Next:** [Part 4 — Docker Deep Dive](04-docker-deep-dive.md)

View file

@ -0,0 +1,303 @@
# Without AI — Part 4: Docker Deep Dive
**Track:** Advanced (No AI)
**Time for this section:** 12 weeks
Docker is the technology that runs every service in this homelab. Understanding it
deeply — not just copying compose files — is what separates someone who can maintain
and troubleshoot a homelab from someone who hopes nothing breaks.
---
## What Docker Actually Is
Most explanations say "containers are like lightweight VMs." That is wrong and leads
to confusion. Here is what a container actually is:
**A container is a Linux process with isolation applied.**
Two Linux kernel features provide that isolation:
**Namespaces** — the container gets its own view of:
- Filesystem (it sees `/` but it is a different tree than the host's `/`)
- Network interfaces (its own `eth0`, its own IP on the Docker network)
- Process list (it can only see its own processes, not the host's)
- User IDs (it can be "root" inside without being root on the host)
**cgroups (control groups)** — limits how much of the host's resources the container can use:
- CPU cores and usage limits
- RAM limits
- Disk I/O limits
- Network bandwidth limits
**Result:** No second kernel, no hardware emulation, no hypervisor. The nginx process
in your `homepage` container is a regular Linux process on your machine — it just
thinks it is alone.
---
## Images vs Containers
```
Image Container
───────────────────────── ─────────────────────────────────────────
A recipe A running instance made from the recipe
Read-only, immutable Has a writable layer on top of the image
Stored in layers One writable layer per container
Shared across containers Separate per container
Survives container deletion Deleted with the container (unless volume)
```
**Layers:** Docker images are built in layers. Each line in a `Dockerfile` creates a layer.
If you update one layer, only that layer is re-downloaded. This is why pulling an update
is fast — most layers are already local.
```bash
docker image ls # List local images
docker image inspect nginx:alpine # See image metadata and layers
docker image history nginx:alpine # See how the image was built, layer by layer
docker image pull postgres:16-alpine # Download an image explicitly
docker image rm nginx:alpine # Remove a local image
```
---
## Docker Networks — In Depth
Docker provides several networking modes:
**bridge (default):** Container gets its own virtual network interface with a private IP
(172.x.x.x range). Containers on the same bridge network can reach each other by IP
or by name (via Docker's built-in DNS). Containers on different bridge networks are isolated.
**host:** Container shares the host's network namespace entirely. `--network host` means
no isolation — the container sees all host network interfaces and binds directly to
host ports. Used for kitestacks-metrics-api so psutil can see real network stats.
**none:** No networking at all. Rarely used.
```bash
# Create a named bridge network
docker network create kitestacks
# See all networks
docker network ls
# Inspect a network — see which containers are connected and their IPs
docker network inspect kitestacks
# Connect a running container to a network
docker network connect kitestacks my-container
# Disconnect
docker network disconnect kitestacks my-container
```
**The DNS trick:** When two containers are on the same bridge network, Docker runs a
DNS server at `127.0.0.11` inside each container. Container names resolve to their
internal IPs. This is why `cloudflared` can connect to `http://grafana:3000`
Docker DNS resolves `grafana` to the grafana container's IP.
```bash
# Verify DNS works from inside a container
docker exec cloudflared nslookup grafana
docker exec cloudflared curl -s http://grafana:3000/api/health
```
---
## Volumes — Persisting Data
Containers are ephemeral. When you delete a container, its writable layer is gone.
To keep data, you use volumes.
**Bind mount:** You choose the path on the host.
```yaml
volumes:
- ./data:/forgejo-data # host path : container path
- /home/kenpat/books:/books:ro # :ro = read-only
```
Data is at `./data` on the host. You can navigate there with `cd`. You can back it up.
**Named volume:** Docker manages the path.
```yaml
volumes:
- uptime-kuma:/app/data
volumes:
uptime-kuma: # define the named volume
```
Data is at `/var/lib/docker/volumes/uptime-kuma/_data/` on the host (Docker manages this).
```bash
docker volume ls # List named volumes
docker volume inspect uptime-kuma # See where it is stored
docker volume rm uptime-kuma # Delete a volume (and its data!)
```
**Access a named volume from a one-off container:**
```bash
docker run --rm -v uptime-kuma:/data alpine ls /data
```
This is the pattern used throughout this homelab to read or modify volumes without
stopping the running service (for reads) or after stopping it (for writes).
---
## Docker Compose — The Full Picture
Docker Compose reads a YAML file and manages the lifecycle of multiple containers.
```yaml
services:
forgejo:
image: codeberg.org/forgejo/forgejo:latest
container_name: forgejo # Fixed name (not random)
restart: unless-stopped # Restart on crash or host reboot
env_file: .env # Load environment variables from file
environment:
FORGEJO__server__DOMAIN: gitforge.kitestacks.com # Override one env var
volumes:
- ./data:/data # Bind mount: ./data on host → /data in container
ports:
- "127.0.0.1:2222:22" # Bind host 127.0.0.1:2222 to container port 22 (SSH)
networks:
- kitestacks
depends_on:
- authentik-postgres # Start this service before forgejo
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
networks:
kitestacks:
external: true # Use existing network (don't create a new one)
```
**Key fields explained:**
`restart: unless-stopped`
- `no` — never restart
- `always` — always restart, even on manual stop
- `on-failure` — restart only if exit code is non-zero
- `unless-stopped` — restart on crash or reboot, but not if you manually stopped it
`env_file: .env`
Reads `KEY=VALUE` pairs from a file. The `.env` file is in `.gitignore` so secrets
never get committed to git. Always use this for passwords, tokens, and secrets.
`depends_on`
Starts services in dependency order. Does NOT wait for a service to be "ready" —
just waits for the container to START. If you need to wait for a database to be ready,
add a health check and use `condition: service_healthy`.
**Common commands:**
```bash
docker compose up -d # Start all services in background
docker compose down # Stop and remove containers (not volumes)
docker compose down -v # Stop, remove containers AND volumes (data loss!)
docker compose restart forgejo # Restart one service
docker compose pull # Pull latest images
docker compose logs -f forgejo # Follow logs for one service
docker compose ps # Show service status
docker compose exec forgejo bash # Open shell in running service
docker compose config # Validate and show merged config
```
---
## Port Mappings — When to Use Them
```yaml
ports:
- "3005:3000" # host_port:container_port
- "127.0.0.1:3005:3000" # bind to localhost only (not accessible from outside host)
- "0.0.0.0:9100:9100" # bind on all interfaces (accessible from outside)
```
**In this homelab, most services do NOT expose host ports** — they only communicate
through the Docker network. Cloudflare Tunnel connects directly to the container via
the Docker bridge network, so no host ports are needed for public services.
The only services that need host ports:
- `node-exporter` on kscloud1 (so Prometheus on monk can scrape it via public IP)
- `kitestacks-metrics-api` does NOT use ports — it uses `network_mode: host`
- `portainer` uses 9443 (HTTPS)
---
## Inspecting and Debugging
```bash
# See everything about a container
docker inspect forgejo
# See just its IP address on each network
docker inspect forgejo --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
# See its environment variables (careful — this shows secrets!)
docker inspect forgejo --format '{{range .Config.Env}}{{println .}}{{end}}'
# See its mounts
docker inspect forgejo --format '{{json .Mounts}}' | python3 -m json.tool
# See resource usage
docker stats # Live, all containers
docker stats forgejo --no-stream # One snapshot for one container
# See what the container's filesystem looks like
docker exec forgejo ls /
docker exec forgejo cat /etc/forgejo/app.ini
docker exec forgejo find /data -name "*.db" 2>/dev/null
```
---
## Common Gotchas
**Containers share the host's kernel:** If you run an Alpine-based image but your
host kernel is too old, some syscalls may not work. Rare but real.
**Named volumes are invisible by default:** New developers spend hours wondering where
data went after deleting a container. Named volumes survive `docker compose down`.
They do NOT survive `docker compose down -v`.
**Order vs readiness:** `depends_on` does not mean "wait until ready." A Postgres
container starts in milliseconds, but PostgreSQL inside it takes 35 seconds to accept
connections. Use healthchecks for real readiness checking.
**Port conflicts:** Two containers cannot bind the same host port. If you get
`Bind for 0.0.0.0:3000 failed: port is already allocated`, something else is already
using that host port.
**network_mode: host and named networks cannot coexist:**
```yaml
network_mode: host # This means the container has NO network isolation
# You cannot also add networks: [...] — they conflict
```
---
## Practice Exercises
1. Pull the `nginx:alpine` image and run it: `docker run -d -p 8080:80 nginx:alpine`
Visit `http://localhost:8080`. Then exec into it and find the nginx config.
2. Run two containers (`alpine`) on the same custom network and verify they can
ping each other by container name
3. Create a named volume and mount it in two different containers. Write a file from
one container and read it from the other
4. Write a `docker-compose.yml` with three services: one nginx, one redis, one alpine
that waits for redis to be healthy before starting
5. Use `docker inspect` to find the IP address of your `forgejo` container on the
`kitestacks` network. Confirm it matches what Docker DNS resolves.
---
**Next:** [Part 5 — Networking](05-networking.md)

View file

@ -0,0 +1,352 @@
# Without AI — Part 5: Networking
**Track:** Advanced (No AI)
**Time for this section:** 12 weeks
Networking is the hardest part to learn and the most important. Every problem in this
homelab ultimately involves a packet trying to get somewhere. If you understand how
packets travel, you can debug anything.
---
## IP Addresses
Every device on a network has an IP address — a number that identifies it.
**IPv4:** Four octets (0255) separated by dots: `192.168.1.205`
**Classes of addresses:**
| Range | Who Owns It | Used For |
|-------|------------|---------|
| `10.0.0.0/8` | Private | Corporate networks, VPNs |
| `172.16.0.0/12` | Private | Docker bridge networks |
| `192.168.0.0/16` | Private | Home networks (your router) |
| `100.64.0.0/10` | Shared | Tailscale uses this range |
| Everything else | Public | Routable on the internet |
Private addresses are not routable on the internet. Your home router uses NAT
(Network Address Translation) to let private-addressed devices reach the internet.
---
## Subnetting and CIDR Notation
CIDR (Classless Inter-Domain Routing) notation describes a range of IP addresses:
```
192.168.1.0/24
└── prefix length: how many bits are fixed
```
An IPv4 address is 32 bits. A `/24` means the first 24 bits are fixed (the network),
leaving 8 bits for hosts. `2^8 = 256` addresses, minus network (`.0`) and broadcast (`.255`)
= 254 usable host addresses.
| CIDR | Addresses | Usable | Example |
|------|-----------|--------|---------|
| `/32` | 1 | 1 | A single IP |
| `/31` | 2 | 2 | Point-to-point link |
| `/30` | 4 | 2 | Small link |
| `/29` | 8 | 6 | Small subnet |
| `/28` | 16 | 14 | |
| `/27` | 32 | 30 | |
| `/26` | 64 | 62 | |
| `/25` | 128 | 126 | |
| `/24` | 256 | 254 | Typical home/office LAN |
| `/16` | 65,536 | 65,534 | Large network |
| `/12` | 1,048,576 | — | Docker range: 172.16.0.0/12 |
| `/8` | 16,777,216 | — | 10.x.x.x range |
**Subnetting practice:** Calculating the host range of `172.17.0.0/16`:
- Fixed: `172.17` (first 16 bits)
- Variable: last 16 bits
- Host range: `172.17.0.1` to `172.17.255.254`
- This covers all of `172.17.x.x`
**Why `/12` covers all Docker networks:**
`172.16.0.0/12` covers `172.16.0.0` through `172.31.255.255`.
Docker creates bridge networks in the `172.17.x.x`, `172.18.x.x`, etc. ranges.
All of those are inside `172.16.0.0/12` — so one ufw rule covers all Docker bridges.
---
## Ports
A port is a 16-bit number (065535) that identifies which service on a host should
handle a connection.
```
IP address = the building
Port = the apartment number
```
**Well-known ports (01023):**
| Port | Protocol | Service |
|------|----------|---------|
| 22 | TCP | SSH |
| 25 | TCP | SMTP (email sending) |
| 53 | UDP/TCP | DNS |
| 80 | TCP | HTTP |
| 443 | TCP | HTTPS |
| 5432 | TCP | PostgreSQL |
| 6379 | TCP | Redis |
**Ephemeral ports (4915265535):** OS assigns these randomly for outbound connections.
**In Docker:**
```yaml
ports:
- "9100:9100" # host:container — both the same number
```
Container port 9100 is mapped to host port 9100.
External systems connect to the host IP on port 9100.
Internally, containers on the Docker network use the container port directly.
---
## DNS (Domain Name System)
DNS is a distributed database that maps names to IP addresses.
**The hierarchy:**
```
. (root)
└── com
└── kitestacks
├── www → Cloudflare anycast IP
├── auth → Cloudflare anycast IP
└── grafana → Cloudflare anycast IP
```
**Resolution process for `grafana.kitestacks.com`:**
1. Browser checks local cache — not found
2. Browser asks OS resolver (usually `127.0.0.53`)
3. OS asks the configured DNS server (your home router, or 8.8.8.8)
4. Resolver asks root nameservers: "who handles `.com`?"
5. Root says: "Ask Verisign's servers"
6. Resolver asks Verisign: "who handles `kitestacks.com`?"
7. Verisign says: "Ask Cloudflare's nameservers (`vera.ns.cloudflare.com`)"
8. Resolver asks Cloudflare: "what is `grafana.kitestacks.com`?"
9. Cloudflare returns: "Cloudflare's anycast IP: 104.x.x.x"
10. Browser connects to 104.x.x.x on port 443
**Internal Docker DNS:**
Inside the `kitestacks` Docker network, Docker runs a DNS server at `127.0.0.11`.
When cloudflared resolves `grafana`, Docker DNS returns the container's bridge IP.
```bash
# Check what an external name resolves to
dig grafana.kitestacks.com
# Check DNS from inside a container
docker exec cloudflared nslookup grafana
docker exec cloudflared cat /etc/resolv.conf # Shows the DNS server: 127.0.0.11
```
---
## HTTP and HTTPS
**HTTP:** Plain text request/response protocol. Anyone who can see the traffic can read it.
```
GET /api/health HTTP/1.1
Host: grafana.kitestacks.com
Accept: application/json
HTTP/1.1 200 OK
Content-Type: application/json
{"ok": true}
```
**HTTPS:** HTTP inside a TLS-encrypted tunnel. The connection is encrypted from client to
Cloudflare's edge. Between Cloudflare and your containers (inside Docker network), it is
plain HTTP — this is fine because that traffic never leaves the host.
**TLS handshake (simplified):**
1. Client says "hello, I support these cipher suites"
2. Server sends its certificate (proves it is `kitestacks.com`)
3. Client verifies certificate against trusted Certificate Authorities
4. Both sides agree on encryption keys (Diffie-Hellman key exchange)
5. Encrypted connection established
6. HTTP requests flow inside this encrypted tunnel
In this homelab, Cloudflare handles TLS entirely. Your containers never see TLS.
---
## Cloudflare Tunnel — Technical Details
**What cloudflared actually does:**
```bash
# Watch cloudflared connect
docker logs cloudflared -f
# You see: "Connection established" connIndex=0 location=ORD
# ORD = Chicago data center (or nearest Cloudflare POP to you)
```
cloudflared establishes persistent multiplexed HTTP/2 connections to Cloudflare's
edge network. When a request comes in:
```
Internet user → Cloudflare edge → tunnel (HTTP/2 multiplexed) → cloudflared
cloudflared reads Ingress rules from Cloudflare API:
grafana.kitestacks.com → http://grafana:3000
cloudflared → Docker DNS → grafana container IP → sends request
```
The tunnel connection uses QUIC (UDP-based) when possible, falls back to HTTPS/TCP.
**Active-Active with two connectors:**
Each connector registers separately. Cloudflare maintains a list of active connectors.
Incoming requests are distributed across connectors by Cloudflare — no configuration
needed on your end. If one connector drops, the others take all traffic within seconds.
---
## Tailscale — WireGuard Under the Hood
Tailscale is a managed WireGuard VPN. Understanding WireGuard explains Tailscale.
**WireGuard:**
- Modern VPN protocol, designed in 2016
- Uses UDP (faster than TCP-based VPNs like OpenVPN)
- Cryptography: Curve25519 key exchange, ChaCha20-Poly1305 encryption
- Each peer has a public/private key pair (like SSH keys)
- Configured via static peer lists with IP allowances
**The NAT problem:** Home machines are behind NAT. Their public IP is the router's IP,
not their own. Two NAT-ed machines cannot easily make direct connections.
**Tailscale's solution — UDP hole punching:**
1. Both machines connect to Tailscale's coordination server (DERP)
2. Tailscale orchestrates a "hole punch": both machines send packets to each other
simultaneously, which opens NAT mappings on both routers
3. Direct WireGuard connection established peer-to-peer
4. Tailscale coordination servers are no longer involved in the data path
```bash
# Check Tailscale status
tailscale status
# See your device's Tailscale IP
tailscale ip -4
# Check connectivity to kscloud1
tailscale ping 100.123.x.x
# See if connection is direct or via relay
tailscale status --json | python3 -m json.tool | grep -A5 "kscloud1"
```
**Why Tailscale IPs are stable:** Each device's `100.x.x.x` IP is tied to its machine
identity in Tailscale's database. It does not change when you move networks or reconnect.
---
## Firewalls (ufw)
ufw (Uncomplicated Firewall) is a frontend for iptables/nftables.
**kscloud1's firewall configuration:**
```bash
# View current rules
sudo ufw status verbose
# Default policies
sudo ufw default deny incoming # Block all inbound by default
sudo ufw default allow outgoing # Allow all outbound
# Allow specific services
sudo ufw allow ssh # Allow SSH (port 22)
sudo ufw allow from 172.16.0.0/12 to any port 8000 proto tcp # Docker → metrics API
# Why 172.16.0.0/12 and not just the specific Docker subnet?
# Docker creates a new bridge network with a random 172.x subnet for each network.
# /12 covers ALL possible Docker subnets so this rule always works.
```
**The ufw/Docker conflict:** Docker modifies iptables rules directly, bypassing ufw.
This means Docker's port mappings (`-p 9100:9100`) are accessible regardless of ufw rules.
Only services running in `network_mode: host` are controlled by ufw.
kscloud1's metrics API uses `network_mode: host`, so it needs an explicit ufw allow rule
for Docker containers to reach it.
---
## Reverse Proxies
A reverse proxy receives requests on behalf of backend services:
```
Client → Reverse Proxy → Backend A
→ Backend B
→ Backend C
```
In this homelab:
- **Cloudflare + cloudflared** — the primary reverse proxy routing by hostname
- **nginx (homepage container)** — secondary proxy forwarding `/api/*` to metrics API
nginx config that proxies API calls:
```nginx
location /api/ {
proxy_pass http://host.docker.internal:8000/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
```
`host.docker.internal` resolves to the host machine's IP from inside a Docker container.
This lets the nginx container reach the metrics API running in `network_mode: host`.
---
## Diagnosing Network Problems
**"I can't reach the service from outside"**
```bash
# Is cloudflared running and connected?
docker logs cloudflared | tail -20
# Is the target container running and on the right network?
docker inspect homepage --format '{{range .NetworkSettings.Networks}}{{println .}}{{end}}'
# Can cloudflared reach the container?
docker exec cloudflared curl -s http://homepage:3000
```
**"Two containers can't talk to each other"**
```bash
# Are they on the same network?
docker network inspect kitestacks | grep -A5 "Containers"
# DNS resolution working?
docker exec service-a nslookup service-b
# Is the target port open inside the container?
docker exec service-b ss -tlnp
```
**"The database won't accept connections"**
```bash
# Is Postgres listening?
docker exec authentik-postgres ss -tlnp | grep 5432
# From another container, can we reach it?
docker exec authentik nc -zv authentik-postgres 5432
# Is it bound to the right interface on kscloud1?
docker exec authentik-postgres ss -tlnp | grep 5432
# Should show: *:5432 or 100.123.x.x:5432, not 127.0.0.1:5432
```
---
**Next:** [Part 6 — Full Build](06-full-build.md)

View file

@ -0,0 +1,478 @@
# Without AI — Part 6: Full Build
**Track:** Advanced (No AI)
**Time for this section:** 48 weeks
You now have the foundations: Linux, Bash, Python, Docker, and Networking.
This section builds the entire KiteStacks homelab from scratch — command by command,
with every command explained.
---
## Before You Start
You need:
- Ubuntu 24.04 installed on your home PC (monk) and your VPS (kscloud1)
- A domain name with DNS managed by Cloudflare
- SSH key access to kscloud1
- Tailscale account and CLI installed on both machines
- Cloudflare account with a tunnel created (token saved)
---
## Phase 1 — Prepare Both Machines
Run on **both monk and kscloud1**:
```bash
# Update the system
sudo apt update && sudo apt upgrade -y
# Install essential tools
sudo apt install -y curl git nano wget python3 python3-pip ufw
# Install Docker
sudo apt install -y ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
-o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] \
https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Enable and start Docker
sudo systemctl enable docker
sudo systemctl start docker
# Add your user to the docker group (avoids sudo for every docker command)
sudo usermod -aG docker $USER
# Log out and back in for this to take effect
# Create the shared Docker network
docker network create kitestacks
```
On **kscloud1** specifically, set up the firewall:
```bash
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
# Allow Docker bridge networks to reach host port 8000 (metrics API)
sudo ufw allow from 172.16.0.0/12 to any port 8000 proto tcp
sudo ufw --force enable
sudo ufw status verbose
```
Install Tailscale on both machines:
```bash
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
# Follow the URL to authenticate
tailscale ip -4 # Note this IP — you will use it throughout the build
```
---
## Phase 2 — Cloudflared (Tunnel Connector)
Run on **monk**:
```bash
mkdir -p ~/kitestacks-live/docker/cloudflared
cd ~/kitestacks-live/docker/cloudflared
cat > .env <<'EOF'
TUNNEL_TOKEN=your-tunnel-token-from-cloudflare
EOF
cat > docker-compose.yml <<'EOF'
services:
cloudflared:
image: cloudflare/cloudflared:latest
container_name: cloudflared
restart: unless-stopped
command: tunnel --no-autoupdate run
environment:
- TUNNEL_TOKEN=${TUNNEL_TOKEN:?set TUNNEL_TOKEN in .env}
networks:
- default
- kitestacks
networks:
kitestacks:
external: true
EOF
docker compose up -d
docker logs cloudflared # Confirm "Connection established"
```
**Why `${TUNNEL_TOKEN:?set TUNNEL_TOKEN in .env}`:**
The `:?` syntax means: if the variable is unset or empty, exit with the given error message.
This prevents silently running cloudflared with no token (which would produce a confusing error).
Repeat on **kscloud1** using the same token, same docker-compose.yml, at `/opt/kitestacks/docker/cloudflared/`.
---
## Phase 3 — Shared Database Layer (on kscloud1)
The shared Postgres and Redis will run on kscloud1. Both monk's and kscloud1's Authentik
will point to these. Forgejo will use the same Postgres (different database).
On **kscloud1**:
```bash
# Get kscloud1's Tailscale IP
TAILSCALE_IP=$(tailscale ip -4)
echo "Tailscale IP: $TAILSCALE_IP"
mkdir -p /opt/kitestacks/docker/authentik
cd /opt/kitestacks/docker/authentik
# Generate a strong Postgres password
PG_PASS=$(openssl rand -base64 32 | tr -d '/+=')
echo "Postgres password: $PG_PASS" # Save this
cat > .env <<EOF
PG_PASS=${PG_PASS}
EOF
cat > docker-compose.yml <<EOF
services:
authentik-postgres:
image: postgres:16-alpine
container_name: authentik-postgres
restart: unless-stopped
environment:
POSTGRES_PASSWORD: \${PG_PASS}
POSTGRES_USER: authentik
POSTGRES_DB: authentik
ports:
- "${TAILSCALE_IP}:5432:5432"
volumes:
- ./postgres:/var/lib/postgresql/data
networks:
- kitestacks
- authentik_default
authentik-redis:
image: redis:7-alpine
container_name: authentik-redis
restart: unless-stopped
ports:
- "${TAILSCALE_IP}:6379:6379"
networks:
- kitestacks
- authentik_default
networks:
kitestacks:
external: true
authentik_default:
name: authentik_default
EOF
docker compose up -d
docker ps # Confirm both containers are Up
# Verify Postgres is listening on Tailscale IP only (NOT 0.0.0.0)
docker exec authentik-postgres ss -tlnp | grep 5432
# Expected: LISTEN 0.0.0.0:5432 or 100.x.x.x:5432
```
**Why the Tailscale IP binding matters:**
`"${TAILSCALE_IP}:5432:5432"` tells Docker: bind host port 5432 only on the Tailscale
interface. If you used `"5432:5432"` (or `"0.0.0.0:5432:5432"`), Postgres would be
reachable from the public internet — a serious security risk. Only devices on your
Tailscale network can reach `100.x.x.x:5432`.
Create the Forgejo database:
```bash
docker exec -e PGPASSWORD="${PG_PASS}" authentik-postgres \
psql -U authentik -c "CREATE USER forgejo WITH PASSWORD 'forgejo-password-here';"
docker exec -e PGPASSWORD="${PG_PASS}" authentik-postgres \
psql -U authentik -c "CREATE DATABASE forgejo OWNER forgejo;"
```
---
## Phase 4 — Authentik (SSO)
On **monk** first:
```bash
mkdir -p ~/kitestacks-live/docker/authentik
cd ~/kitestacks-live/docker/authentik
# Get kscloud1's Tailscale IP
KSCLOUD1_TAILSCALE=100.123.x.x # Replace with your actual value
# Generate Authentik secret key (must be same on both hosts)
SECRET_KEY=$(openssl rand -base64 60 | tr -d '\n')
echo "Secret key: $SECRET_KEY" # Save this — both hosts need the SAME key
cat > .env <<EOF
PG_PASS=your-postgres-password-from-phase-3
AUTHENTIK_SECRET_KEY=${SECRET_KEY}
AUTHENTIK_POSTGRESQL__HOST=${KSCLOUD1_TAILSCALE}
AUTHENTIK_POSTGRESQL__USER=authentik
AUTHENTIK_POSTGRESQL__NAME=authentik
AUTHENTIK_POSTGRESQL__PASSWORD=your-postgres-password-from-phase-3
AUTHENTIK_REDIS__HOST=${KSCLOUD1_TAILSCALE}
AUTHENTIK_BOOTSTRAP_EMAIL=your@email.com
AUTHENTIK_BOOTSTRAP_PASSWORD=choose-strong-password
EOF
cat > docker-compose.yml <<'EOF'
services:
authentik:
image: ghcr.io/goauthentik/server:latest
container_name: authentik
restart: unless-stopped
command: server
env_file: .env
networks:
- kitestacks
authentik-worker:
image: ghcr.io/goauthentik/server:latest
container_name: authentik-worker
restart: unless-stopped
command: worker
env_file: .env
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
- kitestacks
networks:
kitestacks:
external: true
EOF
docker compose up -d
# Wait for Authentik to be healthy (takes ~2 minutes on first boot)
until [[ "$(docker inspect --format '{{.State.Health.Status}}' authentik)" == "healthy" ]]; do
echo "Waiting for Authentik... $(docker inspect --format '{{.State.Health.Status}}' authentik)"
sleep 10
done
echo "Authentik is healthy"
```
**What happens on first boot:** Authentik runs database migrations (creates all tables),
generates cryptographic keys, and starts the server. The worker process handles
background jobs (email, background flows). Both need the same `.env` file.
**Why `AUTHENTIK_REDIS__HOST` and not just `REDIS_HOST`:**
Authentik uses a config format where `__` in environment variable names means "nested key".
`AUTHENTIK_POSTGRESQL__HOST` maps to `authentik.postgresql.host` in the config tree.
On **kscloud1**, create the same Authentik setup pointing to the local Postgres:
```bash
# On kscloud1, AUTHENTIK_POSTGRESQL__HOST should be authentik-postgres
# (via the Docker network), not the Tailscale IP
# kscloud1's Authentik is on the same Docker network as Postgres
```
---
## Phase 5 — Forgejo
On **monk**:
```bash
mkdir -p ~/kitestacks-live/docker/forgejo
cd ~/kitestacks-live/docker/forgejo
KSCLOUD1_TAILSCALE=100.123.x.x # kscloud1's Tailscale IP
cat > .env <<EOF
FORGEJO__database__DB_TYPE=postgres
FORGEJO__database__HOST=${KSCLOUD1_TAILSCALE}:5432
FORGEJO__database__NAME=forgejo
FORGEJO__database__USER=forgejo
FORGEJO__database__PASSWD=forgejo-password-from-phase-3
FORGEJO__server__DOMAIN=gitforge.yourdomain.com
FORGEJO__server__ROOT_URL=https://gitforge.yourdomain.com
FORGEJO__server__SSH_DOMAIN=gitforge.yourdomain.com
EOF
cat > docker-compose.yml <<'EOF'
services:
forgejo:
image: codeberg.org/forgejo/forgejo:latest
container_name: forgejo
restart: unless-stopped
env_file: .env
volumes:
- ./data:/data
networks:
- kitestacks
networks:
kitestacks:
external: true
EOF
docker compose up -d
docker logs forgejo -f # Watch for errors
```
Visit `gitforge.yourdomain.com`. Complete the initial setup, then create your admin account.
On **kscloud1**: Same configuration. Both Forgejo instances point to the same Postgres `forgejo` database — so repos, users, and settings are identical on both.
---
## Phase 6 — All Remaining Services
For each remaining service, the pattern is the same:
1. `mkdir -p ~/kitestacks-live/docker/<service>`
2. Create `.env` with secrets
3. Create `docker-compose.yml`
4. `docker compose up -d`
5. Verify with `docker ps` and `docker logs <container>`
Detailed compose files for each service are in `~/kitestacks-homelab/apps/<service>/`.
Use those as your reference — read each file before running it.
Key services and their main configuration points:
**Karakeep:** Provider ID is `custom` (not `authentik`) — OAuth redirect URI is
`https://links.yourdomain.com/api/auth/callback/custom`.
**Kavita:** OIDC must be configured via web UI (Settings → OIDC), not by file editing.
Authority URL requires trailing slash.
**BookStack:** After first start, fix cache permissions:
```bash
docker exec bookstack chown -R abc:users /config/www/framework/cache/
docker compose restart bookstack
```
**kitestacks-metrics-api:**
```yaml
services:
kitestacks-metrics-api:
image: your-metrics-api-image # Build from apps/kitestacks-metrics-api/
container_name: kitestacks-metrics-api
restart: unless-stopped
network_mode: host # Must be host — not kitestacks network
pid: host # Must be host — reads /proc for real stats
environment:
- FORGEJO_API_BASE=https://gitforge.yourdomain.com
- FORGEJO_TOKEN=your-forgejo-api-token
```
Note: `network_mode: host` and `networks:` cannot coexist. The metrics API is reachable
at `host.docker.internal:8000` from other containers.
---
## Phase 7 — SSO Configuration
For each service, in Authentik admin panel (`auth.yourdomain.com/if/admin/`):
1. **Applications → Providers → Create → OAuth2/OpenID Provider**
- Client type: Confidential
- Redirect URIs: service-specific (see SSO guide)
- Signing key: authentik Self-signed Certificate
- Scopes: openid, email, profile
2. **Applications → Applications → Create**
- Provider: the one you just created
- Launch URL: the service's public URL
3. (For sensitive services) **Policy Binding** → restrict to `homelab-admin` group
OAuth2 code TTL — increase to prevent `invalid_grant` during monk reconnect:
```bash
# Connect to shared Postgres from kscloud1
docker exec -it authentik-postgres psql -U authentik -d authentik
-- Increase code lifetime for all providers to 10 minutes
UPDATE authentik_providers_oauth2_oauth2provider
SET access_code_validity = '00:10:00';
-- Restart both Authentik instances after this
\q
```
---
## Phase 8 — Push Everything to kscloud1
With monk as the source, push configurations to kscloud1:
```bash
# For each service, copy the docker-compose.yml and .env (with paths adjusted)
# The standard pattern:
for service in forgejo karakeep kavita grafana uptime-kuma bookstack osticket portainer; do
ssh -i ~/.ssh/id_ed25519_kscloud1 kenpat@100.123.x.x \
"mkdir -p /opt/kitestacks/docker/$service"
scp -i ~/.ssh/id_ed25519_kscloud1 \
~/kitestacks-live/docker/$service/docker-compose.yml \
~/kitestacks-live/docker/$service/.env \
kenpat@100.123.x.x:/opt/kitestacks/docker/$service/
done
```
Then on kscloud1, start each service:
```bash
for service in forgejo karakeep kavita grafana uptime-kuma bookstack osticket portainer; do
cd /opt/kitestacks/docker/$service
docker compose up -d
done
```
Verify all 11 services return the expected status:
```bash
for url in www auth gitforge ai links kavita grafana status wiki tasks portainer; do
code=$(curl -s -o /dev/null -w "%{http_code}" "https://${url}.yourdomain.com" --max-time 5)
echo "${url}.yourdomain.com: ${code}"
done
```
All should return 200 or 302 (redirect to login).
---
## Committing Everything to Forgejo
Once your homelab is working, commit all configurations:
```bash
cd ~/kitestacks-live
git init
git remote add origin https://gitforge.yourdomain.com/kenpat/kitestacks-live.git
# Add a .gitignore BEFORE adding files — never commit secrets
cat > .gitignore <<'EOF'
**/.env
**/data/
**/postgres/
**/config/
**/*.db
**/*.db-shm
**/*.db-wal
EOF
git add docker-compose.yml docker/*/docker-compose.yml
git commit -m "initial: all service compose files"
git push origin main
```
Your `.env` files (which contain passwords and tokens) must NEVER be committed.
The `.gitignore` above prevents this.
---
**Next:** [Part 7 — Troubleshooting](07-troubleshooting.md)

View file

@ -0,0 +1,389 @@
# Without AI — Part 7: Troubleshooting
**Track:** Advanced (No AI)
**Time for this section:** Ongoing (this is a reference you return to)
Troubleshooting is not a step you complete — it is a skill you build over time.
This section teaches the methodology and documents the real issues encountered
building KiteStacks, with full explanations of how each was diagnosed and fixed.
---
## The Troubleshooting Mindset
Before running any command, form a hypothesis. Before Googling, read the error.
**The diagnostic loop:**
1. **Observe** — what exactly is failing? URL? Error message? Which service?
2. **Hypothesize** — what could cause this? List 23 possibilities
3. **Test** — run the simplest command to prove or disprove your hypothesis
4. **Narrow** — eliminate possibilities until one remains
5. **Fix** — apply the fix
6. **Verify** — confirm the fix worked
7. **Document** — write what broke and what fixed it
The most common mistake: jumping to step 5 without completing steps 24.
---
## Diagnostic Commands to Know Cold
```bash
# Container status
docker ps # All running containers
docker ps -a # All containers (including stopped)
docker inspect <container> # Full container config and state
# Logs
docker logs <container> # All logs
docker logs <container> --tail 50 # Last 50 lines
docker logs <container> -f # Follow live
docker logs <container> --since 5m # Last 5 minutes
# Network
docker exec <container> curl -s http://other-container:port/health
docker exec <container> nslookup other-container
docker exec <container> ss -tlnp
docker network inspect kitestacks
# Disk and resources
docker system df # Docker disk usage
docker stats --no-stream # One-shot resource usage
df -h # Host disk usage
free -h # Host RAM
# DNS and HTTP from host
curl -sv https://grafana.kitestacks.com # -v = verbose (shows headers, TLS)
dig grafana.kitestacks.com # DNS lookup
```
---
## Real Issues Encountered Building KiteStacks
### Issue 1 — SSO: `invalid_grant` on OAuth Login (50% of the time)
**Symptom:** Clicking "Sign in with Authentik" in Grafana, Kavita, etc. sometimes
worked and sometimes showed `invalid_grant: The provided authorization grant is invalid`.
Happened roughly 50% of the time. No correlation to time of day.
**Observation:** The error appeared specifically after the authorization code redirect,
during the token exchange step.
**Hypothesis:**
1. Authentik configuration wrong (but then it would fail 100% of the time)
2. Network issue (but HTTP 400 means request reached Authentik)
3. The code created in step 1 is not found in step 2
**Testing:**
```bash
# Check if both Authentik instances have the same database
docker exec authentik psql -U authentik -h $KSCLOUD1_IP -c "SELECT count(*) FROM authentik_providers_oauth2_authorizationcode;"
# Monk's Authentik: count = 3
# kscloud1's Authentik: count = 1
# Different! Step 1 created the code in one DB, step 2 looked in the other.
```
**Root cause:** Two Authentik instances, two separate Postgres databases. Cloudflare
routes `/authorize` and `/application/o/token/` independently — they can hit different hosts.
**Fix:** Migrate both Authentik instances to a single shared Postgres, hosted on kscloud1,
bound to the Tailscale IP only.
```bash
# 1. Dump monk's Authentik DB
docker exec authentik-postgres pg_dump -U authentik authentik --clean --if-exists \
> /tmp/authentik_dump.sql
# 2. Restore to kscloud1's new shared Postgres
scp /tmp/authentik_dump.sql kenpat@100.123.x.x:/tmp/
ssh kenpat@100.123.x.x "docker exec -i authentik-postgres psql -U authentik -d authentik \
< /tmp/authentik_dump.sql"
# 3. Update monk's Authentik .env to point to kscloud1's Tailscale IP
AUTHENTIK_POSTGRESQL__HOST=100.123.x.x
AUTHENTIK_REDIS__HOST=100.123.x.x
# 4. Remove monk's local Postgres and Redis
docker stop authentik-postgres authentik-redis # Stop, don't delete (keep data as backup)
# 5. Restart monk's Authentik
docker compose up -d
```
**Verification:** Logged in from a browser with DevTools open, watching Network tab.
`/authorize` returned 302 with a code. `/token` returned 200 with a JWT. Done.
**Lesson:** Stateful services with active-active routing need shared state. Any session,
token, or code stored in one instance's database is invisible to the other instance.
---
### Issue 2 — Phantom Third Connector in Cloudflare Dashboard
**Symptom:** Cloudflare Tunnel showed 3 active connectors when only 2 were expected
(monk + kscloud1). Which was the third?
**Investigation:**
```bash
# Check running Docker containers for cloudflared
docker ps | grep cloudflared
# Shows: one cloudflared container — expected
# Check for non-Docker cloudflared processes
ps aux | grep cloudflared
# Shows: TWO processes!
# /usr/bin/cloudflared (system-installed, running as a systemd service)
# /usr/local/bin/cloudflared (Docker container)
```
**Root cause:** A cloudflared systemd service was installed separately from the Docker
container. Both connected to the same tunnel with the same token, registering as separate connectors.
```bash
# Verify the systemd service
sudo systemctl status cloudflared
# Fix: disable the systemd service
sudo systemctl stop cloudflared
sudo systemctl disable cloudflared
# Verify only one connector process remains
ps aux | grep cloudflared
```
**Verification:** Cloudflare dashboard refreshed to show 2 connectors within 30 seconds.
**Lesson:** A service installed via package manager AND in Docker is a recipe for duplicate
processes. Check both `docker ps` and `ps aux` when troubleshooting unexpected behavior.
---
### Issue 3 — Karakeep SSO "Redirect URI Error"
**Symptom:** After configuring Authentik OAuth2 for Karakeep, clicking "Sign in"
showed "Redirect URI Error: The provided redirect_uri does not match any of the
allowed redirect URIs" from Authentik.
**Investigation:**
```bash
# Check what redirect URI was used in the OAuth2 request
# Read from Authentik's logs
docker logs authentik --tail 100 | grep "redirect_uri"
# Shows: redirect_uri=https://links.kitestacks.com/api/auth/callback/authentik
```
**Root cause:** Karakeep uses NextAuth.js internally with provider ID `custom`.
NextAuth constructs callback URLs as `/api/auth/callback/<provider-id>`.
The provider ID is `custom`, not `authentik`.
So the callback is `/api/auth/callback/custom`, not `/api/auth/callback/authentik`.
**Fix:**
```bash
# Update Authentik's OAuth2 provider for Karakeep in the shared Postgres
docker exec -it authentik-postgres psql -U authentik -d authentik
BEGIN;
UPDATE authentik_providers_oauth2_oauth2provider
SET _redirect_uris = '["https://links.kitestacks.com/api/auth/callback/custom"]'
WHERE name = 'Karakeep';
COMMIT;
-- Verify
SELECT name, _redirect_uris FROM authentik_providers_oauth2_oauth2provider WHERE name = 'Karakeep';
\q
```
Restart Authentik on both hosts:
```bash
docker compose restart authentik authentik-worker
# Wait for healthy before testing
```
**Lesson:** When you get a redirect URI mismatch, always check what URI the APP is
actually sending — not what you think it should send. The app's logs or browser DevTools
Network tab show the actual request.
---
### Issue 4 — Kavita OIDC Config Gets Wiped on Restart
**Symptom:** Configured Kavita's OIDC settings by editing `kavita.db` directly
(using sqlite3). Settings looked correct in the DB. After `docker compose restart kavita`,
the OIDC config was reset to empty/disabled.
**Investigation:**
```bash
# Check the ServerSetting row before and after restart
docker exec -it kavita sqlite3 /kavita/config/kavita.db \
"SELECT Value, RowVersion FROM ServerSetting WHERE \"Key\"=40;"
# Before restart: {"enabled":true,"authority":"...","clientId":"kavita",...}, RowVersion=8
# After restart: {"enabled":false,"authority":"","clientId":"","clientSecret":"",...}, RowVersion=10
# RowVersion incremented by 2 — Kavita wrote to the row twice during startup
```
**Root cause:** Kavita validates and resets `ServerSetting` rows during startup from
its own defaults. Any value that does not pass Kavita's internal validation (including
OIDC config with the wrong format) gets reset to defaults. Direct SQL writes do not
go through Kavita's validation pipeline, so they get overwritten.
**Fix:** Use Kavita's own Settings UI via SSH port forwarding to bypass Cloudflare
and reach kscloud1's Kavita directly:
```bash
# Forward kscloud1's Kavita port to localhost
ssh -L 5099:localhost:5000 -i ~/.ssh/id_ed25519_kscloud1 kenpat@100.123.x.x -N &
# Now visit http://localhost:5099 in browser
# Log in with your Kavita credentials
# Settings → OIDC → configure there
# Click Save → changes survive restart
```
**Verification:** After saving in the UI, checked `RowVersion` was not incrementing on restart.
**Lesson:** Do not write directly to application databases unless you know the app does not
reinitialize those values on startup. Use the application's own APIs or UI.
**Critical detail:** The Authority URL MUST have a trailing slash:
`https://auth.kitestacks.com/application/o/kavita/`
Without it: "issuer does not match" error, because Authentik's `openid-configuration`
returns an `issuer` field that includes the trailing slash, and Kavita compares them exactly.
---
### Issue 5 — SSO Login Fails After monk Reconnects
**Symptom:** When monk went offline and came back, SSO logins failed for 510 minutes
with `invalid_grant`, then started working again.
**Investigation:**
Timeline reconstruction:
- T+0: monk goes offline (power or network)
- T+0: kscloud1 handles all traffic solo — SSO works fine, codes stored in shared DB
- T+5min: monk comes back online, cloudflared reconnects
- T+5min to T+8min: monk's Authentik is still starting (container startup takes ~34 min)
- During this window: Cloudflare routes some `/authorize` to kscloud1, some `/token` to monk
- Monk's Authentik hasn't finished starting — it responds with errors or invalid state
**Root cause:** The OAuth2 authorization code has a 1-minute TTL (default). Monk's Authentik
takes 35 minutes to fully start. During startup, Cloudflare is already routing traffic to
monk's cloudflared (which is running), but monk's Authentik is not ready.
Codes created on kscloud1 expire before monk's Authentik is healthy enough to serve them.
**Fix:** Increase the OAuth2 code TTL from 1 minute to 10 minutes:
```bash
docker exec -it authentik-postgres psql -U authentik -d authentik
UPDATE authentik_providers_oauth2_oauth2provider
SET access_code_validity = '00:10:00';
\q
```
Restart both Authentik instances. Now codes have a 10-minute window — enough for monk
to finish starting before the code expires.
**Alternative/additional fix:** Add a health check to monk's cloudflared or Authentik
that keeps cloudflared from accepting traffic until Authentik is healthy.
---
### Issue 6 — kscloud1 SSH Key Auth Broken After Long Absence
**Symptom:** After not connecting to kscloud1 for several weeks, `ssh kenpat@kscloud1`
returned "Permission denied (publickey)".
**Investigation:**
```bash
ssh -v -i ~/.ssh/id_ed25519_kscloud1 kenpat@100.123.x.x
# Verbose output showed: offered key was not accepted
# No other errors — key was being offered but rejected
```
**Root cause:** The `authorized_keys` file on kscloud1 had somehow been reset or corrupted
(possibly from a VPS maintenance event or snapshot restore).
**Fix:** Use Hetzner's console (web-based terminal that does not require SSH):
1. Hetzner dashboard → Server → Console
2. Log in as root (reset root password via Hetzner UI if needed)
3. Restore the public key:
```bash
# On kscloud1 via Hetzner console
mkdir -p /home/kenpat/.ssh
cat >> /home/kenpat/.ssh/authorized_keys << 'EOF'
ssh-ed25519 AAAA... your-public-key-here
EOF
chmod 700 /home/kenpat/.ssh
chmod 600 /home/kenpat/.ssh/authorized_keys
chown -R kenpat:kenpat /home/kenpat/.ssh
```
**Lesson:** Always keep your public key backed up. Cloud providers (Hetzner, AWS, DigitalOcean)
all have web-based console access for exactly this situation. Never rely only on SSH for
access to a remote server.
---
### Issue 7 — ufw Blocking Docker Container to Host Port
**Symptom:** The portal homepage on kscloud1 showed "0%" and "Offline" for the System Status
widget. On monk it showed real values.
**Investigation:**
```bash
# Test the metrics API directly from inside the homepage container on kscloud1
docker exec homepage-backup curl -s http://host.docker.internal:8000/api/metrics
# No response after timeout
# Test from host directly
curl -s http://localhost:8000/api/metrics
# Returns real metrics immediately
# Check ufw rules
sudo ufw status verbose
# default deny incoming — no specific rule for port 8000
```
**Root cause:** The `kitestacks-metrics-api` container runs with `network_mode: host`.
When `homepage-backup` calls `host.docker.internal:8000`, the kernel sees the source IP
as the Docker bridge network (`172.x.x.x`). ufw's `default deny incoming` blocks it.
Docker's iptables bypass (that allows published ports to work despite ufw) does not apply
here because this is host-to-host traffic, not container-published port traffic.
**Fix:**
```bash
sudo ufw allow from 172.16.0.0/12 to any port 8000 proto tcp
sudo ufw status verbose # Verify rule added
```
`172.16.0.0/12` covers all Docker bridge subnets (172.16.x.x through 172.31.x.x).
**Verification:**
```bash
docker exec homepage-backup curl -s http://host.docker.internal:8000/api/metrics
# Now returns: {"cpu_percent": 4.2, "ram_percent": 71.3, ...}
```
---
## General Troubleshooting Cheatsheet
| Symptom | First Commands to Run |
|---------|----------------------|
| Container won't start | `docker logs <container>` |
| Container starts then crashes | `docker logs <container> --tail 30` |
| Can't reach service from browser | `docker exec cloudflared curl -s http://<service>:<port>` |
| SSL/TLS error in browser | `curl -sv https://yourdomain.com` (check Cloudflare is resolving) |
| SSO failing with invalid_grant | Check both Authentik instances point to same shared Postgres |
| Database error | Check data directory permissions: `ls -la ./data/` |
| Port already in use | `sudo ss -tlnp | grep :<port>` |
| Out of disk space | `df -h` and `docker system df` |
| Out of RAM | `free -h` and `docker stats --no-stream` |
| Can't ping between containers | `docker network inspect kitestacks` |
| Forgejo 502 | `docker logs forgejo` — likely DB connection issue |
| Authentik won't start | Check it can reach `$KSCLOUD1_TAILSCALE:5432` (Tailscale up?) |

View file

@ -132,12 +132,12 @@ Given where you are today:
| Timeframe | Milestone |
|-----------|-----------|
| Next 12 months | CompTIA A+ Core 2 ✅ |
| Months 38 | CCNA |
| Months 911 | AWS SAA-C03 |
| Months 1214 | AWS SysOps Associate |
| Months 1518 | CKA (or CompTIA Cloud+) |
| Months 18+ | AI/ML certs |
| **July 7, 2026** | **CompTIA A+ Core 2** — exam goal (hard deadline July 12) |
| Months 16 after A+ | CCNA |
| Months 79 after A+ | AWS SAA-C03 |
| Months 1012 after A+ | AWS SysOps Associate |
| Months 1316 after A+ | CKA (or CompTIA Cloud+) |
| Months 16+ after A+ | AI/ML certs |
---

View file

@ -6,16 +6,16 @@ This is the concept that most people get wrong. Understanding it cold will impre
## The Problem SSO Solves
Without SSO: 9 services = 9 separate user databases. To add a friend:
Without SSO: 11 services = 11 separate user databases. To add a friend:
- Create account in Forgejo
- Create account in Grafana
- Create account in Open WebUI
- Create account in Kavita
- ... 9 times
- ... 11 times
To remove their access: 9 places to deactivate.
To remove their access: 11 places to deactivate.
With SSO: 1 account in Authentik. Access to all 9 services. Deactivate once.
With SSO: 1 account in Authentik. Access to all 11 services. Deactivate once.
---
@ -168,4 +168,4 @@ Authentik acts as a reverse proxy in front of the app. The user authenticates wi
## What to Say About SSO
> *"I implemented single sign-on across all nine services using Authentik as the OIDC identity provider. Each service is registered as an OAuth2 client with a unique client ID and redirect URI. The OAuth2 authorization code flow means user credentials only ever go to Authentik — other services receive a signed JWT and never see the password. I hit a distributed systems issue in production where authorization codes were being invalidated by active-active load balancing across two hosts — I diagnosed it by tracing the OAuth2 flow and fixed it by sharing a single Postgres database between both Authentik instances over a private Tailscale network."*
> *"I implemented single sign-on across all eleven services using Authentik as the OIDC identity provider. Each service is registered as an OAuth2 client with a unique client ID and redirect URI. The OAuth2 authorization code flow means user credentials only ever go to Authentik — other services receive a signed JWT and never see the password. I hit a distributed systems issue in production where authorization codes were being invalidated by active-active load balancing across two hosts — I diagnosed it by tracing the OAuth2 flow and fixed it by sharing a single Postgres database between both Authentik instances over a private Tailscale network."*

View file

@ -2,19 +2,19 @@
## The 30-Second Version (LinkedIn DM, recruiter screen)
> *"I built a self-hosted homelab running a public website at kitestacks.com with nine services — including a Git platform, AI assistant, eBook library, monitoring stack, and SSO. It runs on my home PC with a Hetzner cloud VPS as a live failover, connected through Cloudflare Tunnel so no ports are exposed on my home network. Everything is containerized with Docker and documented in a private Forgejo repo."*
> *"I built a self-hosted homelab running a public website at kitestacks.com with eleven services — including a Git platform, AI assistant, eBook library, bookmark manager, wiki, help desk, monitoring stack, and SSO. It runs on my home PC with a Hetzner cloud VPS as a live failover, connected through Cloudflare Tunnel so no ports are exposed on my home network. Everything is containerized with Docker and documented in a private Forgejo repo."*
---
## The 2-Minute Version (phone screen, LinkedIn intro)
> *"I built KiteStacks — a multi-host self-hosted platform running at kitestacks.com. The core is nine services containerized with Docker: a Forgejo Git instance, Grafana monitoring, Authentik for single sign-on, Open WebUI for AI access, Kavita for reading, Karakeep for bookmarks, OpenProject for tasks, Uptime Kuma for monitoring, and a custom portal I built myself.*
> *"I built KiteStacks — a multi-host self-hosted platform running at kitestacks.com. The core is eleven services containerized with Docker: a custom portal, Forgejo Git instance, Authentik for single sign-on, Open WebUI for AI access, Karakeep for bookmarks, Kavita for reading, Grafana with Prometheus for monitoring, Uptime Kuma for uptime checks, BookStack for documentation, OSTicket for help desk, and Portainer for container management.*
>
> *It runs on my home machine with a Hetzner VPS as a permanent cloud replica — active-active load balanced through Cloudflare Tunnel so the site stays up even when I'm traveling and my home network is down.*
>
> *The hardest part was a production SSO bug where OAuth2 authorization codes were being invalidated by the active-active routing — I traced the OAuth2 flow, identified it as a split-database problem, and solved it by migrating both hosts to a shared Postgres instance accessible only over a private Tailscale network.*
>
> *I'm currently studying for the CCNA to formalize the networking knowledge this project required."*
> *I'm currently studying for CompTIA A+ Core 2 (exam goal July 2026), then CCNA to formalize the networking knowledge this project required."*
---
@ -52,7 +52,7 @@ Be ready to go deep on any of these topics. Know the answers cold.
**"How does the monitoring work?"**
> *"Prometheus scrapes metrics from two node-exporter instances every 15 seconds — one on the home machine via Docker DNS and one on the Hetzner VPS via its public IP. Grafana visualizes both with the Node Exporter Full dashboard, and you can switch between hosts with an instance picker. Uptime Kuma runs external HTTP checks against all nine public subdomains and would alert me if any went down."*
> *"Prometheus scrapes metrics from two node-exporter instances every 15 seconds — one on the home machine via Docker DNS and one on the Hetzner VPS via its public IP. Grafana visualizes both with the Node Exporter Full dashboard, and you can switch between hosts with an instance picker. Uptime Kuma runs external HTTP checks against all eleven public subdomains and alerts me if any go down."*
---

View file

@ -2,13 +2,13 @@
## Your Advantage
You don't have a blank canvas. You have a live production system you built. Most people study networking in a textbook. You configured Cloudflare DNS, set up Tailscale, debugged a Docker networking ufw issue, and traced a distributed systems bug in OAuth2. That's hands-on experience that study alone can't replicate.
You don't have a blank canvas. You have a live production system you built — eleven services running across two hosts with SSO, active-active failover, and shared databases. Most people study networking in a textbook. You configured Cloudflare DNS, set up Tailscale, debugged a Docker networking ufw issue, and traced a distributed systems bug in OAuth2. That's hands-on experience that study alone can't replicate.
The goal now: attach the vocabulary, depth, and theory to things you've already done.
---
## Phase 1 — Complete A+ Core 2 (Now)
## Phase 1 — Complete A+ Core 2 (Exam goal: July 7, 2026)
**Focus areas that directly map to your homelab:**
@ -66,16 +66,18 @@ The CCNA will make everything in your homelab make deeper sense. After CCNA, re-
|-----|------------------------|
| EC2 | Hetzner VPS (kscloud1) |
| S3 | Static file storage |
| VPC | Docker bridge network |
| VPC | Docker bridge network (kitestacks) |
| ALB + CloudFront | Cloudflare Tunnel + edge |
| RDS | Authentik Postgres |
| ElastiCache | Authentik Redis |
| RDS | Shared Postgres on kscloud1 (Authentik + Forgejo) |
| ElastiCache | Shared Redis on kscloud1 |
| CloudWatch | Prometheus + Grafana |
| Route 53 | Cloudflare DNS |
| IAM | Authentik RBAC / groups |
| IAM | Authentik RBAC / groups (homelab-admin) |
| Secrets Manager | .env files (what you'd replace) |
| ECS / Fargate | Docker Compose (what you use) |
| VPC Peering | Tailscale overlay |
| Confluence/SharePoint | BookStack |
| ServiceNow | OSTicket |
---