merge: add homelab-mastery as subdir
Moved homelab-mastery repo content into homelab-mastery/ subdirectory. Covers architecture, concepts, certifications, interview-prep, and learning-path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
fb822d5142
commit
0d3fc4051c
10 changed files with 1534 additions and 0 deletions
48
homelab-mastery/README.md
Normal file
48
homelab-mastery/README.md
Normal file
|
|
@ -0,0 +1,48 @@
|
||||||
|
# Homelab Mastery — KiteStacks Learning Guide
|
||||||
|
|
||||||
|
**Owner:** kenpat
|
||||||
|
**Purpose:** Everything needed to understand, explain, rebuild, and build a career around the KiteStacks homelab project.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Your Current Status
|
||||||
|
|
||||||
|
| Milestone | Status |
|
||||||
|
|-----------|--------|
|
||||||
|
| CompTIA A+ Core 1 | ✅ Passed — highest score in class (22 people) |
|
||||||
|
| CompTIA A+ Core 2 | 🔄 In progress |
|
||||||
|
| CCNA | 📅 Next |
|
||||||
|
| Cloud / AI certs | 📅 After CCNA |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What This Repo Is
|
||||||
|
|
||||||
|
You built a production homelab — a real multi-host, highly available web platform with SSO, monitoring, cloud failover, and AI services. Most people learning DevOps do tutorials with fake projects. You have a real one running at a real domain.
|
||||||
|
|
||||||
|
This repo exists so you can:
|
||||||
|
1. **Understand** what everything does at the conceptual level
|
||||||
|
2. **Explain it** confidently to a hiring manager, recruiter, or LinkedIn connection
|
||||||
|
3. **Rebuild it** from scratch on a new machine if you ever need to
|
||||||
|
4. **Map it** to real certifications and career paths
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Navigation
|
||||||
|
|
||||||
|
| Section | What's Inside |
|
||||||
|
|---------|--------------|
|
||||||
|
| [certifications/](certifications/roadmap.md) | Full cert roadmap for cloud engineering, what each cert proves, study order |
|
||||||
|
| [architecture/](architecture/overview.md) | How the entire system works, why it was built this way |
|
||||||
|
| [concepts/](concepts/) | Deep dives on every technology: Docker, networking, OAuth2, Tailscale, etc. |
|
||||||
|
| [build-guide/](build-guide/README.md) | Step-by-step rebuild from a blank machine, with explanations of every decision |
|
||||||
|
| [interview-prep/](interview-prep/explain-the-project.md) | Exactly what to say to hiring managers, common questions + model answers |
|
||||||
|
| [learning-path/](learning-path/README.md) | Structured study plan, free resources, what to learn in what order |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The One-Paragraph Project Summary
|
||||||
|
|
||||||
|
> *KiteStacks is a self-hosted homelab running nine public-facing services behind Cloudflare Tunnel, with full SSO via Authentik (OIDC/OAuth2), active-active cloud failover on a Hetzner VPS, private networking over Tailscale, and real-time monitoring via Prometheus and Grafana. The platform serves a public domain (kitestacks.com) and stays online even when the primary home machine is off — all running on commodity hardware with no open ports on the home router.*
|
||||||
|
|
||||||
|
That is what you built. Now learn to own every word of it.
|
||||||
199
homelab-mastery/architecture/decisions.md
Normal file
199
homelab-mastery/architecture/decisions.md
Normal file
|
|
@ -0,0 +1,199 @@
|
||||||
|
# Architecture Decisions — The Why Behind Every Choice
|
||||||
|
|
||||||
|
For every technology choice, there was a reason. Understanding the "why" is what separates someone who copied commands from someone who designed a system.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why Docker Instead of Running Services Directly?
|
||||||
|
|
||||||
|
**Problem:** Running 15+ services directly on a Linux host creates dependency hell — different Python versions, conflicting library versions, services affecting each other.
|
||||||
|
|
||||||
|
**Options considered:**
|
||||||
|
- Bare metal: install each app directly on the OS
|
||||||
|
- Virtual machines: one VM per service
|
||||||
|
- Docker containers: isolated processes with their own dependencies
|
||||||
|
|
||||||
|
**Decision:** Docker
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- Each container has its own filesystem, dependencies, and runtime — they can't conflict
|
||||||
|
- Starting/stopping/updating one service doesn't affect others
|
||||||
|
- The `docker-compose.yml` file IS the documentation — it shows exactly what the service needs to run
|
||||||
|
- Portability: move the same compose file to a new machine and it works identically
|
||||||
|
- Isolation: if Karakeep gets compromised, it can't easily touch Forgejo's data
|
||||||
|
|
||||||
|
**What you'd say to a hiring manager:** *"I containerized every service using Docker and Docker Compose so each has isolated dependencies and the entire deployment is reproducible from a single YAML file."*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why Cloudflare Tunnel Instead of Port Forwarding?
|
||||||
|
|
||||||
|
**Problem:** How do you make home services accessible from the internet?
|
||||||
|
|
||||||
|
**Traditional approach:** Open port 80 and 443 on the home router, configure NAT, point DNS to home IP.
|
||||||
|
|
||||||
|
**Problems with that:**
|
||||||
|
- Exposes your home IP address publicly (DDoS risk, can be found, ISP tracks it)
|
||||||
|
- Dynamic home IP means DNS breaks every time IP changes
|
||||||
|
- Some ISPs block residential port 80/443
|
||||||
|
- Router configuration is error-prone and varies by hardware
|
||||||
|
|
||||||
|
**Decision:** Cloudflare Tunnel (cloudflared)
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- cloudflared makes an OUTBOUND connection to Cloudflare — no inbound ports needed
|
||||||
|
- Home IP never exposed
|
||||||
|
- Works regardless of ISP restrictions
|
||||||
|
- Cloudflare handles TLS/HTTPS — you don't manage SSL certificates
|
||||||
|
- Free tier covers everything needed
|
||||||
|
- Bonus: built-in DDoS protection
|
||||||
|
|
||||||
|
**The trade-off:** You depend on Cloudflare. If Cloudflare has an outage, your site goes down even if your hardware is fine. This is acceptable — Cloudflare's uptime is better than most home internet connections.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why Authentik for SSO Instead of Separate Logins Per App?
|
||||||
|
|
||||||
|
**Problem:** 9 services means 9 different usernames and passwords to manage. Adding a user requires going into 9 admin panels. Removing access means 9 places to deactivate.
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
- Separate logins per service (no SSO)
|
||||||
|
- Authelia (simpler, forward-auth proxy only)
|
||||||
|
- Authentik (full OIDC provider, more complex)
|
||||||
|
- Keycloak (enterprise-grade, very heavy)
|
||||||
|
|
||||||
|
**Decision:** Authentik
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- One account controls access to everything
|
||||||
|
- Apps that support native OIDC (Grafana, Kavita, Open WebUI, Karakeep) get real SSO — the user is authenticated inside the app
|
||||||
|
- Can restrict which groups can access which applications (Portainer restricted to homelab-admin group)
|
||||||
|
- Self-hosted — user data stays on your infrastructure
|
||||||
|
- Authentik supports both native OIDC (for apps that support it) and proxy provider (for apps that don't)
|
||||||
|
|
||||||
|
**The trade-off:** Authentik is complex to set up and has a significant memory footprint. Authelia would be simpler. But Authelia only does forward-auth proxy — it can't give an app a real JWT. Authentik does both.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why a Shared Postgres Instead of Separate Authentik Databases?
|
||||||
|
|
||||||
|
**Problem:** After setting up active-active failover, users kept getting `invalid_grant` errors when signing in through SSO.
|
||||||
|
|
||||||
|
**Root cause:** OAuth2 authorization codes are rows in a database. The flow is:
|
||||||
|
1. `/authorize` → code stored in Database A (monk's Authentik)
|
||||||
|
2. `/token` → looks for code in Database B (kscloud1's Authentik)
|
||||||
|
3. Code not found → `invalid_grant`
|
||||||
|
|
||||||
|
Cloudflare Tunnel load-balances between monk and kscloud1 for every HTTP request. Steps 1 and 2 of the OAuth flow can hit different hosts.
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
- Sync databases continuously (complex, slow, conflict-prone)
|
||||||
|
- Use sticky sessions (Cloudflare paid feature)
|
||||||
|
- Share one database (simple, reliable)
|
||||||
|
|
||||||
|
**Decision:** Shared Postgres on kscloud1, accessible only over Tailscale
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- Both monk and kscloud1 Authentik read/write the same database — authorization codes always found
|
||||||
|
- Tailscale binding means the database is never exposed to the public internet (security)
|
||||||
|
- Simple: one line change in each `docker-compose.yml` to point to a different host
|
||||||
|
- Cost: free (already paying for kscloud1)
|
||||||
|
|
||||||
|
**The trade-off:** If kscloud1 goes down and Tailscale connectivity breaks, monk's Authentik can't start. Rollback procedure: restore monk's compose to use a local Postgres.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why Tailscale Instead of WireGuard or OpenVPN?
|
||||||
|
|
||||||
|
**Problem:** Need private networking between monk (home) and kscloud1 (Hetzner cloud) without exposing the Authentik database to the public internet.
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
- WireGuard: manual key exchange, manual routing, technical to configure
|
||||||
|
- OpenVPN: even more complex, slower
|
||||||
|
- Tailscale: managed WireGuard, automatic key exchange, works behind NAT
|
||||||
|
|
||||||
|
**Decision:** Tailscale
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- Works instantly — install, authenticate, done
|
||||||
|
- Handles NAT traversal automatically (monk is behind home router NAT)
|
||||||
|
- Devices get stable 100.x.x.x IPs regardless of actual network location
|
||||||
|
- Free for up to 100 devices
|
||||||
|
- Uses WireGuard under the hood — same encryption, much easier configuration
|
||||||
|
|
||||||
|
**The trade-off:** Tailscale is a managed service — you trust Tailscale's coordination servers. The actual data is encrypted peer-to-peer (Tailscale can't see it), but they control device authentication. Self-hosted alternative: Headscale.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why Active-Active Instead of Active-Passive Failover?
|
||||||
|
|
||||||
|
**The context:** The user travels. When away from home, monk might be inaccessible (home network down, ISP outage, power). kscloud1 should keep the site running.
|
||||||
|
|
||||||
|
**Active-Passive:** kscloud1 only starts serving if monk is detected as down. Cloudflare would need health checks and failover rules.
|
||||||
|
|
||||||
|
**Active-Active:** Both monk and kscloud1 are always in the Cloudflare Tunnel rotation. Every request might hit either host.
|
||||||
|
|
||||||
|
**Decision:** Active-Active
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- Simpler: no health checks to configure, no failover logic
|
||||||
|
- Instant: if monk goes down, kscloud1 is already handling 50% of traffic
|
||||||
|
- Free: Cloudflare Tunnel active-active is free; health-check-based failover requires paid plans
|
||||||
|
|
||||||
|
**The trade-off:** Stateful apps (Forgejo, OpenProject, Kavita) have separate databases on each host. A user might see different data depending on which host answers. This was explicitly accepted: the point is uptime, not data consistency across hosts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why nginx for the Portal Instead of a Pre-Built Dashboard?
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
- gethomepage (what was used before) — nice but limited customization
|
||||||
|
- Heimdall — similar limitations
|
||||||
|
- Custom static site + nginx — full control
|
||||||
|
|
||||||
|
**Decision:** Custom static HTML/CSS/JS + nginx
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- Complete visual control — the cyberpunk theme, the layout, every pixel
|
||||||
|
- Static files served by nginx are extremely fast and reliable
|
||||||
|
- Can proxy the metrics API for real-time stats without CORS issues
|
||||||
|
- No framework dependencies — no Node.js, no build step, just files
|
||||||
|
|
||||||
|
**The trade-off:** More work to build and maintain than a pre-built dashboard. But you now understand every line of it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why Python + FastAPI for the Metrics API?
|
||||||
|
|
||||||
|
**Problem:** The portal needs real-time system stats (CPU, RAM, network), weather, and Forgejo activity. These can't come from static HTML files.
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
- Shell scripts + cron → write stats to a JSON file the frontend reads
|
||||||
|
- Node.js + Express
|
||||||
|
- Python + FastAPI
|
||||||
|
|
||||||
|
**Decision:** Python FastAPI
|
||||||
|
|
||||||
|
**Why:**
|
||||||
|
- Python's `psutil` library reads system metrics with one line of code
|
||||||
|
- FastAPI is modern, fast, and automatically documents the API
|
||||||
|
- `async/await` means the API doesn't block while waiting for weather API responses
|
||||||
|
- Python is readable — you can understand and modify the code
|
||||||
|
|
||||||
|
**The special requirement:** The container needs `network_mode: host` and `pid: host`. Without these:
|
||||||
|
- `network_mode: host`: the container can see the host's network interfaces and report real network throughput (not container-level)
|
||||||
|
- `pid: host`: psutil can read the host's `/proc` filesystem, showing actual system stats instead of container stats
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why the Forgejo Repo for Documentation?
|
||||||
|
|
||||||
|
You could keep documentation in Notion, Google Docs, or a wiki.
|
||||||
|
|
||||||
|
**Why Forgejo:**
|
||||||
|
- It's self-hosted — you own the data
|
||||||
|
- Git tracks every change with a timestamp and message
|
||||||
|
- The documentation lives alongside the configs it describes
|
||||||
|
- Hiring managers can see the commit history and read your documentation directly
|
||||||
|
|
||||||
|
**What this shows to a hiring manager:** You treat documentation like code — version-controlled, structured, maintained.
|
||||||
221
homelab-mastery/architecture/overview.md
Normal file
221
homelab-mastery/architecture/overview.md
Normal file
|
|
@ -0,0 +1,221 @@
|
||||||
|
# KiteStacks Architecture — Full System Overview
|
||||||
|
|
||||||
|
## The Big Picture
|
||||||
|
|
||||||
|
```
|
||||||
|
INTERNET
|
||||||
|
│
|
||||||
|
┌──────▼──────┐
|
||||||
|
│ Cloudflare │ DNS + TLS termination
|
||||||
|
│ (edge) │ Zero Trust Tunnel
|
||||||
|
└──────┬──────┘
|
||||||
|
│ HTTPS (443) only
|
||||||
|
┌────────────────┼────────────────┐
|
||||||
|
│ connector 1 │ connector 2 │ connector 3
|
||||||
|
│ │ │
|
||||||
|
┌──────▼──────┐ │ ┌──────▼──────┐
|
||||||
|
│ MONK │ │ │ KSCLOUD1 │
|
||||||
|
│ (home PC) │ │ │ (Hetzner VPS│
|
||||||
|
│ │ Active │ │ 5.78.x.x) │
|
||||||
|
│ All 9 │ Active │ │ │
|
||||||
|
│ services │ │ │ All 9 │
|
||||||
|
│ │ │ │ services │
|
||||||
|
└──────┬──────┘ │ └──────┬──────┘
|
||||||
|
│ │ │
|
||||||
|
└────────────────┼───────────────┘
|
||||||
|
TAILSCALE VPN
|
||||||
|
(100.x.x.x range)
|
||||||
|
│
|
||||||
|
┌────────▼────────┐
|
||||||
|
│ SHARED DB LAYER │
|
||||||
|
│ on kscloud1 │
|
||||||
|
│ Postgres :5432 │
|
||||||
|
│ Redis :6379 │
|
||||||
|
│ (Tailscale │
|
||||||
|
│ only, private)│
|
||||||
|
└─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Every Service and What It Does
|
||||||
|
|
||||||
|
### The Nine Public Services
|
||||||
|
|
||||||
|
| Service | Container Name | What It Does | Why It's Here |
|
||||||
|
|---------|---------------|--------------|---------------|
|
||||||
|
| **Portal** | `homepage` | The public website (kitestacks.com) — custom nginx serving static HTML/CSS/JS with a cyberpunk theme | Front door to everything. Shows system stats, recent activity, links to all services |
|
||||||
|
| **Authentik** | `authentik` | Identity provider — handles all logins via OIDC/OAuth2 SSO | Single place to manage all user accounts and access control |
|
||||||
|
| **Forgejo** | `forgejo` | Self-hosted Git platform (like GitHub but yours) | Store all homelab code, config, and documentation |
|
||||||
|
| **OpenProject** | `openproject` | Project management (like Jira) | Task tracking, project planning |
|
||||||
|
| **Open WebUI** | `kite-openwebui` | ChatGPT-like AI chat interface | Access multiple AI models through one interface |
|
||||||
|
| **Karakeep** | `karakeep` | Bookmark and read-it-later manager | Save links, articles, and content |
|
||||||
|
| **Kavita** | `kavita` | eBook and manga reader | Personal digital library |
|
||||||
|
| **Grafana** | `grafana` | Monitoring dashboards | Visualize CPU, RAM, network, uptime across both hosts |
|
||||||
|
| **Uptime Kuma** | `uptime-kuma` | Status page and uptime monitoring | Monitor that all 9 services are up and alert if they go down |
|
||||||
|
|
||||||
|
### The Infrastructure Services (Not Public-Facing)
|
||||||
|
|
||||||
|
| Service | What It Does |
|
||||||
|
|---------|-------------|
|
||||||
|
| `cloudflared` | Cloudflare Tunnel connector — creates encrypted outbound tunnel to Cloudflare edge |
|
||||||
|
| `prometheus` | Metrics collection — scrapes system stats from both monk and kscloud1 every 15 seconds |
|
||||||
|
| `node-exporter` | Exposes host system metrics (CPU, RAM, disk, network) for Prometheus to scrape |
|
||||||
|
| `kite-litellm` | LLM proxy gateway — routes AI requests to OpenRouter (multiple free models) |
|
||||||
|
| `portainer` | Docker management UI — visual interface to manage all containers |
|
||||||
|
| `kitestacks-metrics-api` | Python FastAPI service — serves real-time system stats, weather, and Forgejo activity to the portal |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How Traffic Flows
|
||||||
|
|
||||||
|
### When Someone Visits www.kitestacks.com
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Browser sends HTTPS request to www.kitestacks.com
|
||||||
|
2. DNS resolves to Cloudflare's anycast IP (not your home IP)
|
||||||
|
3. Cloudflare terminates TLS — your home router never sees HTTPS
|
||||||
|
4. Cloudflare routes the request through the tunnel to whichever
|
||||||
|
cloudflared connector responds first (monk or kscloud1)
|
||||||
|
5. cloudflared resolves "homepage" via Docker DNS
|
||||||
|
6. Request hits the nginx container serving the static portal
|
||||||
|
7. Portal's JavaScript fetches /api/metrics and /api/activity
|
||||||
|
from the kitestacks-metrics-api container via nginx proxy
|
||||||
|
8. Page renders with live system stats and recent git activity
|
||||||
|
```
|
||||||
|
|
||||||
|
### When Someone Clicks "Sign In with Authentik"
|
||||||
|
|
||||||
|
```
|
||||||
|
1. App (e.g., Grafana) redirects browser to auth.kitestacks.com/application/o/authorize/
|
||||||
|
2. Authentik presents login page
|
||||||
|
3. User enters credentials — Authentik validates against its database
|
||||||
|
(stored on kscloud1's Postgres, shared over Tailscale)
|
||||||
|
4. Authentik generates an authorization code and redirects back to Grafana
|
||||||
|
5. Grafana's backend calls auth.kitestacks.com/application/o/token/
|
||||||
|
to exchange the code for an access token
|
||||||
|
6. Authentik validates the code (found in shared DB) and returns a JWT
|
||||||
|
7. Grafana reads the user's email/name from the JWT and logs them in
|
||||||
|
```
|
||||||
|
|
||||||
|
**The critical detail:** Steps 1 and 5 can hit different tunnel connectors (monk vs kscloud1). The authorization code from step 4 must exist in whichever database step 5 hits. That's why both connectors point to the SAME Postgres on kscloud1 — otherwise step 5 returns `invalid_grant` because the code isn't found.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Two Hosts in Detail
|
||||||
|
|
||||||
|
### Monk (Primary Home Machine)
|
||||||
|
|
||||||
|
- **Role:** Primary production host
|
||||||
|
- **Network:** Home LAN, no open ports on router (Cloudflare Tunnel handles all inbound)
|
||||||
|
- **Services:** All 9 public services + all infrastructure services
|
||||||
|
- **Data:** Each service has its own database/storage
|
||||||
|
- **Authentik DB:** Points to kscloud1's Postgres over Tailscale (100.x.x.x)
|
||||||
|
|
||||||
|
### kscloud1 (Hetzner VPS)
|
||||||
|
|
||||||
|
- **Role:** Permanent cloud replica — always on, even when monk is off (travel, power outage, etc.)
|
||||||
|
- **Network:** Public IP, Cloudflare Tunnel connector 3
|
||||||
|
- **Services:** Full replica of all 9 public services (separate databases except Authentik)
|
||||||
|
- **Hosts:** The shared Authentik Postgres + Redis (bound to Tailscale interface only)
|
||||||
|
- **Resources:** 3 vCPU, 3.7 GB RAM — tight but functional
|
||||||
|
|
||||||
|
### What's the Same Across Both
|
||||||
|
|
||||||
|
- Same Cloudflare Tunnel token (different connector IDs assigned automatically)
|
||||||
|
- Same Authentik database (shared via Tailscale)
|
||||||
|
- Same Authentik secret key (required for JWT validation)
|
||||||
|
- Same kavita.db (one-time sync — users and OIDC config)
|
||||||
|
|
||||||
|
### What's Different Across Both
|
||||||
|
|
||||||
|
- Forgejo data (separate repos — accepted inconsistency)
|
||||||
|
- OpenProject data (separate projects)
|
||||||
|
- Karakeep bookmarks (separate)
|
||||||
|
- Kavita book files (monk has them, kscloud1 doesn't — covers synced, books not)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Docker Network
|
||||||
|
|
||||||
|
Every container joins the `kitestacks` external Docker bridge network:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker network create kitestacks
|
||||||
|
```
|
||||||
|
|
||||||
|
This is what makes Cloudflare Tunnel work. The cloudflared container is also on this network, so when Cloudflare tells cloudflared to route `http://grafana:3000`, Docker's internal DNS resolves `grafana` to the grafana container's IP on that network.
|
||||||
|
|
||||||
|
Without this shared network, cloudflared can't reach the service containers by name.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why No Open Ports on the Router
|
||||||
|
|
||||||
|
Traditional homelab: open port 80/443 on home router → NAT to home server → expose home IP.
|
||||||
|
|
||||||
|
Problems with that:
|
||||||
|
- Your home IP is public (DDoS risk, targeted attacks)
|
||||||
|
- Router configuration is fragile
|
||||||
|
- ISP can change your IP (dynamic IP)
|
||||||
|
- Some ISPs block port 80/443
|
||||||
|
|
||||||
|
Cloudflare Tunnel approach:
|
||||||
|
- cloudflared container makes an OUTBOUND connection to Cloudflare
|
||||||
|
- Cloudflare holds that connection open
|
||||||
|
- Inbound requests come through Cloudflare, over that existing outbound tunnel
|
||||||
|
- Your home IP is never exposed
|
||||||
|
- Works on any network, any ISP, any firewall
|
||||||
|
|
||||||
|
This is why you can run a public website from a home PC with zero router configuration.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tailscale — The Private Backbone
|
||||||
|
|
||||||
|
Tailscale creates a private overlay network (VPN mesh) across all your devices:
|
||||||
|
|
||||||
|
```
|
||||||
|
monk (100.x.x.x) ←—— encrypted ——→ kscloud1 (100.x.x.x)
|
||||||
|
monk (100.x.x.x) ←—— encrypted ——→ pixel-6 (100.x.x.x)
|
||||||
|
```
|
||||||
|
|
||||||
|
Used in this project for:
|
||||||
|
1. **Shared Authentik DB:** kscloud1's Postgres binds to its Tailscale IP, not its public IP. Only devices on the tailnet can connect. Monk points to that address.
|
||||||
|
2. **Forgejo activity feed:** On kscloud1, the metrics API fetches recent commits from monk's Forgejo via monk's Tailscale IP — so both portal instances show the same activity feed.
|
||||||
|
3. **SSH/Admin access:** You can SSH into any device on the tailnet from anywhere.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Monitoring Stack
|
||||||
|
|
||||||
|
```
|
||||||
|
node-exporter (monk) → prometheus (monk) → grafana (monk)
|
||||||
|
node-exporter (kscloud1) ↗ (scrapes 5.78.x.x:9100)
|
||||||
|
```
|
||||||
|
|
||||||
|
Prometheus scrapes metrics every 15 seconds from:
|
||||||
|
- `node-exporter:9100` — monk's own node-exporter (via Docker DNS)
|
||||||
|
- `5.78.x.x:9100` — kscloud1's node-exporter (via public IP, port exposed 0.0.0.0)
|
||||||
|
|
||||||
|
Grafana visualizes both, letting you switch between hosts in the instance picker.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Portal Architecture
|
||||||
|
|
||||||
|
The portal is NOT gethomepage or any pre-built dashboard. It's a custom-built static site:
|
||||||
|
|
||||||
|
```
|
||||||
|
nginx (container: "homepage")
|
||||||
|
├── / → serves static HTML/CSS/JS from ./public/
|
||||||
|
└── /api/* → proxy_pass to kitestacks-metrics-api:8000 (host)
|
||||||
|
|
||||||
|
kitestacks-metrics-api (network_mode: host, pid: host)
|
||||||
|
├── GET /api/metrics → psutil reads HOST's CPU/RAM/disk/network
|
||||||
|
├── GET /api/weather → wttr.in API → current weather by IP geolocation
|
||||||
|
├── GET /api/activity → Forgejo API → recent commits
|
||||||
|
└── GET /api/health → {"ok": true}
|
||||||
|
```
|
||||||
|
|
||||||
|
The metrics API runs with `network_mode: host` and `pid: host` so it reads the HOST machine's process table and `/proc` filesystem — not the container's. Without this, it would report container stats, not laptop stats.
|
||||||
165
homelab-mastery/certifications/roadmap.md
Normal file
165
homelab-mastery/certifications/roadmap.md
Normal file
|
|
@ -0,0 +1,165 @@
|
||||||
|
# Certification Roadmap — Cloud Engineering Track
|
||||||
|
|
||||||
|
Your goal: Cloud Engineer. This is one of the best-paid, highest-demand roles in tech.
|
||||||
|
Your project already demonstrates cloud engineering skills. Certs give you the vocabulary and credentials to prove it on paper.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Your Path (In Order)
|
||||||
|
|
||||||
|
```
|
||||||
|
CompTIA A+ Core 1 ✅ DONE (highest score)
|
||||||
|
↓
|
||||||
|
CompTIA A+ Core 2 ← YOU ARE HERE
|
||||||
|
↓
|
||||||
|
CompTIA Network+ ← OPTIONAL (CCNA covers this and more)
|
||||||
|
↓
|
||||||
|
CCNA ← Strong networking foundation
|
||||||
|
↓
|
||||||
|
AWS Solutions Architect Associate ← Core cloud cert
|
||||||
|
↓
|
||||||
|
CompTIA Cloud+ ← Vendor-neutral cloud (optional, pairs well with AWS SAA)
|
||||||
|
↓
|
||||||
|
AWS SysOps Administrator Associate ← Operations focus (very relevant to homelab)
|
||||||
|
↓
|
||||||
|
Kubernetes (CKA) ← Container orchestration (natural next step from Docker)
|
||||||
|
↓
|
||||||
|
AI / Prompt Engineering certs ← After cloud foundation is solid
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Each Cert Explained
|
||||||
|
|
||||||
|
### CompTIA A+ Core 2 (In Progress)
|
||||||
|
|
||||||
|
**What it covers:** Windows OS, macOS, Linux basics, security fundamentals, troubleshooting, remote support
|
||||||
|
**Why it matters:** Completes your A+ certification — required baseline for most IT roles
|
||||||
|
**How it connects to your homelab:** Linux troubleshooting, OS concepts, security basics
|
||||||
|
|
||||||
|
**Study tips:**
|
||||||
|
- Professor Messer (free on YouTube) — best A+ resource, period
|
||||||
|
- Jason Dion practice exams (Udemy, ~$15) — take these until you consistently hit 85%+
|
||||||
|
- Focus on Core 2's security domain — it maps directly to your Authentik/SSO work
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### CCNA (Cisco Certified Network Associate)
|
||||||
|
|
||||||
|
**What it covers:** TCP/IP networking, routing, switching, VLANs, subnetting, wireless, security basics, automation basics
|
||||||
|
**Why it matters:** The gold standard networking cert. Hiring managers trust it more than Network+. Cloud engineering requires deep networking knowledge.
|
||||||
|
**How it connects to your homelab:**
|
||||||
|
- Subnetting: your Docker bridge networks (172.x.x.x), Tailscale (100.x.x.x) are subnets
|
||||||
|
- DNS: you configured Cloudflare DNS for every subdomain
|
||||||
|
- Routing: Cloudflare Tunnel routes traffic to specific containers by hostname
|
||||||
|
- Firewalls: you configured ufw rules on kscloud1
|
||||||
|
- TCP/UDP: you opened specific ports, understand why services bind to certain ports
|
||||||
|
|
||||||
|
**Study resources:**
|
||||||
|
- *Jeremy's IT Lab* (free, YouTube + Packet Tracer labs) — best free CCNA content
|
||||||
|
- *Neil Anderson's CCNA course* (Udemy) — comprehensive paid option
|
||||||
|
- Cisco Packet Tracer (free simulator) — build labs, don't just watch
|
||||||
|
- Allan Johnson's *CCNA 200-301 Official Cert Guide* (Cisco Press) — the official book
|
||||||
|
|
||||||
|
**Timeline:** Plan 3–6 months of consistent study. Don't rush it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### AWS Solutions Architect — Associate (SAA-C03)
|
||||||
|
|
||||||
|
**What it covers:** EC2, S3, VPC, IAM, RDS, load balancers, auto-scaling, serverless, storage, CDN, security
|
||||||
|
**Why it matters:** Most in-demand cloud cert in the market. AWS powers ~33% of the internet. This cert is the entry point to cloud engineering jobs.
|
||||||
|
**How it connects to your homelab:**
|
||||||
|
- Your Hetzner VPS is essentially what an EC2 instance is on AWS
|
||||||
|
- Your Cloudflare Tunnel is similar to AWS CloudFront + ALB
|
||||||
|
- Your Docker networking maps to AWS VPC concepts
|
||||||
|
- Your Tailscale private network maps to AWS VPC peering / PrivateLink
|
||||||
|
- Your Prometheus/Grafana stack maps to AWS CloudWatch
|
||||||
|
- Your active-active failover maps to AWS multi-AZ architecture
|
||||||
|
|
||||||
|
**Study resources:**
|
||||||
|
- *Stephane Maarek's AWS SAA course* (Udemy, ~$15 on sale) — the best, period
|
||||||
|
- *Tutorial Dojo practice exams* by Jon Bonso — most accurate practice exams for AWS
|
||||||
|
- AWS Free Tier — build the same things you built in your homelab, but on AWS
|
||||||
|
|
||||||
|
**Timeline:** 2–3 months after CCNA. Easier once you know networking well.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### AWS SysOps Administrator — Associate (SOA-C02)
|
||||||
|
|
||||||
|
**What it covers:** Monitoring, logging, automation, deployments, security, cost management, high availability
|
||||||
|
**Why it matters:** More hands-on than SAA. Directly maps to what you did in your homelab — keeping systems running, monitoring them, troubleshooting.
|
||||||
|
**How it connects to your homelab:** This is literally your homelab at enterprise scale. Prometheus → CloudWatch. Docker → EC2/ECS. Cloudflare Tunnel → ALB. Tailscale → VPC.
|
||||||
|
|
||||||
|
**Take this after SAA.** Many people skip it — don't. It makes you a better engineer.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Certified Kubernetes Administrator (CKA)
|
||||||
|
|
||||||
|
**What it covers:** Container orchestration, Kubernetes cluster management, deployments, networking, storage, troubleshooting
|
||||||
|
**Why it matters:** Docker Compose is what you use at home. Kubernetes is what companies use in production. This cert is highly valued at mid-to-senior level.
|
||||||
|
**How it connects to your homelab:** You run containers with Docker Compose — Kubernetes is the enterprise version. Your `kitestacks` Docker network maps to Kubernetes namespaces. Your services map to Kubernetes Deployments.
|
||||||
|
|
||||||
|
**Study resources:**
|
||||||
|
- *Mumshad Mannambeth's CKA course* (KodeKloud) — industry standard
|
||||||
|
- KodeKloud labs — hands-on practice environment built specifically for this exam
|
||||||
|
|
||||||
|
**When to take it:** After AWS certs. Kubernetes before cloud fundamentals is backwards.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### AI / Prompt Engineering Certifications
|
||||||
|
|
||||||
|
Since you're already running Open WebUI + LiteLLM, you have a head start.
|
||||||
|
|
||||||
|
| Cert | Provider | Cost | Best For |
|
||||||
|
|------|----------|------|----------|
|
||||||
|
| **AWS AI Practitioner (AIF-C01)** | AWS | ~$150 | Cloud AI fundamentals, pairs with your AWS path |
|
||||||
|
| **Azure AI-900** | Microsoft | ~$165 | Broad AI concepts, vendor-neutral feel |
|
||||||
|
| **Google Generative AI Fundamentals** | Google Cloud | Free | Quick badge, good starter |
|
||||||
|
| **DeepLearning.AI — Prompt Engineering** | Coursera/DeepLearning | Free (audit) | Best hands-on prompt content |
|
||||||
|
| **Vanderbilt Prompt Engineering Specialization** | Coursera | ~$50 | Certificate for LinkedIn |
|
||||||
|
|
||||||
|
**Honest advice:** For prompt engineering, a portfolio beats a cert. Document your LiteLLM/Open WebUI setup. Show model routing configurations. Write about the decisions you made. That's more valuable than any certificate.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Certification Timeline
|
||||||
|
|
||||||
|
Given where you are today:
|
||||||
|
|
||||||
|
| Timeframe | Milestone |
|
||||||
|
|-----------|-----------|
|
||||||
|
| Next 1–2 months | CompTIA A+ Core 2 ✅ |
|
||||||
|
| Months 3–8 | CCNA |
|
||||||
|
| Months 9–11 | AWS SAA-C03 |
|
||||||
|
| Months 12–14 | AWS SysOps Associate |
|
||||||
|
| Months 15–18 | CKA (or CompTIA Cloud+) |
|
||||||
|
| Months 18+ | AI/ML certs |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why This Order Matters
|
||||||
|
|
||||||
|
**Networking before cloud:** AWS, Azure, and GCP are all just managed networking + compute. If you don't understand subnets, routing, and DNS, cloud will be confusing. CCNA first makes cloud certs 3x easier.
|
||||||
|
|
||||||
|
**Associate before specialty:** Don't skip to advanced certs. The associate level forces you to learn breadth. You'll encounter scenarios in the SysOps exam that directly map to what broke in your homelab.
|
||||||
|
|
||||||
|
**Hands-on alongside study:** The fastest way to pass any of these is to *build the thing* while you study. You already have a homelab. Use it. Every AWS service you study — ask yourself: "what's the equivalent in my homelab?"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What These Certs Say to a Hiring Manager
|
||||||
|
|
||||||
|
| You Have | They Hear |
|
||||||
|
|----------|-----------|
|
||||||
|
| A+ | You know how hardware and OS work |
|
||||||
|
| CCNA | You understand networking deeply, not just surface level |
|
||||||
|
| AWS SAA | You can architect solutions in the cloud |
|
||||||
|
| AWS SysOps | You can keep cloud infrastructure running in production |
|
||||||
|
| CKA | You can manage container workloads at scale |
|
||||||
|
| Homelab project | You do this for fun, not just for a paycheck |
|
||||||
|
|
||||||
|
The last row is the most important one.
|
||||||
126
homelab-mastery/concepts/docker.md
Normal file
126
homelab-mastery/concepts/docker.md
Normal file
|
|
@ -0,0 +1,126 @@
|
||||||
|
# Docker — What It Actually Is
|
||||||
|
|
||||||
|
## The Wrong Mental Model
|
||||||
|
|
||||||
|
Most people think containers are "mini virtual machines." They're not. Understanding the real model is what makes Docker make sense.
|
||||||
|
|
||||||
|
## What a Container Actually Is
|
||||||
|
|
||||||
|
A container is a **process** (or group of processes) running on the host Linux kernel, with two things applied:
|
||||||
|
|
||||||
|
1. **Namespaces** — isolation. The process gets its own view of the filesystem, network, processes, users. It can't see other containers' processes.
|
||||||
|
2. **cgroups (control groups)** — resource limits. The process is limited to a certain amount of CPU, RAM, etc.
|
||||||
|
|
||||||
|
That's it. There's no second kernel. No hypervisor. No hardware emulation. The nginx running in your `homepage` container is a regular Linux process on your laptop — it just *thinks* it's alone.
|
||||||
|
|
||||||
|
This is why containers start in milliseconds (no boot) and use almost no overhead.
|
||||||
|
|
||||||
|
## Images vs Containers
|
||||||
|
|
||||||
|
| Concept | Analogy | What it is |
|
||||||
|
|---------|---------|-----------|
|
||||||
|
| **Image** | A recipe | A read-only template — filesystem layers, default command, environment |
|
||||||
|
| **Container** | A meal cooked from that recipe | A running instance of an image — has its own writable layer on top |
|
||||||
|
|
||||||
|
You can run 10 containers from the same image. They all share the read-only image layers and each gets their own writable layer on top. If a container is deleted, its writable layer is gone. The image remains.
|
||||||
|
|
||||||
|
When you run `docker compose up -d`, Docker:
|
||||||
|
1. Pulls the image if not already local
|
||||||
|
2. Creates a container (adds writable layer)
|
||||||
|
3. Attaches it to the specified networks
|
||||||
|
4. Mounts the volumes
|
||||||
|
5. Starts the process defined in the image's CMD or your compose override
|
||||||
|
|
||||||
|
## Docker Networks — Why the `kitestacks` Network Exists
|
||||||
|
|
||||||
|
Docker creates several default networks. Containers on the **same network** can reach each other by **container name** (Docker has its own DNS built in).
|
||||||
|
|
||||||
|
In this homelab:
|
||||||
|
```
|
||||||
|
docker network create kitestacks
|
||||||
|
```
|
||||||
|
|
||||||
|
Every container joins this network. So when cloudflared routes traffic for `www.kitestacks.com`, it resolves `homepage` via Docker DNS to the container's IP on the `kitestacks` network. Without this shared network, cloudflared can't find the other containers.
|
||||||
|
|
||||||
|
```
|
||||||
|
cloudflared container → DNS lookup "homepage" → 172.x.x.x (homepage container)
|
||||||
|
```
|
||||||
|
|
||||||
|
**`network_mode: host`** is different — the container shares the HOST's network namespace entirely. No isolation. Used for the metrics API so it can read actual host network stats.
|
||||||
|
|
||||||
|
## Volumes — Keeping Data When Containers Are Deleted
|
||||||
|
|
||||||
|
Containers are ephemeral — their writable layer is deleted when the container is removed. To persist data:
|
||||||
|
|
||||||
|
**Bind mount:** Links a host directory to a container path.
|
||||||
|
```yaml
|
||||||
|
volumes:
|
||||||
|
- ./data:/forgejo-data
|
||||||
|
```
|
||||||
|
`./data` on your laptop → `/forgejo-data` inside container. Data lives on your laptop. You can browse it with `ls`.
|
||||||
|
|
||||||
|
**Named volume:** Docker manages the storage location.
|
||||||
|
```yaml
|
||||||
|
volumes:
|
||||||
|
- prometheus-data:/prometheus
|
||||||
|
```
|
||||||
|
Docker stores it in `/var/lib/docker/volumes/prometheus-data/`. You don't specify where.
|
||||||
|
|
||||||
|
**In this homelab:** Databases, config files, and user data use bind mounts (`./data`, `./config`, etc.) so you know exactly where everything is. Named volumes are used where location doesn't matter (Prometheus metrics, Portainer settings).
|
||||||
|
|
||||||
|
## Docker Compose — What It's Doing
|
||||||
|
|
||||||
|
`docker compose up -d` reads `docker-compose.yml` and for each service:
|
||||||
|
1. Ensures the image exists (pull if needed)
|
||||||
|
2. Creates the network if it doesn't exist
|
||||||
|
3. Creates the container with all specified config (env vars, volumes, ports, networks)
|
||||||
|
4. Starts the container
|
||||||
|
|
||||||
|
`-d` means detached — run in background.
|
||||||
|
|
||||||
|
`restart: unless-stopped` means Docker will restart the container if it crashes or if the host reboots — unless you explicitly stop it with `docker compose stop`.
|
||||||
|
|
||||||
|
## Port Mappings
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
ports:
|
||||||
|
- "3006:3000"
|
||||||
|
```
|
||||||
|
|
||||||
|
`HOST_PORT:CONTAINER_PORT`
|
||||||
|
|
||||||
|
Port 3000 inside the container is mapped to port 3006 on the host. From the host, `http://localhost:3006` reaches the service. From within the `kitestacks` Docker network, other containers use `http://forgejo:3000` (the container port, via Docker DNS).
|
||||||
|
|
||||||
|
Cloudflare Tunnel doesn't use host ports — it goes through the Docker network directly using the container name and container port.
|
||||||
|
|
||||||
|
## Commands to Know Cold
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# See all running containers
|
||||||
|
docker ps
|
||||||
|
|
||||||
|
# See logs for a container
|
||||||
|
docker logs forgejo
|
||||||
|
docker logs -f forgejo # follow (live tail)
|
||||||
|
|
||||||
|
# Execute a command inside a running container
|
||||||
|
docker exec -it forgejo bash # open a shell
|
||||||
|
docker exec forgejo forgejo admin user list # run a specific command
|
||||||
|
|
||||||
|
# Inspect a container's config
|
||||||
|
docker inspect authentik
|
||||||
|
|
||||||
|
# See all networks
|
||||||
|
docker network ls
|
||||||
|
docker network inspect kitestacks
|
||||||
|
|
||||||
|
# See disk usage
|
||||||
|
docker system df
|
||||||
|
|
||||||
|
# Remove unused images/containers/networks
|
||||||
|
docker system prune
|
||||||
|
```
|
||||||
|
|
||||||
|
## What to Say About Docker
|
||||||
|
|
||||||
|
> *"I containerized every service using Docker and Docker Compose. Each service is isolated in its own container with its own dependencies, connected through a shared Docker bridge network named 'kitestacks' so they can communicate by container name. Data is persisted via bind-mounted host directories. The entire stack is defined in version-controlled YAML files, making it reproducible on any Linux host."*
|
||||||
185
homelab-mastery/concepts/linux.md
Normal file
185
homelab-mastery/concepts/linux.md
Normal file
|
|
@ -0,0 +1,185 @@
|
||||||
|
# Linux — Commands and Concepts to Own
|
||||||
|
|
||||||
|
You've been running Linux commands without fully owning them. This fixes that.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Filesystem
|
||||||
|
|
||||||
|
Everything in Linux is a file. The filesystem tree starts at `/` (root):
|
||||||
|
|
||||||
|
```
|
||||||
|
/
|
||||||
|
├── etc/ Configuration files (system-wide)
|
||||||
|
├── home/ User home directories (/home/kenpat)
|
||||||
|
├── opt/ Optional/third-party software (kscloud1 services live here)
|
||||||
|
├── proc/ Virtual filesystem — running processes, kernel info
|
||||||
|
│ ├── uptime System uptime in seconds
|
||||||
|
│ └── net/route Routing table (used by metrics API to find active interface)
|
||||||
|
├── sys/ Virtual filesystem — hardware/kernel info
|
||||||
|
├── var/ Variable data — logs, databases, cache
|
||||||
|
└── usr/ User programs and libraries
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Permissions
|
||||||
|
|
||||||
|
Every file has three permission sets: **owner**, **group**, **others**.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ls -la ~/kitestacks-live/docker/kavita/config/kavita.db
|
||||||
|
-rw-r--r-- 1 kenpat kenpat 2.4M Jun 11 kavita.db
|
||||||
|
```
|
||||||
|
|
||||||
|
Breaking it down:
|
||||||
|
- `-` — it's a file (not a directory `d` or symlink `l`)
|
||||||
|
- `rw-` — owner (kenpat): read + write
|
||||||
|
- `r--` — group (kenpat): read only
|
||||||
|
- `r--` — others: read only
|
||||||
|
|
||||||
|
**chmod** changes permissions. **chown** changes owner.
|
||||||
|
|
||||||
|
Why this mattered: When syncing kavita.db to kscloud1, you ran `chown 1000:1000` because the Kavita container runs as user ID 1000. If the file is owned by the wrong user ID, the container can't write to it.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
chmod 644 kavita.db # rw-r--r--
|
||||||
|
chmod 755 script.sh # rwxr-xr-x (executable)
|
||||||
|
chown 1000:1000 kavita.db # set owner to UID 1000, GID 1000
|
||||||
|
chown -R kenpat:kenpat ./ # recursive (-R) on a directory
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Processes
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ps aux # all running processes
|
||||||
|
ps aux | grep forgejo # find forgejo processes
|
||||||
|
kill 1234 # send SIGTERM to PID 1234 (polite stop)
|
||||||
|
kill -9 1234 # send SIGKILL (force kill, no cleanup)
|
||||||
|
```
|
||||||
|
|
||||||
|
**systemctl** manages systemd services (services that start on boot):
|
||||||
|
```bash
|
||||||
|
systemctl status tailscaled # is tailscale running?
|
||||||
|
systemctl restart tailscaled # restart it
|
||||||
|
systemctl enable tailscaled # start on boot
|
||||||
|
journalctl -u tailscaled # logs for tailscale service
|
||||||
|
```
|
||||||
|
|
||||||
|
Your containers don't use systemd — Docker manages them with `restart: unless-stopped`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Networking Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# What ports is this machine listening on?
|
||||||
|
ss -tlnp # TCP listening, numeric, with process
|
||||||
|
ss -tlnp | grep :3006 # is Forgejo's port bound?
|
||||||
|
|
||||||
|
# Test connectivity
|
||||||
|
ping 8.8.8.8 # can I reach Google DNS?
|
||||||
|
curl -I https://auth.kitestacks.com # HTTP headers from Authentik
|
||||||
|
curl -s http://localhost:8000/api/health # test metrics API
|
||||||
|
|
||||||
|
# DNS lookup
|
||||||
|
dig www.kitestacks.com # full DNS query details
|
||||||
|
nslookup gitforge.kitestacks.com # simpler DNS lookup
|
||||||
|
|
||||||
|
# Firewall
|
||||||
|
sudo ufw status # what rules are active?
|
||||||
|
sudo ufw allow 22/tcp # allow SSH
|
||||||
|
sudo ufw deny 3306/tcp # block MySQL from outside
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Piping and Redirection
|
||||||
|
|
||||||
|
The `|` (pipe) sends output of one command as input to another:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker ps | grep forgejo # filter docker ps output
|
||||||
|
cat prometheus.yml | grep job # find job lines in config
|
||||||
|
docker logs authentik 2>&1 | grep ERROR # show only errors
|
||||||
|
```
|
||||||
|
|
||||||
|
`2>&1` redirects stderr (error output, stream 2) to stdout (stream 1) — so errors appear in the same stream as normal output and can be piped.
|
||||||
|
|
||||||
|
`>` redirects output to a file (overwrites):
|
||||||
|
```bash
|
||||||
|
pg_dump authentik > authentik-backup.sql
|
||||||
|
```
|
||||||
|
|
||||||
|
`>>` appends to a file:
|
||||||
|
```bash
|
||||||
|
echo "new line" >> ~/.ssh/config
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## SSH
|
||||||
|
|
||||||
|
SSH (Secure Shell) gives you a terminal session on a remote machine.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh kenpat@5.78.x.x # basic SSH
|
||||||
|
ssh -i ~/.ssh/id_ed25519_kscloud1 kenpat@5.78.x.x # specify key
|
||||||
|
ssh -L 5099:localhost:5000 kenpat@5.78.x.x # local port forward
|
||||||
|
```
|
||||||
|
|
||||||
|
**Local port forward** (`-L local:remote_host:remote_port`):
|
||||||
|
`ssh -L 5099:localhost:5000 kenpat@kscloud1` means:
|
||||||
|
- Traffic to YOUR localhost:5099
|
||||||
|
- Gets tunneled through the SSH connection
|
||||||
|
- And hits kscloud1's localhost:5000
|
||||||
|
|
||||||
|
You used this to access kscloud1's Kavita instance (running on port 5000) from your browser at http://localhost:5099 — without opening that port to the internet.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## sudo and Non-Interactive Usage
|
||||||
|
|
||||||
|
`sudo` runs a command as root. It normally prompts for your password.
|
||||||
|
|
||||||
|
**The kscloud1 problem:** In automated scripts, there's no terminal to enter a password. Solution:
|
||||||
|
```bash
|
||||||
|
echo YOUR_PASSWORD | sudo -S command
|
||||||
|
# -S reads password from stdin instead of terminal
|
||||||
|
```
|
||||||
|
|
||||||
|
You used this to run ufw commands non-interactively. In real production environments, this is handled differently (sudoers file with NOPASSWD for specific commands, or SSH key-based service accounts).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Grep, Sed, Awk
|
||||||
|
|
||||||
|
**grep** finds lines matching a pattern:
|
||||||
|
```bash
|
||||||
|
grep "error" /var/log/syslog # lines containing "error"
|
||||||
|
grep -i "error" logfile # case-insensitive
|
||||||
|
grep -n "AUTHENTIK" docker-compose.yml # show line numbers
|
||||||
|
grep -r "p12217177" /opt/kitestacks/ # recursive search in directory
|
||||||
|
```
|
||||||
|
|
||||||
|
**sed** (stream editor) modifies text:
|
||||||
|
```bash
|
||||||
|
sed 's/old_text/new_text/g' file.txt # replace all occurrences
|
||||||
|
sed -i 's/old/new/g' file.txt # -i edits file in place
|
||||||
|
```
|
||||||
|
|
||||||
|
**awk** processes structured text (columns):
|
||||||
|
```bash
|
||||||
|
grep PG_PASS .env | cut -d= -f2- # get value after = (including trailing =)
|
||||||
|
# cut -d= splits on =, -f2- means "field 2 and everything after"
|
||||||
|
# This is why: PG_PASS=abc= → if you use -f2, you get "abc" (loses trailing =)
|
||||||
|
# With -f2-, you get "abc=" (correct)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What to Say About Linux
|
||||||
|
|
||||||
|
> *"All services run on Linux hosts. I'm comfortable with file permissions, process management, SSH configuration (including local port forwarding for secure access to non-exposed services), firewall rules with ufw, and command-line tools like grep, curl, and docker CLI. I diagnosed and fixed a network configuration issue on the cloud VPS where ufw's default-deny policy was blocking Docker container traffic to a host-network-mode service."*
|
||||||
187
homelab-mastery/concepts/networking.md
Normal file
187
homelab-mastery/concepts/networking.md
Normal file
|
|
@ -0,0 +1,187 @@
|
||||||
|
# Networking — The Foundation of Everything
|
||||||
|
|
||||||
|
This is the most important concept to master. Every other technology in this homelab is built on networking fundamentals. CCNA will teach this deeply — this is the overview.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## IP Addresses
|
||||||
|
|
||||||
|
Every device on a network has an IP address — a unique identifier.
|
||||||
|
|
||||||
|
**IPv4:** Four numbers 0–255 separated by dots: `192.168.1.205`
|
||||||
|
|
||||||
|
**Private ranges** (not routable on the internet, only on local networks):
|
||||||
|
- `192.168.x.x` — home networks (your router assigns these)
|
||||||
|
- `172.16.x.x – 172.31.x.x` — Docker bridge networks use this range
|
||||||
|
- `10.x.x.x` — corporate networks often use this
|
||||||
|
|
||||||
|
**Public IPs:** Routable on the internet. Your home has one (assigned by ISP). kscloud1 has one (assigned by Hetzner).
|
||||||
|
|
||||||
|
**Tailscale IPs:** `100.x.x.x` — a special private range used by Tailscale for its overlay network.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Subnets and CIDR Notation
|
||||||
|
|
||||||
|
A subnet is a range of IP addresses. CIDR notation describes the range:
|
||||||
|
- `192.168.1.0/24` — all addresses from `192.168.1.0` to `192.168.1.255` (256 addresses)
|
||||||
|
- `172.16.0.0/12` — a large range covering all Docker bridge networks
|
||||||
|
- `/32` — a single IP address
|
||||||
|
|
||||||
|
The number after `/` is the prefix length — how many bits are fixed. The remaining bits define the host range.
|
||||||
|
|
||||||
|
In this homelab: `ufw allow from 172.16.0.0/12` allows traffic from any Docker container to the host. That `/12` covers all possible Docker bridge subnet addresses.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ports
|
||||||
|
|
||||||
|
A port is a number (0–65535) that identifies a specific service on a host. Think of the IP address as the building, and the port as the apartment number.
|
||||||
|
|
||||||
|
**Well-known ports:**
|
||||||
|
- 22 — SSH
|
||||||
|
- 80 — HTTP
|
||||||
|
- 443 — HTTPS
|
||||||
|
- 3306 — MySQL/MariaDB
|
||||||
|
- 5432 — PostgreSQL
|
||||||
|
- 6379 — Redis
|
||||||
|
|
||||||
|
**Your homelab ports** (just examples — you know yours):
|
||||||
|
- Each service binds to a port inside the container
|
||||||
|
- Docker maps host ports to container ports: `3006:3000`
|
||||||
|
|
||||||
|
When a service "listens on a port," it's waiting for TCP/UDP connections on that port. When cloudflared connects to `http://grafana:3000`, it's connecting to IP of the `grafana` container on port 3000.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## DNS — How Names Become IPs
|
||||||
|
|
||||||
|
DNS (Domain Name System) translates human-readable names to IP addresses.
|
||||||
|
|
||||||
|
```
|
||||||
|
www.kitestacks.com → DNS lookup → Cloudflare's IP address
|
||||||
|
grafana → Docker DNS → 172.x.x.x (container IP)
|
||||||
|
100.123.x.x → Tailscale DNS → kscloud1
|
||||||
|
```
|
||||||
|
|
||||||
|
**Cloudflare DNS:** You configured NS records to point `kitestacks.com` to Cloudflare's nameservers. Cloudflare then controls all DNS for that domain. The A record for `www.kitestacks.com` points to Cloudflare's anycast IP, not your home IP.
|
||||||
|
|
||||||
|
**Docker DNS:** Inside the `kitestacks` Docker network, Docker runs an internal DNS server at `127.0.0.11`. When cloudflared looks up `homepage`, Docker DNS returns the container's IP on the bridge network.
|
||||||
|
|
||||||
|
**How to check DNS:**
|
||||||
|
```bash
|
||||||
|
dig www.kitestacks.com # what does the public DNS say?
|
||||||
|
nslookup grafana # from inside a container
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## HTTP vs HTTPS
|
||||||
|
|
||||||
|
**HTTP (HyperText Transfer Protocol):** Data is sent in plain text. Anyone who can see the network traffic can read it.
|
||||||
|
|
||||||
|
**HTTPS:** HTTP + TLS encryption. Data is encrypted in transit.
|
||||||
|
|
||||||
|
**TLS (Transport Layer Security):** A cryptographic protocol. Requires a certificate proving the server is who it claims to be.
|
||||||
|
|
||||||
|
In this homelab:
|
||||||
|
- All internal Docker network traffic is HTTP — it never leaves the host, so encryption isn't needed
|
||||||
|
- All public traffic goes through Cloudflare, which handles TLS — Cloudflare terminates HTTPS at the edge
|
||||||
|
- Between Cloudflare and cloudflared (the tunnel itself), traffic is encrypted by the tunnel protocol
|
||||||
|
|
||||||
|
**Certificates:** Cloudflare manages TLS certificates for `*.kitestacks.com` automatically — you don't need to configure Let's Encrypt or buy a certificate.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reverse Proxy
|
||||||
|
|
||||||
|
A reverse proxy sits in front of services and routes requests to them.
|
||||||
|
|
||||||
|
```
|
||||||
|
Client → Reverse Proxy → Service A
|
||||||
|
↘ Service B
|
||||||
|
↘ Service C
|
||||||
|
```
|
||||||
|
|
||||||
|
In this homelab, Cloudflare + cloudflared acts as the reverse proxy:
|
||||||
|
- Receives all inbound HTTPS traffic
|
||||||
|
- Decrypts TLS
|
||||||
|
- Reads the `Host` header (`www.kitestacks.com`, `grafana.kitestacks.com`, etc.)
|
||||||
|
- Routes to the correct container based on the hostname rules you configured
|
||||||
|
|
||||||
|
nginx (the portal container) is also a reverse proxy — it forwards `/api/*` requests to the metrics API running on the host.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cloudflare Tunnel — Deep Dive
|
||||||
|
|
||||||
|
The tunnel replaces the need for port forwarding. Here's exactly what happens:
|
||||||
|
|
||||||
|
**Setup (happens once when you start cloudflared):**
|
||||||
|
1. cloudflared reads the TUNNEL_TOKEN
|
||||||
|
2. It makes an outbound HTTPS connection to Cloudflare's edge servers (`region1.argotunnel.com`)
|
||||||
|
3. It authenticates and registers as a connector
|
||||||
|
4. Cloudflare keeps this connection open (persistent, long-lived)
|
||||||
|
|
||||||
|
**When a request comes in:**
|
||||||
|
1. User's browser connects to Cloudflare's edge (the public IP in DNS)
|
||||||
|
2. Cloudflare sees the Host header: `grafana.kitestacks.com`
|
||||||
|
3. Cloudflare looks up the tunnel configuration — `grafana.kitestacks.com` → `http://grafana:3000`
|
||||||
|
4. Cloudflare sends the request over the existing tunnel connection to cloudflared
|
||||||
|
5. cloudflared resolves `grafana` via Docker DNS → gets container IP
|
||||||
|
6. cloudflared forwards the request to the grafana container
|
||||||
|
7. Response goes back through the tunnel to Cloudflare → to the user
|
||||||
|
|
||||||
|
**The key insight:** All of this happens over a single outbound connection from cloudflared. No inbound ports. Your home router doesn't know any of this is happening.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Tailscale — Overlay Network
|
||||||
|
|
||||||
|
Tailscale creates a WireGuard mesh between devices. Each device gets a `100.x.x.x` IP that works regardless of physical location or network.
|
||||||
|
|
||||||
|
**Under the hood:**
|
||||||
|
- WireGuard: a modern VPN protocol, UDP-based, very fast, cryptographically simple
|
||||||
|
- Tailscale coordinates key exchange via their servers, but actual traffic is peer-to-peer
|
||||||
|
- Works behind NAT via UDP hole-punching (most of the time)
|
||||||
|
- Falls back to relay servers (DERP) if direct connection isn't possible
|
||||||
|
|
||||||
|
**Why this matters for the homelab:**
|
||||||
|
- kscloud1's Postgres and Redis bind to `100.123.x.x` (Tailscale IP), not `0.0.0.0`
|
||||||
|
- Even though kscloud1 has a public IP, the database is unreachable from the internet
|
||||||
|
- Only devices on the tailnet can connect to it
|
||||||
|
- Monk's Authentik connects to `100.123.x.x:5432` — traffic goes through the encrypted Tailscale tunnel
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Firewall Basics (ufw)
|
||||||
|
|
||||||
|
ufw (Uncomplicated Firewall) manages Linux's netfilter/iptables rules.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ufw default deny incoming # block all inbound by default
|
||||||
|
ufw allow ssh # allow SSH (port 22)
|
||||||
|
ufw allow from 172.16.0.0/12 to any port 8000 # Docker containers → metrics API
|
||||||
|
```
|
||||||
|
|
||||||
|
On kscloud1: ufw blocks everything by default. The exception for `172.16.0.0/12` allows containers (which use 172.x.x.x addresses) to reach port 8000 on the host (where the metrics API runs in host network mode).
|
||||||
|
|
||||||
|
Without that rule: the homepage container calls `host.docker.internal:8000` → kernel sees source `172.x.x.x` → ufw blocks it → System Status widget shows "Offline."
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What to Know Cold for CCNA
|
||||||
|
|
||||||
|
- **Subnetting:** Practice calculating subnets. `/24`, `/25`, `/26`, `/27` etc. — know the host ranges by heart.
|
||||||
|
- **OSI Model:** 7 layers. Know what each layer does and what protocols live there.
|
||||||
|
- **TCP vs UDP:** TCP is reliable (handshake, acknowledgements). UDP is fast (no handshake, fire and forget). HTTP uses TCP. DNS uses UDP (mostly).
|
||||||
|
- **The TCP 3-way handshake:** SYN → SYN-ACK → ACK. This is how every TCP connection starts.
|
||||||
|
- **ARP:** How a device finds the MAC address for an IP on the same subnet.
|
||||||
|
- **Default gateway:** The router. Packets destined for outside the local subnet go to the default gateway.
|
||||||
|
- **NAT:** Network Address Translation. How your home router lets multiple devices share one public IP. Crucial to understand — it's why cloudflared uses outbound connections.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What to Say About Networking
|
||||||
|
|
||||||
|
> *"The homelab uses Cloudflare Tunnel for all inbound traffic, which means no ports are open on the home router. All nine public subdomains have DNS pointing to Cloudflare, and a cloudflared connector on each host maintains a persistent outbound tunnel. Internally, services communicate over a Docker bridge network using container DNS. A Tailscale overlay network connects monk and kscloud1 for private database access — the shared Authentik Postgres is bound only to the Tailscale interface so it's never exposed to the public internet."*
|
||||||
171
homelab-mastery/concepts/oauth2-oidc.md
Normal file
171
homelab-mastery/concepts/oauth2-oidc.md
Normal file
|
|
@ -0,0 +1,171 @@
|
||||||
|
# OAuth2 and OIDC — How SSO Actually Works
|
||||||
|
|
||||||
|
This is the concept that most people get wrong. Understanding it cold will impress any interviewer.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Problem SSO Solves
|
||||||
|
|
||||||
|
Without SSO: 9 services = 9 separate user databases. To add a friend:
|
||||||
|
- Create account in Forgejo
|
||||||
|
- Create account in Grafana
|
||||||
|
- Create account in Open WebUI
|
||||||
|
- Create account in Kavita
|
||||||
|
- ... 9 times
|
||||||
|
|
||||||
|
To remove their access: 9 places to deactivate.
|
||||||
|
|
||||||
|
With SSO: 1 account in Authentik. Access to all 9 services. Deactivate once.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OAuth2 — Authorization, Not Authentication
|
||||||
|
|
||||||
|
OAuth2 is commonly misunderstood. It was designed for **authorization** (what can you access?) not **authentication** (who are you?).
|
||||||
|
|
||||||
|
**The core flow (Authorization Code):**
|
||||||
|
|
||||||
|
```
|
||||||
|
1. You click "Sign in with Authentik" in Grafana
|
||||||
|
|
||||||
|
2. Grafana redirects your browser to Authentik:
|
||||||
|
GET https://auth.kitestacks.com/application/o/authorize/
|
||||||
|
?client_id=grafana
|
||||||
|
&redirect_uri=https://grafana.kitestacks.com/login/generic_oauth
|
||||||
|
&response_type=code
|
||||||
|
&scope=openid email profile
|
||||||
|
&state=random_string_to_prevent_csrf
|
||||||
|
|
||||||
|
3. Authentik presents login page
|
||||||
|
You enter username + password
|
||||||
|
Authentik validates credentials against its database
|
||||||
|
|
||||||
|
4. Authentik redirects your browser BACK to Grafana:
|
||||||
|
GET https://grafana.kitestacks.com/login/generic_oauth
|
||||||
|
?code=abc123xyz ← authorization code (short-lived, one-time use)
|
||||||
|
&state=random_string ← must match what Grafana sent in step 2
|
||||||
|
|
||||||
|
5. Grafana's backend (not browser) calls Authentik directly:
|
||||||
|
POST https://auth.kitestacks.com/application/o/token/
|
||||||
|
client_id=grafana
|
||||||
|
client_secret=<secret>
|
||||||
|
code=abc123xyz
|
||||||
|
grant_type=authorization_code
|
||||||
|
|
||||||
|
6. Authentik validates: code exists? client_secret correct?
|
||||||
|
Returns: access_token + id_token (JWTs)
|
||||||
|
|
||||||
|
7. Grafana reads the user's info from the token
|
||||||
|
Logs the user in
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why the code exchange in steps 5-6?**
|
||||||
|
The authorization code goes through the browser (URL redirect) — visible in browser history, logs, etc. The actual tokens go server-to-server (Grafana backend → Authentik). This keeps tokens out of the browser.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The `invalid_grant` Bug — Explained
|
||||||
|
|
||||||
|
This is the exact bug you hit and fixed. Now understand why:
|
||||||
|
|
||||||
|
In step 4, Authentik stores the `code=abc123xyz` in its database.
|
||||||
|
In step 5, Grafana sends that code to Authentik to exchange for tokens.
|
||||||
|
|
||||||
|
With active-active failover:
|
||||||
|
- Step 4 might hit monk's cloudflared connector → monk's Authentik → code stored in monk's Postgres
|
||||||
|
- Step 5 might hit kscloud1's cloudflared connector → kscloud1's Authentik → looks for code in kscloud1's Postgres → NOT FOUND → `invalid_grant`
|
||||||
|
|
||||||
|
**The fix:** Both Authentik instances share ONE Postgres database (on kscloud1, via Tailscale). The code is always found regardless of which connector handles each request.
|
||||||
|
|
||||||
|
This is a real distributed systems problem — **stateful operations across load-balanced nodes** — and you diagnosed and fixed it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OIDC — Adding Identity on Top of OAuth2
|
||||||
|
|
||||||
|
OAuth2 tells you what a user can access. It doesn't tell you who they are.
|
||||||
|
|
||||||
|
OpenID Connect (OIDC) is a layer on top of OAuth2 that adds identity. It introduces the **ID Token** — a JWT (JSON Web Token) that contains claims about the user:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"sub": "user-uuid-from-authentik",
|
||||||
|
"email": "kenpat7177@gmail.com",
|
||||||
|
"name": "kenpat",
|
||||||
|
"preferred_username": "kenpat7177",
|
||||||
|
"iat": 1234567890,
|
||||||
|
"exp": 1234571490,
|
||||||
|
"iss": "https://auth.kitestacks.com/application/o/kavita/"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The app reads these claims to create or update the user's local account. This is why when you log into Grafana with Authentik, Grafana knows your username and email without you creating a separate Grafana account.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## JWT — What It Is
|
||||||
|
|
||||||
|
A JWT (JSON Web Token) is a signed, base64-encoded string with three parts:
|
||||||
|
|
||||||
|
```
|
||||||
|
eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJ1c2VyaWQiLCJlbWFpbCI6ImtlbnBhdEBraXRlc3RhY2tzLmNvbSJ9.SIGNATURE
|
||||||
|
HEADER PAYLOAD (claims) SIGNATURE
|
||||||
|
```
|
||||||
|
|
||||||
|
The signature is created by Authentik using a private RSA key. Any app can verify it using Authentik's public key (available at the JWKS endpoint). This means apps can validate tokens without calling Authentik again.
|
||||||
|
|
||||||
|
**What this means practically:** Authentik signs the JWT. Grafana doesn't need to call Authentik to validate it — it just checks the signature with Authentik's public key. This is stateless authentication.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Discovery Document
|
||||||
|
|
||||||
|
Every OIDC provider exposes a discovery document:
|
||||||
|
|
||||||
|
```
|
||||||
|
https://auth.kitestacks.com/application/o/grafana/.well-known/openid-configuration
|
||||||
|
```
|
||||||
|
|
||||||
|
This JSON file tells any OIDC client where to find every endpoint:
|
||||||
|
- `authorization_endpoint` — where to send the user to log in
|
||||||
|
- `token_endpoint` — where to exchange codes for tokens
|
||||||
|
- `userinfo_endpoint` — where to get user details
|
||||||
|
- `jwks_uri` — where to find the public keys for JWT validation
|
||||||
|
|
||||||
|
Apps use this URL to auto-configure themselves. That's why you set `OPENID_PROVIDER_URL` in Open WebUI and it figured out the rest.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Redirect URI — Why It Must Match Exactly
|
||||||
|
|
||||||
|
In step 2 of the flow, Grafana sends:
|
||||||
|
```
|
||||||
|
&redirect_uri=https://grafana.kitestacks.com/login/generic_oauth
|
||||||
|
```
|
||||||
|
|
||||||
|
Authentik checks: is this redirect URI registered for this client? If not, it refuses — this prevents attackers from creating a malicious app that tricks users into sending their authorization codes to an attacker's server.
|
||||||
|
|
||||||
|
This is why Karakeep broke until you fixed the redirect URI. Karakeep uses NextAuth.js with provider ID `custom`, so the actual callback path was `/api/auth/callback/custom` — not `/api/auth/callback/authentik`. The URI had to match exactly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Provider Patterns in This Homelab
|
||||||
|
|
||||||
|
**Native OIDC** (app handles SSO itself):
|
||||||
|
|
||||||
|
| App | How it works |
|
||||||
|
|-----|-------------|
|
||||||
|
| Grafana | Generic OAuth2 env vars (`GF_AUTH_GENERIC_OAUTH_*`) |
|
||||||
|
| Open WebUI | `OAUTH_CLIENT_ID` + `OPENID_PROVIDER_URL` env vars |
|
||||||
|
| Kavita | Settings UI → OIDC section (must use UI, not database) |
|
||||||
|
| Karakeep | NextAuth.js with `custom` provider ID in `.env` |
|
||||||
|
| Forgejo | OAuth2 authentication source in Forgejo admin UI |
|
||||||
|
|
||||||
|
**The Authentik proxy pattern** (for apps with no native SSO support):
|
||||||
|
Authentik acts as a reverse proxy in front of the app. The user authenticates with Authentik, and only authenticated requests are forwarded to the app. Uptime Kuma and Prometheus use this pattern (not yet fully deployed — requires Cloudflare route update).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What to Say About SSO
|
||||||
|
|
||||||
|
> *"I implemented single sign-on across all nine services using Authentik as the OIDC identity provider. Each service is registered as an OAuth2 client with a unique client ID and redirect URI. The OAuth2 authorization code flow means user credentials only ever go to Authentik — other services receive a signed JWT and never see the password. I hit a distributed systems issue in production where authorization codes were being invalidated by active-active load balancing across two hosts — I diagnosed it by tracing the OAuth2 flow and fixed it by sharing a single Postgres database between both Authentik instances over a private Tailscale network."*
|
||||||
88
homelab-mastery/interview-prep/explain-the-project.md
Normal file
88
homelab-mastery/interview-prep/explain-the-project.md
Normal file
|
|
@ -0,0 +1,88 @@
|
||||||
|
# How to Explain This Project
|
||||||
|
|
||||||
|
## The 30-Second Version (LinkedIn DM, recruiter screen)
|
||||||
|
|
||||||
|
> *"I built a self-hosted homelab running a public website at kitestacks.com with nine services — including a Git platform, AI assistant, eBook library, monitoring stack, and SSO. It runs on my home PC with a Hetzner cloud VPS as a live failover, connected through Cloudflare Tunnel so no ports are exposed on my home network. Everything is containerized with Docker and documented in a private Forgejo repo."*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The 2-Minute Version (phone screen, LinkedIn intro)
|
||||||
|
|
||||||
|
> *"I built KiteStacks — a multi-host self-hosted platform running at kitestacks.com. The core is nine services containerized with Docker: a Forgejo Git instance, Grafana monitoring, Authentik for single sign-on, Open WebUI for AI access, Kavita for reading, Karakeep for bookmarks, OpenProject for tasks, Uptime Kuma for monitoring, and a custom portal I built myself.*
|
||||||
|
>
|
||||||
|
> *It runs on my home machine with a Hetzner VPS as a permanent cloud replica — active-active load balanced through Cloudflare Tunnel so the site stays up even when I'm traveling and my home network is down.*
|
||||||
|
>
|
||||||
|
> *The hardest part was a production SSO bug where OAuth2 authorization codes were being invalidated by the active-active routing — I traced the OAuth2 flow, identified it as a split-database problem, and solved it by migrating both hosts to a shared Postgres instance accessible only over a private Tailscale network.*
|
||||||
|
>
|
||||||
|
> *I'm currently studying for the CCNA to formalize the networking knowledge this project required."*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Technical Deep-Dive (hiring manager, technical interview)
|
||||||
|
|
||||||
|
Be ready to go deep on any of these topics. Know the answers cold.
|
||||||
|
|
||||||
|
### On the Architecture
|
||||||
|
|
||||||
|
**"Walk me through how a request to www.kitestacks.com actually works."**
|
||||||
|
|
||||||
|
> *"The domain's DNS is managed by Cloudflare, pointing to their anycast edge. When a request arrives, Cloudflare routes it through the Cloudflare Tunnel — a persistent outbound connection maintained by the cloudflared container on each host. There are two connectors in rotation: one on my home machine and one on a Hetzner VPS. Cloudflare load-balances between them. The connector resolves the service name via Docker's internal DNS and forwards the request to the homepage container — an nginx instance serving the static portal files. For API calls, nginx proxies to a Python FastAPI service running in host-network mode so it can read actual host metrics via psutil."*
|
||||||
|
|
||||||
|
**"Why did you choose Cloudflare Tunnel over just opening ports on your router?"**
|
||||||
|
|
||||||
|
> *"Three reasons: security, reliability, and simplicity. Security — my home IP is never exposed publicly, which eliminates a whole class of attacks. Reliability — my home ISP uses dynamic IPs, so DNS would break every time it changes; Cloudflare handles that. Simplicity — no router configuration needed, works on any network, and Cloudflare handles TLS automatically."*
|
||||||
|
|
||||||
|
### On the SSO Implementation
|
||||||
|
|
||||||
|
**"How does the SSO work?"**
|
||||||
|
|
||||||
|
> *"I'm using Authentik as an OIDC identity provider. Each service is registered as an OAuth2 client with a unique client ID and redirect URI. The flow is the standard authorization code flow: the app redirects the user to Authentik, the user authenticates, Authentik issues an authorization code, the app's backend exchanges that code for a JWT, and reads the user's identity from the token's claims. Credentials never leave Authentik — other services only ever receive signed JWTs."*
|
||||||
|
|
||||||
|
**"Tell me about a problem you hit in production and how you fixed it."**
|
||||||
|
|
||||||
|
> *"The hardest issue was an `invalid_grant` error on SSO logins after I deployed cloud failover. OAuth2 authorization codes are single-use rows in a database — created when the user authenticates, consumed when the app exchanges them for tokens. With two Authentik instances and Cloudflare load-balancing between them, step one could hit monk's Authentik and step two could hit kscloud1's Authentik. The code existed in one database but not the other.*
|
||||||
|
>
|
||||||
|
> *I diagnosed it by tracing the OAuth2 authorization code flow and checking both Authentik instances' databases. The fix was to share a single Postgres and Redis between both instances, hosted on kscloud1 and accessible only over Tailscale — so both connectors read and write the same state regardless of which one handles which request. The Postgres is bound to kscloud1's Tailscale IP, not the public IP, so it's never reachable from the internet."*
|
||||||
|
|
||||||
|
### On Docker and Infrastructure
|
||||||
|
|
||||||
|
**"Why containerize everything with Docker?"**
|
||||||
|
|
||||||
|
> *"Dependency isolation, reproducibility, and operational consistency. Each service has its own runtime environment — Karakeep runs Node.js, Authentik runs Python, Forgejo runs Go, and they never conflict. The entire stack is defined in version-controlled docker-compose files, so I can recreate it exactly on a new machine. Docker's restart policies handle service recovery automatically. And the shared bridge network lets cloudflared reach any service by container name, which is cleaner than managing static IP assignments."*
|
||||||
|
|
||||||
|
**"How does the monitoring work?"**
|
||||||
|
|
||||||
|
> *"Prometheus scrapes metrics from two node-exporter instances every 15 seconds — one on the home machine via Docker DNS and one on the Hetzner VPS via its public IP. Grafana visualizes both with the Node Exporter Full dashboard, and you can switch between hosts with an instance picker. Uptime Kuma runs external HTTP checks against all nine public subdomains and would alert me if any went down."*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Questions They Might Ask You — With Answers
|
||||||
|
|
||||||
|
**Q: What would you do differently if rebuilding from scratch?**
|
||||||
|
> *"I'd set up Kubernetes instead of Docker Compose — the active-active architecture I built manually is what Kubernetes handles natively. I'd also use Terraform for the cloud infrastructure instead of manual setup, and consider a secrets manager like Vault instead of .env files."*
|
||||||
|
|
||||||
|
**Q: How do you handle secrets and credentials?**
|
||||||
|
> *"Currently .env files on each host, not committed to version control. In a production environment I'd use HashiCorp Vault or a cloud provider's secrets manager. The next improvement I have planned is integrating Vault with Docker secrets injection."*
|
||||||
|
|
||||||
|
**Q: What's the difference between what you built and a production enterprise setup?**
|
||||||
|
> *"Scale, observability, and automation. I have manual deployments; production uses CI/CD pipelines. I have two hosts; production might have dozens in multiple regions. My monitoring covers infrastructure; production monitoring also covers application performance, business metrics, and has PagerDuty integration. And production would have formal incident response runbooks, not just a RUNBOOK.md file."*
|
||||||
|
|
||||||
|
**Q: How does this relate to AWS/cloud engineering?**
|
||||||
|
> *"Almost everything I built has a direct AWS equivalent. My Hetzner VPS is an EC2 instance. My Cloudflare Tunnel is comparable to an ALB with CloudFront. My Tailscale private networking maps to VPC peering. My Prometheus/Grafana stack is what CloudWatch does as a managed service. My Docker Compose files are the manual equivalent of ECS task definitions or Kubernetes manifests. Building this homelab gave me the conceptual foundation that makes cloud services make intuitive sense."*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Red Flags to Avoid
|
||||||
|
|
||||||
|
- Don't say "I just followed tutorials" — you debugged production issues
|
||||||
|
- Don't say "I don't know how X works" — say "I understand the concept, I'd need to look up the specific syntax"
|
||||||
|
- Don't memorize answers — understand them. If you understand the OAuth2 flow, any question about it is answerable
|
||||||
|
- Don't undersell the `invalid_grant` fix — that's a real distributed systems problem that junior engineers typically can't diagnose
|
||||||
|
|
||||||
|
## Green Flags to Show
|
||||||
|
|
||||||
|
- You know WHY, not just WHAT
|
||||||
|
- You can trace a request from browser to service to database and back
|
||||||
|
- You fixed a real production bug by understanding the underlying protocol
|
||||||
|
- You documented everything
|
||||||
|
- You're actively studying (CCNA, cloud certs coming)
|
||||||
144
homelab-mastery/learning-path/README.md
Normal file
144
homelab-mastery/learning-path/README.md
Normal file
|
|
@ -0,0 +1,144 @@
|
||||||
|
# Learning Path — From Where You Are to Cloud Engineer
|
||||||
|
|
||||||
|
## Your Advantage
|
||||||
|
|
||||||
|
You don't have a blank canvas. You have a live production system you built. Most people study networking in a textbook. You configured Cloudflare DNS, set up Tailscale, debugged a Docker networking ufw issue, and traced a distributed systems bug in OAuth2. That's hands-on experience that study alone can't replicate.
|
||||||
|
|
||||||
|
The goal now: attach the vocabulary, depth, and theory to things you've already done.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1 — Complete A+ Core 2 (Now)
|
||||||
|
|
||||||
|
**Focus areas that directly map to your homelab:**
|
||||||
|
|
||||||
|
| A+ Core 2 Topic | Your Homelab Connection |
|
||||||
|
|----------------|------------------------|
|
||||||
|
| Linux command line | You've been using it — now learn the theory |
|
||||||
|
| Security fundamentals | Cloudflare Tunnel, ufw, Tailscale private networking |
|
||||||
|
| Scripting basics | The bash commands you've run |
|
||||||
|
| Troubleshooting methodology | The `invalid_grant` debug process |
|
||||||
|
| Remote access | SSH, SSH tunnels (you used `-L` forwarding) |
|
||||||
|
|
||||||
|
**Study approach:**
|
||||||
|
- Professor Messer's Core 2 videos (free YouTube)
|
||||||
|
- Jason Dion practice exams on Udemy — aim for 85%+ before scheduling
|
||||||
|
- For each topic, ask: "Where did I see this in my homelab?"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2 — CCNA (3–6 Months)
|
||||||
|
|
||||||
|
The CCNA will make everything in your homelab make deeper sense. After CCNA, re-read the networking.md file in this repo and you'll see how much more you understand.
|
||||||
|
|
||||||
|
**Study approach:**
|
||||||
|
1. **Jeremy's IT Lab** (free YouTube + Anki flashcards) — start here
|
||||||
|
2. **Packet Tracer labs** (free from Cisco) — build networks, don't just watch
|
||||||
|
3. **Subnetting practice** — do it daily until it's instant. Use subnettingpractice.com
|
||||||
|
4. **Week 1-4:** OSI model, TCP/IP, subnetting, Ethernet, switching
|
||||||
|
5. **Week 5-8:** VLANs, Spanning Tree, inter-VLAN routing
|
||||||
|
6. **Week 9-16:** IPv4 routing (OSPF, EIGRP), IPv6, ACLs, NAT
|
||||||
|
7. **Week 17-20:** WAN, wireless, security, automation basics, practice exams
|
||||||
|
|
||||||
|
**Labs to build in Packet Tracer that map to your homelab:**
|
||||||
|
- Build the monk + kscloud1 network topology
|
||||||
|
- Simulate the Cloudflare Tunnel concept with a router acting as the "edge"
|
||||||
|
- Set up ACLs that mimic your ufw rules
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3 — AWS SAA-C03 (After CCNA)
|
||||||
|
|
||||||
|
**Study approach:**
|
||||||
|
1. **Stephane Maarek's course** (Udemy) — the industry standard
|
||||||
|
2. **AWS Free Tier** — rebuild your homelab services as AWS equivalents:
|
||||||
|
- Forgejo → CodeCommit
|
||||||
|
- Custom portal → S3 static website + CloudFront
|
||||||
|
- Prometheus/Grafana → CloudWatch
|
||||||
|
- Authentik → Cognito
|
||||||
|
- Docker Compose → ECS Fargate
|
||||||
|
- Tailscale → VPC + PrivateLink
|
||||||
|
3. **Tutorial Dojo practice exams** — Jon Bonso's exams are the most accurate
|
||||||
|
|
||||||
|
**For each AWS service you study, map it back to your homelab:**
|
||||||
|
|
||||||
|
| AWS | Your Homelab Equivalent |
|
||||||
|
|-----|------------------------|
|
||||||
|
| EC2 | Hetzner VPS (kscloud1) |
|
||||||
|
| S3 | Static file storage |
|
||||||
|
| VPC | Docker bridge network |
|
||||||
|
| ALB + CloudFront | Cloudflare Tunnel + edge |
|
||||||
|
| RDS | Authentik Postgres |
|
||||||
|
| ElastiCache | Authentik Redis |
|
||||||
|
| CloudWatch | Prometheus + Grafana |
|
||||||
|
| Route 53 | Cloudflare DNS |
|
||||||
|
| IAM | Authentik RBAC / groups |
|
||||||
|
| Secrets Manager | .env files (what you'd replace) |
|
||||||
|
| ECS / Fargate | Docker Compose (what you use) |
|
||||||
|
| VPC Peering | Tailscale overlay |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4 — Hands-On Learning Between Certs
|
||||||
|
|
||||||
|
Don't just study. Build.
|
||||||
|
|
||||||
|
**Projects to add to your homelab that teach real cloud concepts:**
|
||||||
|
|
||||||
|
1. **Add Terraform** — define your kscloud1 server in Terraform so you can destroy and recreate it in minutes. This is Infrastructure as Code, a core cloud skill.
|
||||||
|
|
||||||
|
2. **Add a CI/CD pipeline** — set up Forgejo Actions (Forgejo's built-in CI/CD) so that pushing to a repo automatically tests and deploys changes. This is what DevOps engineers do all day.
|
||||||
|
|
||||||
|
3. **Add Vault** — replace .env files with HashiCorp Vault for secrets management. Real production environments never use .env files.
|
||||||
|
|
||||||
|
4. **Add Kubernetes** — migrate one or two services from Docker Compose to a local k3s cluster. k3s is lightweight Kubernetes — you have enough RAM on monk.
|
||||||
|
|
||||||
|
5. **Add automated backups** — write a script that backs up your Docker volumes to an S3 bucket (or kscloud1) nightly.
|
||||||
|
|
||||||
|
Each of these is a cert objective AND a portfolio item.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Daily Practice Habits
|
||||||
|
|
||||||
|
**15 minutes per day beats 3 hours on weekends.**
|
||||||
|
|
||||||
|
- **Subnetting:** Do 10 subnet calculations per day during CCNA study
|
||||||
|
- **Flashcards:** Anki for networking concepts, AWS services
|
||||||
|
- **Logs:** Check `docker logs` on a different service each day — understand what it's saying
|
||||||
|
- **Break something:** Pick one service per week, deliberately misconfigure it, diagnose and fix it. Document what you broke and how you fixed it.
|
||||||
|
- **Read error messages:** When something breaks, read the full error before Googling. Form a hypothesis first.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resources — Free First
|
||||||
|
|
||||||
|
| Topic | Resource | Cost |
|
||||||
|
|-------|----------|------|
|
||||||
|
| A+ Core 2 | Professor Messer (YouTube) | Free |
|
||||||
|
| CCNA | Jeremy's IT Lab (YouTube) | Free |
|
||||||
|
| CCNA labs | Cisco Packet Tracer | Free |
|
||||||
|
| AWS SAA | AWS Skill Builder free tier | Free |
|
||||||
|
| Python | automate the boring stuff (automatetheboringstuff.com) | Free |
|
||||||
|
| Docker | docs.docker.com "Get Started" | Free |
|
||||||
|
| Git | git-scm.com/book | Free |
|
||||||
|
| Linux | linuxcommand.org | Free |
|
||||||
|
| Networking deeper | tcpdump / Wireshark tutorials | Free |
|
||||||
|
|
||||||
|
**Worth paying for:**
|
||||||
|
- Stephane Maarek's AWS SAA on Udemy ($15 on sale — never pay full price)
|
||||||
|
- Tutorial Dojo AWS practice exams ($15)
|
||||||
|
- Jason Dion A+/CCNA practice exams on Udemy ($15)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How to Know You're Ready to Interview
|
||||||
|
|
||||||
|
You're ready when you can:
|
||||||
|
1. Explain the OAuth2 authorization code flow from memory without notes
|
||||||
|
2. Subnet any /24 or /25 network in under 30 seconds
|
||||||
|
3. Describe what happens at each layer of the OSI model when you ping google.com
|
||||||
|
4. Walk someone through what happens when a request hits www.kitestacks.com
|
||||||
|
5. Explain the difference between authentication and authorization
|
||||||
|
6. Describe what a VPC is and why it exists
|
||||||
|
7. Answer "what would you do differently?" with a real answer (not "nothing")
|
||||||
Loading…
Add table
Add a link
Reference in a new issue