kenpat 39a1541270 docs: remove personal A+ cert info from homelab docs

Strip all CompTIA A+ references, exam dates, and deadlines from the
project-facing documentation. Certifications roadmap now starts at CCNA,
learning path phases renumbered, interview prep updated accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-06-19 01:12:16 -05:00

8 KiB

Raw Blame History

How to Explain This Project

The 30-Second Version (LinkedIn DM, recruiter screen)

"I built a self-hosted homelab running a public website at kitestacks.com with eleven services — including a Git platform, AI assistant, eBook library, bookmark manager, wiki, help desk, monitoring stack, and SSO. It runs on my home PC with a Hetzner cloud VPS as a live failover, connected through Cloudflare Tunnel so no ports are exposed on my home network. Everything is containerized with Docker and documented in a private Forgejo repo."

The 2-Minute Version (phone screen, LinkedIn intro)

"I built KiteStacks — a multi-host self-hosted platform running at kitestacks.com. The core is eleven services containerized with Docker: a custom portal, Forgejo Git instance, Authentik for single sign-on, Open WebUI for AI access, Karakeep for bookmarks, Kavita for reading, Grafana with Prometheus for monitoring, Uptime Kuma for uptime checks, BookStack for documentation, OSTicket for help desk, and Portainer for container management.

It runs on my home machine with a Hetzner VPS as a permanent cloud replica — active-active load balanced through Cloudflare Tunnel so the site stays up even when I'm traveling and my home network is down.

The hardest part was a production SSO bug where OAuth2 authorization codes were being invalidated by the active-active routing — I traced the OAuth2 flow, identified it as a split-database problem, and solved it by migrating both hosts to a shared Postgres instance accessible only over a private Tailscale network.

I'm currently studying for the CCNA to formalize the networking knowledge this project required."

The Technical Deep-Dive (hiring manager, technical interview)

Be ready to go deep on any of these topics. Know the answers cold.

On the Architecture

"Walk me through how a request to www.kitestacks.com actually works."

"The domain's DNS is managed by Cloudflare, pointing to their anycast edge. When a request arrives, Cloudflare routes it through the Cloudflare Tunnel — a persistent outbound connection maintained by the cloudflared container on each host. There are two connectors in rotation: one on my home machine and one on a Hetzner VPS. Cloudflare load-balances between them. The connector resolves the service name via Docker's internal DNS and forwards the request to the homepage container — an nginx instance serving the static portal files. For API calls, nginx proxies to a Python FastAPI service running in host-network mode so it can read actual host metrics via psutil."

"Why did you choose Cloudflare Tunnel over just opening ports on your router?"

"Three reasons: security, reliability, and simplicity. Security — my home IP is never exposed publicly, which eliminates a whole class of attacks. Reliability — my home ISP uses dynamic IPs, so DNS would break every time it changes; Cloudflare handles that. Simplicity — no router configuration needed, works on any network, and Cloudflare handles TLS automatically."

On the SSO Implementation

"How does the SSO work?"

"I'm using Authentik as an OIDC identity provider. Each service is registered as an OAuth2 client with a unique client ID and redirect URI. The flow is the standard authorization code flow: the app redirects the user to Authentik, the user authenticates, Authentik issues an authorization code, the app's backend exchanges that code for a JWT, and reads the user's identity from the token's claims. Credentials never leave Authentik — other services only ever receive signed JWTs."

"Tell me about a problem you hit in production and how you fixed it."

"The hardest issue was an invalid_grant error on SSO logins after I deployed cloud failover. OAuth2 authorization codes are single-use rows in a database — created when the user authenticates, consumed when the app exchanges them for tokens. With two Authentik instances and Cloudflare load-balancing between them, step one could hit monk's Authentik and step two could hit kscloud1's Authentik. The code existed in one database but not the other.

I diagnosed it by tracing the OAuth2 authorization code flow and checking both Authentik instances' databases. The fix was to share a single Postgres and Redis between both instances, hosted on kscloud1 and accessible only over Tailscale — so both connectors read and write the same state regardless of which one handles which request. The Postgres is bound to kscloud1's Tailscale IP, not the public IP, so it's never reachable from the internet."

On Docker and Infrastructure

"Why containerize everything with Docker?"

"Dependency isolation, reproducibility, and operational consistency. Each service has its own runtime environment — Karakeep runs Node.js, Authentik runs Python, Forgejo runs Go, and they never conflict. The entire stack is defined in version-controlled docker-compose files, so I can recreate it exactly on a new machine. Docker's restart policies handle service recovery automatically. And the shared bridge network lets cloudflared reach any service by container name, which is cleaner than managing static IP assignments."

"How does the monitoring work?"

"Prometheus scrapes metrics from two node-exporter instances every 15 seconds — one on the home machine via Docker DNS and one on the Hetzner VPS via its public IP. Grafana visualizes both with the Node Exporter Full dashboard, and you can switch between hosts with an instance picker. Uptime Kuma runs external HTTP checks against all eleven public subdomains and alerts me if any go down."

Questions They Might Ask You — With Answers

Q: What would you do differently if rebuilding from scratch?

"I'd set up Kubernetes instead of Docker Compose — the active-active architecture I built manually is what Kubernetes handles natively. I'd also use Terraform for the cloud infrastructure instead of manual setup, and consider a secrets manager like Vault instead of .env files."

Q: How do you handle secrets and credentials?

"Currently .env files on each host, not committed to version control. In a production environment I'd use HashiCorp Vault or a cloud provider's secrets manager. The next improvement I have planned is integrating Vault with Docker secrets injection."

Q: What's the difference between what you built and a production enterprise setup?

"Scale, observability, and automation. I have manual deployments; production uses CI/CD pipelines. I have two hosts; production might have dozens in multiple regions. My monitoring covers infrastructure; production monitoring also covers application performance, business metrics, and has PagerDuty integration. And production would have formal incident response runbooks, not just a RUNBOOK.md file."

Q: How does this relate to AWS/cloud engineering?

"Almost everything I built has a direct AWS equivalent. My Hetzner VPS is an EC2 instance. My Cloudflare Tunnel is comparable to an ALB with CloudFront. My Tailscale private networking maps to VPC peering. My Prometheus/Grafana stack is what CloudWatch does as a managed service. My Docker Compose files are the manual equivalent of ECS task definitions or Kubernetes manifests. Building this homelab gave me the conceptual foundation that makes cloud services make intuitive sense."

Red Flags to Avoid

Don't say "I just followed tutorials" — you debugged production issues
Don't say "I don't know how X works" — say "I understand the concept, I'd need to look up the specific syntax"
Don't memorize answers — understand them. If you understand the OAuth2 flow, any question about it is answerable
Don't undersell the invalid_grant fix — that's a real distributed systems problem that junior engineers typically can't diagnose

Green Flags to Show

You know WHY, not just WHAT
You can trace a request from browser to service to database and back
You fixed a real production bug by understanding the underlying protocol
You documented everything
You're actively studying (CCNA, cloud certs coming)

8 KiB Raw Blame History