Complete documentation suite for KiteStacks covering all 11 services across 2-host active-active architecture. Includes beginner track (with AI, 8 files) and advanced track (without AI, 7 files) with time estimates, real troubleshooting cases, and command-by-command explanations. Updates certifications roadmap to reflect July 7 2026 A+ Core 2 exam goal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
229 lines
6.6 KiB
Markdown
229 lines
6.6 KiB
Markdown
# Step 8 — Monitoring
|
||
|
||
**Track:** With AI (Beginner)
|
||
**Time for this step:** 2–3 hours
|
||
|
||
Monitoring means knowing when something is wrong before your users tell you.
|
||
In this step you will set up three layers of monitoring:
|
||
|
||
1. **Grafana** — beautiful dashboards showing CPU, RAM, disk, and network over time
|
||
2. **Uptime Kuma** — checks every 60 seconds that each service responds correctly
|
||
3. **Conky** — a desktop widget on your home computer showing live kscloud1 status
|
||
|
||
---
|
||
|
||
## Monitoring Layer 1 — Grafana + Prometheus
|
||
|
||
You already deployed Grafana and Prometheus in Step 5. Now configure them properly.
|
||
|
||
### Edit the Prometheus Config
|
||
|
||
Prometheus needs to know where to collect metrics from. Tell it about both machines:
|
||
|
||
```bash
|
||
nano ~/kitestacks-live/docker/prometheus/prometheus.yml
|
||
```
|
||
|
||
Add this content:
|
||
```yaml
|
||
global:
|
||
scrape_interval: 15s
|
||
|
||
scrape_configs:
|
||
- job_name: 'monk-node'
|
||
static_configs:
|
||
- targets: ['node-exporter:9100']
|
||
labels:
|
||
instance: 'monk'
|
||
|
||
- job_name: 'kscloud1-node'
|
||
static_configs:
|
||
- targets: ['YOUR_VPS_IP:9100']
|
||
labels:
|
||
instance: 'kscloud1'
|
||
```
|
||
|
||
Replace `YOUR_VPS_IP` with your VPS's public IP address.
|
||
|
||
**On kscloud1**, make sure node-exporter is configured to be reachable publicly:
|
||
```yaml
|
||
# In node-exporter's docker-compose.yml on kscloud1
|
||
ports:
|
||
- "0.0.0.0:9100:9100"
|
||
```
|
||
|
||
Restart Prometheus:
|
||
```bash
|
||
cd ~/kitestacks-live/docker/prometheus
|
||
docker compose restart prometheus
|
||
```
|
||
|
||
### Configure Grafana Provisioning
|
||
|
||
Tell Grafana to automatically load Prometheus as a data source and load the
|
||
Node Exporter Full dashboard:
|
||
|
||
Create `~/kitestacks-live/docker/grafana/provisioning/datasources/prometheus.yml`:
|
||
```yaml
|
||
apiVersion: 1
|
||
datasources:
|
||
- name: Prometheus
|
||
type: prometheus
|
||
uid: 000000001
|
||
url: http://prometheus:9090
|
||
isDefault: true
|
||
```
|
||
|
||
Create `~/kitestacks-live/docker/grafana/provisioning/dashboards/dashboards.yml`:
|
||
```yaml
|
||
apiVersion: 1
|
||
providers:
|
||
- name: default
|
||
folder: KiteStacks
|
||
type: file
|
||
options:
|
||
path: /etc/grafana/provisioning/dashboards
|
||
```
|
||
|
||
The Node Exporter Full dashboard (id 1860) can be imported from Grafana's dashboard library:
|
||
1. Log in to grafana.yourdomain.com
|
||
2. Left menu → Dashboards → Import
|
||
3. Enter ID: `1860`
|
||
4. Select your Prometheus datasource
|
||
5. Import
|
||
|
||
You should now see CPU, RAM, disk, and network graphs for both monk and kscloud1.
|
||
Switch between them using the "instance" dropdown at the top of the dashboard.
|
||
|
||
---
|
||
|
||
## Monitoring Layer 2 — Uptime Kuma
|
||
|
||
You set up Uptime Kuma in Step 5. Now add monitors for all your services.
|
||
|
||
Log in to `status.yourdomain.com` and add an HTTP monitor for each service:
|
||
|
||
| Monitor Name | URL | Check Interval |
|
||
|-------------|-----|----------------|
|
||
| Main Website | https://www.yourdomain.com | 60s |
|
||
| Authentik | https://auth.yourdomain.com | 60s |
|
||
| Forgejo | https://gitforge.yourdomain.com | 60s |
|
||
| KiteAI | https://ai.yourdomain.com | 60s |
|
||
| Karakeep | https://links.yourdomain.com | 60s |
|
||
| Kavita | https://kavita.yourdomain.com | 60s |
|
||
| Grafana | https://grafana.yourdomain.com | 60s |
|
||
| BookStack | https://wiki.yourdomain.com | 60s |
|
||
| OSTicket | https://tasks.yourdomain.com | 60s |
|
||
| Portainer | https://portainer.yourdomain.com | 60s |
|
||
| kscloud1 | (ping to kscloud1 IP) | 60s |
|
||
| Monk | (ping to monk's Tailscale IP) | 60s |
|
||
|
||
Then create a Status Page:
|
||
1. Status Pages → New Status Page
|
||
2. Title: "KiteStacks Status"
|
||
3. Slug: `homelab`
|
||
4. Add all monitors to it
|
||
|
||
**Push Uptime Kuma to kscloud1:**
|
||
|
||
The Conky widget on your desktop reads kscloud1's Uptime Kuma, not monk's. Push monk's
|
||
database to kscloud1 after setting up monitors:
|
||
|
||
**Ask your AI:** "How do I copy a Docker named volume's SQLite database from one machine
|
||
to another using Python's sqlite3.backup() method?"
|
||
|
||
---
|
||
|
||
## Monitoring Layer 3 — Conky Desktop Widget
|
||
|
||
Conky is a program that draws information on your desktop background in real time.
|
||
Your KiteStacks widget shows whether each service on kscloud1 is up (green dot) or
|
||
down (red dot), refreshed every 15 seconds.
|
||
|
||
### Install Conky
|
||
|
||
```bash
|
||
sudo apt install conky-all
|
||
```
|
||
|
||
### Install the Widget Script
|
||
|
||
The widget script reads Uptime Kuma's API and formats the output for Conky.
|
||
The script is at `~/.local/bin/kitestacks-uptime-widget.sh` in the homelab repo.
|
||
|
||
Copy it to your machine:
|
||
```bash
|
||
mkdir -p ~/.local/bin
|
||
cp ~/kitestacks-homelab/apps/conky/kitestacks-uptime-widget.sh ~/.local/bin/
|
||
chmod +x ~/.local/bin/kitestacks-uptime-widget.sh
|
||
```
|
||
|
||
Edit the script to use your kscloud1's Tailscale IP:
|
||
```bash
|
||
nano ~/.local/bin/kitestacks-uptime-widget.sh
|
||
```
|
||
|
||
Change the `KUMA_URL` line:
|
||
```bash
|
||
KUMA_URL="http://100.123.x.x:3001" # kscloud1's Tailscale IP
|
||
```
|
||
|
||
### Enable the Conky Config
|
||
|
||
```bash
|
||
cp ~/kitestacks-homelab/apps/conky/kitestacks-uptime.conf ~/.config/conky/kitestacks-uptime.conf
|
||
conky -c ~/.config/conky/kitestacks-uptime.conf -d
|
||
```
|
||
|
||
The widget should appear in the top-right corner of your desktop, showing a dot for
|
||
each service — green for up, red for down.
|
||
|
||
**Ask your AI:** "How do I make Conky start automatically when I log in to my Ubuntu desktop?"
|
||
|
||
---
|
||
|
||
## Setting Up Alerts
|
||
|
||
Uptime Kuma can send you a notification on your phone when a service goes down.
|
||
|
||
**Option 1: ntfy (recommended — self-hosted)**
|
||
You have ntfy running as a container. Set up an ntfy notification in Uptime Kuma:
|
||
- Notification Type: ntfy
|
||
- URL: your ntfy server URL
|
||
- Topic: choose a topic name (e.g., `homelab-alerts`)
|
||
|
||
Install the ntfy app on your phone and subscribe to your topic.
|
||
|
||
**Option 2: Email**
|
||
Configure email notifications in Uptime Kuma using your email address.
|
||
|
||
**Ask your AI:** "How do I configure Uptime Kuma to send notifications via ntfy?"
|
||
|
||
---
|
||
|
||
## Checkpoint
|
||
|
||
- [ ] Prometheus is collecting metrics from both monk and kscloud1
|
||
- [ ] Grafana shows Node Exporter Full dashboard with both hosts
|
||
- [ ] Uptime Kuma has monitors for all 11 services
|
||
- [ ] Uptime Kuma status page is live at status.yourdomain.com/status/homelab
|
||
- [ ] Uptime Kuma database has been pushed to kscloud1
|
||
- [ ] Conky widget is showing on your desktop with live service status
|
||
- [ ] You receive a notification when you manually pause a service in Uptime Kuma
|
||
|
||
---
|
||
|
||
## Congratulations — Your Homelab Is Complete
|
||
|
||
You have built a production homelab with:
|
||
- 11 self-hosted services running in Docker
|
||
- Single sign-on via Authentik
|
||
- Cloud failover on a Hetzner VPS
|
||
- Private networking over Tailscale
|
||
- Real-time monitoring via Grafana and Uptime Kuma
|
||
- A live desktop status widget
|
||
|
||
Everything you built here maps directly to enterprise cloud engineering skills.
|
||
Every concept has a certification that covers it in depth.
|
||
|
||
**Your next step:** [certifications/roadmap.md](../../certifications/roadmap.md)
|