My Homelab

The homelab is ever changing (mostly growing) and currently consists of a single Proxmox box, a Raspberry Pi, a Unifi stack, and an ever-growing list of things to self-host. The whole thing runs on a strict GitOps loop. Terraform provisions the VMs, Ansible handles the OS setup, and ArgoCD manages k8s.

Description

All workloads run as VMs on the Proxmox host. Two of them handle network foundations: one for AdGuard Home (DNS plus ad blocking), one for Tailscale. The rest form a k3s cluster, with the Pi available for low-risk workloads (backup adguard dns and internal cluster status).

The cluster runs Traefik for ingress, MetalLB for LoadBalancer services, Longhorn for distributed block storage, Authentik for SSO, cert-manager for TLS, CloudNativePG for managed Postgres, self-hosted Renovate for automated dependency updates, and the common observability stack of (Loki, Prometheus, and Grafana although there is quite a lot of improvements I should perform for observability). I also host Home Assistant, Immich, Seafile, Dawarich, FreshRSS, and some other services. A self-hosted AI stack runs alongside them: LiteLLM as an internal, OpenAI-compatible gateway (ClusterIP-only, with provider keys and budgets kept in one place and a scoped virtual key per consumer… meaning my wife and myself), LibreChat as the chat UI in front of it, and SearXNG as a private metasearch engine that doubles as the stack’s web-search backend. Coder provides remote dev environments in the cluster.

External traffic goes through a Cloudflare Tunnel to avoid port forwarding and public IPs. The tunnel terminates at Traefik inside the cluster, which handles TLS and routes to services from there. Admin interfaces (Grafana, ArgoCD, AdGuard) don’t have public DNS records at all. AdGuard resolves their hostnames only on the local network, pointing at Traefik’s LAN IP. Remote access to the LAN itself, for managing VMs or hitting internal-only services, goes through Tailscale. A dedicated VM advertises the entire homelab subnet into the tailnet, so any authorised device gets full LAN access with no extra tunnelling.

The homelab repo isn’t public, but secrets still get “proper” handling: They are all encrypted with SOPS + age before they are committed.

A small (lifetime) VPS runs an “external status check” using uptime-kuma. It does not have any special access to the homelab and just checks publicly accessible pages: status page. (I also made an ansible module to automate this: uptime-kuma-ansible (or rather asked AI to extract the code into that module ;-))

For an internal and more detailed status checks, gatus runs on the Pi (pinned via taint). If the main VM cluster has a problem the Pi’s independent hardware and network path hopefully mean the status page is still up and actually shows what’s broken (in theory…). Both gatus and Prometheus’s Alertmanager push alerts to a Telegram bot so I receive them on my phone.

k8up (a Kubernetes backup operator built on restic) handles scheduled snapshots to Hetzner S3. Databases get exec-based backups, running pg_dump or tar inside the live container; the CloudNativePG-managed clusters instead archive their WAL and take scheduled base backups straight to Hetzner S3.

Kubescape scans the cluster continuously and flags anything that drifts from sensible defaults.