Deep, code-grounded tree-docs guidebook under vibe/guidebooks/factory-provisioning/, explored from the actual playbooks/roles and tofu code: - Hub: the two provisioning engines (operator-run Ansible vs CI-applied OpenTofu), a green-field bring-up flow, master index, maintenance rule. - ansible/ sub-tree: ordered pages 01-system .. 06-recover, an inventory & variables concept page, and a Tier-1/Tier-2 roles reference (hashicorp_vault, step_ca, crowdsec, pihole, deploy_docker_compose + the gitea_* family and helpers). - opentofu/ sub-tree: factory-iac (Cloudflare/OVH/GCP/Gitea/Vault edge + cloudflare_token module), postgres-iac (per-app DB/role/pgbouncer lookup), ci-apply-flow (Gitea OIDC-JWT -> Vault -> auto-approve apply). Cross-linked bidirectionally with the lab-ecosystem guidebook and the safe-env ADR/PRD (the sandbox rehearses exactly these engines). 14 mermaid diagrams MCP-validated; zero dead links. Authored by the Lab Cartographer cohort. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
10 KiB
vibe > Guidebooks > Factory provisioning > Ansible > 01 · System
01 · System — base OS, Docker, K3s, Longhorn, DNS, SSL
Note
Status: ✅ active · Last Updated: 2026-06-23 Upstream: Ansible sub-hub · Factory provisioning hub Downstream: 02 · Setup · 03 · CI/CD Related: Storage & recovery · Secrets & Vault · Naming conventions · ADR-0001 safe prod-like environment
What it does
01 · System takes three bare Raspberry Pis (pi1, pi2, pi3) and turns them into a configured K3s cluster. The wrapper playbooks/01_system.yml does nothing but import_playbook the stage orchestrator playbooks/system/system.yml, which in turn imports ten sub-playbooks in strict order. Each sub-play layers one capability: hostname/DNS hygiene, Pi-hole HA DNS, the step-ca PKI, the external backup disk, Docker, the iSCSI/dm-crypt prerequisites for Longhorn, K3s itself, CoreDNS forwarding, the cert-manager issuer, and finally the cluster config (Longhorn + Traefik).
All host-facing plays target raspberries:&local — the intersection of the raspberries group and the local group, which resolves to pi1/pi2/pi3 (see Inventory & variables). The K3s server/agent split is decided at runtime: the first host (alphabetically) becomes the server, the rest become agents.
Ordered steps
| # | Sub-playbook | Purpose | Key vars / versions |
|---|---|---|---|
| 1 | system/rpi.yml |
Set each node's hostname to its inventory_hostname. On Pi-hole nodes (pi1/pi3) add dnsmasq to the dip group, then stop & disable dnsmasq to free port 53 for pihole-FTL. |
tags: never (opt-in only) |
| 2 | dns/dns.yml → dns/pihole.yml |
Install & configure Pi-hole HA DNS via the pihole role. Adds custom records mapping .arcodange.lab and .arcodange.duckdns.org to pi1. |
pihole_custom_dns → pi1.preferred_ip |
| 3 | ssl/ssl.yml → ssl/step-ca.yml |
Install step-ca (the step_ca role) on all three Pis; fetch the root CA from pi1; build a Gitea runner image that trusts the CA (runner-images:ubuntu-latest-ca) and push it to the registry. |
step_ca_primary: pi1, root at /home/step/.step/certs/root_ca.crt |
| 4 | system/prepare_disks.yml |
Auto-detect the largest external (non-mmcblk0) USB partition, format it ext4 with label arcodange_500, mount at /mnt/arcodange, and persist in fstab. Skips format if the label already exists. pause confirm before any format. |
mount_point: /mnt/arcodange, disk_label: arcodange_500 |
| 5 | system/system_docker.yml |
Install Docker via geerlingguy.docker; write daemon.json with json-file logging (max-size 10m, max-file 5) and data-root: /mnt/arcodange/docker (only when the external disk is mounted). |
tags: never; storage-driver: overlay2 |
| 6 | system/iscsi_longhorn.yml |
Install open-iscsi (+ enable iscsid) and cryptsetup, and load the dm_crypt kernel module (persisted in /etc/modules) — Longhorn's encrypted-volume prerequisites. Creates /mnt/arcodange/longhorn. |
module dm_crypt |
| 7 | system/system_k3s.yml |
Build the K3s inventory dynamically (first sorted host → server, rest → agent), install the k3s-ansible content, run k3s.orchestration.site, then fetch the kubeconfig to ~/.kube/config (rewriting 127.0.0.1 → server IP). |
k3s v1.34.3+k3s1; server args --docker --disable traefik |
| 8 | system/k3s_dns.yml |
Create the coredns-custom ConfigMap so cluster DNS forwards arcodange.lab:53 to the Pi-hole IPs; also patch the main CoreDNS Corefile to forward to the same HA Pi-holes. |
pihole_ips (extracted from hostvars) |
| 9 | system/k3s_ssl.yml |
Deploy cert-manager + step-issuer as k3s static HelmCharts; create the StepClusterIssuer step-ca wired to the JWK provisioner and root CA. |
cert-manager v1.19.2, step-issuer 1.9.11, caUrl: https://ssl-ca.arcodange.lab:8443, ARM64 kube-rbac-proxy override |
| 10 | system/k3s_config.yml |
Deploy Longhorn + Traefik as HelmCharts; issue the wildcard cert, set the default TLSStore, wire Gitea, the IP-allow-list middleware, and the CrowdSec bouncer plugin; then delete the old Traefik to force a redeploy. |
Longhorn v1.9.1, Traefik v37.4.0 (see detail below) |
How the stages fit together
%%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#1f2937','primaryTextColor':'#f9fafb','lineColor':'#6b7280','fontSize':'13px'}}}%%
flowchart TD
classDef host fill:#1e3a5f,stroke:#3b82f6,color:#f9fafb;
classDef cluster fill:#1e4032,stroke:#22c55e,color:#f0fdf4;
classDef danger fill:#5f1e1e,stroke:#ef4444,color:#fef2f2;
rpi["1 · rpi.yml<br>hostname + dnsmasq off"]:::host
dns["2 · pihole<br>HA DNS"]:::host
ssl["3 · step-ca<br>root CA + CA-trusting runner image"]:::host
disk["4 · prepare_disks.yml<br>ext4 arcodange_500 -> /mnt/arcodange"]:::danger
docker["5 · system_docker.yml<br>data-root on external disk"]:::host
iscsi["6 · iscsi_longhorn.yml<br>open-iscsi + dm_crypt"]:::host
k3s["7 · system_k3s.yml<br>k3s v1.34.3 (--disable traefik)"]:::cluster
cdns["8 · k3s_dns.yml<br>coredns-custom -> Pi-hole"]:::cluster
cmgr["9 · k3s_ssl.yml<br>cert-manager + step-issuer"]:::cluster
cfg["10 · k3s_config.yml<br>Longhorn + Traefik + redeploy"]:::cluster
rpi --> dns --> ssl --> disk --> docker --> iscsi --> k3s --> cdns --> cmgr --> cfg
rpi.ymlfixes the hostname and, on Pi-hole nodes, stopsdnsmasqsopihole-FTLcan own port 53.- Pi-hole comes up as the HA DNS authority for
arcodange.lab. - step-ca is installed; its root CA is fetched and baked into a Gitea runner image so CI can trust internal TLS.
prepare_disks.ymlformats and mounts the external USB disk at/mnt/arcodange(with a confirmation pause).- Docker installs with its data-root pointed at that disk and capped logging.
- iSCSI + dm_crypt prerequisites land so Longhorn can attach (and encrypt) volumes.
- K3s installs with the first host as server, Docker as the container runtime, and Traefik disabled.
- CoreDNS is reconfigured to forward
arcodange.labto the Pi-holes. - cert-manager + step-issuer wire the in-cluster issuer to step-ca.
k3s_config.ymldeploys Longhorn and a fully-customized Traefik, then deletes the old Traefik so the helm-controller redeploys with the new config.
k3s_config.yml — Longhorn & Traefik detail
| Resource | Value | Notes |
|---|---|---|
| Longhorn HelmChart | v1.9.1 |
defaultSettings.defaultDataPath: /mnt/arcodange/longhorn — volumes live on the external disk. |
| Traefik HelmChart | v37.4.0 |
Deployed as a k3s static manifest (traefik-v3.yaml) with an inline traefik-configmap. |
| Wildcard cert | wildcard-arcodange-lab |
Certificate for arcodange.lab + *.arcodange.lab, issued by the step-issuer StepClusterIssuer. |
TLSStore default |
defaultCertificate: wildcard-arcodange-lab |
Makes the wildcard cert the cluster-wide default. |
| Gitea exposure | gitea-external ExternalName Service → pi2 port 3000 |
Gitea runs outside K3s as Docker Compose on pi2; Traefik routes gitea.arcodange.lab to it. |
localIp middleware |
ipAllowList |
Restricts dashboard/Gitea routers to LAN + pod CIDR + the detected public IP. |
| CrowdSec bouncer | plugin v1.3.3 |
Traefik experimental plugin crowdsec-bouncer-traefik-plugin (config completed in 04 · Tools). |
| DuckDNS token | traefik-duckdns-token Secret → DUCKDNS_TOKEN |
Consumed by the letsencrypt ACME DNS-challenge resolver via envFrom. |
Gotchas
Caution
Step 4 formats a disk — data loss is real.
prepare_disks.ymlpicks the largest non-system partition and runsmkfs.ext4 -Fon it when thearcodange_500label is absent. Therun_oncepauseprompt ("tapez 'oui' pour continuer") is the only guard, and a wrong USB stick plugged into the wrong Pi will be wiped. Confirmtarget_devicein the debug output before answering. If a candidate already carries the label, the format is skipped and the disk is only (re)mounted.
Warning
K3s ships with
--disable traefik. The bundled Traefik is intentionally turned off in step 7 so step 10 can deploy its own fully-customizedv37.4.0. If you re-enable the bundled Traefik or runk3s_config.ymlout of order, two Traefiks will fight over the ingress ports.
Warning
ARM64 needs the
kube-rbac-proxyimage override. step-issuer's defaultgcr.io/kubebuilder/kube-rbac-proxy:v0.15.0is AMD64-only and crash-loops onpi3(ARM64).k3s_ssl.ymloverrides it toquay.io/brancz/kube-rbac-proxy:v0.15.0. Do not remove this override.
Warning
Traefik is force-redeployed. The last play of
k3s_config.ymldeletes thetraefikDeployment and thehelm-install-traefikJob so the k3s helm-controller re-runs the install against the new manifest. Expect a brief ingress outage during this window; the play then waits for the new Deployment to come back before finishing.
Note
tags: neverplays are opt-in.rpi.ymlandsystem_docker.ymlcarrytags: never, so they are skipped unless you explicitly pass their tag (e.g.--tags rpi/--tags ...) or--tags all. The K3s/Longhorn/Traefik plays run on a normal invocation.