[vibe](../../../README.md) > [Guidebooks](../../README.md) > [Factory provisioning](../README.md) > [Ansible](README.md) > **01 · System** # 01 · System — base OS, Docker, K3s, Longhorn, DNS, SSL > [!NOTE] > **Status:** ✅ active · **Last Updated:** 2026-06-23 > **Upstream:** [Ansible sub-hub](README.md) · [Factory provisioning hub](../README.md) > **Downstream:** [02 · Setup](02-setup.md) · [03 · CI/CD](03-cicd.md) > **Related:** [Storage & recovery](../../lab-ecosystem/storage-and-recovery.md) · [Secrets & Vault](../../lab-ecosystem/secrets-and-vault.md) · [Naming conventions](../../lab-ecosystem/naming-conventions.md) · [ADR-0001 safe prod-like environment](../../../ADR/0001-safe-prod-like-environment.md) ## What it does `01 · System` takes three bare Raspberry Pis (`pi1`, `pi2`, `pi3`) and turns them into a configured K3s cluster. The wrapper [`playbooks/01_system.yml`](../../../../ansible/arcodange/factory/playbooks/01_system.yml) does nothing but `import_playbook` the stage orchestrator [`playbooks/system/system.yml`](../../../../ansible/arcodange/factory/playbooks/system/system.yml), which in turn imports ten sub-playbooks **in strict order**. Each sub-play layers one capability: hostname/DNS hygiene, Pi-hole HA DNS, the step-ca PKI, the external backup disk, Docker, the iSCSI/dm-crypt prerequisites for Longhorn, K3s itself, CoreDNS forwarding, the cert-manager issuer, and finally the cluster config (Longhorn + Traefik). All host-facing plays target `raspberries:&local` — the intersection of the `raspberries` group and the `local` group, which resolves to `pi1`/`pi2`/`pi3` (see [Inventory & variables](inventory.md)). The K3s server/agent split is decided at runtime: the **first host (alphabetically) becomes the server**, the rest become agents. ## Ordered steps | # | Sub-playbook | Purpose | Key vars / versions | | --- | --- | --- | --- | | 1 | [`system/rpi.yml`](../../../../ansible/arcodange/factory/playbooks/system/rpi.yml) | Set each node's hostname to its `inventory_hostname`. On Pi-hole nodes (`pi1`/`pi3`) add `dnsmasq` to the `dip` group, then **stop & disable `dnsmasq`** to free port 53 for `pihole-FTL`. | `tags: never` (opt-in only) | | 2 | [`dns/dns.yml`](../../../../ansible/arcodange/factory/playbooks/dns/dns.yml) → [`dns/pihole.yml`](../../../../ansible/arcodange/factory/playbooks/dns/pihole.yml) | Install & configure **Pi-hole HA DNS** via the `pihole` role. Adds custom records mapping `.arcodange.lab` and `.arcodange.duckdns.org` to `pi1`. | `pihole_custom_dns` → `pi1.preferred_ip` | | 3 | [`ssl/ssl.yml`](../../../../ansible/arcodange/factory/playbooks/ssl/ssl.yml) → [`ssl/step-ca.yml`](../../../../ansible/arcodange/factory/playbooks/ssl/step-ca.yml) | Install **step-ca** (the `step_ca` role) on all three Pis; fetch the root CA from `pi1`; build a **Gitea runner image that trusts the CA** (`runner-images:ubuntu-latest-ca`) and push it to the registry. | `step_ca_primary: pi1`, root at `/home/step/.step/certs/root_ca.crt` | | 4 | [`system/prepare_disks.yml`](../../../../ansible/arcodange/factory/playbooks/system/prepare_disks.yml) | Auto-detect the largest external (non-`mmcblk0`) USB partition, format it **ext4 with label `arcodange_500`**, mount at `/mnt/arcodange`, and persist in `fstab`. Skips format if the label already exists. **`pause` confirm before any format.** | `mount_point: /mnt/arcodange`, `disk_label: arcodange_500` | | 5 | [`system/system_docker.yml`](../../../../ansible/arcodange/factory/playbooks/system/system_docker.yml) | Install Docker via `geerlingguy.docker`; write `daemon.json` with **json-file logging** (`max-size 10m`, `max-file 5`) and **`data-root: /mnt/arcodange/docker`** (only when the external disk is mounted). | `tags: never`; `storage-driver: overlay2` | | 6 | [`system/iscsi_longhorn.yml`](../../../../ansible/arcodange/factory/playbooks/system/iscsi_longhorn.yml) | Install `open-iscsi` (+ enable `iscsid`) and `cryptsetup`, and load the **`dm_crypt`** kernel module (persisted in `/etc/modules`) — Longhorn's encrypted-volume prerequisites. Creates `/mnt/arcodange/longhorn`. | module `dm_crypt` | | 7 | [`system/system_k3s.yml`](../../../../ansible/arcodange/factory/playbooks/system/system_k3s.yml) | Build the K3s inventory dynamically (first sorted host → `server`, rest → `agent`), install the `k3s-ansible` content, run `k3s.orchestration.site`, then **fetch the kubeconfig** to `~/.kube/config` (rewriting `127.0.0.1` → server IP). | **k3s `v1.34.3+k3s1`**; server args `--docker --disable traefik` | | 8 | [`system/k3s_dns.yml`](../../../../ansible/arcodange/factory/playbooks/system/k3s_dns.yml) | Create the **`coredns-custom`** ConfigMap so cluster DNS forwards `arcodange.lab:53` to the Pi-hole IPs; also patch the main CoreDNS Corefile to forward to the same HA Pi-holes. | `pihole_ips` (extracted from hostvars) | | 9 | [`system/k3s_ssl.yml`](../../../../ansible/arcodange/factory/playbooks/system/k3s_ssl.yml) | Deploy **cert-manager** + **step-issuer** as k3s static HelmCharts; create the `StepClusterIssuer` `step-ca` wired to the JWK provisioner and root CA. | cert-manager `v1.19.2`, step-issuer `1.9.11`, `caUrl: https://ssl-ca.arcodange.lab:8443`, **ARM64 `kube-rbac-proxy` override** | | 10 | [`system/k3s_config.yml`](../../../../ansible/arcodange/factory/playbooks/system/k3s_config.yml) | Deploy **Longhorn** + **Traefik** as HelmCharts; issue the wildcard cert, set the default `TLSStore`, wire Gitea, the IP-allow-list middleware, and the CrowdSec bouncer plugin; then **delete the old Traefik** to force a redeploy. | Longhorn `v1.9.1`, Traefik `v37.4.0` (see detail below) | ## How the stages fit together ```mermaid %%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#1f2937','primaryTextColor':'#f9fafb','lineColor':'#6b7280','fontSize':'13px'}}}%% flowchart TD classDef host fill:#1e3a5f,stroke:#3b82f6,color:#f9fafb; classDef cluster fill:#1e4032,stroke:#22c55e,color:#f0fdf4; classDef danger fill:#5f1e1e,stroke:#ef4444,color:#fef2f2; rpi["1 · rpi.yml
hostname + dnsmasq off"]:::host dns["2 · pihole
HA DNS"]:::host ssl["3 · step-ca
root CA + CA-trusting runner image"]:::host disk["4 · prepare_disks.yml
ext4 arcodange_500 -> /mnt/arcodange"]:::danger docker["5 · system_docker.yml
data-root on external disk"]:::host iscsi["6 · iscsi_longhorn.yml
open-iscsi + dm_crypt"]:::host k3s["7 · system_k3s.yml
k3s v1.34.3 (--disable traefik)"]:::cluster cdns["8 · k3s_dns.yml
coredns-custom -> Pi-hole"]:::cluster cmgr["9 · k3s_ssl.yml
cert-manager + step-issuer"]:::cluster cfg["10 · k3s_config.yml
Longhorn + Traefik + redeploy"]:::cluster rpi --> dns --> ssl --> disk --> docker --> iscsi --> k3s --> cdns --> cmgr --> cfg ``` 1. **`rpi.yml`** fixes the hostname and, on Pi-hole nodes, stops `dnsmasq` so `pihole-FTL` can own port 53. 2. **Pi-hole** comes up as the HA DNS authority for `arcodange.lab`. 3. **step-ca** is installed; its root CA is fetched and baked into a Gitea runner image so CI can trust internal TLS. 4. **`prepare_disks.yml`** formats and mounts the external USB disk at `/mnt/arcodange` (with a confirmation pause). 5. **Docker** installs with its data-root pointed at that disk and capped logging. 6. **iSCSI + dm_crypt** prerequisites land so Longhorn can attach (and encrypt) volumes. 7. **K3s** installs with the first host as server, Docker as the container runtime, and Traefik disabled. 8. **CoreDNS** is reconfigured to forward `arcodange.lab` to the Pi-holes. 9. **cert-manager + step-issuer** wire the in-cluster issuer to step-ca. 10. **`k3s_config.yml`** deploys Longhorn and a fully-customized Traefik, then deletes the old Traefik so the helm-controller redeploys with the new config. ## `k3s_config.yml` — Longhorn & Traefik detail | Resource | Value | Notes | | --- | --- | --- | | Longhorn HelmChart | `v1.9.1` | `defaultSettings.defaultDataPath: /mnt/arcodange/longhorn` — volumes live on the external disk. | | Traefik HelmChart | `v37.4.0` | Deployed as a k3s static manifest (`traefik-v3.yaml`) with an inline `traefik-configmap`. | | Wildcard cert | `wildcard-arcodange-lab` | `Certificate` for `arcodange.lab` + `*.arcodange.lab`, issued by the `step-issuer` `StepClusterIssuer`. | | `TLSStore` `default` | `defaultCertificate: wildcard-arcodange-lab` | Makes the wildcard cert the cluster-wide default. | | Gitea exposure | `gitea-external` `ExternalName` Service → `pi2` port 3000 | Gitea runs **outside** K3s as Docker Compose on `pi2`; Traefik routes `gitea.arcodange.lab` to it. | | `localIp` middleware | `ipAllowList` | Restricts dashboard/Gitea routers to LAN + pod CIDR + the detected public IP. | | CrowdSec bouncer | plugin `v1.3.3` | Traefik experimental plugin `crowdsec-bouncer-traefik-plugin` (config completed in [04 · Tools](04-tools.md)). | | DuckDNS token | `traefik-duckdns-token` Secret → `DUCKDNS_TOKEN` | Consumed by the `letsencrypt` ACME DNS-challenge resolver via `envFrom`. | ## Gotchas > [!CAUTION] > **Step 4 formats a disk — data loss is real.** `prepare_disks.yml` picks the **largest non-system partition** and runs `mkfs.ext4 -F` on it when the `arcodange_500` label is absent. The `run_once` `pause` prompt ("tapez 'oui' pour continuer") is the only guard, and a wrong USB stick plugged into the wrong Pi will be wiped. Confirm `target_device` in the debug output before answering. If a candidate already carries the label, the format is skipped and the disk is only (re)mounted. > [!WARNING] > **K3s ships with `--disable traefik`.** The bundled Traefik is intentionally turned off in step 7 so step 10 can deploy its own fully-customized `v37.4.0`. If you re-enable the bundled Traefik or run `k3s_config.yml` out of order, two Traefiks will fight over the ingress ports. > [!WARNING] > **ARM64 needs the `kube-rbac-proxy` image override.** step-issuer's default `gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0` is AMD64-only and **crash-loops on `pi3` (ARM64)**. `k3s_ssl.yml` overrides it to `quay.io/brancz/kube-rbac-proxy:v0.15.0`. Do not remove this override. > [!WARNING] > **Traefik is force-redeployed.** The last play of `k3s_config.yml` deletes the `traefik` Deployment **and** the `helm-install-traefik` Job so the k3s helm-controller re-runs the install against the new manifest. Expect a brief ingress outage during this window; the play then waits for the new Deployment to come back before finishing. > [!NOTE] > **`tags: never` plays are opt-in.** `rpi.yml` and `system_docker.yml` carry `tags: never`, so they are skipped unless you explicitly pass their tag (e.g. `--tags rpi` / `--tags ...`) or `--tags all`. The K3s/Longhorn/Traefik plays run on a normal invocation.