The two factory-provisioning sub-hubs were the only guidebook index pages without the "alter a documented component -> update its page in the same PR" reminder that every sibling hub carries. Add a scoped maintenance rule to each, pointing back to the factory-provisioning maintenance rule and the guidebooks' Rules to contribute, so no folder hub silently drifts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
vibe > Guidebooks > Factory provisioning > Ansible
Ansible — factory provisioning
Note
Status: ✅ active · Last Updated: 2026-06-23 Upstream: Factory provisioning hub · Lab ecosystem · 01 factory Downstream: 01 · System · 02 · Setup · 03 · CI/CD · 04 · Tools · 05 · Backup · 06 · Recover · Inventory & variables · Roles reference Related: Secrets & Vault · Storage & recovery · Naming conventions · ADR-0001 safe prod-like environment
Ansible is the imperative half of the factory: it takes three bare Raspberry Pis (pi1, pi2, pi3) and turns them into a running K3s cluster with Docker, Longhorn storage, Gitea CI runners, CrowdSec, and Vault. OpenTofu (the declarative half) then provisions everything that lives outside the cluster — see the OpenTofu sub-hub.
Collection layout
Everything ships as a single Ansible collection committed under ansible/arcodange/factory/. The collection root, not the repo root, is what ansible-galaxy collection install and the FQCN references (arcodange.factory.<role>) resolve against.
| File | Path | What it declares |
|---|---|---|
galaxy.yml |
ansible/arcodange/factory/galaxy.yml |
Collection identity: namespace arcodange, name factory, version 1.0.0. Together they form the FQCN prefix arcodange.factory.* used by every role and playbook import. |
requirements.yml |
ansible/requirements.yml |
External dependencies pulled at install time (see table below). |
ansible.cfg |
ansible/arcodange/factory/ansible.cfg |
collections_path = ~/.ansible/collections and scp_if_ssh = True for the SSH connection plugin. |
inventory/ |
ansible/arcodange/factory/inventory/ |
hosts.yml + group_vars/. Detailed in Inventory & variables. |
playbooks/ |
ansible/arcodange/factory/playbooks/ |
The numbered pipeline 01..05 plus the recover/ branch. |
roles/ |
ansible/arcodange/factory/roles/ |
Seven reusable roles. Detailed in Roles reference. |
External dependencies (requirements.yml)
| Dependency | Type | Why it is needed |
|---|---|---|
geerlingguy.docker |
role | Installs and configures the Docker engine on each Pi. |
ansible.posix |
collection | POSIX primitives (mounts, sysctl, synchronize). |
community.crypto |
collection | Certificate/key generation for the step-ca PKI and Traefik. |
community.docker |
collection | Manages containers and Compose stacks (Gitea, act_runner). |
community.general |
collection | Broad utility modules used across the pipeline. |
kubernetes.core |
collection | k8s / helm modules used by every K3s-facing task. Needs the kubernetes Python lib at runtime. |
k3s-ansible (git+https://github.com/k3s-io/k3s-ansible.git) |
git role/collection | Upstream playbooks that install and cluster K3s itself. |
Tip
The runtime Python libraries (
kubernetes,jmespath,dnspython) thatkubernetes.coreand friends import are declared in the repo-rootpyproject.toml, not inrequirements.yml.uv syncinstalls them;ansible-galaxyinstalls the Galaxy/git content. Both steps are required.
Invocation pattern
The control node runs Ansible from a uv-managed venv. The localhost inventory entry sets ansible_python_interpreter: "{{ ansible_playbook_python }}", so uv run is enough to put Ansible on the venv's Python — no hardcoded interpreter path. Full recipe lives in ansible/README.md.
- Sync the venv — installs
ansible-coreplus the runtime Python deps:uv sync - Install collection dependencies — pulls the Galaxy + git content from
requirements.yml:uv run ansible-galaxy collection install -r ansible/requirements.yml - Run a stage — point
-iat the inventory directory and pass one numbered playbook:uv run ansible-playbook \ -i ansible/arcodange/factory/inventory \ ansible/arcodange/factory/playbooks/<NN_name>.yml
The vault password (ANSIBLE_VAULT_PASSWORD_FILE)
Encrypted vars are decrypted with a password that is sourced from the cluster, not stored on disk. ANSIBLE_VAULT_PASSWORD_FILE points at a tiny executable script that reads the K8s secret arcodange-ansible-vault from the kube-system namespace:
kubectl get secret -n kube-system arcodange-ansible-vault \
--template='{{index .data.pass | base64decode}}'
Important
The same
arcodange-ansible-vaultsecret inkube-systemis consumed by the Gitea CI runners (needed for the Gitea mailer). Create it once withkubectl create secret generic arcodange-ansible-vault --from-literal="pass=<ansible_vault_password>" -n kube-system. See Secrets & Vault for how this fits the broader secret model.
The provisioning pipeline
The numbered playbooks are meant to be run in order on a fresh cluster — each is a thin wrapper that import_playbooks a stage directory (e.g. 01_system.yml → system/system.yml). The recover/ playbooks are not part of the linear sequence; they are an on-demand branch used only during disaster recovery.
%%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#1f2937','primaryTextColor':'#f9fafb','lineColor':'#6b7280','fontSize':'14px'}}}%%
flowchart LR
classDef stage fill:#1e3a5f,stroke:#3b82f6,color:#f9fafb;
classDef recover fill:#5f1e1e,stroke:#ef4444,color:#fef2f2;
s01["01 · System<br/>Docker · K3s · Longhorn · DNS · SSL"]:::stage
s02["02 · Setup<br/>Gitea · Postgres · NFS backup"]:::stage
s03["03 · CI/CD<br/>act_runner registration"]:::stage
s04["04 · Tools<br/>CrowdSec · Vault"]:::stage
s05["05 · Backup<br/>cron reports · PVC/db dumps"]:::stage
rec["recover/*<br/>Longhorn + data restore"]:::recover
s01 --> s02 --> s03 --> s04 --> s05
s05 -. "on disaster" .-> rec
rec -. "rejoin pipeline" .-> s01
01 · System— base OS hardening on each Pi, then Docker, Longhorn disk prep + iSCSI, K3s install, CoreDNS, the step-ca cert issuer, and final K3s config (kubeconfig, Longhorn, Traefik).02 · Setup— deploys the cluster-resident services: Gitea, PostgreSQL (onpi2), and the NFS backup target.03 · CI/CD— fetches a Gitea runner-registration token and rolls out theact_runnerDocker Compose stack on every non-Gitea Pi so CI jobs have executors.04 · Tools— installs the operational tooling layer: CrowdSec (WAF/IPS) and HashiCorp Vault.05 · Backup— schedules the cron-driven backup + email-report jobs and the Gitea / Postgres / K3s-PVC dump routines.recover/*(on demand) — invoked only after data loss to rebuild Longhorn and replay volume data; once recovered, the cluster re-enters the normal pipeline at01 · System.
Index
| # | Page | Covers | State |
|---|---|---|---|
| 01 | System | RPi hardening, Docker, K3s, Longhorn/iSCSI, CoreDNS, step-ca SSL | ✅ |
| 02 | Setup | Gitea, PostgreSQL, NFS backup target | ✅ |
| 03 | CI/CD | Gitea act_runner registration & Compose deploy |
✅ |
| 04 | Tools | CrowdSec, HashiCorp Vault | ✅ |
| 05 | Backup | Cron report jobs, Gitea/Postgres/PVC dumps | ✅ |
| 06 | Recover | Longhorn + data restore (on-demand DR branch) | 🟡 |
| — | Inventory & variables | hosts.yml groups, group_vars/ layering, host→service mapping |
✅ |
| — | Roles reference | The seven arcodange.factory.* roles |
✅ |
Maintenance rule
Important
Alter a playbook, role, inventory entry, or
group_vars→ update the matching page here in the same change. Adding a stage, renaming a role, bumping the K3s version or arequirements.ymldependency, or moving a host between groups all change what the pages above describe — edit the page in the PR that changes the code, never as a follow-up. This is the factory-provisioning maintenance rule applied to the Ansible half; the guidebooks' full Rules to contribute also apply.