Files
Gabriel Radureau 1824a1885d docs(vibe): add maintenance rule to the ansible + opentofu sub-hubs
The two factory-provisioning sub-hubs were the only guidebook index pages without
the "alter a documented component -> update its page in the same PR" reminder that
every sibling hub carries. Add a scoped maintenance rule to each, pointing back to
the factory-provisioning maintenance rule and the guidebooks' Rules to contribute,
so no folder hub silently drifts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 23:42:24 +02:00
..

vibe > Guidebooks > Factory provisioning > Ansible

Ansible — factory provisioning

Note

Status: active · Last Updated: 2026-06-23 Upstream: Factory provisioning hub · Lab ecosystem · 01 factory Downstream: 01 · System · 02 · Setup · 03 · CI/CD · 04 · Tools · 05 · Backup · 06 · Recover · Inventory & variables · Roles reference Related: Secrets & Vault · Storage & recovery · Naming conventions · ADR-0001 safe prod-like environment

Ansible is the imperative half of the factory: it takes three bare Raspberry Pis (pi1, pi2, pi3) and turns them into a running K3s cluster with Docker, Longhorn storage, Gitea CI runners, CrowdSec, and Vault. OpenTofu (the declarative half) then provisions everything that lives outside the cluster — see the OpenTofu sub-hub.


Collection layout

Everything ships as a single Ansible collection committed under ansible/arcodange/factory/. The collection root, not the repo root, is what ansible-galaxy collection install and the FQCN references (arcodange.factory.<role>) resolve against.

File Path What it declares
galaxy.yml ansible/arcodange/factory/galaxy.yml Collection identity: namespace arcodange, name factory, version 1.0.0. Together they form the FQCN prefix arcodange.factory.* used by every role and playbook import.
requirements.yml ansible/requirements.yml External dependencies pulled at install time (see table below).
ansible.cfg ansible/arcodange/factory/ansible.cfg collections_path = ~/.ansible/collections and scp_if_ssh = True for the SSH connection plugin.
inventory/ ansible/arcodange/factory/inventory/ hosts.yml + group_vars/. Detailed in Inventory & variables.
playbooks/ ansible/arcodange/factory/playbooks/ The numbered pipeline 01..05 plus the recover/ branch.
roles/ ansible/arcodange/factory/roles/ Seven reusable roles. Detailed in Roles reference.

External dependencies (requirements.yml)

Dependency Type Why it is needed
geerlingguy.docker role Installs and configures the Docker engine on each Pi.
ansible.posix collection POSIX primitives (mounts, sysctl, synchronize).
community.crypto collection Certificate/key generation for the step-ca PKI and Traefik.
community.docker collection Manages containers and Compose stacks (Gitea, act_runner).
community.general collection Broad utility modules used across the pipeline.
kubernetes.core collection k8s / helm modules used by every K3s-facing task. Needs the kubernetes Python lib at runtime.
k3s-ansible (git+https://github.com/k3s-io/k3s-ansible.git) git role/collection Upstream playbooks that install and cluster K3s itself.

Tip

The runtime Python libraries (kubernetes, jmespath, dnspython) that kubernetes.core and friends import are declared in the repo-root pyproject.toml, not in requirements.yml. uv sync installs them; ansible-galaxy installs the Galaxy/git content. Both steps are required.


Invocation pattern

The control node runs Ansible from a uv-managed venv. The localhost inventory entry sets ansible_python_interpreter: "{{ ansible_playbook_python }}", so uv run is enough to put Ansible on the venv's Python — no hardcoded interpreter path. Full recipe lives in ansible/README.md.

  1. Sync the venv — installs ansible-core plus the runtime Python deps:
    uv sync
    
  2. Install collection dependencies — pulls the Galaxy + git content from requirements.yml:
    uv run ansible-galaxy collection install -r ansible/requirements.yml
    
  3. Run a stage — point -i at the inventory directory and pass one numbered playbook:
    uv run ansible-playbook \
      -i ansible/arcodange/factory/inventory \
      ansible/arcodange/factory/playbooks/<NN_name>.yml
    

The vault password (ANSIBLE_VAULT_PASSWORD_FILE)

Encrypted vars are decrypted with a password that is sourced from the cluster, not stored on disk. ANSIBLE_VAULT_PASSWORD_FILE points at a tiny executable script that reads the K8s secret arcodange-ansible-vault from the kube-system namespace:

kubectl get secret -n kube-system arcodange-ansible-vault \
  --template='{{index .data.pass | base64decode}}'

Important

The same arcodange-ansible-vault secret in kube-system is consumed by the Gitea CI runners (needed for the Gitea mailer). Create it once with kubectl create secret generic arcodange-ansible-vault --from-literal="pass=<ansible_vault_password>" -n kube-system. See Secrets & Vault for how this fits the broader secret model.


The provisioning pipeline

The numbered playbooks are meant to be run in order on a fresh cluster — each is a thin wrapper that import_playbooks a stage directory (e.g. 01_system.ymlsystem/system.yml). The recover/ playbooks are not part of the linear sequence; they are an on-demand branch used only during disaster recovery.

%%{init: {'theme':'base', 'themeVariables': {'primaryColor':'#1f2937','primaryTextColor':'#f9fafb','lineColor':'#6b7280','fontSize':'14px'}}}%%
flowchart LR
  classDef stage fill:#1e3a5f,stroke:#3b82f6,color:#f9fafb;
  classDef recover fill:#5f1e1e,stroke:#ef4444,color:#fef2f2;

  s01["01 · System<br/>Docker · K3s · Longhorn · DNS · SSL"]:::stage
  s02["02 · Setup<br/>Gitea · Postgres · NFS backup"]:::stage
  s03["03 · CI/CD<br/>act_runner registration"]:::stage
  s04["04 · Tools<br/>CrowdSec · Vault"]:::stage
  s05["05 · Backup<br/>cron reports · PVC/db dumps"]:::stage
  rec["recover/*<br/>Longhorn + data restore"]:::recover

  s01 --> s02 --> s03 --> s04 --> s05
  s05 -. "on disaster" .-> rec
  rec -. "rejoin pipeline" .-> s01
  1. 01 · System — base OS hardening on each Pi, then Docker, Longhorn disk prep + iSCSI, K3s install, CoreDNS, the step-ca cert issuer, and final K3s config (kubeconfig, Longhorn, Traefik).
  2. 02 · Setup — deploys the cluster-resident services: Gitea, PostgreSQL (on pi2), and the NFS backup target.
  3. 03 · CI/CD — fetches a Gitea runner-registration token and rolls out the act_runner Docker Compose stack on every non-Gitea Pi so CI jobs have executors.
  4. 04 · Tools — installs the operational tooling layer: CrowdSec (WAF/IPS) and HashiCorp Vault.
  5. 05 · Backup — schedules the cron-driven backup + email-report jobs and the Gitea / Postgres / K3s-PVC dump routines.
  6. recover/* (on demand) — invoked only after data loss to rebuild Longhorn and replay volume data; once recovered, the cluster re-enters the normal pipeline at 01 · System.

Index

# Page Covers State
01 System RPi hardening, Docker, K3s, Longhorn/iSCSI, CoreDNS, step-ca SSL
02 Setup Gitea, PostgreSQL, NFS backup target
03 CI/CD Gitea act_runner registration & Compose deploy
04 Tools CrowdSec, HashiCorp Vault
05 Backup Cron report jobs, Gitea/Postgres/PVC dumps
06 Recover Longhorn + data restore (on-demand DR branch) 🟡
Inventory & variables hosts.yml groups, group_vars/ layering, host→service mapping
Roles reference The seven arcodange.factory.* roles

Maintenance rule

Important

Alter a playbook, role, inventory entry, or group_vars → update the matching page here in the same change. Adding a stage, renaming a role, bumping the K3s version or a requirements.yml dependency, or moving a host between groups all change what the pages above describe — edit the page in the PR that changes the code, never as a follow-up. This is the factory-provisioning maintenance rule applied to the Ansible half; the guidebooks' full Rules to contribute also apply.