Deep, code-grounded tree-docs guidebook under vibe/guidebooks/factory-provisioning/, explored from the actual playbooks/roles and tofu code: - Hub: the two provisioning engines (operator-run Ansible vs CI-applied OpenTofu), a green-field bring-up flow, master index, maintenance rule. - ansible/ sub-tree: ordered pages 01-system .. 06-recover, an inventory & variables concept page, and a Tier-1/Tier-2 roles reference (hashicorp_vault, step_ca, crowdsec, pihole, deploy_docker_compose + the gitea_* family and helpers). - opentofu/ sub-tree: factory-iac (Cloudflare/OVH/GCP/Gitea/Vault edge + cloudflare_token module), postgres-iac (per-app DB/role/pgbouncer lookup), ci-apply-flow (Gitea OIDC-JWT -> Vault -> auto-approve apply). Cross-linked bidirectionally with the lab-ecosystem guidebook and the safe-env ADR/PRD (the sandbox rehearses exactly these engines). 14 mermaid diagrams MCP-validated; zero dead links. Authored by the Lab Cartographer cohort. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
9.2 KiB
vibe > Guidebooks > Lab ecosystem > 01 · factory
01 · factory
Status: ✅ Active Last Updated: 2026-06-23 Downstream: 02 · tools · 03 · cms Deeper dive: Factory provisioning guidebook — page-by-page walkthrough of the Ansible playbooks/roles and OpenTofu modules summarized here Related: naming-conventions.md · secrets-and-vault.md · storage-and-recovery.md
factory is the cornerstone admin repo: it provisions the hosts and the cluster, declares what gets deployed, and owns the platform-level cloud/Gitea/Vault/Postgres state that every app leans on. It has four pillars — Ansible (imperative host & cluster setup), ArgoCD (declarative app-of-apps), iac/ (OpenTofu for the cloud/Gitea/Vault edge), and postgres/iac/ (per-app PostgreSQL provisioning). The repos tools and cms are deployed by factory's ArgoCD and are mapped in 02 · tools and 03 · cms.
Pillar 1 — Ansible (ansible/)
The collection lives at ansible/arcodange/factory/. The inventory groups the three Pis and pins the service placement; numbered playbooks run an ordered narrative from bare OS to backups; recover/ holds the disaster-recovery playbooks.
Inventory (inventory/hosts.yml)
| Group | Hosts | Purpose |
|---|---|---|
raspberries |
pi1, pi2, pi3 (192.168.1.201-203) |
All three Pis; ansible_user: pi |
postgres |
pi2 |
The PostgreSQL host (docker-compose, outside k3s) |
gitea |
children of postgres (→ pi2) |
Gitea co-located with PG on pi2 |
pihole |
pi1, pi3 |
Internal DNS resolvers |
step_ca |
pi1, pi2, pi3 |
Step-CA PKI for *.arcodange.lab (primary pi1, replicas pi2/pi3) |
local |
localhost + the Pis |
Control-node-local tasks |
Numbered playbooks (playbooks/)
| Playbook | Imports / does | Notes |
|---|---|---|
01_system |
system/system.yml → rpi base, DNS, SSL, prepare disks, Docker, iSCSI, k3s install (--docker --disable traefik), CoreDNS, cert-issuer, Longhorn/Traefik config |
k3s v1.34.3+k3s1 via upstream k3s-ansible; pi1 server, pi2/pi3 agents |
02_setup |
setup/setup.yml → PostgreSQL + Gitea docker-compose; optional backup-NFS share |
Stands up the two out-of-cluster source-of-truth services on pi2 |
03_cicd |
Gitea act-runner docker-compose on pi1/pi3 (raspberries:&local:!gitea), plus the ArgoCD/Image-Updater install |
See the ArgoCD caveat below |
04_tools |
tools/tools.yml → hashicorp_vault.yml, crowdsec.yml |
Platform tooling that bootstraps the cluster's Vault + CrowdSec |
05_backup |
backup/backup.yml → postgres.yml, gitea.yml, k3s_pvc.yml to /mnt/backups |
Scheduled PG/Gitea/PVC backups; cron-report wiring present |
Recovery playbooks (playbooks/recover/)
| Playbook | When to use |
|---|---|
longhorn.yml |
Recover Longhorn after a power cut when Volume CRDs still exist (CSI driver registration loss) |
longhorn_data.yml |
Recover app data from raw replica .img files when Volume CRDs are gone (block-device level) |
The tested power-cut recovery sequence (Longhorn restore → Vault unseal → VSO re-auth → ERP scaled up last) is documented in CLUSTER_RECOVERY.md at the lab root (outside this repo) and summarized in storage-and-recovery.md. Background on PVC recovery is in the Longhorn PVC recovery ADR.
Key roles
deploy_docker_compose (renders compose stacks), gitea_repo / gitea_token / gitea_secret / gitea_sync (Gitea repo/token/secret/mirror management), traefik_certs, playwright, plus sub-roles step_ca, hashicorp_vault, crowdsec, pihole.
Pillar 2 — ArgoCD app-of-apps (argocd/)
A Helm chart whose templates/apps.yaml loops over values.gitea_applications and emits one Application CRD per app. Each Application derives everything from the app name: repoURL = https://gitea.arcodange.lab/<org>/<app>, path = chart, namespace = <app> (CreateNamespace=true), with syncPolicy.automated prune: true + selfHeal: true by default.
| App | Org override | Image Updater |
|---|---|---|
url-shortener |
— | — |
tools |
— | explicit prune+selfHeal |
webapp |
— | ✅ digest strategy |
telegram-gateway |
arcodange |
✅ digest strategy |
erp |
— | — |
cms |
— | ✅ digest strategy |
dance-lessons-coach |
arcodange |
✅ digest strategy |
Note
The chart also templates a
longhorn_backup_targetand the ArgoCD Image Updater config (argocd.arcodange.lab). ArgoCD itself is not currently deployed in-cluster — its install is commented out in03_cicd. This page documents the intended steady state; treat ArgoCD as "designed, not live" until that step is enabled.
Pillar 3 — OpenTofu (iac/)
Manages the cloud/Gitea/Vault edge. State lives in GCS (backend "gcs", bucket arcodange-tf, prefix factory/main). Tofu authenticates to Vault via Gitea OIDC JWT (mount gitea_jwt, role gitea_cicd).
| Provider | Used for |
|---|---|
go-gitea/gitea (0.6.0) |
Repos, users, action secrets (e.g. the restricted tofu_module_reader CI user, CMS secrets) |
vault (4.4.0) |
KV secrets + policies + k8s auth roles (e.g. Longhorn GCS-backup creds & policy) |
google (7.0.1) |
GCS backup bucket + service account + HMAC key for Longhorn |
cloudflare/cloudflare (~> 5) |
R2 bucket, API tokens, CMS edge wiring (detailed in 03 · cms) |
ovh/ovh (2.8.0) |
OAuth2 client + IAM policy for the arcodange.fr domain (registrar = OVH) |
modules/cloudflare_token is a reusable scoped-token factory. The whole module reuses the <app> name as the GCS state prefix (<app>/main) — see naming-conventions.md.
Pillar 4 — per-app PostgreSQL (postgres/iac/)
OpenTofu using the cyrilgdn/postgresql provider against PG on 192.168.1.202 (state prefix factory/postgres). It iterates over a var.applications set and, per app, creates:
| Resource | Name pattern | Purpose |
|---|---|---|
| Database | <app> |
The app's database (template0, owned by the role) |
| Owner role (non-login) | <app>_role |
Database owner; granted to dynamic users by Vault |
| Editor role (login) | credentials_editor |
Shared admin role that can grant the per-app roles |
user_lookup() function |
per-<app> db |
SECURITY DEFINER lookup for pgbouncer auth (granted to pgbouncer_auth, revoked from public) |
Current applications set: webapp, erp, crowdsec, plausible, dance-lessons-coach. Vault's PostgreSQL secrets engine then issues dynamic credentials on top of these roles — see secrets-and-vault.md. The pooler (pgbouncer) that consumes user_lookup() lives in the tools namespace — see 02 · tools.
Provisioning order
%%{init: {'theme': 'base'}}%%
flowchart LR
classDef proc fill:#059669,stroke:#047857,color:#fff
classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff
S1["01_system<br>OS + k3s + Longhorn"]:::proc --> S2["02_setup<br>PG + Gitea (pi2)"]:::proc --> S3["03_cicd<br>runners + ArgoCD"]:::proc --> S4["04_tools<br>Vault + CrowdSec"]:::proc --> S5["05_backup<br>PG/Gitea/PVC"]:::proc
IAC["iac/ + postgres/iac<br>(OpenTofu state in GCS)"]:::store -. "declares cloud/Gitea/Vault/PG" .- S2
01_systemlays the OS, disks, Docker, and k3s with Longhorn + Traefik onto the three Pis.02_setupstands up PostgreSQL and Gitea as docker-compose onpi2— the out-of-cluster source-of-truth services.03_cicdregisters the Gitea act-runners (and is where ArgoCD would install, currently commented out).04_toolsbootstraps the cluster's Vault and CrowdSec.05_backupschedules PostgreSQL, Gitea, and k3s-PVC backups to/mnt/backups.- In parallel, OpenTofu (
iac/andpostgres/iac/) declares the cloud, Gitea, Vault, and PostgreSQL objects, keeping state in GCS.
Cross-references
- Lab ecosystem hub — the whole-lab map this page sits under.
- 02 · tools — what ArgoCD deploys into the
toolsnamespace (incl. pgbouncer that consumes the PGuser_lookup()). - 03 · cms — the CMS edge that
iac/cloudflare.tfandiac/ovh.tfwire up. - naming-conventions.md — the
<app>join key these pillars share. - secrets-and-vault.md — Gitea OIDC JWT for Tofu/CI and dynamic PG creds.
- storage-and-recovery.md — Longhorn + GCS backup + power-cut recovery.
- new-web-app runbook · conventions — the step-by-step procedure these pillars support.
- doc/adr — the canonical infrastructure ADRs.
- Longhorn PVC recovery ADR — recovery background.