Files
factory/vibe/guidebooks/lab-ecosystem/01-factory.md
Gabriel Radureau 7647a68cdc docs(vibe): bootstrap vibe/ knowledge tree + ecosystem AGENTS.md
Add a root AGENTS.md (ecosystem map of factory/tools/cms + agent operating
rules + the persona cohort & workflow) and a new vibe/ knowledge base for LLM
agents, modeled on tree-docs conventions and the factory house style.

vibe/ folders (each with a README hub + contribution rules):
- ADR/      optimized MADR-lite; canonical home going forward (doc/adr stays historical)
- PRD/      one subfolder per PRD, mandatory STATUS.md, QA strategy for big ones
- investigations/  single INV-NNN-slug.md, or stub + folder w/ notebooks
- guidebooks/      tree-docs maps; lab-ecosystem guidebook of factory+tools+cms
- runbooks/        [AGENT]/[HUMAN] step procedures (EN; doc/runbooks stays FR)
- shareouts/       dated FR handouts (decks/mp4)

Seed content (first ADR + PRD): a safe, production-like environment to rehearse
risky changes and recovery without touching real prod — local-only sandbox
(k3d + arm64 VMs) with a hard prod/sandbox isolation boundary. Includes
INV-001 (prod blast-radius couplings), the ecosystem guidebook, and a FR shareout.

Conventions enforced: no-tombstone rule, breadcrumb spine, bidirectional
cross-links, theme:base mermaid (MCP-validated) + ordered-list-after-diagram.
Built with a Workflow + persona cohort; 24 files, zero dead links.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:52:37 +02:00

9.1 KiB

vibe > Guidebooks > Lab ecosystem > 01 · factory

01 · factory

Status: Active Last Updated: 2026-06-23 Downstream: 02 · tools · 03 · cms Related: naming-conventions.md · secrets-and-vault.md · storage-and-recovery.md

factory is the cornerstone admin repo: it provisions the hosts and the cluster, declares what gets deployed, and owns the platform-level cloud/Gitea/Vault/Postgres state that every app leans on. It has four pillars — Ansible (imperative host & cluster setup), ArgoCD (declarative app-of-apps), iac/ (OpenTofu for the cloud/Gitea/Vault edge), and postgres/iac/ (per-app PostgreSQL provisioning). The repos tools and cms are deployed by factory's ArgoCD and are mapped in 02 · tools and 03 · cms.

Pillar 1 — Ansible (ansible/)

The collection lives at ansible/arcodange/factory/. The inventory groups the three Pis and pins the service placement; numbered playbooks run an ordered narrative from bare OS to backups; recover/ holds the disaster-recovery playbooks.

Inventory (inventory/hosts.yml)

Group Hosts Purpose
raspberries pi1, pi2, pi3 (192.168.1.201-203) All three Pis; ansible_user: pi
postgres pi2 The PostgreSQL host (docker-compose, outside k3s)
gitea children of postgres (→ pi2) Gitea co-located with PG on pi2
pihole pi1, pi3 Internal DNS resolvers
step_ca pi1, pi2, pi3 Step-CA PKI for *.arcodange.lab (primary pi1, replicas pi2/pi3)
local localhost + the Pis Control-node-local tasks

Numbered playbooks (playbooks/)

Playbook Imports / does Notes
01_system system/system.yml → rpi base, DNS, SSL, prepare disks, Docker, iSCSI, k3s install (--docker --disable traefik), CoreDNS, cert-issuer, Longhorn/Traefik config k3s v1.34.3+k3s1 via upstream k3s-ansible; pi1 server, pi2/pi3 agents
02_setup setup/setup.yml → PostgreSQL + Gitea docker-compose; optional backup-NFS share Stands up the two out-of-cluster source-of-truth services on pi2
03_cicd Gitea act-runner docker-compose on pi1/pi3 (raspberries:&local:!gitea), plus the ArgoCD/Image-Updater install See the ArgoCD caveat below
04_tools tools/tools.ymlhashicorp_vault.yml, crowdsec.yml Platform tooling that bootstraps the cluster's Vault + CrowdSec
05_backup backup/backup.ymlpostgres.yml, gitea.yml, k3s_pvc.yml to /mnt/backups Scheduled PG/Gitea/PVC backups; cron-report wiring present

Recovery playbooks (playbooks/recover/)

Playbook When to use
longhorn.yml Recover Longhorn after a power cut when Volume CRDs still exist (CSI driver registration loss)
longhorn_data.yml Recover app data from raw replica .img files when Volume CRDs are gone (block-device level)

The tested power-cut recovery sequence (Longhorn restore → Vault unseal → VSO re-auth → ERP scaled up last) is documented in CLUSTER_RECOVERY.md at the lab root (outside this repo) and summarized in storage-and-recovery.md. Background on PVC recovery is in the Longhorn PVC recovery ADR.

Key roles

deploy_docker_compose (renders compose stacks), gitea_repo / gitea_token / gitea_secret / gitea_sync (Gitea repo/token/secret/mirror management), traefik_certs, playwright, plus sub-roles step_ca, hashicorp_vault, crowdsec, pihole.

Pillar 2 — ArgoCD app-of-apps (argocd/)

A Helm chart whose templates/apps.yaml loops over values.gitea_applications and emits one Application CRD per app. Each Application derives everything from the app name: repoURL = https://gitea.arcodange.lab/<org>/<app>, path = chart, namespace = <app> (CreateNamespace=true), with syncPolicy.automated prune: true + selfHeal: true by default.

App Org override Image Updater
url-shortener
tools explicit prune+selfHeal
webapp digest strategy
telegram-gateway arcodange digest strategy
erp
cms digest strategy
dance-lessons-coach arcodange digest strategy

Note

The chart also templates a longhorn_backup_target and the ArgoCD Image Updater config (argocd.arcodange.lab). ArgoCD itself is not currently deployed in-cluster — its install is commented out in 03_cicd. This page documents the intended steady state; treat ArgoCD as "designed, not live" until that step is enabled.

Pillar 3 — OpenTofu (iac/)

Manages the cloud/Gitea/Vault edge. State lives in GCS (backend "gcs", bucket arcodange-tf, prefix factory/main). Tofu authenticates to Vault via Gitea OIDC JWT (mount gitea_jwt, role gitea_cicd).

Provider Used for
go-gitea/gitea (0.6.0) Repos, users, action secrets (e.g. the restricted tofu_module_reader CI user, CMS secrets)
vault (4.4.0) KV secrets + policies + k8s auth roles (e.g. Longhorn GCS-backup creds & policy)
google (7.0.1) GCS backup bucket + service account + HMAC key for Longhorn
cloudflare/cloudflare (~> 5) R2 bucket, API tokens, CMS edge wiring (detailed in 03 · cms)
ovh/ovh (2.8.0) OAuth2 client + IAM policy for the arcodange.fr domain (registrar = OVH)

modules/cloudflare_token is a reusable scoped-token factory. The whole module reuses the <app> name as the GCS state prefix (<app>/main) — see naming-conventions.md.

Pillar 4 — per-app PostgreSQL (postgres/iac/)

OpenTofu using the cyrilgdn/postgresql provider against PG on 192.168.1.202 (state prefix factory/postgres). It iterates over a var.applications set and, per app, creates:

Resource Name pattern Purpose
Database <app> The app's database (template0, owned by the role)
Owner role (non-login) <app>_role Database owner; granted to dynamic users by Vault
Editor role (login) credentials_editor Shared admin role that can grant the per-app roles
user_lookup() function per-<app> db SECURITY DEFINER lookup for pgbouncer auth (granted to pgbouncer_auth, revoked from public)

Current applications set: webapp, erp, crowdsec, plausible, dance-lessons-coach. Vault's PostgreSQL secrets engine then issues dynamic credentials on top of these roles — see secrets-and-vault.md. The pooler (pgbouncer) that consumes user_lookup() lives in the tools namespace — see 02 · tools.

Provisioning order

%%{init: {'theme': 'base'}}%%
flowchart LR
    classDef proc fill:#059669,stroke:#047857,color:#fff
    classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff
    S1["01_system<br>OS + k3s + Longhorn"]:::proc --> S2["02_setup<br>PG + Gitea (pi2)"]:::proc --> S3["03_cicd<br>runners + ArgoCD"]:::proc --> S4["04_tools<br>Vault + CrowdSec"]:::proc --> S5["05_backup<br>PG/Gitea/PVC"]:::proc
    IAC["iac/ + postgres/iac<br>(OpenTofu state in GCS)"]:::store -. "declares cloud/Gitea/Vault/PG" .- S2
  1. 01_system lays the OS, disks, Docker, and k3s with Longhorn + Traefik onto the three Pis.
  2. 02_setup stands up PostgreSQL and Gitea as docker-compose on pi2 — the out-of-cluster source-of-truth services.
  3. 03_cicd registers the Gitea act-runners (and is where ArgoCD would install, currently commented out).
  4. 04_tools bootstraps the cluster's Vault and CrowdSec.
  5. 05_backup schedules PostgreSQL, Gitea, and k3s-PVC backups to /mnt/backups.
  6. In parallel, OpenTofu (iac/ and postgres/iac/) declares the cloud, Gitea, Vault, and PostgreSQL objects, keeping state in GCS.

Cross-references