[vibe](../../README.md) > [Guidebooks](../README.md) > [Lab ecosystem](README.md) > **01 · factory** # 01 · factory > **Status:** ✅ Active > **Last Updated:** 2026-06-23 > **Downstream:** [02 · tools](02-tools.md) · [03 · cms](03-cms.md) > **Deeper dive:** [Factory provisioning guidebook](../factory-provisioning/README.md) — page-by-page walkthrough of the Ansible playbooks/roles and OpenTofu modules summarized here > **Related:** [naming-conventions.md](naming-conventions.md) · [secrets-and-vault.md](secrets-and-vault.md) · [storage-and-recovery.md](storage-and-recovery.md) `factory` is the **cornerstone admin repo**: it provisions the hosts and the cluster, declares what gets deployed, and owns the platform-level cloud/Gitea/Vault/Postgres state that every app leans on. It has four pillars — **Ansible** (imperative host & cluster setup), **ArgoCD** (declarative app-of-apps), **`iac/`** (OpenTofu for the cloud/Gitea/Vault edge), and **`postgres/iac/`** (per-app PostgreSQL provisioning). The repos `tools` and `cms` are deployed *by* factory's ArgoCD and are mapped in [02 · tools](02-tools.md) and [03 · cms](03-cms.md). ## Pillar 1 — Ansible ([`ansible/`](../../../ansible/)) The collection lives at `ansible/arcodange/factory/`. The inventory groups the three Pis and pins the service placement; numbered playbooks run an ordered narrative from bare OS to backups; `recover/` holds the disaster-recovery playbooks. ### Inventory (`inventory/hosts.yml`) | Group | Hosts | Purpose | |---|---|---| | `raspberries` | `pi1`, `pi2`, `pi3` (`192.168.1.201-203`) | All three Pis; `ansible_user: pi` | | `postgres` | `pi2` | The PostgreSQL host (docker-compose, outside k3s) | | `gitea` | children of `postgres` (→ `pi2`) | Gitea co-located with PG on `pi2` | | `pihole` | `pi1`, `pi3` | Internal DNS resolvers | | `step_ca` | `pi1`, `pi2`, `pi3` | Step-CA PKI for `*.arcodange.lab` (primary `pi1`, replicas `pi2`/`pi3`) | | `local` | `localhost` + the Pis | Control-node-local tasks | ### Numbered playbooks (`playbooks/`) | Playbook | Imports / does | Notes | |---|---|---| | `01_system` | `system/system.yml` → rpi base, DNS, SSL, prepare disks, Docker, iSCSI, **k3s install** (`--docker --disable traefik`), CoreDNS, cert-issuer, Longhorn/Traefik config | k3s `v1.34.3+k3s1` via upstream `k3s-ansible`; pi1 server, pi2/pi3 agents | | `02_setup` | `setup/setup.yml` → PostgreSQL + Gitea docker-compose; optional backup-NFS share | Stands up the two out-of-cluster source-of-truth services on `pi2` | | `03_cicd` | Gitea **act-runner** docker-compose on `pi1`/`pi3` (`raspberries:&local:!gitea`), plus the ArgoCD/Image-Updater install | See the ArgoCD caveat below | | `04_tools` | `tools/tools.yml` → `hashicorp_vault.yml`, `crowdsec.yml` | Platform tooling that bootstraps the cluster's Vault + CrowdSec | | `05_backup` | `backup/backup.yml` → `postgres.yml`, `gitea.yml`, `k3s_pvc.yml` to `/mnt/backups` | Scheduled PG/Gitea/PVC backups; cron-report wiring present | ### Recovery playbooks (`playbooks/recover/`) | Playbook | When to use | |---|---| | `longhorn.yml` | Recover Longhorn after a power cut when **Volume CRDs still exist** (CSI driver registration loss) | | `longhorn_data.yml` | Recover app data from **raw replica `.img` files** when Volume CRDs are gone (block-device level) | The tested power-cut recovery sequence (Longhorn restore → Vault unseal → VSO re-auth → ERP scaled up last) is documented in `CLUSTER_RECOVERY.md` at the lab root (outside this repo) and summarized in [storage-and-recovery.md](storage-and-recovery.md). Background on PVC recovery is in the [Longhorn PVC recovery ADR](../../../ansible/arcodange/factory/docs/adr/20260414-longhorn-pvc-recovery.md). ### Key roles `deploy_docker_compose` (renders compose stacks), `gitea_repo` / `gitea_token` / `gitea_secret` / `gitea_sync` (Gitea repo/token/secret/mirror management), `traefik_certs`, `playwright`, plus sub-roles `step_ca`, `hashicorp_vault`, `crowdsec`, `pihole`. ## Pillar 2 — ArgoCD app-of-apps ([`argocd/`](../../../argocd/)) A Helm chart whose `templates/apps.yaml` loops over `values.gitea_applications` and emits one `Application` CRD per app. Each Application derives everything from the app name: `repoURL = https://gitea.arcodange.lab//`, `path = chart`, `namespace = ` (`CreateNamespace=true`), with `syncPolicy.automated` `prune: true` + `selfHeal: true` by default. > [!TIP] > **Deeper dive:** the [Applications guidebook](../applications/README.md) maps what these `Application` CRDs deploy — the common app-repo pattern (Dockerfile + `chart/` + optional `iac/` + CI) every app in the list below shares, and the two archetypes (Go + Postgres vs Rust + SQLite). | App | Org override | Image Updater | |---|---|---| | `url-shortener` | — | — | | `tools` | — | explicit `prune`+`selfHeal` | | `webapp` | — | ✅ digest strategy | | `telegram-gateway` | `arcodange` | ✅ digest strategy | | `erp` | — | — | | `cms` | — | ✅ digest strategy | | `dance-lessons-coach` | `arcodange` | ✅ digest strategy | > [!NOTE] > The chart also templates a `longhorn_backup_target` and the ArgoCD Image Updater config (`argocd.arcodange.lab`). **ArgoCD itself is not currently deployed in-cluster** — its install is commented out in `03_cicd`. This page documents the intended steady state; treat ArgoCD as "designed, not live" until that step is enabled. ## Pillar 3 — OpenTofu ([`iac/`](../../../iac/)) Manages the cloud/Gitea/Vault edge. State lives in **GCS** (`backend "gcs"`, bucket `arcodange-tf`, prefix `factory/main`). Tofu authenticates to Vault via **Gitea OIDC JWT** (mount `gitea_jwt`, role `gitea_cicd`). | Provider | Used for | |---|---| | `go-gitea/gitea` (`0.6.0`) | Repos, users, action secrets (e.g. the restricted `tofu_module_reader` CI user, CMS secrets) | | `vault` (`4.4.0`) | KV secrets + policies + k8s auth roles (e.g. Longhorn GCS-backup creds & policy) | | `google` (`7.0.1`) | GCS backup bucket + service account + HMAC key for Longhorn | | `cloudflare/cloudflare` (`~> 5`) | R2 bucket, API tokens, CMS edge wiring (detailed in [03 · cms](03-cms.md)) | | `ovh/ovh` (`2.8.0`) | OAuth2 client + IAM policy for the `arcodange.fr` domain (registrar = OVH) | `modules/cloudflare_token` is a reusable scoped-token factory. The whole module reuses the `` name as the GCS state prefix (`/main`) — see [naming-conventions.md](naming-conventions.md). ## Pillar 4 — per-app PostgreSQL ([`postgres/iac/`](../../../postgres/)) OpenTofu using the `cyrilgdn/postgresql` provider against PG on `192.168.1.202` (state prefix `factory/postgres`). It iterates over a `var.applications` set and, **per app**, creates: | Resource | Name pattern | Purpose | |---|---|---| | Database | `` | The app's database (`template0`, owned by the role) | | Owner role (non-login) | `_role` | Database owner; granted to dynamic users by Vault | | Editor role (login) | `credentials_editor` | Shared admin role that can grant the per-app roles | | `user_lookup()` function | per-`` db | `SECURITY DEFINER` lookup for **pgbouncer** auth (granted to `pgbouncer_auth`, revoked from `public`) | Current `applications` set: `webapp`, `erp`, `crowdsec`, `plausible`, `dance-lessons-coach`. Vault's PostgreSQL secrets engine then issues **dynamic** credentials on top of these roles — see [secrets-and-vault.md](secrets-and-vault.md). The pooler (`pgbouncer`) that consumes `user_lookup()` lives in the `tools` namespace — see [02 · tools](02-tools.md). ## Provisioning order ```mermaid %%{init: {'theme': 'base'}}%% flowchart LR classDef proc fill:#059669,stroke:#047857,color:#fff classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff S1["01_system
OS + k3s + Longhorn"]:::proc --> S2["02_setup
PG + Gitea (pi2)"]:::proc --> S3["03_cicd
runners + ArgoCD"]:::proc --> S4["04_tools
Vault + CrowdSec"]:::proc --> S5["05_backup
PG/Gitea/PVC"]:::proc IAC["iac/ + postgres/iac
(OpenTofu state in GCS)"]:::store -. "declares cloud/Gitea/Vault/PG" .- S2 ``` 1. **`01_system`** lays the OS, disks, Docker, and k3s with Longhorn + Traefik onto the three Pis. 2. **`02_setup`** stands up PostgreSQL and Gitea as docker-compose on `pi2` — the out-of-cluster source-of-truth services. 3. **`03_cicd`** registers the Gitea act-runners (and is where ArgoCD would install, currently commented out). 4. **`04_tools`** bootstraps the cluster's Vault and CrowdSec. 5. **`05_backup`** schedules PostgreSQL, Gitea, and k3s-PVC backups to `/mnt/backups`. 6. In parallel, **OpenTofu** (`iac/` and `postgres/iac/`) declares the cloud, Gitea, Vault, and PostgreSQL objects, keeping state in GCS. ## Cross-references - [Lab ecosystem hub](README.md) — the whole-lab map this page sits under. - [Applications guidebook](../applications/README.md) — the apps ArgoCD's app-of-apps deploys: the common app-repo pattern and the Go+Postgres / Rust+SQLite archetypes. - [02 · tools](02-tools.md) — what ArgoCD deploys into the `tools` namespace (incl. pgbouncer that consumes the PG `user_lookup()`). - [03 · cms](03-cms.md) — the CMS edge that `iac/cloudflare.tf` and `iac/ovh.tf` wire up. - [naming-conventions.md](naming-conventions.md) — the `` join key these pillars share. - [secrets-and-vault.md](secrets-and-vault.md) — Gitea OIDC JWT for Tofu/CI and dynamic PG creds. - [storage-and-recovery.md](storage-and-recovery.md) — Longhorn + GCS backup + power-cut recovery. - [new-web-app runbook](../../../doc/runbooks/new-web-app/README.md) · [conventions](../../../doc/runbooks/new-web-app/conventions.md) — the step-by-step procedure these pillars support. - [doc/adr](../../../doc/adr/README.md) — the canonical infrastructure ADRs. - [Longhorn PVC recovery ADR](../../../ansible/arcodange/factory/docs/adr/20260414-longhorn-pvc-recovery.md) — recovery background.