feat(multi-env): Phase B — make factory machinery env-capable (no activation)

ADR-0002 Phase B. Makes postgres/iac, argocd, and the conventions docs multi-environment-capable WITHOUT activating any sandbox yet — every app stays prod-only, so this change is behaviour-neutral: - postgres/iac `tofu plan` is a no-op (proven: the elision flatten keys are bare app names, db=<app>, role=<app>_role — identical addresses) - the argocd apps.yaml render is byte-identical (181→181 lines, diff empty) since no app declares `envs` postgres/iac: - variables.tf: `applications` becomes set(object({name, envs=optional(["prod"])})) - main.tf: a `local.app_instances` flatten of applications × envs keyed by the elided instance id (env=prod → "<app>"); per-app resources iterate it and reference each.key / each.value.{database,role}. For prod-only apps every resource address + attribute is unchanged. (main.tf also got a full `tofu fmt` pass — the pgbouncer function block reindents 4→2 spaces, which is cosmetic; the correctness gate is the CI tofu plan, not the text diff.) - terraform.tfvars: string entries → { name = "..." } objects. argocd/templates/apps.yaml: - after the prod Application, a `range $app_attr.envs` loop renders one extra Application per non-prod env: name/namespace `<app>-<env>`, shared repoURL, helm.valueFiles [values.yaml, values-<env>.yaml], per-env syncPolicy override. Renders nothing while no app sets `envs` → prod render unchanged. docs: - doc/runbooks/new-web-app/conventions.md (FR, authoritative): new section "Plusieurs environnements pour une même app" — elision rule, suffix rule, snake-case owner-role exception, erp/erp-sandbox table, ADR-0002 link. - vibe/guidebooks/lab-ecosystem/naming-conventions.md (EN mirror): the env coordinate section + a "Two sandbox models" section reconciling the separate-cluster (ADR-0001, names repeat) vs in-cluster sibling (ADR-0002, <env> suffix) strategies; Last Updated bumped; ADR-0002 cross-links. Activation (erp gets envs=["prod","sandbox"] in postgres tfvars + argocd values + erp/iac) is Phase D, gated by its own plan review. Refs ADR-0002 (factory#15). Phase A = tools#2 (merged). Phase C = erp#11 (merged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-28 16:28:28 +02:00
parent 8a1a63ee10
commit c00c4cdd5c
6 changed files with 178 additions and 59 deletions
--- a/argocd/templates/apps.yaml
+++ b/argocd/templates/apps.yaml
@@ -31,4 +31,47 @@ spec:
    {{- end }}
    syncOptions:
      - CreateNamespace=true
+{{- /*
+  Non-prod environments (ADR-0002 elision rule): one extra Application per env
+  under `<app_attr>.envs`. Each renders the SAME repo + chart, overlaid with
+  values-<env>.yaml, into the `<app>-<env>` namespace. Apps with no `envs` key
+  render nothing extra here, so prod-only apps are byte-identical.
+*/ -}}
+{{- range $env_name, $env_attr := $app_attr.envs }}
+---
+apiVersion: argoproj.io/v1alpha1
+kind: Application
+metadata:
+  name: {{ $app_name }}-{{ $env_name }}
+  namespace: argocd
+  finalizers:
+    - resources-finalizer.argocd.argoproj.io
+  {{- with $env_attr.annotations }}
+  annotations:
+    {{- toYaml . | nindent 4 }}
+  {{- end }}
+spec:
+  project: default
+  source:
+    repoURL: https://gitea.arcodange.lab/{{ $org }}/{{ $app_name }}
+    targetRevision: HEAD
+    path: chart
+    helm:
+      valueFiles:
+        - values.yaml
+        - values-{{ $env_name }}.yaml
+  destination:
+    server: https://kubernetes.default.svc
+    namespace: {{ $app_name }}-{{ $env_name }}
+  syncPolicy:
+    {{- if $env_attr.syncPolicy }}
+    {{- toYaml $env_attr.syncPolicy | nindent 4 }}
+    {{- else }}
+    automated:
+      prune: true
+      selfHeal: true
+    {{- end }}
+    syncOptions:
+      - CreateNamespace=true
+{{- end }}
 {{ end }}
--- a/doc/runbooks/new-web-app/conventions.md
+++ b/doc/runbooks/new-web-app/conventions.md
@@ -44,6 +44,29 @@ Les briques se « branchent » entre elles **par convention de nom**, pas par co
 ✅ **Utilise un nom court, stable, kebab-case** dès le départ.
 ❌ **N'introduis pas** de variantes (`my_app` vs `my-app`, `MyApp`, pluriels) : rien ne te préviendra, l'app échouera silencieusement à se connecter ou à se déployer.

+## Plusieurs environnements pour une même app
+
+Une application peut être déployée plusieurs fois (prod, sandbox, …) **sans devenir une app distincte** : même dépôt, même chart, même version. On ajoute une seconde coordonnée `<env>` au nom, régie par une **règle d'élision** ([ADR-0002](../../../vibe/ADR/0002-per-application-environments.md)) :
+
+- **`env` vaut `prod` par défaut, et `prod` s'élide.** Quand `env == prod`, **aucun suffixe** n'est ajouté : tous les noms dérivés sont identiques au cas mono-environnement décrit plus haut. Une app existante ne change donc pas (`plan` à vide).
+- **Les environnements non-prod prennent le suffixe `<app>-<env>`** en kebab-case partout — base, namespace, chemins/rôles/policies Vault, Application ArgoCD, hôte DNS, sous-préfixe d'état GCS — **à une exception** : le rôle propriétaire PostgreSQL reste en snake-case `<app>_<env>_role`, pour rester cohérent avec le suffixe `_role`.
+- **Un seul dépôt et un seul chart** servent tous les environnements ; les différences sont superposées via `values-<env>.yaml`. **Un seul rôle JWT de CI** (`gitea_cicd_<app>`) par dépôt couvre tous ses environnements.
+
+Exemple — `erp` (prod, élidé) vs `erp-sandbox` :
+
+| Système | `erp` (env = prod) | `erp-sandbox` (env = sandbox) |
+|---|---|---|
+| Base PostgreSQL | `erp` | `erp-sandbox` |
+| Rôle propriétaire PG | `erp_role` | `erp_sandbox_role` |
+| Namespace + ServiceAccount | `erp` | `erp-sandbox` |
+| Creds DB dynamiques Vault | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
+| Secret KV de config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
+| Application ArgoCD | `erp` | `erp-sandbox` |
+| Domaine interne | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
+| Dépôt Gitea / chart / JWT CI | `arcodange-org/erp` · chart · `gitea_cicd_erp` | partagés (mêmes valeurs) |
+
+Déclaration : `postgres/iac/terraform.tfvars` et la liste `applications` côté `tools` acceptent `envs = ["prod", "sandbox"]` ; l'omettre revient à `["prod"]`. L'`Application` ArgoCD non-prod se déclare via une clé `envs` sous l'app dans [argocd/values.yaml](https://gitea.arcodange.lab/arcodange-org/factory/src/branch/main/argocd/values.yaml).
+
 ## Références croisées

 - [01 · Dépôt Gitea](01-gitea-repo.md) — fixe `<app>` comme nom de dépôt sous `arcodange-org`.
--- a/postgres/iac/main.tf
+++ b/postgres/iac/main.tf
@@ -1,3 +1,21 @@
+locals {
+  # Flatten applications × envs into per-instance objects, keyed by the elided
+  # instance id (ADR-0002 elision rule): env=prod → "<app>", else "<app>-<env>".
+  # The Postgres owner role stays snake-case: "<app>_role" (prod) / "<app>_<env>_role".
+  # For a prod-only app the key equals "<app>", database equals "<app>", and role
+  # equals "<app>_role" — identical to the previous set(string) for_each, so every
+  # resource address and attribute is unchanged (a no-op plan).
+  app_instances = merge([
+    for app in var.applications : {
+      for env in app.envs :
+      (env == "prod" ? app.name : "${app.name}-${env}") => {
+        database = env == "prod" ? app.name : "${app.name}-${env}"
+        role     = env == "prod" ? "${app.name}_role" : "${app.name}_${env}_role"
+      }
+    }
+  ]...)
+}
+
 resource "random_password" "credentials_editor" {
  length           = 24
  override_special = "-:!+<>"
@@ -24,27 +42,27 @@ resource "vault_kv_secret" "postgres_admin_credentials" {
 }

 resource "postgresql_role" "app_role" {
-  for_each = var.applications
-  name     = "${each.value}_role"
+  for_each = local.app_instances
+  name     = each.value.role
  login    = false
 }
 resource "postgresql_grant_role" "credentials_editor_app_role" {
-  for_each = var.applications
+  for_each          = local.app_instances
  role              = postgresql_role.credentials_editor.name
-  grant_role = postgresql_role.app_role[each.value].name
+  grant_role        = postgresql_role.app_role[each.key].name
  with_admin_option = true
 }
 resource "postgresql_database" "app_db" {
-  for_each = var.applications
-  name                   = each.value
-  owner                  = postgresql_role.app_role[each.value].name
+  for_each               = local.app_instances
+  name                   = each.value.database
+  owner                  = postgresql_role.app_role[each.key].name
  template               = "template0"
  alter_object_ownership = true
 }
 resource "postgresql_function" "pgbouncer_user_lookup" {
-    for_each = var.applications
+  for_each = local.app_instances
  name     = "user_lookup"
-    database = postgresql_database.app_db[each.value].name
+  database = postgresql_database.app_db[each.key].name
  arg {
    mode = "IN"
    name = "i_username"
@@ -73,25 +91,25 @@ resource "postgresql_function" "pgbouncer_user_lookup" {
  security_definer = true
 }
 resource "postgresql_grant" "pgbouncer_user_lookup_public_revoke" {
-  for_each = var.applications
-  database    = postgresql_function.pgbouncer_user_lookup[each.value].database
+  for_each    = local.app_instances
+  database    = postgresql_function.pgbouncer_user_lookup[each.key].database
  role        = "public"
  schema      = "public"
  object_type = "function"
  objects = [
-    postgresql_function.pgbouncer_user_lookup[each.value].name,
+    postgresql_function.pgbouncer_user_lookup[each.key].name,
  ]
  privileges = []
 }
 resource "postgresql_grant" "pgbouncer_user_lookup" {
-  depends_on = [ postgresql_grant.pgbouncer_user_lookup_public_revoke ] # can't do both in parallel
-  for_each = var.applications
-  database    = postgresql_function.pgbouncer_user_lookup[each.value].database
+  depends_on  = [postgresql_grant.pgbouncer_user_lookup_public_revoke] # can't do both in parallel
+  for_each    = local.app_instances
+  database    = postgresql_function.pgbouncer_user_lookup[each.key].database
  role        = "pgbouncer_auth"
  schema      = "public"
  object_type = "function"
  objects = [
-    postgresql_function.pgbouncer_user_lookup[each.value].name,
+    postgresql_function.pgbouncer_user_lookup[each.key].name,
  ]
  privileges = ["EXECUTE"]
 }
--- a/postgres/iac/terraform.tfvars
+++ b/postgres/iac/terraform.tfvars
@@ -1,7 +1,7 @@
 applications = [
-    "webapp",
-    "erp",
-    "crowdsec",
-    "plausible",
-    "dance-lessons-coach",
+  { name = "webapp" },
+  { name = "erp" },
+  { name = "crowdsec" },
+  { name = "plausible" },
+  { name = "dance-lessons-coach" },
 ]
--- a/postgres/iac/variables.tf
+++ b/postgres/iac/variables.tf
@@ -1,3 +1,11 @@
 variable "applications" {
-  type = set(string)
+  # Multi-env (ADR-0002): each application declares the environments it deploys to.
+  # `envs` defaults to ["prod"] so every existing entry is unchanged in behaviour —
+  # by the elision rule the prod instance keeps the bare `<app>` identifiers, so its
+  # database, owner role, and all derived resources keep their exact current names
+  # and Terraform addresses (a no-op plan).
+  type = set(object({
+    name = string
+    envs = optional(list(string), ["prod"])
+  }))
 }
--- a/vibe/guidebooks/lab-ecosystem/naming-conventions.md
+++ b/vibe/guidebooks/lab-ecosystem/naming-conventions.md
@@ -3,8 +3,8 @@
 # Naming conventions — the `<app>` join key

 > **Status**: 🟢 Active
-> **Last Updated**: 2026-06-23
-> **Related**: [Lab ecosystem](README.md) · [Factory brick](01-factory.md) · [Secrets & Vault](secrets-and-vault.md) · [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md)
+> **Last Updated**: 2026-06-25
+> **Related**: [Lab ecosystem](README.md) · [Factory brick](01-factory.md) · [Secrets & Vault](secrets-and-vault.md) · [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) · [ADR 0002 — per-application environments](../../ADR/0002-per-application-environments.md)
 > **Upstream (source of truth)**: [doc/runbooks/new-web-app/conventions.md](../../../doc/runbooks/new-web-app/conventions.md) (French, authoritative)

 ## TL;DR
@@ -83,9 +83,35 @@ The symptom is always the same: a brick that *looks* provisioned but never conne
 ✅ Choose a short, stable, lowercase kebab-case name up front and reuse it character-for-character.
 ❌ Never introduce variants (case, separators, plurals); nothing will warn you.

-## Why this makes a sandbox safe
+## Multiple environments per app (the `<env>` coordinate)

-The `<app>` convention is also the reason a **production-like sandbox can reuse the exact same names** without colliding with production. Because every brick derives its resource names from `<app>` and from nothing else, an entire parallel universe of the platform — its own Vault, its own Postgres instance, its own k3s namespace scope — can host an `erp` named identically to the production `erp`, provided the two universes never share a backing store. Identity comes from the *environment boundary*, not from the name; the name is free to repeat. This is what lets QA and recovery drills run against `erp`, `webapp`, etc. with realistic identifiers instead of mangled `erp-staging`-style aliases that would themselves break the name-wiring. See the PRD's [isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) for how that environment fence is drawn.
+A single application can run as several deployed instances — `prod`, `sandbox`, and so on — **without becoming a separate app**: same repo, same chart, same version. A second coordinate `<env>` extends the join key, governed by an **elision rule** ([ADR 0002](../../ADR/0002-per-application-environments.md)):
+
+- `env` defaults to `prod`, and **`prod` elides** — when `env == prod` no suffix is added, so every derived name is exactly the single-coordinate output of the mapping above. Existing apps are unaffected (their plan is a no-op).
+- Non-prod envs take the **`<app>-<env>`** suffix everywhere — namespace, Vault paths / roles / policies, ArgoCD Application, DNS, GCS state sub-prefix — with the one snake-case exception inherited from the `_role` convention: the Postgres owner role is `<app>_<env>_role`.
+- One repo, one chart, and one CI JWT role (`gitea_cicd_<app>`) serve every env; per-env differences are a `values-<env>.yaml` overlay.
+
+Worked example — `erp` (prod, elided) and `erp-sandbox`:
+
+| System | `erp` (env = prod) | `erp-sandbox` |
+| --- | --- | --- |
+| PostgreSQL database | `erp` | `erp-sandbox` |
+| PostgreSQL owner role | `erp_role` | `erp_sandbox_role` |
+| Namespace + ServiceAccount | `erp` | `erp-sandbox` |
+| Vault dynamic DB creds | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
+| Vault KV config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
+| ArgoCD Application | `erp` | `erp-sandbox` |
+| Internal DNS | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
+| Gitea repo / chart / CI JWT | `arcodange-org/erp` · chart · `gitea_cicd_erp` | shared |
+
+## Two sandbox models, two naming strategies
+
+There are two distinct ways to stand up a non-production copy, and they treat the join key differently — by design, not by accident.
+
+- **Separate-cluster sandbox** ([ADR 0001](../../ADR/0001-safe-prod-like-environment.md)) — a whole parallel universe (its own Vault, Postgres, k3s) on the control node, for rehearsing dangerous *infrastructure* changes. The two universes never share a backing store, so identity comes from the *environment boundary*, not the name: the sandbox hosts an `erp` named identically to production. Names repeat freely; no `<env>` suffix is needed, so the name-wiring stays intact and drills run against realistic identifiers.
+- **In-cluster sibling instance** ([ADR 0002](../../ADR/0002-per-application-environments.md)) — a second instance on the *same* cluster (e.g. `erp-sandbox` beside `erp`), for rehearsing *application-data* writes against the real API. Here there is no cluster fence to disambiguate by, so the `<env>` suffix *is* the separator: every derived name carries `-sandbox` to avoid colliding with prod's namespace, database, Vault paths, and DNS.
+
+Both keep the name-wiring coherent — one by repeating the slug behind a cluster fence, the other by extending the slug with the elided `<env>` coordinate. See the PRD's [isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) for how the separate-cluster fence is drawn, and [ADR 0002](../../ADR/0002-per-application-environments.md) for why the in-cluster sibling's blast radius stays bounded to one app's data.

 ## See also

@@ -93,4 +119,5 @@ The `<app>` convention is also the reason a **production-like sandbox can reuse
 - [Secrets & Vault](secrets-and-vault.md) — how `gitea_cicd_<app>` and the `<app>` / `<app>-ops` policies fit the auth model.
 - [Factory brick](01-factory.md) — where the ArgoCD app-of-apps, the Postgres OpenTofu, and the IaC live.
 - [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) — why identical names are safe across environments.
- [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md).
+- [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md) — the separate-cluster sandbox model.
+- [ADR 0002 — Per-application environments](../../ADR/0002-per-application-environments.md) — the `<env>` coordinate + elision rule, and the in-cluster sibling sandbox model.