Merge pull request 'feat(multi-env): Phase D4 — register erp-sandbox ArgoCD Application' (#18 ) from claude/phaseD-erp-sandbox-argocd into main

feat(multi-env): Phase D4 — register erp-sandbox ArgoCD Application
ADR-0002 Phase D, final step. Adds `envs: { sandbox: {} }` to the erp entry in argocd/values.yaml, so the Phase B per-env loop in templates/apps.yaml renders an extra Application "erp-sandbox": - source: same erp repo + chart, overlaid with values.yaml + values-sandbox.yaml - destination namespace: erp-sandbox (CreateNamespace=true) - syncPolicy: automated prune + selfHeal (default) GitOps activation: on merge to main, the factory app-of-apps re-renders and ArgoCD creates the erp-sandbox Application, which deploys the Dolibarr chart into the erp-sandbox namespace. The pod's VSO reads the Vault paths created in D2/D3 (auth/kubernetes/role/erp-sandbox, postgres/creds/erp-sandbox, kvv2/erp-sandbox/config) and connects to the erp-sandbox DB created in D1. Render verified: the only diff vs main is the added erp-sandbox Application; prod erp + all other apps render byte-identical. No DNS/TLS change needed (Phase E): *.arcodange.lab is a wildcard in Pi-hole (CoreDNS forwards to it) and cert-manager holds a *.arcodange.lab wildcard set as Traefik's default TLS — so erp-sandbox.arcodange.lab resolves + gets HTTPS automatically once the ingress is up. Completes Phase D. D1=factory#17, D2=tools#3, D3=erp#12 (all merged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-28 18:36:21 +02:00 · 2026-06-28 18:35:48 +02:00 · 2026-06-28 17:09:45 +02:00 · 2026-06-28 17:05:50 +02:00 · 2026-06-28 16:53:39 +02:00 · 2026-06-28 16:28:28 +02:00
11 changed files with 284 additions and 72 deletions
--- a/argocd/templates/apps.yaml
+++ b/argocd/templates/apps.yaml
@@ -31,4 +31,47 @@ spec:
    {{- end }}
    syncOptions:
      - CreateNamespace=true
+{{- /*
+  Non-prod environments (ADR-0002 elision rule): one extra Application per env
+  under `<app_attr>.envs`. Each renders the SAME repo + chart, overlaid with
+  values-<env>.yaml, into the `<app>-<env>` namespace. Apps with no `envs` key
+  render nothing extra here, so prod-only apps are byte-identical.
+*/ -}}
+{{- range $env_name, $env_attr := $app_attr.envs }}
+---
+apiVersion: argoproj.io/v1alpha1
+kind: Application
+metadata:
+  name: {{ $app_name }}-{{ $env_name }}
+  namespace: argocd
+  finalizers:
+    - resources-finalizer.argocd.argoproj.io
+  {{- with $env_attr.annotations }}
+  annotations:
+    {{- toYaml . | nindent 4 }}
+  {{- end }}
+spec:
+  project: default
+  source:
+    repoURL: https://gitea.arcodange.lab/{{ $org }}/{{ $app_name }}
+    targetRevision: HEAD
+    path: chart
+    helm:
+      valueFiles:
+        - values.yaml
+        - values-{{ $env_name }}.yaml
+  destination:
+    server: https://kubernetes.default.svc
+    namespace: {{ $app_name }}-{{ $env_name }}
+  syncPolicy:
+    {{- if $env_attr.syncPolicy }}
+    {{- toYaml $env_attr.syncPolicy | nindent 4 }}
+    {{- else }}
+    automated:
+      prune: true
+      selfHeal: true
+    {{- end }}
+    syncOptions:
+      - CreateNamespace=true
+{{- end }}
 {{ end }}
--- a/argocd/values.yaml
+++ b/argocd/values.yaml
@@ -21,6 +21,11 @@ gitea_applications:
      argocd-image-updater.argoproj.io/telegram-gateway.update-strategy: digest
  erp:
    annotations: {}
+    # Non-prod environments (ADR-0002). Each key renders an extra Application
+    # "<app>-<env>" overlaid with chart/values-<env>.yaml into namespace
+    # "<app>-<env>". Prod erp is unaffected.
+    envs:
+      sandbox: {}
  cms:
    annotations:
      argocd-image-updater.argoproj.io/image-list: cms=gitea.arcodange.lab/arcodange-org/cms:latest
--- a/doc/runbooks/new-web-app/conventions.md
+++ b/doc/runbooks/new-web-app/conventions.md
@@ -44,6 +44,29 @@ Les briques se « branchent » entre elles **par convention de nom**, pas par co
 ✅ **Utilise un nom court, stable, kebab-case** dès le départ.
 ❌ **N'introduis pas** de variantes (`my_app` vs `my-app`, `MyApp`, pluriels) : rien ne te préviendra, l'app échouera silencieusement à se connecter ou à se déployer.

+## Plusieurs environnements pour une même app
+
+Une application peut être déployée plusieurs fois (prod, sandbox, …) **sans devenir une app distincte** : même dépôt, même chart, même version. On ajoute une seconde coordonnée `<env>` au nom, régie par une **règle d'élision** ([ADR-0002](../../../vibe/ADR/0002-per-application-environments.md)) :
+
+- **`env` vaut `prod` par défaut, et `prod` s'élide.** Quand `env == prod`, **aucun suffixe** n'est ajouté : tous les noms dérivés sont identiques au cas mono-environnement décrit plus haut. Une app existante ne change donc pas (`plan` à vide).
+- **Les environnements non-prod prennent le suffixe `<app>-<env>`** en kebab-case partout — base, namespace, chemins/rôles/policies Vault, Application ArgoCD, hôte DNS, sous-préfixe d'état GCS — **à une exception** : le rôle propriétaire PostgreSQL reste en snake-case `<app>_<env>_role`, pour rester cohérent avec le suffixe `_role`.
+- **Un seul dépôt et un seul chart** servent tous les environnements ; les différences sont superposées via `values-<env>.yaml`. **Un seul rôle JWT de CI** (`gitea_cicd_<app>`) par dépôt couvre tous ses environnements.
+
+Exemple — `erp` (prod, élidé) vs `erp-sandbox` :
+
+| Système | `erp` (env = prod) | `erp-sandbox` (env = sandbox) |
+|---|---|---|
+| Base PostgreSQL | `erp` | `erp-sandbox` |
+| Rôle propriétaire PG | `erp_role` | `erp_sandbox_role` |
+| Namespace + ServiceAccount | `erp` | `erp-sandbox` |
+| Creds DB dynamiques Vault | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
+| Secret KV de config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
+| Application ArgoCD | `erp` | `erp-sandbox` |
+| Domaine interne | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
+| Dépôt Gitea / chart / JWT CI | `arcodange-org/erp` · chart · `gitea_cicd_erp` | partagés (mêmes valeurs) |
+
+Déclaration : `postgres/iac/terraform.tfvars` et la liste `applications` côté `tools` acceptent `envs = ["prod", "sandbox"]` ; l'omettre revient à `["prod"]`. L'`Application` ArgoCD non-prod se déclare via une clé `envs` sous l'app dans [argocd/values.yaml](https://gitea.arcodange.lab/arcodange-org/factory/src/branch/main/argocd/values.yaml).
+
 ## Références croisées

 - [01 · Dépôt Gitea](01-gitea-repo.md) — fixe `<app>` comme nom de dépôt sous `arcodange-org`.
--- a/iac/cloudflare.tf
+++ b/iac/cloudflare.tf
@@ -14,16 +14,6 @@ resource "cloudflare_r2_bucket" "arcodange_tf" {
  jurisdiction = "eu"
 }

-# One-time state reconcile. The arcodange-tf R2 bucket already exists in the EU jurisdiction, but its
-# prior state entry lacked the jurisdiction, so cloudflare provider >= 5.20 read it as "not found" and
-# tried to recreate it (which fails: "already exists"). Re-import it with the jurisdiction-qualified id
-# (<account_id>/<bucket_name>/<jurisdiction>) so the next apply adopts the real bucket instead.
-# This block is a no-op once the bucket is in state and can be removed afterwards.
-import {
-  to = cloudflare_r2_bucket.arcodange_tf
-  id = "f7fcf28c0823cecb44e53b6e92d5144f/arcodange-tf/eu"
-}
-
 module "cf_r2_arcodange_tf_token" {
  source     = "./modules/cloudflare_token"
  account_id = local.cloudflare_account_id
--- a/postgres/iac/main.tf
+++ b/postgres/iac/main.tf
@@ -1,12 +1,30 @@
+locals {
+  # Flatten applications × envs into per-instance objects, keyed by the elided
+  # instance id (ADR-0002 elision rule): env=prod → "<app>", else "<app>-<env>".
+  # The Postgres owner role stays snake-case: "<app>_role" (prod) / "<app>_<env>_role".
+  # For a prod-only app the key equals "<app>", database equals "<app>", and role
+  # equals "<app>_role" — identical to the previous set(string) for_each, so every
+  # resource address and attribute is unchanged (a no-op plan).
+  app_instances = merge([
+    for app in var.applications : {
+      for env in app.envs :
+      (env == "prod" ? app.name : "${app.name}-${env}") => {
+        database = env == "prod" ? app.name : "${app.name}-${env}"
+        role     = env == "prod" ? "${app.name}_role" : "${app.name}_${env}_role"
+      }
+    }
+  ]...)
+}
+
 resource "random_password" "credentials_editor" {
  length           = 24
  override_special = "-:!+<>"
 }

 resource "postgresql_role" "credentials_editor" {
-  name     = "credentials_editor"
-  login    = true
-  password = random_password.credentials_editor.result
+  name        = "credentials_editor"
+  login       = true
+  password    = random_password.credentials_editor.result
  create_role = true
  lifecycle {
    ignore_changes = [
@@ -24,74 +42,74 @@ resource "vault_kv_secret" "postgres_admin_credentials" {
 }

 resource "postgresql_role" "app_role" {
-  for_each = var.applications
-  name     = "${each.value}_role"
+  for_each = local.app_instances
+  name     = each.value.role
  login    = false
 }
 resource "postgresql_grant_role" "credentials_editor_app_role" {
-  for_each = var.applications
-  role       = postgresql_role.credentials_editor.name
-  grant_role = postgresql_role.app_role[each.value].name
+  for_each          = local.app_instances
+  role              = postgresql_role.credentials_editor.name
+  grant_role        = postgresql_role.app_role[each.key].name
  with_admin_option = true
 }
 resource "postgresql_database" "app_db" {
-  for_each = var.applications
-  name                   = each.value
-  owner                  = postgresql_role.app_role[each.value].name
+  for_each               = local.app_instances
+  name                   = each.value.database
+  owner                  = postgresql_role.app_role[each.key].name
  template               = "template0"
  alter_object_ownership = true
 }
 resource "postgresql_function" "pgbouncer_user_lookup" {
-    for_each = var.applications
-    name = "user_lookup"
-    database = postgresql_database.app_db[each.value].name
-    arg {
-        mode = "IN"
-        name = "i_username"
-        type = "text"
-    }
-    arg {
-        mode = "OUT"
-        name = "uname"
-        type = "text"
-    }
-    arg {
-        mode = "OUT"
-        name = "phash"
-        type = "text"
-    }
-    returns = "record"
-    language = "plpgsql"
-    body = <<-EOF
+  for_each = local.app_instances
+  name     = "user_lookup"
+  database = postgresql_database.app_db[each.key].name
+  arg {
+    mode = "IN"
+    name = "i_username"
+    type = "text"
+  }
+  arg {
+    mode = "OUT"
+    name = "uname"
+    type = "text"
+  }
+  arg {
+    mode = "OUT"
+    name = "phash"
+    type = "text"
+  }
+  returns          = "record"
+  language         = "plpgsql"
+  body             = <<-EOF
        BEGIN
            SELECT usename, passwd FROM pg_catalog.pg_shadow
            WHERE usename = i_username INTO uname, phash;
            RETURN;
        END;
    EOF
-    parallel = "SAFE"
-    security_definer = true
+  parallel         = "SAFE"
+  security_definer = true
 }
 resource "postgresql_grant" "pgbouncer_user_lookup_public_revoke" {
-  for_each = var.applications
-  database    = postgresql_function.pgbouncer_user_lookup[each.value].database
+  for_each    = local.app_instances
+  database    = postgresql_function.pgbouncer_user_lookup[each.key].database
  role        = "public"
  schema      = "public"
  object_type = "function"
-  objects     = [
-    postgresql_function.pgbouncer_user_lookup[each.value].name,
+  objects = [
+    postgresql_function.pgbouncer_user_lookup[each.key].name,
  ]
-  privileges  = []
+  privileges = []
 }
 resource "postgresql_grant" "pgbouncer_user_lookup" {
-  depends_on = [ postgresql_grant.pgbouncer_user_lookup_public_revoke ] # can't do both in parallel
-  for_each = var.applications
-  database    = postgresql_function.pgbouncer_user_lookup[each.value].database
+  depends_on  = [postgresql_grant.pgbouncer_user_lookup_public_revoke] # can't do both in parallel
+  for_each    = local.app_instances
+  database    = postgresql_function.pgbouncer_user_lookup[each.key].database
  role        = "pgbouncer_auth"
  schema      = "public"
  object_type = "function"
-  objects     = [
-    postgresql_function.pgbouncer_user_lookup[each.value].name,
+  objects = [
+    postgresql_function.pgbouncer_user_lookup[each.key].name,
  ]
-  privileges  = ["EXECUTE"]
-}
+  privileges = ["EXECUTE"]
+}
--- a/postgres/iac/terraform.tfvars
+++ b/postgres/iac/terraform.tfvars
@@ -1,7 +1,7 @@
 applications = [
-    "webapp",
-    "erp",
-    "crowdsec",
-    "plausible",
-    "dance-lessons-coach",
-]
+  { name = "webapp" },
+  { name = "erp", envs = ["prod", "sandbox"] },
+  { name = "crowdsec" },
+  { name = "plausible" },
+  { name = "dance-lessons-coach" },
+]
--- a/postgres/iac/variables.tf
+++ b/postgres/iac/variables.tf
@@ -1,3 +1,11 @@
 variable "applications" {
-  type = set(string)
-}
+  # Multi-env (ADR-0002): each application declares the environments it deploys to.
+  # `envs` defaults to ["prod"] so every existing entry is unchanged in behaviour —
+  # by the elision rule the prod instance keeps the bare `<app>` identifiers, so its
+  # database, owner role, and all derived resources keep their exact current names
+  # and Terraform addresses (a no-op plan).
+  type = set(object({
+    name = string
+    envs = optional(list(string), ["prod"])
+  }))
+}
--- a/vibe/ADR/0002-per-application-environments.md
+++ b/vibe/ADR/0002-per-application-environments.md
@@ -0,0 +1,97 @@
+[vibe](../README.md) > [ADR](README.md) > **0002 · Per-application environments**
+
+# ADR-0002: Per-application environments via an env coordinate
+
+> **Status**: Accepted
+> **Date**: 2026-06-25
+> **Deciders**: @arcodange
+
+## Context
+
+The [`<app>` join key](../../doc/runbooks/new-web-app/conventions.md) threads one kebab-case identifier identically through every system that makes up an application: the Gitea repo, the Postgres database + `<app>_role`, Vault (`postgres/creds/<app>`, the k8s auth role `<app>`, the policies `<app>` / `<app>-ops`, the CI JWT role `gitea_cicd_<app>`), the k8s namespace + ServiceAccount, the ArgoCD Application, the GCS state prefix `<app>/main`, and DNS (`<app>.arcodange.lab`). Bricks wire together by name convention, not explicit config.
+
+That convention conflates two ideas it never separated: an **application** and a **deployed instance** of it. There is exactly one of everything per app — one namespace, one database, one Vault creds path, one DNS host. The model cannot express "the same app, a second time, somewhere else."
+
+The motivating need makes the gap concrete. The Arcodange Dolibarr ERP is growing a write-capable AI-agent skill — auto-creating supplier invoices from ingested emails, fixing thirdparty data, and similar mutations. Before such writes touch the production accounting database, the operator needs a place where the agent can run write operations autonomously, a human reviews the result, and only then the same operation is promoted to prod. That requires a **second deployed instance of the same application**: the same Dolibarr chart, the same version, the same conventions — differing only in *where* it runs and *which data* it touches.
+
+| Force | Pressure it creates |
+| --- | --- |
+| One identifier per app, no env coordinate | "Same app, different environment" is inexpressible without inventing a whole second app. |
+| Write-capable AI agent landing on the prod ERP | A wrong autonomous write corrupts live accounting data with no rehearsal surface. |
+| Fidelity requirement for the rehearsal surface | The sandbox must run the *real* Dolibarr API against *prod-like* data, or the rehearsal predicts nothing. |
+| [ADR-0001](0001-safe-prod-like-environment.md) rejected an in-cluster sandbox | Its Alternative 3 ("sandbox namespace on the real cluster") was rejected for shared blast radius — so any in-cluster sibling instance must be reconciled against that, not pretended away. |
+
+Treating the sandbox as a wholly separate app would fork the chart, the repo, the runbook chain, and the Vault wiring — four things that then drift apart over time, defeating the "same app, same version" fidelity the rehearsal depends on.
+
+## Decision
+
+We will extend the `<app>` convention with a second coordinate, `<env>`, governed by an **elision rule** so that adding the coordinate changes nothing for any existing app.
+
+- **`env` defaults to `prod`, and `prod` elides.** When `env == prod`, no suffix is added: every derived name is character-for-character identical to today's single-env output. The instance name equals the app name (`local.instance == local.name`), so every existing app's `tofu plan` is a no-op.
+- **Non-prod envs take the `<app>-<env>` suffix** in kebab-case everywhere — namespace, Vault paths / roles / policies, ArgoCD Application, DNS host, GCS-state sub-prefix — with one exception: the Postgres owner role stays snake-case as `<app>_<env>_role`, matching the existing `_role` suffix convention.
+- **One repo and one chart serve every env of an app.** Per-env differences are overlaid via `values-<env>.yaml`; the chart's instance-specific values are `.Values`-driven, not hardcoded literals, so the same chart renders any instance.
+- **One CI JWT role (`gitea_cicd_<app>`) per repo covers all its envs.** Its ops policy is widened to the `<app>-*` path family. Each running instance keeps its own runtime Vault policy.
+
+### Worked example: `erp` and `erp-sandbox`
+
+| Coordinate | `erp` (env = prod, elided) | `erp-sandbox` (env = sandbox) |
+| --- | --- | --- |
+| Postgres database | `erp` | `erp-sandbox` |
+| Postgres owner role | `erp_role` | `erp_sandbox_role` |
+| k8s namespace + ServiceAccount | `erp` | `erp-sandbox` |
+| Vault dynamic DB creds | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
+| Vault KV config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
+| ArgoCD Application | `erp` | `erp-sandbox` |
+| Internal DNS | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
+| Gitea repo | `arcodange-org/erp` | `arcodange-org/erp` (shared) |
+| Helm chart | one chart | one chart (shared) |
+| CI JWT role | `gitea_cicd_erp` | `gitea_cicd_erp` (shared) |
+
+### Why this is not what ADR-0001 rejected
+
+[ADR-0001](0001-safe-prod-like-environment.md) chose a **local-only** safe environment (k3d / arm64 VMs) and rejected its Alternative 3, an in-cluster "sandbox namespace on the real cluster," for shared blast radius. ADR-0002 introduces an in-cluster sibling instance (`erp-sandbox`), which looks like the very thing that was rejected. The two stand together because they operate at **different layers**.
+
+ADR-0001's rejection is scoped to rehearsing **infrastructure / platform** change-classes — Ansible playbooks, Vault policy / auth / mount changes, Postgres superuser migrations, ArgoCD prune / selfHeal, Longhorn ops, DNS / email. Those couplings share fleet-wide control planes, so an in-cluster sandbox cannot isolate them; only a separate cluster + Vault + state + DNS zone can. That is exactly why ADR-0001 is local-only.
+
+ADR-0002 operates one layer up. The AI agent's only reach is the **Dolibarr HTTP API**, holding a write-scoped, app-specific API key against an isolated database — `erp-sandbox` on its own `erp_sandbox_role`, its own namespace, its own Vault creds path. The agent never touches kubectl, the Vault root, the Postgres superuser, ArgoCD, Longhorn, or DNS. The fleet-level blast radius that doomed Alternative 3 for infra rehearsal is simply **not in the agent's reach**; the blast radius of a wrong AI write is bounded to the sandbox app's own data.
+
+The two ADRs are therefore complementary, not contradictory, and ADR-0002 does not supersede ADR-0001. ADR-0001 isolates the *operator* from breaking the *fleet*. ADR-0002 isolates the *AI agent* from corrupting *one app's production data*, while preserving the prod-like API surface and real-data fidelity that the local k3d sandbox — which carries no prod data — cannot offer.
+
+## Consequences
+
+- **+** Every existing app (webapp, erp, crowdsec, plausible, dance-lessons-coach, cms) is unaffected: the elision rule makes the prod instance's derived names byte-identical, so adoption ships with zero migration and a no-op plan.
+- **+** A second instance of an app is now a `values-<env>.yaml` overlay plus an `envs` entry — not a forked repo, chart, and runbook chain — so prod and sandbox share one source of truth and stay on the same version by construction.
+- **+** The AI-agent write skill gets a prod-like rehearsal surface with real-shaped data: the *same* Dolibarr API and chart, an *isolated* database, a bounded blast radius.
+- **+** The convention chain (db + role → Vault creds + policy → namespace + SA → ArgoCD → DNS) is reused verbatim for the `-sandbox` instance, so runbooks read identically for any env.
+- **−** Names are no longer a flat app list: every consumer must reason about the `instance == app` (prod) versus `app-env` (non-prod) distinction, and the snake-case owner-role exception (`<app>_<env>_role`) is a special case that must be carried in the modules.
+- **−** A single shared Vault CI policy widened to `<app>-*` means the CI role for a repo can write the ops paths of *all* that repo's envs — a deliberately looser ops scope than one-policy-per-instance.
+- **−** A single shared OpenTofu state per repo holds every env's resources together, so the envs of one app share a blast radius at the state layer (mitigated by `for_each`, accepted at current scale — see Alternatives).
+- **→** The AI-agent promotion workflow this unlocks: the agent runs writes against `erp-sandbox` autonomously, emits a structured changeset, a human reviews it, and the **same** operation is re-applied to prod only with explicit confirmation — never auto-applied by the agent. The read/write skills resolve their target by an env switch (e.g. `DOLIBARR_TARGET=prod|sandbox`, defaulting to `prod`).
+- **→** Rollout is additive and phased, each phase gated by a no-op `tofu plan` against existing apps: **(A)** the `tools` repo adds an optional `env` / `envs` parameter to the shared `app_roles` and `app_policy` Vault modules; **(B)** the `factory` repo gains the `envs` schema in `postgres/iac` tfvars, renders one ArgoCD Application per env, and documents the elision rule in `conventions.md`; **(C)** the `erp` chart literals are templated to `.Values`; **(D)** `erp` + `factory` activate `erp-sandbox`; **(E)** DNS + ArgoCD registration.
+- **→** Per-env state separation (`<app>/<env>` prefixes) is a door left open: if env-to-env blast-radius isolation at the state layer becomes warranted, the prefix scheme can be revisited without changing the naming model.
+
+## Alternatives considered
+
+| Option | Why not |
+| --- | --- |
+| Treat `erp-sandbox` as a wholly separate `<app>` (own repo, own chart copy) | Forks the chart, the repo, and the runbook chain; the two copies drift over time; defeats the "same app, same version" fidelity the rehearsal depends on. |
+| Use the [ADR-0001](0001-safe-prod-like-environment.md) local-only sandbox (k3d / VMs) for the AI-agent writes | That environment carries **no production data** — the write-rehearsal needs prod-like data and the real Dolibarr API surface to be meaningful. Complementary to ADR-0001, not a substitute for it. |
+| Per-env OpenTofu state (`<app>/<env>` prefixes) instead of one shared state per repo | Buys more env-to-env blast-radius isolation, but at the cost of more CI plumbing and cross-env output wiring than current scale warrants; one shared state with `for_each` keeps runbooks simple. A real decision point — the chosen path is single shared state per repo, with the prefix scheme left as a future door. |
+| No elision — always suffix, even prod (`<app>-prod`) | Breaks every existing derived name, forcing a fleet-wide rename plus `tofu` resource moves; rejected in favour of the elision rule's zero-migration property. |
+
+## QA & validation
+
+- **Backwards-compat no-op gate** — after the module change, `tofu plan` against every existing app (webapp, erp, crowdsec, plausible, dance-lessons-coach, cms) reports zero changes. The elision rule guarantees `local.instance == local.name` for `env == prod`, so no prod resource moves.
+- **Byte-identical chart render** — `helm template erp chart/` before versus after the literal-templating refactor diffs to nothing (verified: 10857 bytes on both sides, `diff` exit 0).
+- **`tofu fmt -check` + `tofu validate`** are clean on the module changes.
+- **Sandbox activation gate** — when `erp-sandbox` is stood up, the [new-web-app convention chain](../../doc/runbooks/new-web-app/conventions.md) must resolve end to end for the `-sandbox` instance (db + role → Vault creds + policy → namespace + SA → ArgoCD Healthy/Synced → VSO injects → pod Running), exactly as the prod instance does.
+- **Promotion gate** — no AI-authored write reaches the prod ERP until it has been applied to `erp-sandbox`, produced a reviewed changeset, and been explicitly re-applied with human confirmation.
+
+## References
+
+- [ADR-0001 · Safe, production-like environment](0001-safe-prod-like-environment.md) — the local-only safe environment for infra rehearsal that this ADR complements (it stands; this does not supersede it).
+- [PRD · Safe, production-like environment](../PRD/safe-prod-like-environment/README.md) — the product view this work relates to, and its [isolation-boundary leaf](../PRD/safe-prod-like-environment/isolation-boundary.md) detailing the cluster/Vault/state/DNS boundary.
+- [new-web-app conventions](../../doc/runbooks/new-web-app/conventions.md) — the single-env `<app>` convention this ADR extends with the env coordinate.
+- [Phase A — `tools` Vault module env parameter](https://gitea.arcodange.lab/arcodange-org/tools/pulls/2) — adds the optional `env` / `envs` parameter to the shared `app_roles` and `app_policy` modules.
+- [Phase C — `erp` chart literal templating](https://gitea.arcodange.lab/arcodange-org/erp/pulls/11) — templates the chart's single-env literals to `.Values` so one chart renders any instance.
+- [PR factory#15 — this ADR](https://gitea.arcodange.lab/arcodange-org/factory/pulls/15) — the change that introduces ADR-0002 (links back to this file).
--- a/vibe/ADR/README.md
+++ b/vibe/ADR/README.md
@@ -3,7 +3,7 @@
 # Architecture Decision Records

 > **Status**: 🟢 Active
-> **Last Updated**: 2026-06-23
+> **Last Updated**: 2026-06-25
 > **Related**: [vibe/PRD](../PRD/README.md) · [vibe/Investigations](../investigations/README.md)
 > **Historical**: [doc/adr](../../doc/adr/README.md) (foundational infra) · [ansible/.../docs/adr](../../ansible/arcodange/factory/docs/adr/) (dated infra ADRs)

@@ -34,6 +34,7 @@ When a new decision *supersedes* one of the historical records, write the new AD
 | # | Title | Status | Date |
 | --- | --- | --- | --- |
 | [0001](0001-safe-prod-like-environment.md) | Safe, production-like environment | 🟢 Accepted | 2026-06-23 |
+| [0002](0002-per-application-environments.md) | Per-application environments | 🟢 Accepted | 2026-06-25 |

 ## Rules to contribute

--- a/vibe/PRD/safe-prod-like-environment/README.md
+++ b/vibe/PRD/safe-prod-like-environment/README.md
@@ -3,9 +3,9 @@
 # Safe, production-like environment

 > **Status:** In design
-> **Last Updated:** 2026-06-23
+> **Last Updated:** 2026-06-25
 > **Design record:** [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md)
-> **Adjacent:** [INV-001 — prod blast-radius couplings](../../investigations/INV-001-prod-blast-radius-couplings.md)
+> **Adjacent:** [INV-001 — prod blast-radius couplings](../../investigations/INV-001-prod-blast-radius-couplings.md) · [ADR 0002 — per-application environments](../../ADR/0002-per-application-environments.md) (the application-data-layer counterpart)
 > **Map:** [Lab ecosystem guidebook](../../guidebooks/lab-ecosystem/README.md)

 ## Problem
--- a/vibe/guidebooks/lab-ecosystem/naming-conventions.md
+++ b/vibe/guidebooks/lab-ecosystem/naming-conventions.md
@@ -3,8 +3,8 @@
 # Naming conventions — the `<app>` join key

 > **Status**: 🟢 Active
-> **Last Updated**: 2026-06-23
-> **Related**: [Lab ecosystem](README.md) · [Factory brick](01-factory.md) · [Secrets & Vault](secrets-and-vault.md) · [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md)
+> **Last Updated**: 2026-06-25
+> **Related**: [Lab ecosystem](README.md) · [Factory brick](01-factory.md) · [Secrets & Vault](secrets-and-vault.md) · [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) · [ADR 0002 — per-application environments](../../ADR/0002-per-application-environments.md)
 > **Upstream (source of truth)**: [doc/runbooks/new-web-app/conventions.md](../../../doc/runbooks/new-web-app/conventions.md) (French, authoritative)

 ## TL;DR
@@ -83,9 +83,35 @@ The symptom is always the same: a brick that *looks* provisioned but never conne
 ✅ Choose a short, stable, lowercase kebab-case name up front and reuse it character-for-character.
 ❌ Never introduce variants (case, separators, plurals); nothing will warn you.

-## Why this makes a sandbox safe
+## Multiple environments per app (the `<env>` coordinate)

-The `<app>` convention is also the reason a **production-like sandbox can reuse the exact same names** without colliding with production. Because every brick derives its resource names from `<app>` and from nothing else, an entire parallel universe of the platform — its own Vault, its own Postgres instance, its own k3s namespace scope — can host an `erp` named identically to the production `erp`, provided the two universes never share a backing store. Identity comes from the *environment boundary*, not from the name; the name is free to repeat. This is what lets QA and recovery drills run against `erp`, `webapp`, etc. with realistic identifiers instead of mangled `erp-staging`-style aliases that would themselves break the name-wiring. See the PRD's [isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) for how that environment fence is drawn.
+A single application can run as several deployed instances — `prod`, `sandbox`, and so on — **without becoming a separate app**: same repo, same chart, same version. A second coordinate `<env>` extends the join key, governed by an **elision rule** ([ADR 0002](../../ADR/0002-per-application-environments.md)):
+
+- `env` defaults to `prod`, and **`prod` elides** — when `env == prod` no suffix is added, so every derived name is exactly the single-coordinate output of the mapping above. Existing apps are unaffected (their plan is a no-op).
+- Non-prod envs take the **`<app>-<env>`** suffix everywhere — namespace, Vault paths / roles / policies, ArgoCD Application, DNS, GCS state sub-prefix — with the one snake-case exception inherited from the `_role` convention: the Postgres owner role is `<app>_<env>_role`.
+- One repo, one chart, and one CI JWT role (`gitea_cicd_<app>`) serve every env; per-env differences are a `values-<env>.yaml` overlay.
+
+Worked example — `erp` (prod, elided) and `erp-sandbox`:
+
+| System | `erp` (env = prod) | `erp-sandbox` |
+| --- | --- | --- |
+| PostgreSQL database | `erp` | `erp-sandbox` |
+| PostgreSQL owner role | `erp_role` | `erp_sandbox_role` |
+| Namespace + ServiceAccount | `erp` | `erp-sandbox` |
+| Vault dynamic DB creds | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
+| Vault KV config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
+| ArgoCD Application | `erp` | `erp-sandbox` |
+| Internal DNS | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
+| Gitea repo / chart / CI JWT | `arcodange-org/erp` · chart · `gitea_cicd_erp` | shared |
+
+## Two sandbox models, two naming strategies
+
+There are two distinct ways to stand up a non-production copy, and they treat the join key differently — by design, not by accident.
+
+- **Separate-cluster sandbox** ([ADR 0001](../../ADR/0001-safe-prod-like-environment.md)) — a whole parallel universe (its own Vault, Postgres, k3s) on the control node, for rehearsing dangerous *infrastructure* changes. The two universes never share a backing store, so identity comes from the *environment boundary*, not the name: the sandbox hosts an `erp` named identically to production. Names repeat freely; no `<env>` suffix is needed, so the name-wiring stays intact and drills run against realistic identifiers.
+- **In-cluster sibling instance** ([ADR 0002](../../ADR/0002-per-application-environments.md)) — a second instance on the *same* cluster (e.g. `erp-sandbox` beside `erp`), for rehearsing *application-data* writes against the real API. Here there is no cluster fence to disambiguate by, so the `<env>` suffix *is* the separator: every derived name carries `-sandbox` to avoid colliding with prod's namespace, database, Vault paths, and DNS.
+
+Both keep the name-wiring coherent — one by repeating the slug behind a cluster fence, the other by extending the slug with the elided `<env>` coordinate. See the PRD's [isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) for how the separate-cluster fence is drawn, and [ADR 0002](../../ADR/0002-per-application-environments.md) for why the in-cluster sibling's blast radius stays bounded to one app's data.

 ## See also

@@ -93,4 +119,5 @@ The `<app>` convention is also the reason a **production-like sandbox can reuse
 - [Secrets & Vault](secrets-and-vault.md) — how `gitea_cicd_<app>` and the `<app>` / `<app>-ops` policies fit the auth model.
 - [Factory brick](01-factory.md) — where the ArgoCD app-of-apps, the Postgres OpenTofu, and the IaC live.
 - [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) — why identical names are safe across environments.
- [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md).
+- [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md) — the separate-cluster sandbox model.
+- [ADR 0002 — Per-application environments](../../ADR/0002-per-application-environments.md) — the `<env>` coordinate + elision rule, and the in-cluster sibling sandbox model.
Author	SHA1	Message	Date
arcodange	5c60677171	Merge pull request 'feat(multi-env): Phase D4 — register erp-sandbox ArgoCD Application' (#18 ) from claude/phaseD-erp-sandbox-argocd into main	2026-06-28 18:36:21 +02:00
Gabriel Radureau	90498e4f55	feat(multi-env): Phase D4 — register erp-sandbox ArgoCD Application ADR-0002 Phase D, final step. Adds `envs: { sandbox: {} }` to the erp entry in argocd/values.yaml, so the Phase B per-env loop in templates/apps.yaml renders an extra Application "erp-sandbox": - source: same erp repo + chart, overlaid with values.yaml + values-sandbox.yaml - destination namespace: erp-sandbox (CreateNamespace=true) - syncPolicy: automated prune + selfHeal (default) GitOps activation: on merge to main, the factory app-of-apps re-renders and ArgoCD creates the erp-sandbox Application, which deploys the Dolibarr chart into the erp-sandbox namespace. The pod's VSO reads the Vault paths created in D2/D3 (auth/kubernetes/role/erp-sandbox, postgres/creds/erp-sandbox, kvv2/erp-sandbox/config) and connects to the erp-sandbox DB created in D1. Render verified: the only diff vs main is the added erp-sandbox Application; prod erp + all other apps render byte-identical. No DNS/TLS change needed (Phase E): .arcodange.lab is a wildcard in Pi-hole (CoreDNS forwards to it) and cert-manager holds a .arcodange.lab wildcard set as Traefik's default TLS — so erp-sandbox.arcodange.lab resolves + gets HTTPS automatically once the ingress is up. Completes Phase D. D1=factory#17, D2=tools#3, D3=erp#12 (all merged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-28 18:35:48 +02:00
arcodange	a38c8b39f1	Merge pull request 'feat(multi-env): Phase D1 — provision erp-sandbox Postgres DB + role' (#17 ) from claude/phaseD-erp-sandbox-postgres into main	2026-06-28 17:09:45 +02:00
Gabriel Radureau	00a838799b	feat(multi-env): Phase D1 — provision erp-sandbox Postgres DB + role Activates the sandbox environment for the ERP on the Postgres side (ADR-0002 Phase D). `erp` gains `envs = ["prod", "sandbox"]`, so the elision flatten now materialises a second instance `erp-sandbox`: - database `erp-sandbox` - owner role `erp_sandbox_role` (snake-case per the convention) - pgbouncer user_lookup function + grants for the new DB The prod `erp` instance is unchanged (db `erp`, role `erp_role`) — the apply is purely additive (~6 resources for erp-sandbox, 0 changed, 0 destroyed on everything else). Verified the flatten output with a standalone tofu apply before pushing. This is D1 of the Phase D activation. D2 (tools Vault policies), D3 (erp iac creds + KV), D4 (ArgoCD Application) follow in order. Refs ADR-0002 (factory#15), Phase B (factory#16). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-28 17:05:50 +02:00
arcodange	235ff72ac0	Merge pull request 'feat(multi-env): Phase B — factory machinery env-capable (no activation)' (#16 ) from claude/multi-env-phaseb into main	2026-06-28 16:53:39 +02:00
Gabriel Radureau	c00c4cdd5c	feat(multi-env): Phase B — make factory machinery env-capable (no activation) ADR-0002 Phase B. Makes postgres/iac, argocd, and the conventions docs multi-environment-capable WITHOUT activating any sandbox yet — every app stays prod-only, so this change is behaviour-neutral: - postgres/iac `tofu plan` is a no-op (proven: the elision flatten keys are bare app names, db=<app>, role=<app>_role — identical addresses) - the argocd apps.yaml render is byte-identical (181→181 lines, diff empty) since no app declares `envs` postgres/iac: - variables.tf: `applications` becomes set(object({name, envs=optional(["prod"])})) - main.tf: a `local.app_instances` flatten of applications × envs keyed by the elided instance id (env=prod → "<app>"); per-app resources iterate it and reference each.key / each.value.{database,role}. For prod-only apps every resource address + attribute is unchanged. (main.tf also got a full `tofu fmt` pass — the pgbouncer function block reindents 4→2 spaces, which is cosmetic; the correctness gate is the CI tofu plan, not the text diff.) - terraform.tfvars: string entries → { name = "..." } objects. argocd/templates/apps.yaml: - after the prod Application, a `range $app_attr.envs` loop renders one extra Application per non-prod env: name/namespace `<app>-<env>`, shared repoURL, helm.valueFiles [values.yaml, values-<env>.yaml], per-env syncPolicy override. Renders nothing while no app sets `envs` → prod render unchanged. docs: - doc/runbooks/new-web-app/conventions.md (FR, authoritative): new section "Plusieurs environnements pour une même app" — elision rule, suffix rule, snake-case owner-role exception, erp/erp-sandbox table, ADR-0002 link. - vibe/guidebooks/lab-ecosystem/naming-conventions.md (EN mirror): the env coordinate section + a "Two sandbox models" section reconciling the separate-cluster (ADR-0001, names repeat) vs in-cluster sibling (ADR-0002, <env> suffix) strategies; Last Updated bumped; ADR-0002 cross-links. Activation (erp gets envs=["prod","sandbox"] in postgres tfvars + argocd values + erp/iac) is Phase D, gated by its own plan review. Refs ADR-0002 (factory#15). Phase A = tools#2 (merged). Phase C = erp#11 (merged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-28 16:28:28 +02:00
arcodange	8a1a63ee10	Merge pull request 'docs(adr): ADR-0002 — per-application environments via an env coordinate' (#15 ) from claude/adr-multi-env into main	2026-06-28 16:17:37 +02:00
Gabriel Radureau	c35b510040	docs(adr): fill the ADR-0002 ↔ PR backlink (factory#15) Replaces the placeholder References line with the PR URL so the ADR↔PR crosslink is bidirectional per the AGENTS.md rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-25 14:56:09 +02:00
Gabriel Radureau	3961914613	docs(adr): ADR-0002 — per-application environments via an env coordinate Records the decision to extend the <app> join key with a second coordinate <env>, governed by an elision rule (env=prod elides → every existing app's derived names are byte-identical and its tofu plan is a no-op; non-prod envs take the <app>-<env> suffix, with the Postgres owner role staying snake-case <app>_<env>_role). Motivated by the ERP's incoming write-capable AI-agent skill: it needs an in-cluster sandbox instance (erp-sandbox) with a prod-like Dolibarr API + isolated database to rehearse writes before a human promotes them to prod. The ADR reconciles this against ADR-0001 honestly — ADR-0001 rejected an in-cluster sandbox for INFRA-change rehearsal (shared fleet-wide control planes); ADR-0002 operates one layer up where the agent's only reach is the app's HTTP API against an isolated DB, so the fleet blast radius is not in scope. The two are complementary; ADR-0002 does not supersede ADR-0001. Also: - vibe/ADR/README.md: index row for 0002 + Last Updated 2026-06-25 - PRD safe-prod-like-environment README: bidirectional back-link to ADR-0002 on the Adjacent line + Last Updated 2026-06-25 Authored via the ADR Scribe persona, validated via the Continuity Warden checklist (no-tombstone, breadcrumb, MADR-lite sections, dead-link scan, bidirectional links). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-25 14:55:19 +02:00
arcodange	801724e1bc	Merge pull request 'chore(iac): remove spent R2 import block' (#14 ) from arcodange/r2-import-cleanup into main	2026-06-24 13:24:09 +02:00
Gabriel Radureau	7727b244ad	chore(iac): remove spent R2 import block The one-time import block from the previous change reconciled cloudflare_r2_bucket.arcodange_tf into state (run #29: "Import complete", "Apply complete! Resources: 1 imported"). It is now a no-op, so remove it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-24 13:23:42 +02:00
arcodange	e2a79a08a7	Merge pull request 'fix(iac): import existing EU R2 bucket into state' (#13 ) from arcodange/r2-state-import into main	2026-06-24 13:19:56 +02:00