18 Commits

Author SHA1 Message Date
9a42346852 Merge pull request 'docs(adr): ADR-0003 — sandbox state lifecycle (iso-prod seed, reset & prod-write isolation)' (#19) from claude/adr-0003-sandbox-reset into main 2026-06-28 20:21:54 +02:00
8e69004b4c docs(adr): fill the ADR-0003 ↔ PR backlink (factory#19)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-28 20:21:45 +02:00
23d8bc9231 docs(adr): ADR-0003 — sandbox state lifecycle (iso-prod seed, reset & prod-write isolation)
Records how erp-sandbox's DATA is seeded, reset, and kept structurally
incapable of harming prod — the application-data-layer complement to ADR-0001
(which rejected an in-cluster sandbox for INFRA rehearsal) and the lifecycle for
the erp-sandbox instance ADR-0002 stood up.

Decision: (1) iso-prod golden via read-only pg_dump of prod erp, app-scoped to
llx_*; (2) reset = DROP OWNED BY erp_sandbox_role CASCADE + pg_restore
--no-owner --role=erp_sandbox_role into the EXISTING db (no DROP/CREATE DATABASE,
no CREATEDB, no superuser; provisioner-owned infra objects like the pgbouncer
user_lookup function are left untouched); (3) prod-write isolation as a
structural invariant (superuser only in human-gated postgres.yaml CI; DROP
DATABASE gated by ownership — erp_sandbox_role owns only erp-sandbox, never prod
erp/erp_role; sandbox-scoped Dolibarr key; membership-only runtime creds;
host-guard; resettability); plus a human-gated promote via the read-only
dolibarr-data-snapshot diff under a separate prod-write credential.

The reset mechanism + the integrity invariant were validated against the live
erp-sandbox: DROP OWNED BY erp_sandbox_role + app-scoped pg_restore round-trips
to the golden checkpoint using only erp_sandbox_role membership (superuser=false,
createdb=false, not a member of erp_role), proving prod is structurally
unreachable from the sandbox credential.

Drafted via a clean-context agent; mechanism refined from a live prototype.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-28 20:21:00 +02:00
5c60677171 Merge pull request 'feat(multi-env): Phase D4 — register erp-sandbox ArgoCD Application' (#18) from claude/phaseD-erp-sandbox-argocd into main 2026-06-28 18:36:21 +02:00
90498e4f55 feat(multi-env): Phase D4 — register erp-sandbox ArgoCD Application
ADR-0002 Phase D, final step. Adds `envs: { sandbox: {} }` to the erp entry
in argocd/values.yaml, so the Phase B per-env loop in templates/apps.yaml
renders an extra Application "erp-sandbox":
  - source: same erp repo + chart, overlaid with values.yaml + values-sandbox.yaml
  - destination namespace: erp-sandbox (CreateNamespace=true)
  - syncPolicy: automated prune + selfHeal (default)

GitOps activation: on merge to main, the factory app-of-apps re-renders and
ArgoCD creates the erp-sandbox Application, which deploys the Dolibarr chart
into the erp-sandbox namespace. The pod's VSO reads the Vault paths created in
D2/D3 (auth/kubernetes/role/erp-sandbox, postgres/creds/erp-sandbox,
kvv2/erp-sandbox/config) and connects to the erp-sandbox DB created in D1.

Render verified: the only diff vs main is the added erp-sandbox Application;
prod erp + all other apps render byte-identical.

No DNS/TLS change needed (Phase E): *.arcodange.lab is a wildcard in Pi-hole
(CoreDNS forwards to it) and cert-manager holds a *.arcodange.lab wildcard set
as Traefik's default TLS — so erp-sandbox.arcodange.lab resolves + gets HTTPS
automatically once the ingress is up.

Completes Phase D. D1=factory#17, D2=tools#3, D3=erp#12 (all merged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-28 18:35:48 +02:00
a38c8b39f1 Merge pull request 'feat(multi-env): Phase D1 — provision erp-sandbox Postgres DB + role' (#17) from claude/phaseD-erp-sandbox-postgres into main 2026-06-28 17:09:45 +02:00
00a838799b feat(multi-env): Phase D1 — provision erp-sandbox Postgres DB + role
Activates the sandbox environment for the ERP on the Postgres side
(ADR-0002 Phase D). `erp` gains `envs = ["prod", "sandbox"]`, so the
elision flatten now materialises a second instance `erp-sandbox`:
  - database `erp-sandbox`
  - owner role `erp_sandbox_role` (snake-case per the convention)
  - pgbouncer user_lookup function + grants for the new DB

The prod `erp` instance is unchanged (db `erp`, role `erp_role`) — the
apply is purely additive (~6 resources for erp-sandbox, 0 changed,
0 destroyed on everything else). Verified the flatten output with a
standalone tofu apply before pushing.

This is D1 of the Phase D activation. D2 (tools Vault policies),
D3 (erp iac creds + KV), D4 (ArgoCD Application) follow in order.

Refs ADR-0002 (factory#15), Phase B (factory#16).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-28 17:05:50 +02:00
235ff72ac0 Merge pull request 'feat(multi-env): Phase B — factory machinery env-capable (no activation)' (#16) from claude/multi-env-phaseb into main 2026-06-28 16:53:39 +02:00
c00c4cdd5c feat(multi-env): Phase B — make factory machinery env-capable (no activation)
ADR-0002 Phase B. Makes postgres/iac, argocd, and the conventions docs
multi-environment-capable WITHOUT activating any sandbox yet — every app
stays prod-only, so this change is behaviour-neutral:
  - postgres/iac `tofu plan` is a no-op (proven: the elision flatten keys
    are bare app names, db=<app>, role=<app>_role — identical addresses)
  - the argocd apps.yaml render is byte-identical (181→181 lines, diff
    empty) since no app declares `envs`

postgres/iac:
- variables.tf: `applications` becomes set(object({name, envs=optional(["prod"])}))
- main.tf: a `local.app_instances` flatten of applications × envs keyed by the
  elided instance id (env=prod → "<app>"); per-app resources iterate it and
  reference each.key / each.value.{database,role}. For prod-only apps every
  resource address + attribute is unchanged. (main.tf also got a full
  `tofu fmt` pass — the pgbouncer function block reindents 4→2 spaces, which
  is cosmetic; the correctness gate is the CI tofu plan, not the text diff.)
- terraform.tfvars: string entries → { name = "..." } objects.

argocd/templates/apps.yaml:
- after the prod Application, a `range $app_attr.envs` loop renders one extra
  Application per non-prod env: name/namespace `<app>-<env>`, shared repoURL,
  helm.valueFiles [values.yaml, values-<env>.yaml], per-env syncPolicy override.
  Renders nothing while no app sets `envs` → prod render unchanged.

docs:
- doc/runbooks/new-web-app/conventions.md (FR, authoritative): new section
  "Plusieurs environnements pour une même app" — elision rule, suffix rule,
  snake-case owner-role exception, erp/erp-sandbox table, ADR-0002 link.
- vibe/guidebooks/lab-ecosystem/naming-conventions.md (EN mirror): the env
  coordinate section + a "Two sandbox models" section reconciling the
  separate-cluster (ADR-0001, names repeat) vs in-cluster sibling (ADR-0002,
  <env> suffix) strategies; Last Updated bumped; ADR-0002 cross-links.

Activation (erp gets envs=["prod","sandbox"] in postgres tfvars + argocd
values + erp/iac) is Phase D, gated by its own plan review.

Refs ADR-0002 (factory#15). Phase A = tools#2 (merged). Phase C = erp#11 (merged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-28 16:28:28 +02:00
8a1a63ee10 Merge pull request 'docs(adr): ADR-0002 — per-application environments via an env coordinate' (#15) from claude/adr-multi-env into main 2026-06-28 16:17:37 +02:00
c35b510040 docs(adr): fill the ADR-0002 ↔ PR backlink (factory#15)
Replaces the placeholder References line with the PR URL so the
ADR↔PR crosslink is bidirectional per the AGENTS.md rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-25 14:56:09 +02:00
3961914613 docs(adr): ADR-0002 — per-application environments via an env coordinate
Records the decision to extend the <app> join key with a second
coordinate <env>, governed by an elision rule (env=prod elides → every
existing app's derived names are byte-identical and its tofu plan is a
no-op; non-prod envs take the <app>-<env> suffix, with the Postgres
owner role staying snake-case <app>_<env>_role).

Motivated by the ERP's incoming write-capable AI-agent skill: it needs
an in-cluster sandbox instance (erp-sandbox) with a prod-like Dolibarr
API + isolated database to rehearse writes before a human promotes them
to prod. The ADR reconciles this against ADR-0001 honestly — ADR-0001
rejected an in-cluster sandbox for INFRA-change rehearsal (shared
fleet-wide control planes); ADR-0002 operates one layer up where the
agent's only reach is the app's HTTP API against an isolated DB, so the
fleet blast radius is not in scope. The two are complementary; ADR-0002
does not supersede ADR-0001.

Also:
- vibe/ADR/README.md: index row for 0002 + Last Updated 2026-06-25
- PRD safe-prod-like-environment README: bidirectional back-link to
  ADR-0002 on the Adjacent line + Last Updated 2026-06-25

Authored via the ADR Scribe persona, validated via the Continuity Warden
checklist (no-tombstone, breadcrumb, MADR-lite sections, dead-link scan,
bidirectional links).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-25 14:55:19 +02:00
801724e1bc Merge pull request 'chore(iac): remove spent R2 import block' (#14) from arcodange/r2-import-cleanup into main 2026-06-24 13:24:09 +02:00
7727b244ad chore(iac): remove spent R2 import block
The one-time import block from the previous change reconciled
cloudflare_r2_bucket.arcodange_tf into state (run #29: "Import complete",
"Apply complete! Resources: 1 imported"). It is now a no-op, so remove it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 13:23:42 +02:00
e2a79a08a7 Merge pull request 'fix(iac): import existing EU R2 bucket into state' (#13) from arcodange/r2-state-import into main 2026-06-24 13:19:56 +02:00
a0fbe5c655 fix(iac): import existing EU R2 bucket into state
Run #28 applied cleanly except cloudflare_r2_bucket.arcodange_tf: the bucket
exists in the EU jurisdiction, but its prior state entry lacked the jurisdiction,
so cloudflare provider >=5.20 read it as not-found, removed it from state, and
then failed to recreate it ("already exists"). Add a config-driven import block
with the jurisdiction-qualified id (<account_id>/<bucket_name>/<jurisdiction>) so
the next apply adopts the real bucket. No-op once reconciled; removable after.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 13:19:32 +02:00
fc28c52b85 Merge pull request 'fix(iac): pin cloudflare provider + lockfile, trust homelab CA in gitea provider' (#12) from arcodange/iac-provider-fixes into main 2026-06-24 13:03:16 +02:00
9b545e6f8f fix(iac): pin cloudflare provider + lockfile, trust homelab CA in gitea provider
With the runner CA fix (#11) the iac workflow now runs far enough to apply,
which exposed two provider problems:

cloudflare drift — `cloudflare/cloudflare` floated on `~> 5` with no committed
lock file, so CI pulled v5.21.1 where `cloudflare_account_token.policies[].resources`
is a JSON string, not a map ("Incorrect attribute value type"). Fix:
- pin to `~> 5.21` and commit a multi-platform `.terraform.lock.hcl`
  (linux_arm64 for the runner + darwin_arm64 for local);
- `jsonencode(...)` the module's policy resources;
- bind the cloudflare_token module to `cloudflare/cloudflare` explicitly (it was
  defaulting to `hashicorp/cloudflare`, pulling a redundant provider);
- stop `.gitignore` from hiding the lock file (the old `.terraform.*` rule did).

gitea provider TLS — it runs inside the dflook/terraform-apply container, which
doesn't trust the homelab CA (only the ubuntu-latest-ca runner does), so it
failed `x509: certificate signed by unknown authority` reaching
gitea.arcodange.lab. Fix: feed it the homelab CA via the provider's `cacert_file`
(TF_VAR_gitea_cacert_file -> the homelab.pem the workflow already materializes).

Validated locally with `tofu validate` + provider-schema inspection (no prod
calls). Complements #11. Out of scope (need a live run / operator): the OVH
consumer-key scope, and the R2 bucket "not found" on refresh (a state reconcile).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 12:56:46 +02:00
17 changed files with 588 additions and 68 deletions

View File

@@ -62,6 +62,10 @@ jobs:
run: echo -n "${{ secrets.HOMELAB_CA_CERT }}" | base64 -d > $VAULT_CACERT run: echo -n "${{ secrets.HOMELAB_CA_CERT }}" | base64 -d > $VAULT_CACERT
- name: terraform apply - name: terraform apply
uses: dflook/terraform-apply@v1 uses: dflook/terraform-apply@v1
env:
# the apply runs in dflook's container, which doesn't trust the homelab CA;
# hand the gitea provider the CA cert the step above wrote to the workspace
TF_VAR_gitea_cacert_file: "${{ github.workspace }}/homelab.pem"
with: with:
path: iac path: iac
auto_approve: true auto_approve: true

6
.gitignore vendored
View File

@@ -1,5 +1,7 @@
.terraform .terraform/
.terraform.* *.tfstate
*.tfstate.*
# keep .terraform.lock.hcl tracked (it pins provider versions; the old `.terraform.*` rule hid it)
.DS_Store .DS_Store
node_modules/ node_modules/
.venv/ .venv/

View File

@@ -31,4 +31,47 @@ spec:
{{- end }} {{- end }}
syncOptions: syncOptions:
- CreateNamespace=true - CreateNamespace=true
{{- /*
Non-prod environments (ADR-0002 elision rule): one extra Application per env
under `<app_attr>.envs`. Each renders the SAME repo + chart, overlaid with
values-<env>.yaml, into the `<app>-<env>` namespace. Apps with no `envs` key
render nothing extra here, so prod-only apps are byte-identical.
*/ -}}
{{- range $env_name, $env_attr := $app_attr.envs }}
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: {{ $app_name }}-{{ $env_name }}
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
{{- with $env_attr.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
project: default
source:
repoURL: https://gitea.arcodange.lab/{{ $org }}/{{ $app_name }}
targetRevision: HEAD
path: chart
helm:
valueFiles:
- values.yaml
- values-{{ $env_name }}.yaml
destination:
server: https://kubernetes.default.svc
namespace: {{ $app_name }}-{{ $env_name }}
syncPolicy:
{{- if $env_attr.syncPolicy }}
{{- toYaml $env_attr.syncPolicy | nindent 4 }}
{{- else }}
automated:
prune: true
selfHeal: true
{{- end }}
syncOptions:
- CreateNamespace=true
{{- end }}
{{ end }} {{ end }}

View File

@@ -21,6 +21,11 @@ gitea_applications:
argocd-image-updater.argoproj.io/telegram-gateway.update-strategy: digest argocd-image-updater.argoproj.io/telegram-gateway.update-strategy: digest
erp: erp:
annotations: {} annotations: {}
# Non-prod environments (ADR-0002). Each key renders an extra Application
# "<app>-<env>" overlaid with chart/values-<env>.yaml into namespace
# "<app>-<env>". Prod erp is unaffected.
envs:
sandbox: {}
cms: cms:
annotations: annotations:
argocd-image-updater.argoproj.io/image-list: cms=gitea.arcodange.lab/arcodange-org/cms:latest argocd-image-updater.argoproj.io/image-list: cms=gitea.arcodange.lab/arcodange-org/cms:latest

View File

@@ -44,6 +44,29 @@ Les briques se « branchent » entre elles **par convention de nom**, pas par co
**Utilise un nom court, stable, kebab-case** dès le départ. **Utilise un nom court, stable, kebab-case** dès le départ.
**N'introduis pas** de variantes (`my_app` vs `my-app`, `MyApp`, pluriels) : rien ne te préviendra, l'app échouera silencieusement à se connecter ou à se déployer. **N'introduis pas** de variantes (`my_app` vs `my-app`, `MyApp`, pluriels) : rien ne te préviendra, l'app échouera silencieusement à se connecter ou à se déployer.
## Plusieurs environnements pour une même app
Une application peut être déployée plusieurs fois (prod, sandbox, …) **sans devenir une app distincte** : même dépôt, même chart, même version. On ajoute une seconde coordonnée `<env>` au nom, régie par une **règle d'élision** ([ADR-0002](../../../vibe/ADR/0002-per-application-environments.md)) :
- **`env` vaut `prod` par défaut, et `prod` s'élide.** Quand `env == prod`, **aucun suffixe** n'est ajouté : tous les noms dérivés sont identiques au cas mono-environnement décrit plus haut. Une app existante ne change donc pas (`plan` à vide).
- **Les environnements non-prod prennent le suffixe `<app>-<env>`** en kebab-case partout — base, namespace, chemins/rôles/policies Vault, Application ArgoCD, hôte DNS, sous-préfixe d'état GCS — **à une exception** : le rôle propriétaire PostgreSQL reste en snake-case `<app>_<env>_role`, pour rester cohérent avec le suffixe `_role`.
- **Un seul dépôt et un seul chart** servent tous les environnements ; les différences sont superposées via `values-<env>.yaml`. **Un seul rôle JWT de CI** (`gitea_cicd_<app>`) par dépôt couvre tous ses environnements.
Exemple — `erp` (prod, élidé) vs `erp-sandbox` :
| Système | `erp` (env = prod) | `erp-sandbox` (env = sandbox) |
|---|---|---|
| Base PostgreSQL | `erp` | `erp-sandbox` |
| Rôle propriétaire PG | `erp_role` | `erp_sandbox_role` |
| Namespace + ServiceAccount | `erp` | `erp-sandbox` |
| Creds DB dynamiques Vault | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
| Secret KV de config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
| Application ArgoCD | `erp` | `erp-sandbox` |
| Domaine interne | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
| Dépôt Gitea / chart / JWT CI | `arcodange-org/erp` · chart · `gitea_cicd_erp` | partagés (mêmes valeurs) |
Déclaration : `postgres/iac/terraform.tfvars` et la liste `applications` côté `tools` acceptent `envs = ["prod", "sandbox"]` ; l'omettre revient à `["prod"]`. L'`Application` ArgoCD non-prod se déclare via une clé `envs` sous l'app dans [argocd/values.yaml](https://gitea.arcodange.lab/arcodange-org/factory/src/branch/main/argocd/values.yaml).
## Références croisées ## Références croisées
- [01 · Dépôt Gitea](01-gitea-repo.md) — fixe `<app>` comme nom de dépôt sous `arcodange-org`. - [01 · Dépôt Gitea](01-gitea-repo.md) — fixe `<app>` comme nom de dépôt sous `arcodange-org`.

174
iac/.terraform.lock.hcl generated Normal file
View File

@@ -0,0 +1,174 @@
# This file is maintained automatically by "tofu init".
# Manual edits may be lost in future updates.
provider "registry.opentofu.org/cloudflare/cloudflare" {
version = "5.21.1"
constraints = ">= 5.20.0, ~> 5.21"
hashes = [
"h1:gNF1Sro3G9nXhtdkitXwDVKxI1jpBAf8KPv+Y4kAJwk=",
"h1:iWJb0lHfVWmCJQSyroXOT8zQlFOT8k1caHcfaooG5wk=",
"zh:049719425b8be43d9d4f0c208217aca0baa22374f061d7ff92f02563490f649c",
"zh:0a8a3c1b26680b437fe9e7910ca81e532d36f8efacfb14f45690b6a779856993",
"zh:32b61f80892243f7ab8e453fa038c1f3e2aac733ccb98307c2cfe798b2793b32",
"zh:42c27f3cd62979e70716c51f682a3d131d51ad76d86dff83d8cdbfffcebac841",
"zh:4c8cd464f9b6ecde5cd4430bbba4be3b810826105e51ef6328b6a2b69f821443",
"zh:586ea42ef74d6c5bc4c9b89da6b1f8618a19f4e80272fe8d615e7d5b11c491af",
"zh:b09b86c7cac7085e01c9b7a828f09d13c44589d3e3cd42f0b694ca3e4cd3ed0a",
"zh:eac80665e60c701b37a6318f4e405d67f1720f8da5f93135c6256049282d3367",
"zh:f809ab383cca0a5f83072981c64208cbd7fa67e986a86ee02dd2c82333221e32",
]
}
provider "registry.opentofu.org/go-gitea/gitea" {
version = "0.6.0"
constraints = "0.6.0"
hashes = [
"h1:DB9cn3EvZt6yEDAW/4s7clYOQhIwXQpSMQ+kDAK+o9Q=",
"h1:MTo8bBuGgh5t3u/UuBI6oMJ/pT7a3GwdyVS1i/aPsh8=",
"zh:16269c27d36157a9248cdb3acd4ee507950cb84bc0eaf843f74b302b3d194285",
"zh:20951c7b571853def841942499a161bc806325b5c7d17de3cb49516bfcab3863",
"zh:3d9a69119d4a76de25a4562d9ed87ea72773733b97bc98084a9ba7572c5124c4",
"zh:49a8fb4735c12169cb0f66e1dd286a3cc008ebc212e486a82758fe3c50456e52",
"zh:4daa6ce8136204aa60f47b519c2da0a551e9ed45fdc684cafb8c3170c106f5d3",
"zh:88df966ec884351492f1284fbc55a5c35d3723a863b58f1d9d98039e3b7bc7c6",
"zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425",
"zh:a107ceacfca8341c4141574daacbbc6f91fb6414e0c541b27ac79948d4456ba9",
"zh:a3231c31c194f0606f01e06f44f74d37414f7a74e52452d6e93366ab2bcbcd4e",
"zh:a39af6f3dcb1d4fe6c19a8d64755a13fffb2febbec0be428291aa62c37295d17",
"zh:ac0c894d15a9c57e51ea67667fe9fdea0cf40dc7b97398e7e69fec03568801f7",
"zh:c3b70df30b8882b5d38e75b1709c292522a6f8a9bb226bc0a3258c4942c17c9d",
"zh:edd887c4eb5f721dcf6e288b0e1e599c2a7742135c1b3ab5493957f9a8f9dbe1",
"zh:f60b1f57123d11109caeff030c1c4456eed659bb88ec60b8d01b09c6a6954f00",
"zh:f7086eec6d90c2c0bd6385b7cccc80fffb5ce4300736f4bb982d70bf6eeecf48",
]
}
provider "registry.opentofu.org/hashicorp/google" {
version = "7.0.1"
constraints = "7.0.1"
hashes = [
"h1:kYx0VRlMuHcgOxEfbvORwTVGH+3WQUJJJJf1+PNh3k8=",
"h1:n9AyrMUKkTDkmfy1UBwaOh2ANepQ8i3Oa8ILLS6oaMI=",
"zh:0c1f204c23de0d63a5e3bf993a7f12d0b594f6a8020ef6dbac4ab711b2fc22d3",
"zh:2578d65af13c8b1971e6fb7c4725bbf93284c1e46a39d6528ec2323c17c84fb8",
"zh:3555b358d6c029929109fe629192ae19599d4efe1fee86d497d58b692a9313dd",
"zh:4475bc4fd37a51c962e5268a4ad65e059bcb074e5e0a9bac0d092bc23fda0927",
"zh:49af845bb5e1117bcb8885b9ecd4cce37dee00b43a1a08617392239c74398e8d",
"zh:ad5128adc7f3f1cb8ffbfdf98c1295c54e65da6d1e59849671081aac5caca01f",
"zh:c30baca3b476ea7ae9ad11f81ea85e8113b7f51ec21b4d6239142556131ada68",
"zh:c6de66d3674adc23abc65a3eea09829e9afbb1864aa563b140e1e5207671279e",
"zh:e9dda7a294a0c8f972c7ba20861be2c6fa7ee4c3c86550952e3a9199efd95d0e",
"zh:efdc977432a7bfe77a50dccdf1e890a7d0d9a8fb75dcd3a963cd0416ce175e8d",
]
}
provider "registry.opentofu.org/hashicorp/null" {
version = "3.3.0"
hashes = [
"h1:EvvCOc4FJY3NitSm6BpzCcUPU53LayVCB/tPOxYmy7U=",
"h1:mdu+qpyVmjDDLMrcL1JFy+cSyF58I3TFJwB5NssCZ58=",
"zh:083dcc0bec53f8abfa3f2aa2ce9d732a9675338fd60ae7d61162e25db7cb08bf",
"zh:19f7456b5a2ad16595860974714bfdb25b87bc16356ea9d5c7453892aaa27864",
"zh:222c0ed1fed4e4c677ebe626104dbfdba66763e264de0d9c27c58ce60104ee69",
"zh:271711d6caa7dd5a4e9b79fe8c679fab61a840bcf80040a0f5ebb425d1b27d97",
"zh:5adcf35f30baaea13f80c2a2c774deb9369892719493049687e23476c9dff40f",
"zh:5bcfd19df16e73d7f0ad75bd09e2b3b86cf6700d09822d585d68304b71de1d97",
"zh:604edecf263e38674decb35bb4e0e048fdc951f26fa103c33065ff9728f0313b",
"zh:782acbfb4fa4807e273e588fe45b4aaea9dd0fd1136f76ec3200f6f4db3af8d6",
"zh:84411a596d528fe67294e5c1cfd0c2036b08802497bcc4215ce518924f3c9a4a",
"zh:85e79eecf3f5348975cffec3016b0eba3baf605646102d4348796ccd2df2e5f6",
"zh:95669535ca17aeefef307ebfd59ce6930953173baae5637e8cbbf0297ec7ad58",
"zh:d04d9b177747bfd66b4a45b5d911a2a7822aa8451f5e35621971fb7a4206b530",
"zh:e6d9c924475283e90833450a14a732f4deb6d9bb131db8f86ab856e894270836",
"zh:ebcab0c8a1334c86ed7cfa53f571a17ad6d27e9901f27a8854ea622a74b54bb6",
"zh:ef9c757bb2c83d2103811a3d86b6ec5be06b0ffc337b84db1582d023bce7cdcd",
]
}
provider "registry.opentofu.org/hashicorp/random" {
version = "3.9.0"
hashes = [
"h1:U8KXqGCoNI9/guYbTvzgdtVk3fRthoG0UXwm1JoEpIs=",
"h1:gGDdPPibmw2EWROx+sh1RGLjR5+nPwZyrf6/N9jXfeM=",
"zh:03f1114cc20b8913523735ab76e0f0a2b16ce13c92923a53304bf85f07fc0dbc",
"zh:105b678ee72322a3067f105d7e05e940f6143238f377f6e87ff4ec909246ac2a",
"zh:55f3bbf13ea18cbace61a706566a80f25f33fe2b1780b6f3d7b582af2a05b6d2",
"zh:63adf996db48f082f7a6351eb485e219cd88795fc71e6ec60a837263ab0d2cb1",
"zh:7e99550738a4e3cc68b8a467714b0d69371025fe95e3326d5323d026d55653e9",
"zh:8342b54af3a18a37e075eeae61be57f4de2ba71b35d95c5075d402dd2c1f289d",
"zh:83ee18e32ac9dd5fc91298554b7c4cfa4c3a1db50f4c797945637cc93c0844ae",
"zh:993ecc0adbf6bd535a59fbc9b735d8c33950e6f6eb5e621d750da9b71d65d80a",
"zh:ad722bc59d4edbf1415e827fc007c0efe6e0e9462d5568bae20b34be1058a261",
"zh:ae9448e1f87b2f9a6c5197a0e9862162ec6b137cb3a3835e11522995d8939e7c",
"zh:bc9cdd3aac784f759125c6627f6f6416e8726a1c184eb9cf3e55b9edbc94c627",
"zh:c8e35b89572ba1c40a9b20022e033a3395fb8d42e7604d50c900f193ba10382e",
"zh:e2deaa8a9975ef81d9f62baed12c41286918b0a10908e0e031f13f69a3b730a1",
"zh:ee39707557210a0ab1098aa357d2cdfe502e5a312d0dbdffb09d08facc4d3fc5",
"zh:f81afe4eb63e8aa9e0ea71be6c990f0dc69cb360e7191c0742a991f4a5081b64",
]
}
provider "registry.opentofu.org/hashicorp/tls" {
version = "4.3.0"
hashes = [
"h1:GizReb5vbh71HnhHlGphHhVFj3ghwAaC2MKqb2d8Ye8=",
"h1:ZxKvDInYHzss9rv75M778pInFm08ME6hY31XMyFP4IA=",
"zh:07bb8c6e64124dada7dff57a38a46f2f323b3fd77920404c0c550293d1cf6188",
"zh:0b3bfda2df39c52f1c5452d05cf3107bedd5d20ab6977c90ede540c695fb6c3e",
"zh:110a055289f0400a63ac172bedb0e671d059b7a5ba22d4a3f5f246ccac0ad676",
"zh:15e532d8c711377499dece832e60170a8bef39830125b8154f4bda81d9721d29",
"zh:22ca65d96e9fc1be5605372d855c9e1eba2d86d510f7ac8593968f5649435e47",
"zh:36df38dfd03e8c1298c5704fd85e28b69a3927ed0b339f9628d0b56dac99c6b5",
"zh:429e2bfcb81656e1fe90b7b284767d1453c1a4100b16d27e4b29c34aa12f0ce1",
"zh:5b6679953065f0279bf018426c6fb06dd93a851a7a9369f2e3a1fec5bc417e83",
"zh:6a72c88d5aa945ddb32041350755377c96681563136decfe7e05c7cdea7988f1",
"zh:6f05757c50da9f8354a735b5756bd63a71126fcd142129525b90c56bfd081d61",
"zh:751703b7a4d40c3a111c4ed0d5da3ec91c14f880faf6f010a5000a2eb5366011",
"zh:87a5279e61b8198798a2fe86cfe3b74e5340bb486f4e148bb5b4d46f860cf1db",
"zh:942af95e9fd73327a7e9ab0803c4d701b782ddacd78c9b7ce9c91e38b3051522",
"zh:a457d0efea3c404178a182d240ba21cdeb0c620ffabeeb9a8977b024a85e1360",
"zh:d5eac8f4f0ae1ff41cbcc1008e6a74a8491dc27f4c6e5a0c32c5c4b6ef2e4087",
]
}
provider "registry.opentofu.org/hashicorp/vault" {
version = "4.4.0"
constraints = "4.4.0"
hashes = [
"h1:IhKDv0pTgpy89K3QYmDX872H75Wl7kZKR2scUQynuiA=",
"h1:t74F5RJkOMm0N/PbcvxPGyi0V1hwHjuOv0lFZ7lII6c=",
"zh:0309ea8f81386e17ab13c06c5991ca959708c55c815b0cfba2bbcd865e0d606e",
"zh:40e56199ccd266bffa216e8ebbcdc2e29b6ef5145b39377be766e763cac759c8",
"zh:6fad1f073bd2e53e34736e000f98db581137e153ac80bbb5c4f1a1e38b46a1d2",
"zh:74564fd4759decccf7f3c952aa2feba1012f103a66ec354aa3b3292a2f1b2412",
"zh:7aae012c1a43e6e5dae6f608ec0f08cdb3f95fa121a32e413fe7ee37cb99947f",
"zh:7c83f508e164844b1dd9bafe9de0fe60c7be7b55a02e704a6e2f50cff38b7d96",
"zh:873a42322b68d9fba4a38217b97ee04a1eb617e811d7f9954016f5c3eb6cb0bc",
"zh:9db2b13472cf91a5f18f0a7c6ae532277c05b0980d87f492341426b981679f7b",
"zh:ac1cbd2926265db80efe3f1814bed82901f7d8a7d4e5b1e22592e1eef234b1c7",
"zh:f465a955cc96f640e7426a648ba672c169a4a2959bad6146fe61583d67642561",
]
}
provider "registry.opentofu.org/ovh/ovh" {
version = "2.8.0"
constraints = "2.8.0"
hashes = [
"h1:wfhxUnZfCPsc6veiUOkEBXwyvF9ZGi2SwR85cp2CUws=",
"h1:zbnPL6Y4k/dY1X2u2JVyTid5hXwcCIfz65VC9UbkDrE=",
"zh:026d6590900388d8845af9d99a438e3cd90fcf50ef5f95a24b9dc646f391aa5c",
"zh:1375f3947bbdfe19c05abf0dbc0cb6f319d79976909282a269f4eb934a67fb18",
"zh:13cc7536d366935cb31b89f2b714c5ac8eac7e825e6897477fe56caebb04992e",
"zh:388696109f5f03c95775407df10dca822d0651237872a579fe7e953312a75ff6",
"zh:3ca9fd5e6756fe9f448066f74e7d6d7de5e7c0f34f923032d3a976ab6772a86a",
"zh:43ab0d8e362e2b22cac53747f609798de9e267a3eceaa66146b36e8ed6b16a98",
"zh:456d80cf53e21258d4df1a239ba3f7b1482631e558497cd797fafd25f8eea3ca",
"zh:54d46a83305120a9331b1dc12e6039b895b5285434bb96904d30f1fe277bbde7",
"zh:5b6b2628ef1a00579e769d7f67482fb8b59534f8761b399e7baf683e716e5d88",
"zh:68e6df5c16b92601d4545739855ec309b1ce7fce6597d8d6e4776357a5da7a7c",
"zh:80745afe134180fc441cc1c34c3a9ea20756f01ae793ba625255ce92817f5f5d",
"zh:a81a6896e60526588f8d16168d06018842c083ff5a1d73193cf7e9b26c3a4076",
"zh:ce68d4e6ca846f5e97de06fce5a4d6aca16154ddd8cf43580fd89b581e1ee471",
"zh:e498f560263abebf96a2cc698492b603c5a78851f77235d141c1ee7336ab866c",
]
}

View File

@@ -41,13 +41,13 @@ locals {
length(local.selected_account_permissions) > 0 ? { length(local.selected_account_permissions) > 0 ? {
effect = "allow" effect = "allow"
permission_groups = [for id in local.selected_account_permissions : { id = id }] permission_groups = [for id in local.selected_account_permissions : { id = id }]
resources = local.account_resource resources = jsonencode(local.account_resource) # cloudflare provider >=5.20 types policies[].resources as a JSON string
} : null, } : null,
length(local.selected_bucket_permissions) > 0 ? { length(local.selected_bucket_permissions) > 0 ? {
effect = "allow" effect = "allow"
permission_groups = [for id in local.selected_bucket_permissions : { id = id }] permission_groups = [for id in local.selected_bucket_permissions : { id = id }]
resources = local.bucket_resource resources = jsonencode(local.bucket_resource) # cloudflare provider >=5.20 types policies[].resources as a JSON string
} : null } : null
] : policy if policy != null] ] : policy if policy != null]

View File

@@ -0,0 +1,12 @@
# Bind the module's cloudflare_* resources to the cloudflare/cloudflare provider explicitly.
# Without this, OpenTofu defaults the module's provider source to hashicorp/cloudflare, pulling a
# second (redundant) provider into the lock file and relying on a registry redirect.
# >= 5.20 because policies[].resources is now a JSON string (set via jsonencode in main.tf).
terraform {
required_providers {
cloudflare = {
source = "cloudflare/cloudflare"
version = ">= 5.20"
}
}
}

View File

@@ -14,7 +14,7 @@ terraform {
} }
cloudflare = { cloudflare = {
source = "cloudflare/cloudflare" source = "cloudflare/cloudflare"
version = "~> 5" version = "~> 5.21" # pinned + .terraform.lock.hcl committed to avoid silent v5.x drift
} }
ovh = { ovh = {
source = "ovh/ovh" source = "ovh/ovh"
@@ -23,8 +23,18 @@ terraform {
} }
} }
variable "gitea_cacert_file" {
# The gitea provider runs inside the dflook/terraform-apply container, which does NOT trust the
# homelab CA (unlike the ubuntu-latest-ca runner). Point it at the CA the workflow already writes
# so it can verify https://gitea.arcodange.lab. Set via TF_VAR_gitea_cacert_file in CI; null locally.
description = "Path to the homelab CA cert for the Gitea provider (set in CI). Null = use system trust."
type = string
default = null
}
provider "gitea" { # https://registry.terraform.io/providers/go-gitea/gitea/latest/docs provider "gitea" { # https://registry.terraform.io/providers/go-gitea/gitea/latest/docs
base_url = "https://gitea.arcodange.lab" base_url = "https://gitea.arcodange.lab"
cacert_file = var.gitea_cacert_file
# use GITEA_TOKEN env var # use GITEA_TOKEN env var
} }

View File

@@ -1,3 +1,21 @@
locals {
# Flatten applications × envs into per-instance objects, keyed by the elided
# instance id (ADR-0002 elision rule): env=prod → "<app>", else "<app>-<env>".
# The Postgres owner role stays snake-case: "<app>_role" (prod) / "<app>_<env>_role".
# For a prod-only app the key equals "<app>", database equals "<app>", and role
# equals "<app>_role" — identical to the previous set(string) for_each, so every
# resource address and attribute is unchanged (a no-op plan).
app_instances = merge([
for app in var.applications : {
for env in app.envs :
(env == "prod" ? app.name : "${app.name}-${env}") => {
database = env == "prod" ? app.name : "${app.name}-${env}"
role = env == "prod" ? "${app.name}_role" : "${app.name}_${env}_role"
}
}
]...)
}
resource "random_password" "credentials_editor" { resource "random_password" "credentials_editor" {
length = 24 length = 24
override_special = "-:!+<>" override_special = "-:!+<>"
@@ -24,27 +42,27 @@ resource "vault_kv_secret" "postgres_admin_credentials" {
} }
resource "postgresql_role" "app_role" { resource "postgresql_role" "app_role" {
for_each = var.applications for_each = local.app_instances
name = "${each.value}_role" name = each.value.role
login = false login = false
} }
resource "postgresql_grant_role" "credentials_editor_app_role" { resource "postgresql_grant_role" "credentials_editor_app_role" {
for_each = var.applications for_each = local.app_instances
role = postgresql_role.credentials_editor.name role = postgresql_role.credentials_editor.name
grant_role = postgresql_role.app_role[each.value].name grant_role = postgresql_role.app_role[each.key].name
with_admin_option = true with_admin_option = true
} }
resource "postgresql_database" "app_db" { resource "postgresql_database" "app_db" {
for_each = var.applications for_each = local.app_instances
name = each.value name = each.value.database
owner = postgresql_role.app_role[each.value].name owner = postgresql_role.app_role[each.key].name
template = "template0" template = "template0"
alter_object_ownership = true alter_object_ownership = true
} }
resource "postgresql_function" "pgbouncer_user_lookup" { resource "postgresql_function" "pgbouncer_user_lookup" {
for_each = var.applications for_each = local.app_instances
name = "user_lookup" name = "user_lookup"
database = postgresql_database.app_db[each.value].name database = postgresql_database.app_db[each.key].name
arg { arg {
mode = "IN" mode = "IN"
name = "i_username" name = "i_username"
@@ -73,25 +91,25 @@ resource "postgresql_function" "pgbouncer_user_lookup" {
security_definer = true security_definer = true
} }
resource "postgresql_grant" "pgbouncer_user_lookup_public_revoke" { resource "postgresql_grant" "pgbouncer_user_lookup_public_revoke" {
for_each = var.applications for_each = local.app_instances
database = postgresql_function.pgbouncer_user_lookup[each.value].database database = postgresql_function.pgbouncer_user_lookup[each.key].database
role = "public" role = "public"
schema = "public" schema = "public"
object_type = "function" object_type = "function"
objects = [ objects = [
postgresql_function.pgbouncer_user_lookup[each.value].name, postgresql_function.pgbouncer_user_lookup[each.key].name,
] ]
privileges = [] privileges = []
} }
resource "postgresql_grant" "pgbouncer_user_lookup" { resource "postgresql_grant" "pgbouncer_user_lookup" {
depends_on = [ postgresql_grant.pgbouncer_user_lookup_public_revoke ] # can't do both in parallel depends_on = [postgresql_grant.pgbouncer_user_lookup_public_revoke] # can't do both in parallel
for_each = var.applications for_each = local.app_instances
database = postgresql_function.pgbouncer_user_lookup[each.value].database database = postgresql_function.pgbouncer_user_lookup[each.key].database
role = "pgbouncer_auth" role = "pgbouncer_auth"
schema = "public" schema = "public"
object_type = "function" object_type = "function"
objects = [ objects = [
postgresql_function.pgbouncer_user_lookup[each.value].name, postgresql_function.pgbouncer_user_lookup[each.key].name,
] ]
privileges = ["EXECUTE"] privileges = ["EXECUTE"]
} }

View File

@@ -1,7 +1,7 @@
applications = [ applications = [
"webapp", { name = "webapp" },
"erp", { name = "erp", envs = ["prod", "sandbox"] },
"crowdsec", { name = "crowdsec" },
"plausible", { name = "plausible" },
"dance-lessons-coach", { name = "dance-lessons-coach" },
] ]

View File

@@ -1,3 +1,11 @@
variable "applications" { variable "applications" {
type = set(string) # Multi-env (ADR-0002): each application declares the environments it deploys to.
# `envs` defaults to ["prod"] so every existing entry is unchanged in behaviour —
# by the elision rule the prod instance keeps the bare `<app>` identifiers, so its
# database, owner role, and all derived resources keep their exact current names
# and Terraform addresses (a no-op plan).
type = set(object({
name = string
envs = optional(list(string), ["prod"])
}))
} }

View File

@@ -0,0 +1,97 @@
[vibe](../README.md) > [ADR](README.md) > **0002 · Per-application environments**
# ADR-0002: Per-application environments via an env coordinate
> **Status**: Accepted
> **Date**: 2026-06-25
> **Deciders**: @arcodange
## Context
The [`<app>` join key](../../doc/runbooks/new-web-app/conventions.md) threads one kebab-case identifier identically through every system that makes up an application: the Gitea repo, the Postgres database + `<app>_role`, Vault (`postgres/creds/<app>`, the k8s auth role `<app>`, the policies `<app>` / `<app>-ops`, the CI JWT role `gitea_cicd_<app>`), the k8s namespace + ServiceAccount, the ArgoCD Application, the GCS state prefix `<app>/main`, and DNS (`<app>.arcodange.lab`). Bricks wire together by name convention, not explicit config.
That convention conflates two ideas it never separated: an **application** and a **deployed instance** of it. There is exactly one of everything per app — one namespace, one database, one Vault creds path, one DNS host. The model cannot express "the same app, a second time, somewhere else."
The motivating need makes the gap concrete. The Arcodange Dolibarr ERP is growing a write-capable AI-agent skill — auto-creating supplier invoices from ingested emails, fixing thirdparty data, and similar mutations. Before such writes touch the production accounting database, the operator needs a place where the agent can run write operations autonomously, a human reviews the result, and only then the same operation is promoted to prod. That requires a **second deployed instance of the same application**: the same Dolibarr chart, the same version, the same conventions — differing only in *where* it runs and *which data* it touches.
| Force | Pressure it creates |
| --- | --- |
| One identifier per app, no env coordinate | "Same app, different environment" is inexpressible without inventing a whole second app. |
| Write-capable AI agent landing on the prod ERP | A wrong autonomous write corrupts live accounting data with no rehearsal surface. |
| Fidelity requirement for the rehearsal surface | The sandbox must run the *real* Dolibarr API against *prod-like* data, or the rehearsal predicts nothing. |
| [ADR-0001](0001-safe-prod-like-environment.md) rejected an in-cluster sandbox | Its Alternative 3 ("sandbox namespace on the real cluster") was rejected for shared blast radius — so any in-cluster sibling instance must be reconciled against that, not pretended away. |
Treating the sandbox as a wholly separate app would fork the chart, the repo, the runbook chain, and the Vault wiring — four things that then drift apart over time, defeating the "same app, same version" fidelity the rehearsal depends on.
## Decision
We will extend the `<app>` convention with a second coordinate, `<env>`, governed by an **elision rule** so that adding the coordinate changes nothing for any existing app.
- **`env` defaults to `prod`, and `prod` elides.** When `env == prod`, no suffix is added: every derived name is character-for-character identical to today's single-env output. The instance name equals the app name (`local.instance == local.name`), so every existing app's `tofu plan` is a no-op.
- **Non-prod envs take the `<app>-<env>` suffix** in kebab-case everywhere — namespace, Vault paths / roles / policies, ArgoCD Application, DNS host, GCS-state sub-prefix — with one exception: the Postgres owner role stays snake-case as `<app>_<env>_role`, matching the existing `_role` suffix convention.
- **One repo and one chart serve every env of an app.** Per-env differences are overlaid via `values-<env>.yaml`; the chart's instance-specific values are `.Values`-driven, not hardcoded literals, so the same chart renders any instance.
- **One CI JWT role (`gitea_cicd_<app>`) per repo covers all its envs.** Its ops policy is widened to the `<app>-*` path family. Each running instance keeps its own runtime Vault policy.
### Worked example: `erp` and `erp-sandbox`
| Coordinate | `erp` (env = prod, elided) | `erp-sandbox` (env = sandbox) |
| --- | --- | --- |
| Postgres database | `erp` | `erp-sandbox` |
| Postgres owner role | `erp_role` | `erp_sandbox_role` |
| k8s namespace + ServiceAccount | `erp` | `erp-sandbox` |
| Vault dynamic DB creds | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
| Vault KV config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
| ArgoCD Application | `erp` | `erp-sandbox` |
| Internal DNS | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
| Gitea repo | `arcodange-org/erp` | `arcodange-org/erp` (shared) |
| Helm chart | one chart | one chart (shared) |
| CI JWT role | `gitea_cicd_erp` | `gitea_cicd_erp` (shared) |
### Why this is not what ADR-0001 rejected
[ADR-0001](0001-safe-prod-like-environment.md) chose a **local-only** safe environment (k3d / arm64 VMs) and rejected its Alternative 3, an in-cluster "sandbox namespace on the real cluster," for shared blast radius. ADR-0002 introduces an in-cluster sibling instance (`erp-sandbox`), which looks like the very thing that was rejected. The two stand together because they operate at **different layers**.
ADR-0001's rejection is scoped to rehearsing **infrastructure / platform** change-classes — Ansible playbooks, Vault policy / auth / mount changes, Postgres superuser migrations, ArgoCD prune / selfHeal, Longhorn ops, DNS / email. Those couplings share fleet-wide control planes, so an in-cluster sandbox cannot isolate them; only a separate cluster + Vault + state + DNS zone can. That is exactly why ADR-0001 is local-only.
ADR-0002 operates one layer up. The AI agent's only reach is the **Dolibarr HTTP API**, holding a write-scoped, app-specific API key against an isolated database — `erp-sandbox` on its own `erp_sandbox_role`, its own namespace, its own Vault creds path. The agent never touches kubectl, the Vault root, the Postgres superuser, ArgoCD, Longhorn, or DNS. The fleet-level blast radius that doomed Alternative 3 for infra rehearsal is simply **not in the agent's reach**; the blast radius of a wrong AI write is bounded to the sandbox app's own data.
The two ADRs are therefore complementary, not contradictory, and ADR-0002 does not supersede ADR-0001. ADR-0001 isolates the *operator* from breaking the *fleet*. ADR-0002 isolates the *AI agent* from corrupting *one app's production data*, while preserving the prod-like API surface and real-data fidelity that the local k3d sandbox — which carries no prod data — cannot offer.
## Consequences
- **+** Every existing app (webapp, erp, crowdsec, plausible, dance-lessons-coach, cms) is unaffected: the elision rule makes the prod instance's derived names byte-identical, so adoption ships with zero migration and a no-op plan.
- **+** A second instance of an app is now a `values-<env>.yaml` overlay plus an `envs` entry — not a forked repo, chart, and runbook chain — so prod and sandbox share one source of truth and stay on the same version by construction.
- **+** The AI-agent write skill gets a prod-like rehearsal surface with real-shaped data: the *same* Dolibarr API and chart, an *isolated* database, a bounded blast radius.
- **+** The convention chain (db + role → Vault creds + policy → namespace + SA → ArgoCD → DNS) is reused verbatim for the `-sandbox` instance, so runbooks read identically for any env.
- **** Names are no longer a flat app list: every consumer must reason about the `instance == app` (prod) versus `app-env` (non-prod) distinction, and the snake-case owner-role exception (`<app>_<env>_role`) is a special case that must be carried in the modules.
- **** A single shared Vault CI policy widened to `<app>-*` means the CI role for a repo can write the ops paths of *all* that repo's envs — a deliberately looser ops scope than one-policy-per-instance.
- **** A single shared OpenTofu state per repo holds every env's resources together, so the envs of one app share a blast radius at the state layer (mitigated by `for_each`, accepted at current scale — see Alternatives).
- **→** The AI-agent promotion workflow this unlocks: the agent runs writes against `erp-sandbox` autonomously, emits a structured changeset, a human reviews it, and the **same** operation is re-applied to prod only with explicit confirmation — never auto-applied by the agent. The read/write skills resolve their target by an env switch (e.g. `DOLIBARR_TARGET=prod|sandbox`, defaulting to `prod`).
- **→** Rollout is additive and phased, each phase gated by a no-op `tofu plan` against existing apps: **(A)** the `tools` repo adds an optional `env` / `envs` parameter to the shared `app_roles` and `app_policy` Vault modules; **(B)** the `factory` repo gains the `envs` schema in `postgres/iac` tfvars, renders one ArgoCD Application per env, and documents the elision rule in `conventions.md`; **(C)** the `erp` chart literals are templated to `.Values`; **(D)** `erp` + `factory` activate `erp-sandbox`; **(E)** DNS + ArgoCD registration.
- **→** Per-env state separation (`<app>/<env>` prefixes) is a door left open: if env-to-env blast-radius isolation at the state layer becomes warranted, the prefix scheme can be revisited without changing the naming model.
## Alternatives considered
| Option | Why not |
| --- | --- |
| Treat `erp-sandbox` as a wholly separate `<app>` (own repo, own chart copy) | Forks the chart, the repo, and the runbook chain; the two copies drift over time; defeats the "same app, same version" fidelity the rehearsal depends on. |
| Use the [ADR-0001](0001-safe-prod-like-environment.md) local-only sandbox (k3d / VMs) for the AI-agent writes | That environment carries **no production data** — the write-rehearsal needs prod-like data and the real Dolibarr API surface to be meaningful. Complementary to ADR-0001, not a substitute for it. |
| Per-env OpenTofu state (`<app>/<env>` prefixes) instead of one shared state per repo | Buys more env-to-env blast-radius isolation, but at the cost of more CI plumbing and cross-env output wiring than current scale warrants; one shared state with `for_each` keeps runbooks simple. A real decision point — the chosen path is single shared state per repo, with the prefix scheme left as a future door. |
| No elision — always suffix, even prod (`<app>-prod`) | Breaks every existing derived name, forcing a fleet-wide rename plus `tofu` resource moves; rejected in favour of the elision rule's zero-migration property. |
## QA & validation
- **Backwards-compat no-op gate** — after the module change, `tofu plan` against every existing app (webapp, erp, crowdsec, plausible, dance-lessons-coach, cms) reports zero changes. The elision rule guarantees `local.instance == local.name` for `env == prod`, so no prod resource moves.
- **Byte-identical chart render** — `helm template erp chart/` before versus after the literal-templating refactor diffs to nothing (verified: 10857 bytes on both sides, `diff` exit 0).
- **`tofu fmt -check` + `tofu validate`** are clean on the module changes.
- **Sandbox activation gate** — when `erp-sandbox` is stood up, the [new-web-app convention chain](../../doc/runbooks/new-web-app/conventions.md) must resolve end to end for the `-sandbox` instance (db + role → Vault creds + policy → namespace + SA → ArgoCD Healthy/Synced → VSO injects → pod Running), exactly as the prod instance does.
- **Promotion gate** — no AI-authored write reaches the prod ERP until it has been applied to `erp-sandbox`, produced a reviewed changeset, and been explicitly re-applied with human confirmation.
## References
- [ADR-0001 · Safe, production-like environment](0001-safe-prod-like-environment.md) — the local-only safe environment for infra rehearsal that this ADR complements (it stands; this does not supersede it).
- [PRD · Safe, production-like environment](../PRD/safe-prod-like-environment/README.md) — the product view this work relates to, and its [isolation-boundary leaf](../PRD/safe-prod-like-environment/isolation-boundary.md) detailing the cluster/Vault/state/DNS boundary.
- [new-web-app conventions](../../doc/runbooks/new-web-app/conventions.md) — the single-env `<app>` convention this ADR extends with the env coordinate.
- [Phase A — `tools` Vault module env parameter](https://gitea.arcodange.lab/arcodange-org/tools/pulls/2) — adds the optional `env` / `envs` parameter to the shared `app_roles` and `app_policy` modules.
- [Phase C — `erp` chart literal templating](https://gitea.arcodange.lab/arcodange-org/erp/pulls/11) — templates the chart's single-env literals to `.Values` so one chart renders any instance.
- [PR factory#15 — this ADR](https://gitea.arcodange.lab/arcodange-org/factory/pulls/15) — the change that introduces ADR-0002 (links back to this file).

View File

@@ -0,0 +1,95 @@
[vibe](../README.md) > [ADR](README.md) > **0003 · Sandbox state lifecycle**
# ADR-0003: Sandbox state lifecycle — iso-prod seed, reset & prod-write isolation
> **Status**: Accepted
> **Date**: 2026-06-28
> **Deciders**: @arcodange
## Context
[ADR-0002](0002-per-application-environments.md) introduced the `<env>` coordinate and stood up `erp-sandbox` in-cluster: its own Postgres database `erp-sandbox` owned by `erp_sandbox_role`, its own Vault auth role with dynamic credentials at `postgres/creds/erp-sandbox` and KV config at `kvv2/erp-sandbox/config`, its own ArgoCD Application, reachable at `https://erp-sandbox.arcodange.lab`. That ADR created the *place*. It deliberately left open *how that place's data is filled, refreshed, and kept incapable of harming prod* — the lifecycle of the sandbox's state.
The motivating workload is the write-capable AI-agent skill foreshadowed by ADR-0002 (the future "V9" Dolibarr write skill): auto-creating supplier invoices, fixing thirdparty records, and similar mutations. For that rehearsal to predict anything, three forces must be satisfied at once:
| Force | Pressure it creates |
| --- | --- |
| Rehearsal must run against prod-shaped data | A sandbox seeded with synthetic data predicts nothing about how a write behaves on the real accounting set. |
| Rehearsal must be repeatable and disposable | An agent (and BDD suite) must run writes, observe, and roll back to a known-good state many times without manual cleanup. |
| The rehearsal path must be structurally unable to write prod | "Same app, different env" puts a sibling instance one API call away from the production database; intent alone is not a fence. |
The reach matters. The agent's only surface is the **Dolibarr REST API** against `erp-sandbox.arcodange.lab` — it never touches kubectl, the Vault root, the Postgres superuser, ArgoCD, Longhorn, or DNS. That is precisely the boundary ADR-0002 established, and it is what makes an application-data rehearsal safe to operate in-cluster.
### Why this is not what ADR-0001 rejected
[ADR-0001](0001-safe-prod-like-environment.md) rejected its Alternative 3 — a "sandbox namespace on the real cluster" — for shared blast radius, and chose a **local-only** safe environment (k3d / arm64 VMs) instead. That rejection is scoped to rehearsing **infrastructure / platform** change-classes: Ansible playbooks, Vault policy / auth / mount changes, Postgres superuser migrations, ArgoCD prune / selfHeal, Longhorn ops, DNS / email. Those couplings share fleet-wide control planes, so an in-cluster sandbox cannot isolate them, and a sandbox that *looks* like prod at the infra layer gives false confidence — it cannot faithfully mirror a three-node fleet, its Longhorn, or its single Vault.
ADR-0003 operates one layer up, at the **application-data layer**. The question here is not "is this Terraform/Ansible change safe to apply to the fleet" but "is this Dolibarr write safe to apply to the accounting data." At that layer the agent's reach is **API-only**, the state is a single Postgres database plus an uploads PVC, and a wrong write's blast radius is bounded to one app's data — all of which a sibling environment *can* faithfully carry, because it runs the real Dolibarr API against a real copy of prod's rows. This ADR does not reverse ADR-0001; it addresses a different problem at a different altitude. ADR-0001 isolates the *operator* from breaking the *fleet*; ADR-0003 defines how the *AI agent* rehearses *one app's data* without a structural path to prod.
## Decision
We will define the `erp-sandbox` state lifecycle around three mechanisms — an iso-prod seed, an object-level reset, and structural prod-write isolation — plus a human-gated promote step that carries a reviewed change from sandbox to prod.
### 1 · Iso-prod seed (the golden checkpoint)
We will produce a "golden" copy of production data with a **read-only** `pg_dump` of the prod `erp` database and store it as a reusable artifact. Seeding or refreshing the sandbox loads that golden into `erp-sandbox`. The dump is the source of business fidelity; Dolibarr's uploaded `documents/` PVC may *optionally* be rsync'd alongside it for file-level fidelity, but the database carries the data the rehearsal asserts against. `pg_dump` reads — it never writes prod — so producing the golden is itself a safe operation.
### 2 · Reset via object-level wipe-and-reload — not `DROP/CREATE DATABASE`
We will reset the sandbox by restoring an **app-scoped** golden dump **into the existing `erp-sandbox` database**, not by dropping and recreating the database. Concretely: the golden is a `pg_dump` scoped to the application's own objects (Dolibarr prefixes every table `llx_*`), and reset is **`DROP OWNED BY erp_sandbox_role CASCADE`** — which removes every object owned by the app role, i.e. the app tables *and* any drift a rehearsal created, regardless of name — followed by `pg_restore --no-owner --role=erp_sandbox_role`. It runs with the sandbox's **own dynamic credentials** — a short-lived login role that is a member of `erp_sandbox_role`, which owns the objects — so it needs **no `CREATEDB`, no superuser**, and is structurally confined to objects the app role owns.
Infrastructure objects that share the `public` schema but are owned by the *provisioner* rather than the app role — notably the pgbouncer `user_lookup` function created per-database by `postgres/iac` — are deliberately left untouched: they are identical across environments, are not part of the app's data, and the app credential cannot (and must not) drop or recreate them. This is why the golden is scoped to `llx_*` and the wipe is `DROP OWNED BY <app role>` rather than a blanket `pg_restore --clean` (which would try to recreate the provisioner-owned function and fail on ownership) or a `DROP SCHEMA public CASCADE` (which would take the infra function with it). The Dolibarr pod is scaled to 0 (or its backends terminated) for the duration of the restore so it has exclusive access to the database. A faster `DROP/CREATE DATABASE … TEMPLATE` variant exists but requires a `CREATEDB` role; it is deferred to the Alternatives below.
### 3 · Prod-write isolation — defense in depth
We record the following as the integrity invariant of the sandbox: **no path the agent can reach can mutate prod.** Each layer is enforced *structurally* — by ownership and credential scope, not by policy or convention — so they hold even on a misdirected command:
- **The only super-credential lives behind the human gate.** The sole credential that can create or drop databases or otherwise reach prod is the Postgres provider configured `superuser = true` in `postgres/iac/providers.tf`. It authenticates via Vault JWT and is exercised **only** inside the human-gated `postgres.yaml` CI run (Gitea OIDC handoff + PR merge). No standing or autonomous credential holds it.
- **`DROP DATABASE` requires ownership.** The sandbox owner role `erp_sandbox_role` owns **only** `erp-sandbox`; it is structurally incapable of dropping prod `erp`, which is owned by `erp_role`. Ownership — not a deny rule — is the fence.
- **The write skill is sandbox-scoped at the application layer.** The V9 write skill authenticates with a Dolibarr user and API key valid only on `erp-sandbox.arcodange.lab`. Prod stays read-only: the `ai_agent` account has no Dolibarr write permissions, so even a write misdirected at prod is rejected by Dolibarr itself.
- **The runtime DB creds carry no prod rights.** The sandbox runtime credentials (`postgres/creds/erp-sandbox`) grant only membership in `erp_sandbox_role` — no rights on the prod database.
- **A host-guard refuses non-sandbox targets.** The write tooling refuses any operation whose target host is not `erp-sandbox.*`.
- **Resettability is itself a safety layer.** Any mistake made in the sandbox is reverted by the next reset, so the cost of a wrong sandbox write is bounded to "reset and retry."
### 4 · Human-in-the-loop promote
After rehearsing in the sandbox, the change is captured as a reviewable diff using the existing read-only `dolibarr-data-snapshot` skill (in the `erp` repo), which produces content-addressable before/after snapshots. A human approves the diff, and only then are the **same** operations applied to prod under a **separate, deliberately-scoped prod-write credential** used exclusively at promote time. That credential is never part of the agent's standing credentials — the agent authors and rehearses; promotion to prod is a distinct, human-initiated act.
## Consequences
- **+** Autonomous agents get a faithful, disposable rehearsal target — real prod-shaped data via the real Dolibarr API — with zero structural path to prod writes.
- **+** The existing read-only skill family (`dolibarr-tva-summary`, `dolibarr-payments-state`, `dolibarr-invoice-audit`, `dolibarr-thirdparty-completeness`) becomes the BDD assertion library: each skill is a ready-made check the rehearsal can run before and after a write.
- **+** Reset reuses the same Vault + Postgres scoping that already protects prod — no new privilege surface is introduced for the lifecycle; the sandbox's own dynamic creds suffice.
- **+** The promote path keeps the prod-write credential out of the agent's hands entirely, so the only writes prod ever sees are human-approved replays.
- **** Encryption fidelity is imperfect. Dolibarr ties some encrypted fields to `DOLI_INSTANCE_UNIQUE_ID`; the sandbox has its own uuid, so a few encrypted fields will not decrypt unless prod's uuid and key are copied into the sandbox KV. That "high-fidelity" mode is opt-in because it brings a prod secret into the sandbox; the default is the sandbox's own uuid, accepting the minor breakage of a few undecryptable fields.
- **** Reset requires scaling the Dolibarr pod to 0 briefly, so the sandbox is unavailable for the duration of the restore.
- **** `pg_restore` cost grows with database size; a large enough golden makes reset slow.
- **→** If reset becomes slow, introduce a `CREATEDB`-scoped role that owns only the sandbox and golden databases and switch reset to the `DROP/CREATE DATABASE … TEMPLATE` clone path — still structurally unable to drop prod, because it does not own `erp`.
- **→** Optional `documents/` PVC rsync is a door left open for file-level fidelity if a rehearsal ever needs to assert on uploaded attachments, not just the database rows.
## Alternatives considered
| Option | Why not |
| --- | --- |
| `DROP/CREATE DATABASE … TEMPLATE` for fast reset | Rejected as the **default** because it requires a `CREATEDB` role. Acceptable **later** only via a dedicated role that owns only the sandbox + golden databases — ownership keeps prod undroppable — and documented here as the escape hatch for scale, not the day-one path. |
| Use the human-gated CI superuser path (`postgres.yaml`) for resets | Rejected for autonomous / BDD use: that credential can reach prod, so it must stay behind the human merge gate. Wiring it into an automated reset loop would put a prod-capable credential on the agent's hot path — exactly what the integrity invariant forbids. |
| A fully separate cluster ([ADR-0001](0001-safe-prod-like-environment.md)'s model) | The right answer for **infra** rehearsal, but overkill here. The agent's reach is API-only and the state is one database plus a PVC; a sibling in-cluster environment carries that data faithfully without a second cluster to operate. |
| Synthetic / fixture seed data instead of an iso-prod dump | Cheaper and carries no prod secrets, but predicts nothing about how a write behaves on the real accounting set — the rehearsal's whole point is prod-shaped data. The encryption-fidelity trade-off is accepted instead. |
## QA & validation
- **Reset round-trip gate** — seed `erp-sandbox` from the golden, run a known write via the V9 skill, reset, and assert the sandbox state hashes back to the golden checkpoint (via the content-addressable `dolibarr-data-snapshot` hash). A reset that does not return to the golden hash is a failure.
- **No-superuser proof** — the reset path runs end to end using only `postgres/creds/erp-sandbox` (membership in `erp_sandbox_role`); it must succeed with **no** `CREATEDB` and **no** superuser. If it needs either, the object-level mechanism is not confined as claimed.
- **Prod-undroppable proof** — attempting `DROP DATABASE erp` (or any object write on prod) with the sandbox runtime credential must be rejected by Postgres on ownership grounds, and a write to `erp.arcodange.lab` with the sandbox Dolibarr key must be rejected by Dolibarr's permission model.
- **Host-guard check** — the write tooling refuses any target host not matching `erp-sandbox.*`.
- **Promotion gate** — no AI-authored write reaches prod until it has been rehearsed in `erp-sandbox`, captured as a reviewed before/after snapshot diff, and explicitly replayed against prod under the separate promote-time credential with human confirmation.
## References
- [ADR-0001 · Safe, production-like environment](0001-safe-prod-like-environment.md) — the local-only safe environment for **infra** rehearsal; this ADR addresses the **application-data** layer and does not supersede it.
- [ADR-0002 · Per-application environments](0002-per-application-environments.md) — established the `<env>` coordinate and stood up the `erp-sandbox` instance whose state lifecycle this ADR defines.
- `factory` `postgres/iac/providers.tf` — the `superuser = true` Postgres provider, the sole prod-capable credential, exercised only in the human-gated `postgres.yaml` CI run.
- `factory` `postgres/iac/main.tf` — the per-instance flatten that owns each database by its `<app>_role` / `<app>_<env>_role`; `erp-sandbox` is owned by `erp_sandbox_role`, prod `erp` by `erp_role`, which is why the sandbox cannot drop prod.
- `tools` `hashicorp-vault/iac/modules/app_roles/main.tf` — the dynamic-credential role whose creation statement grants only `GRANT <app>_role TO {{name}}` (membership only), so `postgres/creds/erp-sandbox` carries no rights on the prod database.
- `erp` `.claude/skills/dolibarr-data-snapshot/` — the read-only, content-addressable snapshot skill used to capture the reviewable before/after diff at promote time and to verify the reset round-trip.
- PRs: this ADR is introduced by [PR factory#19](https://gitea.arcodange.lab/arcodange-org/factory/pulls/19) (links back to this file).

View File

@@ -3,7 +3,7 @@
# Architecture Decision Records # Architecture Decision Records
> **Status**: 🟢 Active > **Status**: 🟢 Active
> **Last Updated**: 2026-06-23 > **Last Updated**: 2026-06-28
> **Related**: [vibe/PRD](../PRD/README.md) · [vibe/Investigations](../investigations/README.md) > **Related**: [vibe/PRD](../PRD/README.md) · [vibe/Investigations](../investigations/README.md)
> **Historical**: [doc/adr](../../doc/adr/README.md) (foundational infra) · [ansible/.../docs/adr](../../ansible/arcodange/factory/docs/adr/) (dated infra ADRs) > **Historical**: [doc/adr](../../doc/adr/README.md) (foundational infra) · [ansible/.../docs/adr](../../ansible/arcodange/factory/docs/adr/) (dated infra ADRs)
@@ -34,6 +34,8 @@ When a new decision *supersedes* one of the historical records, write the new AD
| # | Title | Status | Date | | # | Title | Status | Date |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| [0001](0001-safe-prod-like-environment.md) | Safe, production-like environment | 🟢 Accepted | 2026-06-23 | | [0001](0001-safe-prod-like-environment.md) | Safe, production-like environment | 🟢 Accepted | 2026-06-23 |
| [0002](0002-per-application-environments.md) | Per-application environments | 🟢 Accepted | 2026-06-25 |
| [0003](0003-sandbox-state-lifecycle.md) | Sandbox state lifecycle | 🟢 Accepted | 2026-06-28 |
## Rules to contribute ## Rules to contribute

View File

@@ -3,9 +3,9 @@
# Safe, production-like environment # Safe, production-like environment
> **Status:** In design > **Status:** In design
> **Last Updated:** 2026-06-23 > **Last Updated:** 2026-06-25
> **Design record:** [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md) > **Design record:** [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md)
> **Adjacent:** [INV-001 — prod blast-radius couplings](../../investigations/INV-001-prod-blast-radius-couplings.md) > **Adjacent:** [INV-001 — prod blast-radius couplings](../../investigations/INV-001-prod-blast-radius-couplings.md) · [ADR 0002 — per-application environments](../../ADR/0002-per-application-environments.md) (the application-data-layer counterpart)
> **Map:** [Lab ecosystem guidebook](../../guidebooks/lab-ecosystem/README.md) > **Map:** [Lab ecosystem guidebook](../../guidebooks/lab-ecosystem/README.md)
## Problem ## Problem

View File

@@ -3,8 +3,8 @@
# Naming conventions — the `<app>` join key # Naming conventions — the `<app>` join key
> **Status**: 🟢 Active > **Status**: 🟢 Active
> **Last Updated**: 2026-06-23 > **Last Updated**: 2026-06-25
> **Related**: [Lab ecosystem](README.md) · [Factory brick](01-factory.md) · [Secrets & Vault](secrets-and-vault.md) · [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) > **Related**: [Lab ecosystem](README.md) · [Factory brick](01-factory.md) · [Secrets & Vault](secrets-and-vault.md) · [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) · [ADR 0002 — per-application environments](../../ADR/0002-per-application-environments.md)
> **Upstream (source of truth)**: [doc/runbooks/new-web-app/conventions.md](../../../doc/runbooks/new-web-app/conventions.md) (French, authoritative) > **Upstream (source of truth)**: [doc/runbooks/new-web-app/conventions.md](../../../doc/runbooks/new-web-app/conventions.md) (French, authoritative)
## TL;DR ## TL;DR
@@ -83,9 +83,35 @@ The symptom is always the same: a brick that *looks* provisioned but never conne
✅ Choose a short, stable, lowercase kebab-case name up front and reuse it character-for-character. ✅ Choose a short, stable, lowercase kebab-case name up front and reuse it character-for-character.
❌ Never introduce variants (case, separators, plurals); nothing will warn you. ❌ Never introduce variants (case, separators, plurals); nothing will warn you.
## Why this makes a sandbox safe ## Multiple environments per app (the `<env>` coordinate)
The `<app>` convention is also the reason a **production-like sandbox can reuse the exact same names** without colliding with production. Because every brick derives its resource names from `<app>` and from nothing else, an entire parallel universe of the platform — its own Vault, its own Postgres instance, its own k3s namespace scope — can host an `erp` named identically to the production `erp`, provided the two universes never share a backing store. Identity comes from the *environment boundary*, not from the name; the name is free to repeat. This is what lets QA and recovery drills run against `erp`, `webapp`, etc. with realistic identifiers instead of mangled `erp-staging`-style aliases that would themselves break the name-wiring. See the PRD's [isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) for how that environment fence is drawn. A single application can run as several deployed instances — `prod`, `sandbox`, and so on — **without becoming a separate app**: same repo, same chart, same version. A second coordinate `<env>` extends the join key, governed by an **elision rule** ([ADR 0002](../../ADR/0002-per-application-environments.md)):
- `env` defaults to `prod`, and **`prod` elides** — when `env == prod` no suffix is added, so every derived name is exactly the single-coordinate output of the mapping above. Existing apps are unaffected (their plan is a no-op).
- Non-prod envs take the **`<app>-<env>`** suffix everywhere — namespace, Vault paths / roles / policies, ArgoCD Application, DNS, GCS state sub-prefix — with the one snake-case exception inherited from the `_role` convention: the Postgres owner role is `<app>_<env>_role`.
- One repo, one chart, and one CI JWT role (`gitea_cicd_<app>`) serve every env; per-env differences are a `values-<env>.yaml` overlay.
Worked example — `erp` (prod, elided) and `erp-sandbox`:
| System | `erp` (env = prod) | `erp-sandbox` |
| --- | --- | --- |
| PostgreSQL database | `erp` | `erp-sandbox` |
| PostgreSQL owner role | `erp_role` | `erp_sandbox_role` |
| Namespace + ServiceAccount | `erp` | `erp-sandbox` |
| Vault dynamic DB creds | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
| Vault KV config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
| ArgoCD Application | `erp` | `erp-sandbox` |
| Internal DNS | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
| Gitea repo / chart / CI JWT | `arcodange-org/erp` · chart · `gitea_cicd_erp` | shared |
## Two sandbox models, two naming strategies
There are two distinct ways to stand up a non-production copy, and they treat the join key differently — by design, not by accident.
- **Separate-cluster sandbox** ([ADR 0001](../../ADR/0001-safe-prod-like-environment.md)) — a whole parallel universe (its own Vault, Postgres, k3s) on the control node, for rehearsing dangerous *infrastructure* changes. The two universes never share a backing store, so identity comes from the *environment boundary*, not the name: the sandbox hosts an `erp` named identically to production. Names repeat freely; no `<env>` suffix is needed, so the name-wiring stays intact and drills run against realistic identifiers.
- **In-cluster sibling instance** ([ADR 0002](../../ADR/0002-per-application-environments.md)) — a second instance on the *same* cluster (e.g. `erp-sandbox` beside `erp`), for rehearsing *application-data* writes against the real API. Here there is no cluster fence to disambiguate by, so the `<env>` suffix *is* the separator: every derived name carries `-sandbox` to avoid colliding with prod's namespace, database, Vault paths, and DNS.
Both keep the name-wiring coherent — one by repeating the slug behind a cluster fence, the other by extending the slug with the elided `<env>` coordinate. See the PRD's [isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) for how the separate-cluster fence is drawn, and [ADR 0002](../../ADR/0002-per-application-environments.md) for why the in-cluster sibling's blast radius stays bounded to one app's data.
## See also ## See also
@@ -93,4 +119,5 @@ The `<app>` convention is also the reason a **production-like sandbox can reuse
- [Secrets & Vault](secrets-and-vault.md) — how `gitea_cicd_<app>` and the `<app>` / `<app>-ops` policies fit the auth model. - [Secrets & Vault](secrets-and-vault.md) — how `gitea_cicd_<app>` and the `<app>` / `<app>-ops` policies fit the auth model.
- [Factory brick](01-factory.md) — where the ArgoCD app-of-apps, the Postgres OpenTofu, and the IaC live. - [Factory brick](01-factory.md) — where the ArgoCD app-of-apps, the Postgres OpenTofu, and the IaC live.
- [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) — why identical names are safe across environments. - [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) — why identical names are safe across environments.
- [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md). - [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md) — the separate-cluster sandbox model.
- [ADR 0002 — Per-application environments](../../ADR/0002-per-application-environments.md) — the `<env>` coordinate + elision rule, and the in-cluster sibling sandbox model.