docs(adr): ADR-0002 — per-application environments via an env coordinate #15
97
vibe/ADR/0002-per-application-environments.md
Normal file
97
vibe/ADR/0002-per-application-environments.md
Normal file
@@ -0,0 +1,97 @@
|
|||||||
|
[vibe](../README.md) > [ADR](README.md) > **0002 · Per-application environments**
|
||||||
|
|
||||||
|
# ADR-0002: Per-application environments via an env coordinate
|
||||||
|
|
||||||
|
> **Status**: Accepted
|
||||||
|
> **Date**: 2026-06-25
|
||||||
|
> **Deciders**: @arcodange
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The [`<app>` join key](../../doc/runbooks/new-web-app/conventions.md) threads one kebab-case identifier identically through every system that makes up an application: the Gitea repo, the Postgres database + `<app>_role`, Vault (`postgres/creds/<app>`, the k8s auth role `<app>`, the policies `<app>` / `<app>-ops`, the CI JWT role `gitea_cicd_<app>`), the k8s namespace + ServiceAccount, the ArgoCD Application, the GCS state prefix `<app>/main`, and DNS (`<app>.arcodange.lab`). Bricks wire together by name convention, not explicit config.
|
||||||
|
|
||||||
|
That convention conflates two ideas it never separated: an **application** and a **deployed instance** of it. There is exactly one of everything per app — one namespace, one database, one Vault creds path, one DNS host. The model cannot express "the same app, a second time, somewhere else."
|
||||||
|
|
||||||
|
The motivating need makes the gap concrete. The Arcodange Dolibarr ERP is growing a write-capable AI-agent skill — auto-creating supplier invoices from ingested emails, fixing thirdparty data, and similar mutations. Before such writes touch the production accounting database, the operator needs a place where the agent can run write operations autonomously, a human reviews the result, and only then the same operation is promoted to prod. That requires a **second deployed instance of the same application**: the same Dolibarr chart, the same version, the same conventions — differing only in *where* it runs and *which data* it touches.
|
||||||
|
|
||||||
|
| Force | Pressure it creates |
|
||||||
|
| --- | --- |
|
||||||
|
| One identifier per app, no env coordinate | "Same app, different environment" is inexpressible without inventing a whole second app. |
|
||||||
|
| Write-capable AI agent landing on the prod ERP | A wrong autonomous write corrupts live accounting data with no rehearsal surface. |
|
||||||
|
| Fidelity requirement for the rehearsal surface | The sandbox must run the *real* Dolibarr API against *prod-like* data, or the rehearsal predicts nothing. |
|
||||||
|
| [ADR-0001](0001-safe-prod-like-environment.md) rejected an in-cluster sandbox | Its Alternative 3 ("sandbox namespace on the real cluster") was rejected for shared blast radius — so any in-cluster sibling instance must be reconciled against that, not pretended away. |
|
||||||
|
|
||||||
|
Treating the sandbox as a wholly separate app would fork the chart, the repo, the runbook chain, and the Vault wiring — four things that then drift apart over time, defeating the "same app, same version" fidelity the rehearsal depends on.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
We will extend the `<app>` convention with a second coordinate, `<env>`, governed by an **elision rule** so that adding the coordinate changes nothing for any existing app.
|
||||||
|
|
||||||
|
- **`env` defaults to `prod`, and `prod` elides.** When `env == prod`, no suffix is added: every derived name is character-for-character identical to today's single-env output. The instance name equals the app name (`local.instance == local.name`), so every existing app's `tofu plan` is a no-op.
|
||||||
|
- **Non-prod envs take the `<app>-<env>` suffix** in kebab-case everywhere — namespace, Vault paths / roles / policies, ArgoCD Application, DNS host, GCS-state sub-prefix — with one exception: the Postgres owner role stays snake-case as `<app>_<env>_role`, matching the existing `_role` suffix convention.
|
||||||
|
- **One repo and one chart serve every env of an app.** Per-env differences are overlaid via `values-<env>.yaml`; the chart's instance-specific values are `.Values`-driven, not hardcoded literals, so the same chart renders any instance.
|
||||||
|
- **One CI JWT role (`gitea_cicd_<app>`) per repo covers all its envs.** Its ops policy is widened to the `<app>-*` path family. Each running instance keeps its own runtime Vault policy.
|
||||||
|
|
||||||
|
### Worked example: `erp` and `erp-sandbox`
|
||||||
|
|
||||||
|
| Coordinate | `erp` (env = prod, elided) | `erp-sandbox` (env = sandbox) |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| Postgres database | `erp` | `erp-sandbox` |
|
||||||
|
| Postgres owner role | `erp_role` | `erp_sandbox_role` |
|
||||||
|
| k8s namespace + ServiceAccount | `erp` | `erp-sandbox` |
|
||||||
|
| Vault dynamic DB creds | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
|
||||||
|
| Vault KV config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
|
||||||
|
| ArgoCD Application | `erp` | `erp-sandbox` |
|
||||||
|
| Internal DNS | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
|
||||||
|
| Gitea repo | `arcodange-org/erp` | `arcodange-org/erp` (shared) |
|
||||||
|
| Helm chart | one chart | one chart (shared) |
|
||||||
|
| CI JWT role | `gitea_cicd_erp` | `gitea_cicd_erp` (shared) |
|
||||||
|
|
||||||
|
### Why this is not what ADR-0001 rejected
|
||||||
|
|
||||||
|
[ADR-0001](0001-safe-prod-like-environment.md) chose a **local-only** safe environment (k3d / arm64 VMs) and rejected its Alternative 3, an in-cluster "sandbox namespace on the real cluster," for shared blast radius. ADR-0002 introduces an in-cluster sibling instance (`erp-sandbox`), which looks like the very thing that was rejected. The two stand together because they operate at **different layers**.
|
||||||
|
|
||||||
|
ADR-0001's rejection is scoped to rehearsing **infrastructure / platform** change-classes — Ansible playbooks, Vault policy / auth / mount changes, Postgres superuser migrations, ArgoCD prune / selfHeal, Longhorn ops, DNS / email. Those couplings share fleet-wide control planes, so an in-cluster sandbox cannot isolate them; only a separate cluster + Vault + state + DNS zone can. That is exactly why ADR-0001 is local-only.
|
||||||
|
|
||||||
|
ADR-0002 operates one layer up. The AI agent's only reach is the **Dolibarr HTTP API**, holding a write-scoped, app-specific API key against an isolated database — `erp-sandbox` on its own `erp_sandbox_role`, its own namespace, its own Vault creds path. The agent never touches kubectl, the Vault root, the Postgres superuser, ArgoCD, Longhorn, or DNS. The fleet-level blast radius that doomed Alternative 3 for infra rehearsal is simply **not in the agent's reach**; the blast radius of a wrong AI write is bounded to the sandbox app's own data.
|
||||||
|
|
||||||
|
The two ADRs are therefore complementary, not contradictory, and ADR-0002 does not supersede ADR-0001. ADR-0001 isolates the *operator* from breaking the *fleet*. ADR-0002 isolates the *AI agent* from corrupting *one app's production data*, while preserving the prod-like API surface and real-data fidelity that the local k3d sandbox — which carries no prod data — cannot offer.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- **+** Every existing app (webapp, erp, crowdsec, plausible, dance-lessons-coach, cms) is unaffected: the elision rule makes the prod instance's derived names byte-identical, so adoption ships with zero migration and a no-op plan.
|
||||||
|
- **+** A second instance of an app is now a `values-<env>.yaml` overlay plus an `envs` entry — not a forked repo, chart, and runbook chain — so prod and sandbox share one source of truth and stay on the same version by construction.
|
||||||
|
- **+** The AI-agent write skill gets a prod-like rehearsal surface with real-shaped data: the *same* Dolibarr API and chart, an *isolated* database, a bounded blast radius.
|
||||||
|
- **+** The convention chain (db + role → Vault creds + policy → namespace + SA → ArgoCD → DNS) is reused verbatim for the `-sandbox` instance, so runbooks read identically for any env.
|
||||||
|
- **−** Names are no longer a flat app list: every consumer must reason about the `instance == app` (prod) versus `app-env` (non-prod) distinction, and the snake-case owner-role exception (`<app>_<env>_role`) is a special case that must be carried in the modules.
|
||||||
|
- **−** A single shared Vault CI policy widened to `<app>-*` means the CI role for a repo can write the ops paths of *all* that repo's envs — a deliberately looser ops scope than one-policy-per-instance.
|
||||||
|
- **−** A single shared OpenTofu state per repo holds every env's resources together, so the envs of one app share a blast radius at the state layer (mitigated by `for_each`, accepted at current scale — see Alternatives).
|
||||||
|
- **→** The AI-agent promotion workflow this unlocks: the agent runs writes against `erp-sandbox` autonomously, emits a structured changeset, a human reviews it, and the **same** operation is re-applied to prod only with explicit confirmation — never auto-applied by the agent. The read/write skills resolve their target by an env switch (e.g. `DOLIBARR_TARGET=prod|sandbox`, defaulting to `prod`).
|
||||||
|
- **→** Rollout is additive and phased, each phase gated by a no-op `tofu plan` against existing apps: **(A)** the `tools` repo adds an optional `env` / `envs` parameter to the shared `app_roles` and `app_policy` Vault modules; **(B)** the `factory` repo gains the `envs` schema in `postgres/iac` tfvars, renders one ArgoCD Application per env, and documents the elision rule in `conventions.md`; **(C)** the `erp` chart literals are templated to `.Values`; **(D)** `erp` + `factory` activate `erp-sandbox`; **(E)** DNS + ArgoCD registration.
|
||||||
|
- **→** Per-env state separation (`<app>/<env>` prefixes) is a door left open: if env-to-env blast-radius isolation at the state layer becomes warranted, the prefix scheme can be revisited without changing the naming model.
|
||||||
|
|
||||||
|
## Alternatives considered
|
||||||
|
|
||||||
|
| Option | Why not |
|
||||||
|
| --- | --- |
|
||||||
|
| Treat `erp-sandbox` as a wholly separate `<app>` (own repo, own chart copy) | Forks the chart, the repo, and the runbook chain; the two copies drift over time; defeats the "same app, same version" fidelity the rehearsal depends on. |
|
||||||
|
| Use the [ADR-0001](0001-safe-prod-like-environment.md) local-only sandbox (k3d / VMs) for the AI-agent writes | That environment carries **no production data** — the write-rehearsal needs prod-like data and the real Dolibarr API surface to be meaningful. Complementary to ADR-0001, not a substitute for it. |
|
||||||
|
| Per-env OpenTofu state (`<app>/<env>` prefixes) instead of one shared state per repo | Buys more env-to-env blast-radius isolation, but at the cost of more CI plumbing and cross-env output wiring than current scale warrants; one shared state with `for_each` keeps runbooks simple. A real decision point — the chosen path is single shared state per repo, with the prefix scheme left as a future door. |
|
||||||
|
| No elision — always suffix, even prod (`<app>-prod`) | Breaks every existing derived name, forcing a fleet-wide rename plus `tofu` resource moves; rejected in favour of the elision rule's zero-migration property. |
|
||||||
|
|
||||||
|
## QA & validation
|
||||||
|
|
||||||
|
- **Backwards-compat no-op gate** — after the module change, `tofu plan` against every existing app (webapp, erp, crowdsec, plausible, dance-lessons-coach, cms) reports zero changes. The elision rule guarantees `local.instance == local.name` for `env == prod`, so no prod resource moves.
|
||||||
|
- **Byte-identical chart render** — `helm template erp chart/` before versus after the literal-templating refactor diffs to nothing (verified: 10857 bytes on both sides, `diff` exit 0).
|
||||||
|
- **`tofu fmt -check` + `tofu validate`** are clean on the module changes.
|
||||||
|
- **Sandbox activation gate** — when `erp-sandbox` is stood up, the [new-web-app convention chain](../../doc/runbooks/new-web-app/conventions.md) must resolve end to end for the `-sandbox` instance (db + role → Vault creds + policy → namespace + SA → ArgoCD Healthy/Synced → VSO injects → pod Running), exactly as the prod instance does.
|
||||||
|
- **Promotion gate** — no AI-authored write reaches the prod ERP until it has been applied to `erp-sandbox`, produced a reviewed changeset, and been explicitly re-applied with human confirmation.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [ADR-0001 · Safe, production-like environment](0001-safe-prod-like-environment.md) — the local-only safe environment for infra rehearsal that this ADR complements (it stands; this does not supersede it).
|
||||||
|
- [PRD · Safe, production-like environment](../PRD/safe-prod-like-environment/README.md) — the product view this work relates to, and its [isolation-boundary leaf](../PRD/safe-prod-like-environment/isolation-boundary.md) detailing the cluster/Vault/state/DNS boundary.
|
||||||
|
- [new-web-app conventions](../../doc/runbooks/new-web-app/conventions.md) — the single-env `<app>` convention this ADR extends with the env coordinate.
|
||||||
|
- [Phase A — `tools` Vault module env parameter](https://gitea.arcodange.lab/arcodange-org/tools/pulls/2) — adds the optional `env` / `envs` parameter to the shared `app_roles` and `app_policy` modules.
|
||||||
|
- [Phase C — `erp` chart literal templating](https://gitea.arcodange.lab/arcodange-org/erp/pulls/11) — templates the chart's single-env literals to `.Values` so one chart renders any instance.
|
||||||
|
- [PR factory#15 — this ADR](https://gitea.arcodange.lab/arcodange-org/factory/pulls/15) — the change that introduces ADR-0002 (links back to this file).
|
||||||
@@ -3,7 +3,7 @@
|
|||||||
# Architecture Decision Records
|
# Architecture Decision Records
|
||||||
|
|
||||||
> **Status**: 🟢 Active
|
> **Status**: 🟢 Active
|
||||||
> **Last Updated**: 2026-06-23
|
> **Last Updated**: 2026-06-25
|
||||||
> **Related**: [vibe/PRD](../PRD/README.md) · [vibe/Investigations](../investigations/README.md)
|
> **Related**: [vibe/PRD](../PRD/README.md) · [vibe/Investigations](../investigations/README.md)
|
||||||
> **Historical**: [doc/adr](../../doc/adr/README.md) (foundational infra) · [ansible/.../docs/adr](../../ansible/arcodange/factory/docs/adr/) (dated infra ADRs)
|
> **Historical**: [doc/adr](../../doc/adr/README.md) (foundational infra) · [ansible/.../docs/adr](../../ansible/arcodange/factory/docs/adr/) (dated infra ADRs)
|
||||||
|
|
||||||
@@ -34,6 +34,7 @@ When a new decision *supersedes* one of the historical records, write the new AD
|
|||||||
| # | Title | Status | Date |
|
| # | Title | Status | Date |
|
||||||
| --- | --- | --- | --- |
|
| --- | --- | --- | --- |
|
||||||
| [0001](0001-safe-prod-like-environment.md) | Safe, production-like environment | 🟢 Accepted | 2026-06-23 |
|
| [0001](0001-safe-prod-like-environment.md) | Safe, production-like environment | 🟢 Accepted | 2026-06-23 |
|
||||||
|
| [0002](0002-per-application-environments.md) | Per-application environments | 🟢 Accepted | 2026-06-25 |
|
||||||
|
|
||||||
## Rules to contribute
|
## Rules to contribute
|
||||||
|
|
||||||
|
|||||||
@@ -3,9 +3,9 @@
|
|||||||
# Safe, production-like environment
|
# Safe, production-like environment
|
||||||
|
|
||||||
> **Status:** In design
|
> **Status:** In design
|
||||||
> **Last Updated:** 2026-06-23
|
> **Last Updated:** 2026-06-25
|
||||||
> **Design record:** [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md)
|
> **Design record:** [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md)
|
||||||
> **Adjacent:** [INV-001 — prod blast-radius couplings](../../investigations/INV-001-prod-blast-radius-couplings.md)
|
> **Adjacent:** [INV-001 — prod blast-radius couplings](../../investigations/INV-001-prod-blast-radius-couplings.md) · [ADR 0002 — per-application environments](../../ADR/0002-per-application-environments.md) (the application-data-layer counterpart)
|
||||||
> **Map:** [Lab ecosystem guidebook](../../guidebooks/lab-ecosystem/README.md)
|
> **Map:** [Lab ecosystem guidebook](../../guidebooks/lab-ecosystem/README.md)
|
||||||
|
|
||||||
## Problem
|
## Problem
|
||||||
|
|||||||
Reference in New Issue
Block a user