Compare commits
12 Commits
arcodange/
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 5c60677171 | |||
| 90498e4f55 | |||
| a38c8b39f1 | |||
| 00a838799b | |||
| 235ff72ac0 | |||
| c00c4cdd5c | |||
| 8a1a63ee10 | |||
| c35b510040 | |||
| 3961914613 | |||
| 801724e1bc | |||
| 7727b244ad | |||
| e2a79a08a7 |
@@ -31,4 +31,47 @@ spec:
|
||||
{{- end }}
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
{{- /*
|
||||
Non-prod environments (ADR-0002 elision rule): one extra Application per env
|
||||
under `<app_attr>.envs`. Each renders the SAME repo + chart, overlaid with
|
||||
values-<env>.yaml, into the `<app>-<env>` namespace. Apps with no `envs` key
|
||||
render nothing extra here, so prod-only apps are byte-identical.
|
||||
*/ -}}
|
||||
{{- range $env_name, $env_attr := $app_attr.envs }}
|
||||
---
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: {{ $app_name }}-{{ $env_name }}
|
||||
namespace: argocd
|
||||
finalizers:
|
||||
- resources-finalizer.argocd.argoproj.io
|
||||
{{- with $env_attr.annotations }}
|
||||
annotations:
|
||||
{{- toYaml . | nindent 4 }}
|
||||
{{- end }}
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://gitea.arcodange.lab/{{ $org }}/{{ $app_name }}
|
||||
targetRevision: HEAD
|
||||
path: chart
|
||||
helm:
|
||||
valueFiles:
|
||||
- values.yaml
|
||||
- values-{{ $env_name }}.yaml
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: {{ $app_name }}-{{ $env_name }}
|
||||
syncPolicy:
|
||||
{{- if $env_attr.syncPolicy }}
|
||||
{{- toYaml $env_attr.syncPolicy | nindent 4 }}
|
||||
{{- else }}
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
{{- end }}
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
{{- end }}
|
||||
{{ end }}
|
||||
@@ -21,6 +21,11 @@ gitea_applications:
|
||||
argocd-image-updater.argoproj.io/telegram-gateway.update-strategy: digest
|
||||
erp:
|
||||
annotations: {}
|
||||
# Non-prod environments (ADR-0002). Each key renders an extra Application
|
||||
# "<app>-<env>" overlaid with chart/values-<env>.yaml into namespace
|
||||
# "<app>-<env>". Prod erp is unaffected.
|
||||
envs:
|
||||
sandbox: {}
|
||||
cms:
|
||||
annotations:
|
||||
argocd-image-updater.argoproj.io/image-list: cms=gitea.arcodange.lab/arcodange-org/cms:latest
|
||||
|
||||
@@ -44,6 +44,29 @@ Les briques se « branchent » entre elles **par convention de nom**, pas par co
|
||||
✅ **Utilise un nom court, stable, kebab-case** dès le départ.
|
||||
❌ **N'introduis pas** de variantes (`my_app` vs `my-app`, `MyApp`, pluriels) : rien ne te préviendra, l'app échouera silencieusement à se connecter ou à se déployer.
|
||||
|
||||
## Plusieurs environnements pour une même app
|
||||
|
||||
Une application peut être déployée plusieurs fois (prod, sandbox, …) **sans devenir une app distincte** : même dépôt, même chart, même version. On ajoute une seconde coordonnée `<env>` au nom, régie par une **règle d'élision** ([ADR-0002](../../../vibe/ADR/0002-per-application-environments.md)) :
|
||||
|
||||
- **`env` vaut `prod` par défaut, et `prod` s'élide.** Quand `env == prod`, **aucun suffixe** n'est ajouté : tous les noms dérivés sont identiques au cas mono-environnement décrit plus haut. Une app existante ne change donc pas (`plan` à vide).
|
||||
- **Les environnements non-prod prennent le suffixe `<app>-<env>`** en kebab-case partout — base, namespace, chemins/rôles/policies Vault, Application ArgoCD, hôte DNS, sous-préfixe d'état GCS — **à une exception** : le rôle propriétaire PostgreSQL reste en snake-case `<app>_<env>_role`, pour rester cohérent avec le suffixe `_role`.
|
||||
- **Un seul dépôt et un seul chart** servent tous les environnements ; les différences sont superposées via `values-<env>.yaml`. **Un seul rôle JWT de CI** (`gitea_cicd_<app>`) par dépôt couvre tous ses environnements.
|
||||
|
||||
Exemple — `erp` (prod, élidé) vs `erp-sandbox` :
|
||||
|
||||
| Système | `erp` (env = prod) | `erp-sandbox` (env = sandbox) |
|
||||
|---|---|---|
|
||||
| Base PostgreSQL | `erp` | `erp-sandbox` |
|
||||
| Rôle propriétaire PG | `erp_role` | `erp_sandbox_role` |
|
||||
| Namespace + ServiceAccount | `erp` | `erp-sandbox` |
|
||||
| Creds DB dynamiques Vault | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
|
||||
| Secret KV de config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
|
||||
| Application ArgoCD | `erp` | `erp-sandbox` |
|
||||
| Domaine interne | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
|
||||
| Dépôt Gitea / chart / JWT CI | `arcodange-org/erp` · chart · `gitea_cicd_erp` | partagés (mêmes valeurs) |
|
||||
|
||||
Déclaration : `postgres/iac/terraform.tfvars` et la liste `applications` côté `tools` acceptent `envs = ["prod", "sandbox"]` ; l'omettre revient à `["prod"]`. L'`Application` ArgoCD non-prod se déclare via une clé `envs` sous l'app dans [argocd/values.yaml](https://gitea.arcodange.lab/arcodange-org/factory/src/branch/main/argocd/values.yaml).
|
||||
|
||||
## Références croisées
|
||||
|
||||
- [01 · Dépôt Gitea](01-gitea-repo.md) — fixe `<app>` comme nom de dépôt sous `arcodange-org`.
|
||||
|
||||
@@ -14,16 +14,6 @@ resource "cloudflare_r2_bucket" "arcodange_tf" {
|
||||
jurisdiction = "eu"
|
||||
}
|
||||
|
||||
# One-time state reconcile. The arcodange-tf R2 bucket already exists in the EU jurisdiction, but its
|
||||
# prior state entry lacked the jurisdiction, so cloudflare provider >= 5.20 read it as "not found" and
|
||||
# tried to recreate it (which fails: "already exists"). Re-import it with the jurisdiction-qualified id
|
||||
# (<account_id>/<bucket_name>/<jurisdiction>) so the next apply adopts the real bucket instead.
|
||||
# This block is a no-op once the bucket is in state and can be removed afterwards.
|
||||
import {
|
||||
to = cloudflare_r2_bucket.arcodange_tf
|
||||
id = "f7fcf28c0823cecb44e53b6e92d5144f/arcodange-tf/eu"
|
||||
}
|
||||
|
||||
module "cf_r2_arcodange_tf_token" {
|
||||
source = "./modules/cloudflare_token"
|
||||
account_id = local.cloudflare_account_id
|
||||
|
||||
@@ -1,12 +1,30 @@
|
||||
locals {
|
||||
# Flatten applications × envs into per-instance objects, keyed by the elided
|
||||
# instance id (ADR-0002 elision rule): env=prod → "<app>", else "<app>-<env>".
|
||||
# The Postgres owner role stays snake-case: "<app>_role" (prod) / "<app>_<env>_role".
|
||||
# For a prod-only app the key equals "<app>", database equals "<app>", and role
|
||||
# equals "<app>_role" — identical to the previous set(string) for_each, so every
|
||||
# resource address and attribute is unchanged (a no-op plan).
|
||||
app_instances = merge([
|
||||
for app in var.applications : {
|
||||
for env in app.envs :
|
||||
(env == "prod" ? app.name : "${app.name}-${env}") => {
|
||||
database = env == "prod" ? app.name : "${app.name}-${env}"
|
||||
role = env == "prod" ? "${app.name}_role" : "${app.name}_${env}_role"
|
||||
}
|
||||
}
|
||||
]...)
|
||||
}
|
||||
|
||||
resource "random_password" "credentials_editor" {
|
||||
length = 24
|
||||
override_special = "-:!+<>"
|
||||
}
|
||||
|
||||
resource "postgresql_role" "credentials_editor" {
|
||||
name = "credentials_editor"
|
||||
login = true
|
||||
password = random_password.credentials_editor.result
|
||||
name = "credentials_editor"
|
||||
login = true
|
||||
password = random_password.credentials_editor.result
|
||||
create_role = true
|
||||
lifecycle {
|
||||
ignore_changes = [
|
||||
@@ -24,74 +42,74 @@ resource "vault_kv_secret" "postgres_admin_credentials" {
|
||||
}
|
||||
|
||||
resource "postgresql_role" "app_role" {
|
||||
for_each = var.applications
|
||||
name = "${each.value}_role"
|
||||
for_each = local.app_instances
|
||||
name = each.value.role
|
||||
login = false
|
||||
}
|
||||
resource "postgresql_grant_role" "credentials_editor_app_role" {
|
||||
for_each = var.applications
|
||||
role = postgresql_role.credentials_editor.name
|
||||
grant_role = postgresql_role.app_role[each.value].name
|
||||
for_each = local.app_instances
|
||||
role = postgresql_role.credentials_editor.name
|
||||
grant_role = postgresql_role.app_role[each.key].name
|
||||
with_admin_option = true
|
||||
}
|
||||
resource "postgresql_database" "app_db" {
|
||||
for_each = var.applications
|
||||
name = each.value
|
||||
owner = postgresql_role.app_role[each.value].name
|
||||
for_each = local.app_instances
|
||||
name = each.value.database
|
||||
owner = postgresql_role.app_role[each.key].name
|
||||
template = "template0"
|
||||
alter_object_ownership = true
|
||||
}
|
||||
resource "postgresql_function" "pgbouncer_user_lookup" {
|
||||
for_each = var.applications
|
||||
name = "user_lookup"
|
||||
database = postgresql_database.app_db[each.value].name
|
||||
arg {
|
||||
mode = "IN"
|
||||
name = "i_username"
|
||||
type = "text"
|
||||
}
|
||||
arg {
|
||||
mode = "OUT"
|
||||
name = "uname"
|
||||
type = "text"
|
||||
}
|
||||
arg {
|
||||
mode = "OUT"
|
||||
name = "phash"
|
||||
type = "text"
|
||||
}
|
||||
returns = "record"
|
||||
language = "plpgsql"
|
||||
body = <<-EOF
|
||||
for_each = local.app_instances
|
||||
name = "user_lookup"
|
||||
database = postgresql_database.app_db[each.key].name
|
||||
arg {
|
||||
mode = "IN"
|
||||
name = "i_username"
|
||||
type = "text"
|
||||
}
|
||||
arg {
|
||||
mode = "OUT"
|
||||
name = "uname"
|
||||
type = "text"
|
||||
}
|
||||
arg {
|
||||
mode = "OUT"
|
||||
name = "phash"
|
||||
type = "text"
|
||||
}
|
||||
returns = "record"
|
||||
language = "plpgsql"
|
||||
body = <<-EOF
|
||||
BEGIN
|
||||
SELECT usename, passwd FROM pg_catalog.pg_shadow
|
||||
WHERE usename = i_username INTO uname, phash;
|
||||
RETURN;
|
||||
END;
|
||||
EOF
|
||||
parallel = "SAFE"
|
||||
security_definer = true
|
||||
parallel = "SAFE"
|
||||
security_definer = true
|
||||
}
|
||||
resource "postgresql_grant" "pgbouncer_user_lookup_public_revoke" {
|
||||
for_each = var.applications
|
||||
database = postgresql_function.pgbouncer_user_lookup[each.value].database
|
||||
for_each = local.app_instances
|
||||
database = postgresql_function.pgbouncer_user_lookup[each.key].database
|
||||
role = "public"
|
||||
schema = "public"
|
||||
object_type = "function"
|
||||
objects = [
|
||||
postgresql_function.pgbouncer_user_lookup[each.value].name,
|
||||
objects = [
|
||||
postgresql_function.pgbouncer_user_lookup[each.key].name,
|
||||
]
|
||||
privileges = []
|
||||
privileges = []
|
||||
}
|
||||
resource "postgresql_grant" "pgbouncer_user_lookup" {
|
||||
depends_on = [ postgresql_grant.pgbouncer_user_lookup_public_revoke ] # can't do both in parallel
|
||||
for_each = var.applications
|
||||
database = postgresql_function.pgbouncer_user_lookup[each.value].database
|
||||
depends_on = [postgresql_grant.pgbouncer_user_lookup_public_revoke] # can't do both in parallel
|
||||
for_each = local.app_instances
|
||||
database = postgresql_function.pgbouncer_user_lookup[each.key].database
|
||||
role = "pgbouncer_auth"
|
||||
schema = "public"
|
||||
object_type = "function"
|
||||
objects = [
|
||||
postgresql_function.pgbouncer_user_lookup[each.value].name,
|
||||
objects = [
|
||||
postgresql_function.pgbouncer_user_lookup[each.key].name,
|
||||
]
|
||||
privileges = ["EXECUTE"]
|
||||
}
|
||||
privileges = ["EXECUTE"]
|
||||
}
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
applications = [
|
||||
"webapp",
|
||||
"erp",
|
||||
"crowdsec",
|
||||
"plausible",
|
||||
"dance-lessons-coach",
|
||||
]
|
||||
{ name = "webapp" },
|
||||
{ name = "erp", envs = ["prod", "sandbox"] },
|
||||
{ name = "crowdsec" },
|
||||
{ name = "plausible" },
|
||||
{ name = "dance-lessons-coach" },
|
||||
]
|
||||
|
||||
@@ -1,3 +1,11 @@
|
||||
variable "applications" {
|
||||
type = set(string)
|
||||
}
|
||||
# Multi-env (ADR-0002): each application declares the environments it deploys to.
|
||||
# `envs` defaults to ["prod"] so every existing entry is unchanged in behaviour —
|
||||
# by the elision rule the prod instance keeps the bare `<app>` identifiers, so its
|
||||
# database, owner role, and all derived resources keep their exact current names
|
||||
# and Terraform addresses (a no-op plan).
|
||||
type = set(object({
|
||||
name = string
|
||||
envs = optional(list(string), ["prod"])
|
||||
}))
|
||||
}
|
||||
|
||||
97
vibe/ADR/0002-per-application-environments.md
Normal file
97
vibe/ADR/0002-per-application-environments.md
Normal file
@@ -0,0 +1,97 @@
|
||||
[vibe](../README.md) > [ADR](README.md) > **0002 · Per-application environments**
|
||||
|
||||
# ADR-0002: Per-application environments via an env coordinate
|
||||
|
||||
> **Status**: Accepted
|
||||
> **Date**: 2026-06-25
|
||||
> **Deciders**: @arcodange
|
||||
|
||||
## Context
|
||||
|
||||
The [`<app>` join key](../../doc/runbooks/new-web-app/conventions.md) threads one kebab-case identifier identically through every system that makes up an application: the Gitea repo, the Postgres database + `<app>_role`, Vault (`postgres/creds/<app>`, the k8s auth role `<app>`, the policies `<app>` / `<app>-ops`, the CI JWT role `gitea_cicd_<app>`), the k8s namespace + ServiceAccount, the ArgoCD Application, the GCS state prefix `<app>/main`, and DNS (`<app>.arcodange.lab`). Bricks wire together by name convention, not explicit config.
|
||||
|
||||
That convention conflates two ideas it never separated: an **application** and a **deployed instance** of it. There is exactly one of everything per app — one namespace, one database, one Vault creds path, one DNS host. The model cannot express "the same app, a second time, somewhere else."
|
||||
|
||||
The motivating need makes the gap concrete. The Arcodange Dolibarr ERP is growing a write-capable AI-agent skill — auto-creating supplier invoices from ingested emails, fixing thirdparty data, and similar mutations. Before such writes touch the production accounting database, the operator needs a place where the agent can run write operations autonomously, a human reviews the result, and only then the same operation is promoted to prod. That requires a **second deployed instance of the same application**: the same Dolibarr chart, the same version, the same conventions — differing only in *where* it runs and *which data* it touches.
|
||||
|
||||
| Force | Pressure it creates |
|
||||
| --- | --- |
|
||||
| One identifier per app, no env coordinate | "Same app, different environment" is inexpressible without inventing a whole second app. |
|
||||
| Write-capable AI agent landing on the prod ERP | A wrong autonomous write corrupts live accounting data with no rehearsal surface. |
|
||||
| Fidelity requirement for the rehearsal surface | The sandbox must run the *real* Dolibarr API against *prod-like* data, or the rehearsal predicts nothing. |
|
||||
| [ADR-0001](0001-safe-prod-like-environment.md) rejected an in-cluster sandbox | Its Alternative 3 ("sandbox namespace on the real cluster") was rejected for shared blast radius — so any in-cluster sibling instance must be reconciled against that, not pretended away. |
|
||||
|
||||
Treating the sandbox as a wholly separate app would fork the chart, the repo, the runbook chain, and the Vault wiring — four things that then drift apart over time, defeating the "same app, same version" fidelity the rehearsal depends on.
|
||||
|
||||
## Decision
|
||||
|
||||
We will extend the `<app>` convention with a second coordinate, `<env>`, governed by an **elision rule** so that adding the coordinate changes nothing for any existing app.
|
||||
|
||||
- **`env` defaults to `prod`, and `prod` elides.** When `env == prod`, no suffix is added: every derived name is character-for-character identical to today's single-env output. The instance name equals the app name (`local.instance == local.name`), so every existing app's `tofu plan` is a no-op.
|
||||
- **Non-prod envs take the `<app>-<env>` suffix** in kebab-case everywhere — namespace, Vault paths / roles / policies, ArgoCD Application, DNS host, GCS-state sub-prefix — with one exception: the Postgres owner role stays snake-case as `<app>_<env>_role`, matching the existing `_role` suffix convention.
|
||||
- **One repo and one chart serve every env of an app.** Per-env differences are overlaid via `values-<env>.yaml`; the chart's instance-specific values are `.Values`-driven, not hardcoded literals, so the same chart renders any instance.
|
||||
- **One CI JWT role (`gitea_cicd_<app>`) per repo covers all its envs.** Its ops policy is widened to the `<app>-*` path family. Each running instance keeps its own runtime Vault policy.
|
||||
|
||||
### Worked example: `erp` and `erp-sandbox`
|
||||
|
||||
| Coordinate | `erp` (env = prod, elided) | `erp-sandbox` (env = sandbox) |
|
||||
| --- | --- | --- |
|
||||
| Postgres database | `erp` | `erp-sandbox` |
|
||||
| Postgres owner role | `erp_role` | `erp_sandbox_role` |
|
||||
| k8s namespace + ServiceAccount | `erp` | `erp-sandbox` |
|
||||
| Vault dynamic DB creds | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
|
||||
| Vault KV config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
|
||||
| ArgoCD Application | `erp` | `erp-sandbox` |
|
||||
| Internal DNS | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
|
||||
| Gitea repo | `arcodange-org/erp` | `arcodange-org/erp` (shared) |
|
||||
| Helm chart | one chart | one chart (shared) |
|
||||
| CI JWT role | `gitea_cicd_erp` | `gitea_cicd_erp` (shared) |
|
||||
|
||||
### Why this is not what ADR-0001 rejected
|
||||
|
||||
[ADR-0001](0001-safe-prod-like-environment.md) chose a **local-only** safe environment (k3d / arm64 VMs) and rejected its Alternative 3, an in-cluster "sandbox namespace on the real cluster," for shared blast radius. ADR-0002 introduces an in-cluster sibling instance (`erp-sandbox`), which looks like the very thing that was rejected. The two stand together because they operate at **different layers**.
|
||||
|
||||
ADR-0001's rejection is scoped to rehearsing **infrastructure / platform** change-classes — Ansible playbooks, Vault policy / auth / mount changes, Postgres superuser migrations, ArgoCD prune / selfHeal, Longhorn ops, DNS / email. Those couplings share fleet-wide control planes, so an in-cluster sandbox cannot isolate them; only a separate cluster + Vault + state + DNS zone can. That is exactly why ADR-0001 is local-only.
|
||||
|
||||
ADR-0002 operates one layer up. The AI agent's only reach is the **Dolibarr HTTP API**, holding a write-scoped, app-specific API key against an isolated database — `erp-sandbox` on its own `erp_sandbox_role`, its own namespace, its own Vault creds path. The agent never touches kubectl, the Vault root, the Postgres superuser, ArgoCD, Longhorn, or DNS. The fleet-level blast radius that doomed Alternative 3 for infra rehearsal is simply **not in the agent's reach**; the blast radius of a wrong AI write is bounded to the sandbox app's own data.
|
||||
|
||||
The two ADRs are therefore complementary, not contradictory, and ADR-0002 does not supersede ADR-0001. ADR-0001 isolates the *operator* from breaking the *fleet*. ADR-0002 isolates the *AI agent* from corrupting *one app's production data*, while preserving the prod-like API surface and real-data fidelity that the local k3d sandbox — which carries no prod data — cannot offer.
|
||||
|
||||
## Consequences
|
||||
|
||||
- **+** Every existing app (webapp, erp, crowdsec, plausible, dance-lessons-coach, cms) is unaffected: the elision rule makes the prod instance's derived names byte-identical, so adoption ships with zero migration and a no-op plan.
|
||||
- **+** A second instance of an app is now a `values-<env>.yaml` overlay plus an `envs` entry — not a forked repo, chart, and runbook chain — so prod and sandbox share one source of truth and stay on the same version by construction.
|
||||
- **+** The AI-agent write skill gets a prod-like rehearsal surface with real-shaped data: the *same* Dolibarr API and chart, an *isolated* database, a bounded blast radius.
|
||||
- **+** The convention chain (db + role → Vault creds + policy → namespace + SA → ArgoCD → DNS) is reused verbatim for the `-sandbox` instance, so runbooks read identically for any env.
|
||||
- **−** Names are no longer a flat app list: every consumer must reason about the `instance == app` (prod) versus `app-env` (non-prod) distinction, and the snake-case owner-role exception (`<app>_<env>_role`) is a special case that must be carried in the modules.
|
||||
- **−** A single shared Vault CI policy widened to `<app>-*` means the CI role for a repo can write the ops paths of *all* that repo's envs — a deliberately looser ops scope than one-policy-per-instance.
|
||||
- **−** A single shared OpenTofu state per repo holds every env's resources together, so the envs of one app share a blast radius at the state layer (mitigated by `for_each`, accepted at current scale — see Alternatives).
|
||||
- **→** The AI-agent promotion workflow this unlocks: the agent runs writes against `erp-sandbox` autonomously, emits a structured changeset, a human reviews it, and the **same** operation is re-applied to prod only with explicit confirmation — never auto-applied by the agent. The read/write skills resolve their target by an env switch (e.g. `DOLIBARR_TARGET=prod|sandbox`, defaulting to `prod`).
|
||||
- **→** Rollout is additive and phased, each phase gated by a no-op `tofu plan` against existing apps: **(A)** the `tools` repo adds an optional `env` / `envs` parameter to the shared `app_roles` and `app_policy` Vault modules; **(B)** the `factory` repo gains the `envs` schema in `postgres/iac` tfvars, renders one ArgoCD Application per env, and documents the elision rule in `conventions.md`; **(C)** the `erp` chart literals are templated to `.Values`; **(D)** `erp` + `factory` activate `erp-sandbox`; **(E)** DNS + ArgoCD registration.
|
||||
- **→** Per-env state separation (`<app>/<env>` prefixes) is a door left open: if env-to-env blast-radius isolation at the state layer becomes warranted, the prefix scheme can be revisited without changing the naming model.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
| Option | Why not |
|
||||
| --- | --- |
|
||||
| Treat `erp-sandbox` as a wholly separate `<app>` (own repo, own chart copy) | Forks the chart, the repo, and the runbook chain; the two copies drift over time; defeats the "same app, same version" fidelity the rehearsal depends on. |
|
||||
| Use the [ADR-0001](0001-safe-prod-like-environment.md) local-only sandbox (k3d / VMs) for the AI-agent writes | That environment carries **no production data** — the write-rehearsal needs prod-like data and the real Dolibarr API surface to be meaningful. Complementary to ADR-0001, not a substitute for it. |
|
||||
| Per-env OpenTofu state (`<app>/<env>` prefixes) instead of one shared state per repo | Buys more env-to-env blast-radius isolation, but at the cost of more CI plumbing and cross-env output wiring than current scale warrants; one shared state with `for_each` keeps runbooks simple. A real decision point — the chosen path is single shared state per repo, with the prefix scheme left as a future door. |
|
||||
| No elision — always suffix, even prod (`<app>-prod`) | Breaks every existing derived name, forcing a fleet-wide rename plus `tofu` resource moves; rejected in favour of the elision rule's zero-migration property. |
|
||||
|
||||
## QA & validation
|
||||
|
||||
- **Backwards-compat no-op gate** — after the module change, `tofu plan` against every existing app (webapp, erp, crowdsec, plausible, dance-lessons-coach, cms) reports zero changes. The elision rule guarantees `local.instance == local.name` for `env == prod`, so no prod resource moves.
|
||||
- **Byte-identical chart render** — `helm template erp chart/` before versus after the literal-templating refactor diffs to nothing (verified: 10857 bytes on both sides, `diff` exit 0).
|
||||
- **`tofu fmt -check` + `tofu validate`** are clean on the module changes.
|
||||
- **Sandbox activation gate** — when `erp-sandbox` is stood up, the [new-web-app convention chain](../../doc/runbooks/new-web-app/conventions.md) must resolve end to end for the `-sandbox` instance (db + role → Vault creds + policy → namespace + SA → ArgoCD Healthy/Synced → VSO injects → pod Running), exactly as the prod instance does.
|
||||
- **Promotion gate** — no AI-authored write reaches the prod ERP until it has been applied to `erp-sandbox`, produced a reviewed changeset, and been explicitly re-applied with human confirmation.
|
||||
|
||||
## References
|
||||
|
||||
- [ADR-0001 · Safe, production-like environment](0001-safe-prod-like-environment.md) — the local-only safe environment for infra rehearsal that this ADR complements (it stands; this does not supersede it).
|
||||
- [PRD · Safe, production-like environment](../PRD/safe-prod-like-environment/README.md) — the product view this work relates to, and its [isolation-boundary leaf](../PRD/safe-prod-like-environment/isolation-boundary.md) detailing the cluster/Vault/state/DNS boundary.
|
||||
- [new-web-app conventions](../../doc/runbooks/new-web-app/conventions.md) — the single-env `<app>` convention this ADR extends with the env coordinate.
|
||||
- [Phase A — `tools` Vault module env parameter](https://gitea.arcodange.lab/arcodange-org/tools/pulls/2) — adds the optional `env` / `envs` parameter to the shared `app_roles` and `app_policy` modules.
|
||||
- [Phase C — `erp` chart literal templating](https://gitea.arcodange.lab/arcodange-org/erp/pulls/11) — templates the chart's single-env literals to `.Values` so one chart renders any instance.
|
||||
- [PR factory#15 — this ADR](https://gitea.arcodange.lab/arcodange-org/factory/pulls/15) — the change that introduces ADR-0002 (links back to this file).
|
||||
@@ -3,7 +3,7 @@
|
||||
# Architecture Decision Records
|
||||
|
||||
> **Status**: 🟢 Active
|
||||
> **Last Updated**: 2026-06-23
|
||||
> **Last Updated**: 2026-06-25
|
||||
> **Related**: [vibe/PRD](../PRD/README.md) · [vibe/Investigations](../investigations/README.md)
|
||||
> **Historical**: [doc/adr](../../doc/adr/README.md) (foundational infra) · [ansible/.../docs/adr](../../ansible/arcodange/factory/docs/adr/) (dated infra ADRs)
|
||||
|
||||
@@ -34,6 +34,7 @@ When a new decision *supersedes* one of the historical records, write the new AD
|
||||
| # | Title | Status | Date |
|
||||
| --- | --- | --- | --- |
|
||||
| [0001](0001-safe-prod-like-environment.md) | Safe, production-like environment | 🟢 Accepted | 2026-06-23 |
|
||||
| [0002](0002-per-application-environments.md) | Per-application environments | 🟢 Accepted | 2026-06-25 |
|
||||
|
||||
## Rules to contribute
|
||||
|
||||
|
||||
@@ -3,9 +3,9 @@
|
||||
# Safe, production-like environment
|
||||
|
||||
> **Status:** In design
|
||||
> **Last Updated:** 2026-06-23
|
||||
> **Last Updated:** 2026-06-25
|
||||
> **Design record:** [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md)
|
||||
> **Adjacent:** [INV-001 — prod blast-radius couplings](../../investigations/INV-001-prod-blast-radius-couplings.md)
|
||||
> **Adjacent:** [INV-001 — prod blast-radius couplings](../../investigations/INV-001-prod-blast-radius-couplings.md) · [ADR 0002 — per-application environments](../../ADR/0002-per-application-environments.md) (the application-data-layer counterpart)
|
||||
> **Map:** [Lab ecosystem guidebook](../../guidebooks/lab-ecosystem/README.md)
|
||||
|
||||
## Problem
|
||||
|
||||
@@ -3,8 +3,8 @@
|
||||
# Naming conventions — the `<app>` join key
|
||||
|
||||
> **Status**: 🟢 Active
|
||||
> **Last Updated**: 2026-06-23
|
||||
> **Related**: [Lab ecosystem](README.md) · [Factory brick](01-factory.md) · [Secrets & Vault](secrets-and-vault.md) · [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md)
|
||||
> **Last Updated**: 2026-06-25
|
||||
> **Related**: [Lab ecosystem](README.md) · [Factory brick](01-factory.md) · [Secrets & Vault](secrets-and-vault.md) · [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) · [ADR 0002 — per-application environments](../../ADR/0002-per-application-environments.md)
|
||||
> **Upstream (source of truth)**: [doc/runbooks/new-web-app/conventions.md](../../../doc/runbooks/new-web-app/conventions.md) (French, authoritative)
|
||||
|
||||
## TL;DR
|
||||
@@ -83,9 +83,35 @@ The symptom is always the same: a brick that *looks* provisioned but never conne
|
||||
✅ Choose a short, stable, lowercase kebab-case name up front and reuse it character-for-character.
|
||||
❌ Never introduce variants (case, separators, plurals); nothing will warn you.
|
||||
|
||||
## Why this makes a sandbox safe
|
||||
## Multiple environments per app (the `<env>` coordinate)
|
||||
|
||||
The `<app>` convention is also the reason a **production-like sandbox can reuse the exact same names** without colliding with production. Because every brick derives its resource names from `<app>` and from nothing else, an entire parallel universe of the platform — its own Vault, its own Postgres instance, its own k3s namespace scope — can host an `erp` named identically to the production `erp`, provided the two universes never share a backing store. Identity comes from the *environment boundary*, not from the name; the name is free to repeat. This is what lets QA and recovery drills run against `erp`, `webapp`, etc. with realistic identifiers instead of mangled `erp-staging`-style aliases that would themselves break the name-wiring. See the PRD's [isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) for how that environment fence is drawn.
|
||||
A single application can run as several deployed instances — `prod`, `sandbox`, and so on — **without becoming a separate app**: same repo, same chart, same version. A second coordinate `<env>` extends the join key, governed by an **elision rule** ([ADR 0002](../../ADR/0002-per-application-environments.md)):
|
||||
|
||||
- `env` defaults to `prod`, and **`prod` elides** — when `env == prod` no suffix is added, so every derived name is exactly the single-coordinate output of the mapping above. Existing apps are unaffected (their plan is a no-op).
|
||||
- Non-prod envs take the **`<app>-<env>`** suffix everywhere — namespace, Vault paths / roles / policies, ArgoCD Application, DNS, GCS state sub-prefix — with the one snake-case exception inherited from the `_role` convention: the Postgres owner role is `<app>_<env>_role`.
|
||||
- One repo, one chart, and one CI JWT role (`gitea_cicd_<app>`) serve every env; per-env differences are a `values-<env>.yaml` overlay.
|
||||
|
||||
Worked example — `erp` (prod, elided) and `erp-sandbox`:
|
||||
|
||||
| System | `erp` (env = prod) | `erp-sandbox` |
|
||||
| --- | --- | --- |
|
||||
| PostgreSQL database | `erp` | `erp-sandbox` |
|
||||
| PostgreSQL owner role | `erp_role` | `erp_sandbox_role` |
|
||||
| Namespace + ServiceAccount | `erp` | `erp-sandbox` |
|
||||
| Vault dynamic DB creds | `postgres/creds/erp` | `postgres/creds/erp-sandbox` |
|
||||
| Vault KV config | `kvv2/erp/config` | `kvv2/erp-sandbox/config` |
|
||||
| ArgoCD Application | `erp` | `erp-sandbox` |
|
||||
| Internal DNS | `erp.arcodange.lab` | `erp-sandbox.arcodange.lab` |
|
||||
| Gitea repo / chart / CI JWT | `arcodange-org/erp` · chart · `gitea_cicd_erp` | shared |
|
||||
|
||||
## Two sandbox models, two naming strategies
|
||||
|
||||
There are two distinct ways to stand up a non-production copy, and they treat the join key differently — by design, not by accident.
|
||||
|
||||
- **Separate-cluster sandbox** ([ADR 0001](../../ADR/0001-safe-prod-like-environment.md)) — a whole parallel universe (its own Vault, Postgres, k3s) on the control node, for rehearsing dangerous *infrastructure* changes. The two universes never share a backing store, so identity comes from the *environment boundary*, not the name: the sandbox hosts an `erp` named identically to production. Names repeat freely; no `<env>` suffix is needed, so the name-wiring stays intact and drills run against realistic identifiers.
|
||||
- **In-cluster sibling instance** ([ADR 0002](../../ADR/0002-per-application-environments.md)) — a second instance on the *same* cluster (e.g. `erp-sandbox` beside `erp`), for rehearsing *application-data* writes against the real API. Here there is no cluster fence to disambiguate by, so the `<env>` suffix *is* the separator: every derived name carries `-sandbox` to avoid colliding with prod's namespace, database, Vault paths, and DNS.
|
||||
|
||||
Both keep the name-wiring coherent — one by repeating the slug behind a cluster fence, the other by extending the slug with the elided `<env>` coordinate. See the PRD's [isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) for how the separate-cluster fence is drawn, and [ADR 0002](../../ADR/0002-per-application-environments.md) for why the in-cluster sibling's blast radius stays bounded to one app's data.
|
||||
|
||||
## See also
|
||||
|
||||
@@ -93,4 +119,5 @@ The `<app>` convention is also the reason a **production-like sandbox can reuse
|
||||
- [Secrets & Vault](secrets-and-vault.md) — how `gitea_cicd_<app>` and the `<app>` / `<app>-ops` policies fit the auth model.
|
||||
- [Factory brick](01-factory.md) — where the ArgoCD app-of-apps, the Postgres OpenTofu, and the IaC live.
|
||||
- [PRD — isolation boundary](../../PRD/safe-prod-like-environment/isolation-boundary.md) — why identical names are safe across environments.
|
||||
- [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md).
|
||||
- [ADR 0001 — Safe, production-like environment](../../ADR/0001-safe-prod-like-environment.md) — the separate-cluster sandbox model.
|
||||
- [ADR 0002 — Per-application environments](../../ADR/0002-per-application-environments.md) — the `<env>` coordinate + elision rule, and the in-cluster sibling sandbox model.
|
||||
|
||||
Reference in New Issue
Block a user