Files
erp/ops/sandbox
Gabriel Radureau 7264f00ed4 feat(ops): erp-sandbox iso-prod seed + documents sync tooling (ADR-0003 E2)
Productionizes the sandbox state-lifecycle mechanisms validated live against
erp-sandbox. `ops/sandbox/sandbox-lifecycle.sh`:
  - refresh-from-prod: read-only pg_dump of prod erp (default_transaction_read_only)
    -> DROP OWNED BY erp_sandbox_role CASCADE -> pg_restore into erp-sandbox, using
    the sandbox's own membership creds (no DROP/CREATE DATABASE, no CREATEDB, no
    superuser). Dumps the full public schema (so app helper functions + triggers
    come over) and filters the provisioner-owned pgbouncer user_lookup function
    from the restore TOC. Scales the pod to 0 for exclusive access; copies prod
    creds into a transient secret that is deleted on exit.
  - sync-documents: tar-pipe the documents/mycompany tree (company logo + uploads)
    prod -> sandbox, since uploaded files live on the PVC, not the DB.

Prod integrity is structural: prod is read-only during dump; the restore can only
write erp-sandbox (erp_sandbox_role owns only the sandbox DB and cannot drop prod
erp/erp_role); the platform's only prod-capable superuser stays behind the
human-gated postgres.yaml CI and is never used here.

README documents the integrity guarantee, the encryption + PVC fidelity caveats,
the BDD reset loop, and the hardening backlog (dedicated read-only dump role,
golden-cache PVC).

Refs ADR-0003 (factory#19). Chart owner-role fix = erp#13.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-29 07:42:00 +02:00
..

erp-sandbox lifecycle ops

Tooling to make erp-sandbox iso-prod and to reset it, implementing ADR-0003 (sandbox state lifecycle). The sandbox exists so AI agents can rehearse Dolibarr write operations against a faithful copy of prod, with a structural guarantee that the rehearsal path can never mutate prod.

The prod-integrity guarantee (why this is safe)

Layer Enforcement
prod is read-only during a refresh pg_dump runs with default_transaction_read_only=on
the restore can only write the sandbox it uses the sandbox's own dynamic creds — a member of erp_sandbox_role, which owns only erp-sandbox
no database is dropped/created wipe is DROP OWNED BY erp_sandbox_role CASCADE; reload is pg_restore (no CREATEDB, no superuser)
prod is structurally undroppable DROP DATABASE needs ownership; erp_sandbox_role does not own prod erp (owned by erp_role)

The only prod-capable credential on the platform is the superuser=true provider in factory postgres/iac/providers.tf, used only in the human-gated postgres.yaml CI. This tooling never touches it.

Usage

./sandbox-lifecycle.sh refresh-from-prod   # clone prod DB (data + config) into erp-sandbox
./sandbox-lifecycle.sh sync-documents      # copy mycompany/ uploads (company logo, PDFs)
./sandbox-lifecycle.sh refresh             # both, in order

refresh-from-prod scales the sandbox pod to 0, dumps the full prod public schema (read-only), wipes the sandbox's app objects, restores, and scales back up. It dumps the whole schema (not just llx_*) so app helper functions and their triggers (e.g. update_modified_column_tms()) come over; it filters out the provisioner-owned user_lookup pgbouncer function from the restore TOC because that object already exists per-environment and is not app data.

Two fidelity caveats (by design — see ADR-0003)

  1. Encryption. Dolibarr ties some encrypted fields (notably API keys) to DOLI_INSTANCE_UNIQUE_ID. The sandbox has its own uuid, so prod-encrypted values won't decrypt there. This is why the write-scoped ai_agent_sandbox API key must be generated inside the sandbox (see ../../test/ POC), not copied from prod. Most data is plaintext and unaffected.
  2. Uploaded files live on the PVC, not the DB. A DB refresh copies the logo const (MAIN_INFO_SOCIETE_LOGO) but not the image; sync-documents copies the documents/mycompany tree so the logo + attachments actually render.

BDD reset loop (E4)

For repeated rehearsals, refresh-from-prod is the "reset to prod state". A faster checkpoint/reset that avoids re-reading prod each time (cache a golden dump on a small PVC, then DROP OWNED + pg_restore from it) is the documented next optimization — see ADR-0003 §Decision/Consequences.

Hardening backlog

  • Replace the transient copy of prod's read+write creds with a dedicated read-only Postgres role (issued via a Vault dynamic role) so the dump path is least-privilege by construction, not just by default_transaction_read_only.
  • Provision a golden-cache PVC for fast BDD resets.