feat(ops): erp-sandbox iso-prod seed + documents sync tooling (ADR-0003 E2)
Productionizes the sandbox state-lifecycle mechanisms validated live against
erp-sandbox. `ops/sandbox/sandbox-lifecycle.sh`:
- refresh-from-prod: read-only pg_dump of prod erp (default_transaction_read_only)
-> DROP OWNED BY erp_sandbox_role CASCADE -> pg_restore into erp-sandbox, using
the sandbox's own membership creds (no DROP/CREATE DATABASE, no CREATEDB, no
superuser). Dumps the full public schema (so app helper functions + triggers
come over) and filters the provisioner-owned pgbouncer user_lookup function
from the restore TOC. Scales the pod to 0 for exclusive access; copies prod
creds into a transient secret that is deleted on exit.
- sync-documents: tar-pipe the documents/mycompany tree (company logo + uploads)
prod -> sandbox, since uploaded files live on the PVC, not the DB.
Prod integrity is structural: prod is read-only during dump; the restore can only
write erp-sandbox (erp_sandbox_role owns only the sandbox DB and cannot drop prod
erp/erp_role); the platform's only prod-capable superuser stays behind the
human-gated postgres.yaml CI and is never used here.
README documents the integrity guarantee, the encryption + PVC fidelity caveats,
the BDD reset loop, and the hardening backlog (dedicated read-only dump role,
golden-cache PVC).
Refs ADR-0003 (factory#19). Chart owner-role fix = erp#13.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
60
ops/sandbox/README.md
Normal file
60
ops/sandbox/README.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# erp-sandbox lifecycle ops
|
||||
|
||||
Tooling to make `erp-sandbox` **iso-prod** and to reset it, implementing
|
||||
[ADR-0003](https://gitea.arcodange.lab/arcodange-org/factory/src/branch/main/vibe/ADR/0003-sandbox-state-lifecycle.md)
|
||||
(sandbox state lifecycle). The sandbox exists so AI agents can rehearse Dolibarr
|
||||
**write** operations against a faithful copy of prod, with a structural guarantee
|
||||
that the rehearsal path can never mutate prod.
|
||||
|
||||
## The prod-integrity guarantee (why this is safe)
|
||||
|
||||
| Layer | Enforcement |
|
||||
| --- | --- |
|
||||
| prod is **read-only** during a refresh | `pg_dump` runs with `default_transaction_read_only=on` |
|
||||
| the restore can only write the sandbox | it uses the sandbox's own dynamic creds — a member of `erp_sandbox_role`, which **owns only `erp-sandbox`** |
|
||||
| no database is dropped/created | wipe is `DROP OWNED BY erp_sandbox_role CASCADE`; reload is `pg_restore` (no `CREATEDB`, no superuser) |
|
||||
| prod is structurally undroppable | `DROP DATABASE` needs ownership; `erp_sandbox_role` does not own prod `erp` (owned by `erp_role`) |
|
||||
|
||||
The only prod-capable credential on the platform is the `superuser=true` provider
|
||||
in `factory postgres/iac/providers.tf`, used **only** in the human-gated
|
||||
`postgres.yaml` CI. This tooling never touches it.
|
||||
|
||||
## Usage
|
||||
|
||||
```sh
|
||||
./sandbox-lifecycle.sh refresh-from-prod # clone prod DB (data + config) into erp-sandbox
|
||||
./sandbox-lifecycle.sh sync-documents # copy mycompany/ uploads (company logo, PDFs)
|
||||
./sandbox-lifecycle.sh refresh # both, in order
|
||||
```
|
||||
|
||||
`refresh-from-prod` scales the sandbox pod to 0, dumps the full prod `public`
|
||||
schema (read-only), wipes the sandbox's app objects, restores, and scales back
|
||||
up. It dumps the **whole** schema (not just `llx_*`) so app helper functions and
|
||||
their triggers (e.g. `update_modified_column_tms()`) come over; it filters out
|
||||
the provisioner-owned `user_lookup` pgbouncer function from the restore TOC
|
||||
because that object already exists per-environment and is not app data.
|
||||
|
||||
## Two fidelity caveats (by design — see ADR-0003)
|
||||
|
||||
1. **Encryption.** Dolibarr ties some encrypted fields (notably API keys) to
|
||||
`DOLI_INSTANCE_UNIQUE_ID`. The sandbox has its **own** uuid, so prod-encrypted
|
||||
values won't decrypt there. This is why the write-scoped `ai_agent_sandbox`
|
||||
API key must be **generated inside the sandbox** (see `../../test/` POC),
|
||||
not copied from prod. Most data is plaintext and unaffected.
|
||||
2. **Uploaded files live on the PVC, not the DB.** A DB refresh copies the logo
|
||||
*const* (`MAIN_INFO_SOCIETE_LOGO`) but not the image; `sync-documents` copies
|
||||
the `documents/mycompany` tree so the logo + attachments actually render.
|
||||
|
||||
## BDD reset loop (E4)
|
||||
|
||||
For repeated rehearsals, `refresh-from-prod` is the "reset to prod state". A
|
||||
faster checkpoint/reset that avoids re-reading prod each time (cache a golden
|
||||
dump on a small PVC, then `DROP OWNED + pg_restore` from it) is the documented
|
||||
next optimization — see ADR-0003 §Decision/Consequences.
|
||||
|
||||
## Hardening backlog
|
||||
|
||||
- Replace the transient copy of prod's read+write creds with a **dedicated
|
||||
read-only Postgres role** (issued via a Vault dynamic role) so the dump path is
|
||||
least-privilege by construction, not just by `default_transaction_read_only`.
|
||||
- Provision a golden-cache PVC for fast BDD resets.
|
||||
Reference in New Issue
Block a user