Commit Graph

2 Commits

Author SHA1 Message Date
434be7488d fix(ops): sandbox refresh-from-prod actually restores now (pg_restore -U + self-heal pause)
refresh-from-prod was structurally broken and silently no-op'd the restore:

1. pg_restore lacked -U, so the postgres image connected as its OS user `root`
   and auth-failed. The failure was swallowed by `|| echo "ignorable warnings"`,
   so the script reported success while the DROP OWNED had already emptied the DB.
   E2's original seed was a manual process, so this path had never really run.
   Fix: pass `-h $PGHOST -U $SB_PGUSER`; don't trust pg_restore's exit code (it
   returns non-zero on the harmless "schema public already exists" notice) — verify
   by counting restored llx_* tables and FAIL the Job if < 250.

2. erp-sandbox is ArgoCD-managed with self-heal ON, which reverts the
   `kubectl scale --replicas=0` within seconds — so the seed ran with Dolibarr
   still connected. Fix: pause self-heal for the duration, re-arm it after; app
   restore + self-heal restoration + secret cleanup are guarded by an EXIT trap so
   an interrupt can't strand the sandbox at replicas=0 / self-heal off.

Validated end-to-end on the live sandbox: 295 llx tables, company=Arcodange,
owner=erp_sandbox_role, self-heal re-armed, pod 1/1. README documents the self-heal
pause and the iso-prod consequence (ai_agent_sandbox is wiped → re-provision).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-30 06:59:39 +02:00
7264f00ed4 feat(ops): erp-sandbox iso-prod seed + documents sync tooling (ADR-0003 E2)
Productionizes the sandbox state-lifecycle mechanisms validated live against
erp-sandbox. `ops/sandbox/sandbox-lifecycle.sh`:
  - refresh-from-prod: read-only pg_dump of prod erp (default_transaction_read_only)
    -> DROP OWNED BY erp_sandbox_role CASCADE -> pg_restore into erp-sandbox, using
    the sandbox's own membership creds (no DROP/CREATE DATABASE, no CREATEDB, no
    superuser). Dumps the full public schema (so app helper functions + triggers
    come over) and filters the provisioner-owned pgbouncer user_lookup function
    from the restore TOC. Scales the pod to 0 for exclusive access; copies prod
    creds into a transient secret that is deleted on exit.
  - sync-documents: tar-pipe the documents/mycompany tree (company logo + uploads)
    prod -> sandbox, since uploaded files live on the PVC, not the DB.

Prod integrity is structural: prod is read-only during dump; the restore can only
write erp-sandbox (erp_sandbox_role owns only the sandbox DB and cannot drop prod
erp/erp_role); the platform's only prod-capable superuser stays behind the
human-gated postgres.yaml CI and is never used here.

README documents the integrity guarantee, the encryption + PVC fidelity caveats,
the BDD reset loop, and the hardening backlog (dedicated read-only dump role,
golden-cache PVC).

Refs ADR-0003 (factory#19). Chart owner-role fix = erp#13.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-29 07:42:00 +02:00