refresh-from-prod was structurally broken and silently no-op'd the restore: 1. pg_restore lacked -U, so the postgres image connected as its OS user `root` and auth-failed. The failure was swallowed by `|| echo "ignorable warnings"`, so the script reported success while the DROP OWNED had already emptied the DB. E2's original seed was a manual process, so this path had never really run. Fix: pass `-h $PGHOST -U $SB_PGUSER`; don't trust pg_restore's exit code (it returns non-zero on the harmless "schema public already exists" notice) — verify by counting restored llx_* tables and FAIL the Job if < 250. 2. erp-sandbox is ArgoCD-managed with self-heal ON, which reverts the `kubectl scale --replicas=0` within seconds — so the seed ran with Dolibarr still connected. Fix: pause self-heal for the duration, re-arm it after; app restore + self-heal restoration + secret cleanup are guarded by an EXIT trap so an interrupt can't strand the sandbox at replicas=0 / self-heal off. Validated end-to-end on the live sandbox: 295 llx tables, company=Arcodange, owner=erp_sandbox_role, self-heal re-armed, pod 1/1. README documents the self-heal pause and the iso-prod consequence (ai_agent_sandbox is wiped → re-provision). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.0 KiB
Executable File
9.0 KiB
Executable File