fix(ops): sandbox refresh-from-prod actually restores (pg_restore -U + self-heal pause) #29
Reference in New Issue
Block a user
Delete Branch "claude/sandbox-lifecycle-restore-fix"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Surfaced while resetting the sandbox to a clean iso-prod checkpoint:
refresh-from-prodwas structurally broken and silently no-op'd the restore, leaving the sandbox DB empty after the wipe.Two bugs
pg_restoreconnected asroot. It lacked-U, so the postgres image used its OS user (root) →password authentication failed for user "root". The failure was swallowed by… || echo "completed with ignorable warnings", so the script reported success afterDROP OWNEDhad already emptied the DB. (E2's original seed was a manual process, so this code path had never really executed.)→ pass
-h $PGHOST -U $SB_PGUSER; don't trust pg_restore's exit code (it returns non-zero on the harmlessschema "public" already existsnotice) — verify by counting restoredllx_*tables and fail the Job if< 250.erp-sandboxhasselfHeal: true, which revertskubectl scale --replicas=0within seconds, so the seed ran with Dolibarr still connected (its table re-creation collided with the restore).→ pause self-heal for the duration and re-arm it; app restore + self-heal + secret cleanup are guarded by an EXIT trap so an interrupt can't strand the sandbox at
replicas=0/ self-heal off.Validated end-to-end on the live sandbox
README now documents the self-heal pause and the iso-prod consequence (a refresh wipes
ai_agent_sandbox→ re-runprovisionSandbox.ts).🤖 Generated with Claude Code