Periodic cleanup goroutine, started alongside the worker when DATABASE_URL
is set. Three concerns :
- DELETE rows with status='done' older than QUEUE_DONE_RETENTION (default
168h / 7 days). Past success rows have no value beyond debug runway.
- UPDATE rows stuck in status='running' for more than QUEUE_STUCK_TIMEOUT
(default 30m) back to 'pending' so a worker can retry. Handles the
case of a pod crashing mid-job (without this, jobs stay orphaned forever).
- 'dead' rows are NEVER auto-purged (volume negligible, kept for forensics).
Configurable via env :
- QUEUE_DONE_RETENTION (default 168h)
- QUEUE_STUCK_TIMEOUT (default 30m)
- QUEUE_JANITOR_INTERVAL (default 1h)
The janitor runs once immediately at startup (recovers anything orphaned
by the previous pod before opening for new traffic), then ticks on the
interval.
Queue interface gains PurgeDone + RecoverStuck — both use Postgres'
make_interval(secs) for safe parameterization.
4 new unit tests via fakeQueue mock (47 total, race clean).
Adds the async dispatch infrastructure :
- Postgres pool + embedded migration (CREATE TABLE/INDEX IF NOT EXISTS
gateway_jobs). Auto-applied at boot. lib/pq driver (matches webapp
convention).
- queue.go : Enqueue (idempotent on UNIQUE(bot_slug, update_id) — handles
Telegram redelivery), Pop with FOR UPDATE SKIP LOCKED, MarkDone,
MarkFailed with exponential backoff (30s → 2m → 10m → 1h → dead at 5).
- worker.go : goroutine that drains the queue, dispatches via the same
Handler interface as sync, schedules retries on failure, notifies the
user once when a job goes to dead.
- BotConfig gains `async: bool`. Registry refuses bots with async=true
if DATABASE_URL is unset (queue=nil).
- Server : when bot.Async, the webhook ack is immediate ; the update
payload is enqueued for the worker.
When DATABASE_URL is unset (current default), queue/worker stay disabled
and only sync handlers (echo, http, auth) work — no breaking change to
the running cluster.
Refs ~/.claude/plans/pour-les-notifications-on-inherited-seal.md § Phase 2.
Adds an authentication layer in front of the bot handlers :
- Auth handler on the principal bot (@arcodange_factory_bot, slug
factory) parses /start, /auth <code>, /whoami, /logout. On a
successful /auth, the message containing the code is best-effort
deleted from the user's chat (replay defense).
- Redis-backed sessions (key tg-gw:auth:<from.id>, TTL 24h, configurable
via AUTH_SESSION_TTL). Constant-time secret compare via crypto/subtle.
- ALLOWED_USERS env (CSV of Telegram user IDs) — silent-drops anyone
not in the list before the auth gate runs.
- New per-bot field 'requireAuth' (pointer-bool). Default = true (secure
by default). Auto-forced to false for handler=auth (chicken-and-egg).
- Server gates: allowlist first, then requireAuth before handler dispatch.
- Fail-at-startup if a bot is configured with handler=auth or
requireAuth: true while AUTH_SECRET is unset.
Design: factory/docs/adr/20260509-telegram-gateway-auth.md (in factory PR).
User docs: AUTH.md (new), HOWTO_ADD_BOT.md (Cas 2 updated for default
true and gated flow).
New deps: github.com/redis/go-redis/v9.
Refs ~/.claude/plans/pour-les-notifications-on-inherited-seal.md § Phase 1.5.
Aligns the project name with the public URL (tg.arcodange.fr) and the
Arcodange organization conventions. The 'homelab-gateway' name was too
generic.
Touches: chart name + helpers, image registry path, Go module path,
secret/configmap names, deployment mountPath, all docs.