Commit Graph

16 Commits

Author SHA1 Message Date
abe77f5873 Phase 2d — gateway_jobs retention (Janitor goroutine)
All checks were successful
CI/CD / test (push) Successful in 20s
CI/CD / build-and-push-image (push) Successful in 58s
Periodic cleanup goroutine, started alongside the worker when DATABASE_URL
is set. Three concerns :

- DELETE rows with status='done' older than QUEUE_DONE_RETENTION (default
  168h / 7 days). Past success rows have no value beyond debug runway.
- UPDATE rows stuck in status='running' for more than QUEUE_STUCK_TIMEOUT
  (default 30m) back to 'pending' so a worker can retry. Handles the
  case of a pod crashing mid-job (without this, jobs stay orphaned forever).
- 'dead' rows are NEVER auto-purged (volume negligible, kept for forensics).

Configurable via env :
- QUEUE_DONE_RETENTION (default 168h)
- QUEUE_STUCK_TIMEOUT  (default 30m)
- QUEUE_JANITOR_INTERVAL (default 1h)

The janitor runs once immediately at startup (recovers anything orphaned
by the previous pod before opening for new traffic), then ticks on the
interval.

Queue interface gains PurgeDone + RecoverStuck — both use Postgres'
make_interval(secs) for safe parameterization.

4 new unit tests via fakeQueue mock (47 total, race clean).
2026-05-09 16:06:54 +02:00
95380dac99 ci: drop -race from CI test step (TSan VMA incompatible on ARM64 runners)
All checks were successful
CI/CD / test (push) Successful in 57s
CI/CD / build-and-push-image (push) Successful in 55s
The Gitea Actions runners are on ARM64 (pi1/pi3) and Go's
ThreadSanitizer fails with 'unsupported VMA range, Found 47 - Supported
48' on those kernels. Race detector is still available locally via
`make test-race`.
2026-05-09 15:22:27 +02:00
a288564fe7 ci: add 'test' job (go vet + go test -race) gating docker build
Some checks failed
CI/CD / test (push) Failing after 2m23s
CI/CD / build-and-push-image (push) Has been skipped
This was supposed to land in d63f195 but the prior Write didn't apply.
CI now runs unit + integration tests on every push and PR ; the docker
image is only pushed on main, after tests pass.
2026-05-09 15:19:15 +02:00
d63f195b3d Phase 2c — testing infrastructure (43 tests, CI gating, docker-compose)
Some checks failed
Docker Build / build-and-push-image (push) Has been cancelled
Brings the project to a TDD/BDD-friendly state — apologies for shipping
Phase 1.5 + Phase 2 code-first, that violated feedback_tdd_first_bdd_required.

What's added :

- helpers_test.go : FakeTelegram (httptest server that records sendMessage /
  deleteMessage / setWebhook / etc.), miniredis bootstrap, MakeUpdate /
  PostWebhook helpers. The same harness simulates 'a user DMing the bot'
  end-to-end without hitting Telegram cloud — answer to the user question.
- 43 tests covering : allowlist parsing, telegram type helpers (UserID /
  ChatID / Text / messageID), secret_token constant-time compare, Backoff
  schedule, Auth (login wrong/right/logout/TTL/nil-receiver), EchoHandler,
  HTTPHandler (forward / timeout / non-2xx / empty body), AuthHandler
  (start / auth / whoami / logout / replay defense delete), Server (bad
  secret 401, unknown bot 404, allowlist drop, gated bot prompt,
  full /auth → echo → /logout flow, healthz/readyz).
- All tests pass with -race in 1.6s, no external deps (miniredis +
  httptest in-process).

Infra :

- Updated .gitea/workflows/dockerimage.yaml : new 'test' job
  (go vet + go test -race) gates the build-and-push-image job. CI now
  also runs on pull_request.
- docker-compose.yml : redis + postgres for full local stack.
- Makefile : test-race, compose-up/down targets.
- README updated with test + local-dev sections.

Refs ~/.claude/plans/pour-les-notifications-on-inherited-seal.md § Phase 2.
2026-05-09 15:18:29 +02:00
4f246ccc1d docs(DEPLOY): add Phase 2b activation steps
All checks were successful
Docker Build / build-and-push-image (push) Successful in 58s
2026-05-09 14:39:14 +02:00
799e10dcc2 Phase 2b — durable Postgres queue + worker (gated on DATABASE_URL)
Some checks failed
Docker Build / build-and-push-image (push) Has been cancelled
Adds the async dispatch infrastructure :

- Postgres pool + embedded migration (CREATE TABLE/INDEX IF NOT EXISTS
  gateway_jobs). Auto-applied at boot. lib/pq driver (matches webapp
  convention).
- queue.go : Enqueue (idempotent on UNIQUE(bot_slug, update_id) — handles
  Telegram redelivery), Pop with FOR UPDATE SKIP LOCKED, MarkDone,
  MarkFailed with exponential backoff (30s → 2m → 10m → 1h → dead at 5).
- worker.go : goroutine that drains the queue, dispatches via the same
  Handler interface as sync, schedules retries on failure, notifies the
  user once when a job goes to dead.
- BotConfig gains `async: bool`. Registry refuses bots with async=true
  if DATABASE_URL is unset (queue=nil).
- Server : when bot.Async, the webhook ack is immediate ; the update
  payload is enqueued for the worker.

When DATABASE_URL is unset (current default), queue/worker stay disabled
and only sync handlers (echo, http, auth) work — no breaking change to
the running cluster.

Refs ~/.claude/plans/pour-les-notifications-on-inherited-seal.md § Phase 2.
2026-05-09 14:38:41 +02:00
f90d5efdae docs(HOWTO): add Cas 2.5 — http forward handler (Phase 2a livrée)
All checks were successful
Docker Build / build-and-push-image (push) Successful in 52s
2026-05-09 14:29:40 +02:00
8001460f14 Phase 2a — add 'http' handler (sync forwarder)
All checks were successful
Docker Build / build-and-push-image (push) Successful in 53s
The http handler POSTs the Telegram Update JSON to a configurable
internal URL and expects a JSON {text} reply, which it sends back via
sendMessage. Sync : the webhook ack waits for the upstream answer
(timeout default 5s, capped at 30s — Telegram itself closes around 60s).

For slow / unreliable backends use the Phase 3 async handlers once the
queue is in place.

YAML config :

  bots:
    webappbot:
      handler: http
      http:
        url: http://webapp.webapp.svc.cluster.local:8080/telegram/update
        timeout: 5s

Refs ~/.claude/plans/pour-les-notifications-on-inherited-seal.md § Phase 2.
2026-05-09 14:27:58 +02:00
515b407db4 docs: align ADR path references to doc/adr (singular)
All checks were successful
Docker Build / build-and-push-image (push) Successful in 56s
Mirror of factory#8 path correction. Updates Gitea URLs in AUTH.md /
HOWTO_ADD_BOT.md and the '// Voir factory/...' header comments in code.
2026-05-09 14:26:12 +02:00
a6e2ef19b5 Dockerfile: bump golang base 1.23 → 1.24
All checks were successful
Docker Build / build-and-push-image (push) Successful in 1m19s
go-redis bumped go.mod's directive to 1.24 ; builder must match
(otherwise 'go mod download' fails).
2026-05-09 13:59:36 +02:00
07115e3162 Phase 1.5 — auth layer (Redis sessions, allowlist, requireAuth)
Some checks failed
Docker Build / build-and-push-image (push) Failing after 18s
Adds an authentication layer in front of the bot handlers :

- Auth handler on the principal bot (@arcodange_factory_bot, slug
  factory) parses /start, /auth <code>, /whoami, /logout. On a
  successful /auth, the message containing the code is best-effort
  deleted from the user's chat (replay defense).
- Redis-backed sessions (key tg-gw:auth:<from.id>, TTL 24h, configurable
  via AUTH_SESSION_TTL). Constant-time secret compare via crypto/subtle.
- ALLOWED_USERS env (CSV of Telegram user IDs) — silent-drops anyone
  not in the list before the auth gate runs.
- New per-bot field 'requireAuth' (pointer-bool). Default = true (secure
  by default). Auto-forced to false for handler=auth (chicken-and-egg).
- Server gates: allowlist first, then requireAuth before handler dispatch.
- Fail-at-startup if a bot is configured with handler=auth or
  requireAuth: true while AUTH_SECRET is unset.

Design: factory/docs/adr/20260509-telegram-gateway-auth.md (in factory PR).
User docs: AUTH.md (new), HOWTO_ADD_BOT.md (Cas 2 updated for default
true and gated flow).

New deps: github.com/redis/go-redis/v9.

Refs ~/.claude/plans/pour-les-notifications-on-inherited-seal.md § Phase 1.5.
2026-05-09 13:56:30 +02:00
6228169ac1 docs: HOWTO_ADD_BOT — 3 cases (outbound only / echo via gateway / agent-driven)
All checks were successful
Docker Build / build-and-push-image (push) Successful in 40s
2026-05-09 13:12:55 +02:00
d8b102fbf9 server: don't reject Telegram updates with unknown fields
All checks were successful
Docker Build / build-and-push-image (push) Successful in 42s
DisallowUnknownFields rejected real Telegram payloads (entities, from,
date, etc. that our minimal structs don't cover). Lenient decode is the
right default for an upstream webhook we don't control.
2026-05-09 13:06:40 +02:00
5044890e7d chart: pin image.tag to 'latest'
Workflow Gitea Actions ne produit que :latest et la branch ref ;
appVersion (0.1.0) n'existe pas → ImagePullBackOff.
2026-05-09 12:51:16 +02:00
13dc7aee13 rename: homelab-gateway → telegram-gateway
All checks were successful
Docker Build / build-and-push-image (push) Successful in 44s
Aligns the project name with the public URL (tg.arcodange.fr) and the
Arcodange organization conventions. The 'homelab-gateway' name was too
generic.

Touches: chart name + helpers, image registry path, Go module path,
secret/configmap names, deployment mountPath, all docs.
2026-05-09 12:35:03 +02:00
ee832de089 Phase 1 MVP — echo bot factory
All checks were successful
Docker Build / build-and-push-image (push) Successful in 1m8s
2026-05-09 12:23:59 +02:00