Merge pull request 'fix(docs): place ADR under doc/adr (singular) per convention' (#8 ) from fix/adr-path-doc-singular into main

Reviewed-on: #8
docs: place new ADR under doc/adr (singular) per convention
2026-05-09 15:29:22 +02:00 · 2026-05-09 14:25:37 +02:00 · 2026-05-09 14:22:12 +02:00 · 2026-05-09 13:58:27 +02:00 · 2026-05-09 12:41:49 +02:00 · 2026-05-09 12:35:37 +02:00
3 changed files with 272 additions and 0 deletions
--- a/ansible/arcodange/factory/playbooks/tools/roles/hashicorp_vault/tasks/gitea_oidc_auth.yml
+++ b/ansible/arcodange/factory/playbooks/tools/roles/hashicorp_vault/tasks/gitea_oidc_auth.yml
@@ -36,6 +36,11 @@
 # WARNING : this disables AND wipes ALL gitea_cicd_* per-app JWT roles
 # (created by tools/hashicorp-vault/iac/) every time it runs. Default is OFF
 # to preserve those roles across normal ansible runs ; opt-in only when you
 # really want to rebuild the OIDC backend from scratch (e.g. config drift on
 # bound_issuer or similar).
 - name: Delete existing Gitea OIDC backends if they exist
  include_tasks: vault_cmd.yml
  vars:
@@ -48,6 +53,7 @@
    - gitea_jwt
  loop_control:
    loop_var: backend_name
  when: vault_oidc_force_reset | default(false) | bool
 - name: use tofu to provision vault
  block:
--- a/argocd/values.yaml
+++ b/argocd/values.yaml
@@ -14,6 +14,11 @@ gitea_applications:
    annotations:
      argocd-image-updater.argoproj.io/image-list: webapp=gitea.arcodange.lab/arcodange-org/webapp:latest
      argocd-image-updater.argoproj.io/webapp.update-strategy: digest
  telegram-gateway:
    org: arcodange
    annotations:
      argocd-image-updater.argoproj.io/image-list: telegram-gateway=gitea.arcodange.lab/arcodange/telegram-gateway:latest
      argocd-image-updater.argoproj.io/telegram-gateway.update-strategy: digest
  erp:
    annotations: {}
  cms:
--- a/doc/adr/20260509-telegram-gateway-auth.md
+++ b/doc/adr/20260509-telegram-gateway-auth.md
@@ -0,0 +1,261 @@
 [← ADRs](.) · [factory](../..) · **20260509 — telegram-gateway auth**
 > **Cross-references** (bidirectionnel : chaque fichier listé doit citer cette ADR en tête)
 >
 > - **Code** (repo `arcodange/telegram-gateway`) :
 >   [`auth.go`](https://gitea.arcodange.lab/arcodange/telegram-gateway/src/branch/main/auth.go) ·
 >   [`handler_auth.go`](https://gitea.arcodange.lab/arcodange/telegram-gateway/src/branch/main/handler_auth.go) ·
 >   [`allowlist.go`](https://gitea.arcodange.lab/arcodange/telegram-gateway/src/branch/main/allowlist.go) ·
 >   [`server.go`](https://gitea.arcodange.lab/arcodange/telegram-gateway/src/branch/main/server.go) ·
 >   [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange/telegram-gateway/src/branch/main/chart/values.yaml)
 > - **User docs** :
 >   [`AUTH.md`](https://gitea.arcodange.lab/arcodange/telegram-gateway/src/branch/main/AUTH.md) ·
 >   [`HOWTO_ADD_BOT.md`](https://gitea.arcodange.lab/arcodange/telegram-gateway/src/branch/main/HOWTO_ADD_BOT.md)
 > - **Related ADR** :
 >   [`20260407-network-architecture.md`](20260407-network-architecture.md) (Cloudflare / Traefik / CrowdSec stack)
 > - **Implementation plan** : `~/.claude/plans/pour-les-notifications-on-inherited-seal.md` § Phase 1.5
 # ADR 20260509: Telegram Gateway — Authentication Layer
 ## Status
 Proposed
 ## Context
 Le service `telegram-gateway` (Phase 1, livré le 2026-05-09) expose des bots Telegram via webhooks publics sur `tg.arcodange.fr/bot/<slug>`. À ce stade :
 - Tout utilisateur Telegram qui connaît le handle d'un bot peut le DM et déclencher son handler.
 - Le gateway valide le `secret_token` Telegram (qui prouve que **Telegram** envoie le webhook), pas l'identité du **user** derrière le message.
 - Avant d'ouvrir le gateway à d'autres bots utiles (commandes `/build`, scripts Ollama, etc.), il faut un protocole d'authentification.
 Le besoin métier :
 - Un **bot principal** (`@arcodange_factory_bot`, slug interne `factory`) sert de point d'auth.
 - Une commande **`/auth <code>`** valide une session pour l'utilisateur Telegram qui l'envoie.
 - Les autres bots du gateway ne répondent qu'aux **utilisateurs déjà authentifiés**, **par défaut** (secure-by-default).
 - En garde-fou supplémentaire, une **allowlist d'IDs Telegram** peut filtrer les utilisateurs autorisés à parler aux bots, indépendamment de l'auth (silent-drop avant tout traitement).
 ## Decision
 ### 1. Identité utilisateur
 Telegram n'expose **pas l'IP** de l'utilisateur côté bot. La clé stable est **`from.id`** (Telegram user ID, `int64`, identique pour un même compte sur tous les devices). On l'utilise comme identifiant de session.
 > Hors scope : auth liée au device/IP — nécessiterait un canal d'auth séparé (web UI sur LAN, etc.).
 ### 2. Stockage de session
 - **Redis** (`redis.tools.svc.cluster.local:6379`, déjà déployé dans le namespace `tools`).
 - Clé : `tg-gw:auth:<from.id>` → valeur `1` (ou JSON metadata si on enrichit plus tard).
 - TTL : **24 h par défaut**, configurable via env `AUTH_SESSION_TTL` (Go duration : `12h`, `7d`, etc.).
 - Refresh : chaque `/auth` réussi remet le TTL à zéro.
 ### 3. Bot principal & commandes
 Le bot `factory` passe du handler `echo` au handler `auth`. Le handler `auth` reconnaît :
 | Commande | Effet |
 |---|---|
 | `/start` | Message d'accueil + liste des commandes disponibles |
 | `/auth <code>` | Compare `<code>` à `AUTH_SECRET` en constant-time ; si OK → SET Redis, deleteMessage du message original (replay defense), reply "✅ Authentifié pour 24 h" |
 | `/whoami` | Affiche le user_id et le TTL restant (ou "non authentifié") |
 | `/logout` | DEL Redis, reply "Déconnecté" |
 | _autre_ | Rappel des commandes |
 ### 4. Garde-fou allowlist
 Env `ALLOWED_USERS` : CSV de `from.id` Telegram (`12345,67890`). Comportement :
 - Vide ou absent → ouvert à tous (rétro-compat Phase 1).
 - Set → tout `from.id` hors-liste fait l'objet d'un **silent-drop** (HTTP 200 vide vers Telegram, log INFO côté gateway, **pas de réponse au user**).
 - Le silent-drop intervient **avant** la gate auth. Permet de masquer l'existence des bots à des inconnus.
 ### 5. Gate `requireAuth` par bot — secure-by-default
 Champ booléen dans `chart/values.yaml`, par bot. Sémantique :
 - **Default = `true`** (secure-by-default). Tout bot omet ce champ → gated.
 - Pour rendre un bot public, ajout explicite `requireAuth: false`.
 - Pour `handler: auth` (le bot principal), `requireAuth` est **forcé à `false`** automatiquement (chicken-and-egg : si l'auth elle-même est gated, personne ne peut s'authentifier).
 ```yaml
 bots:
  factory:
    handler: auth          # requireAuth auto-forcé à false
  pingbot:
    handler: echo          # requireAuth: true (implicite, défaut)
  statusbot:
    handler: echo
    requireAuth: false     # opt-out explicite, bot public
 ```
 Lorsque `requireAuth: true` et que le user n'est pas authentifié :
 > 🔒 Authentifie-toi d'abord avec `/auth <code>` chez @arcodange_factory_bot
 … puis ack 200 à Telegram. Le handler du bot n'est **pas** appelé.
 ### 6. Fail-at-startup
 Si `AUTH_SECRET` est vide ET au moins un bot a `handler=auth` ou `requireAuth: true` (y compris par défaut) → le pod **échoue au boot** avec un message clair. Évite le scénario "auth silencieusement off, bots accessibles à tous sans le savoir". Avec un défaut `requireAuth: true`, en pratique tout déploiement exige `AUTH_SECRET` (sauf si tous les bots font opt-out explicite).
 ## Architecture Diagrams
 ### 1. Flow `/auth` (login)
 ```mermaid
 %%{init: {'theme':'neutral'}}%%
 sequenceDiagram
    participant U as Utilisateur
    participant TG as Telegram
    participant GW as telegram-gateway
    participant R as Redis (tools)
    U->>TG: /auth s3cr3t (DM @arcodange_factory_bot)
    TG->>GW: POST /bot/factory<br/>X-Telegram-Bot-Api-Secret-Token: …
    GW->>GW: verify secret_token (Telegram→GW)
    GW->>GW: check ALLOWED_USERS (si configuré)
    GW->>GW: factory.handler = auth, parse "/auth s3cr3t"
    GW->>GW: subtle.ConstantTimeCompare(s3cr3t, AUTH_SECRET)
    alt Code valide
        GW->>R: SET tg-gw:auth:<from.id> EX 24h
        GW->>TG: deleteMessage (replay defense)
        GW->>TG: sendMessage "✅ Authentifié pour 24h"
        GW->>TG: 200 OK (ack webhook)
        TG->>U: "✅ Authentifié pour 24h"
    else Code invalide
        GW->>TG: sendMessage "❌ Mauvais code"
        GW->>TG: 200 OK
        TG->>U: "❌ Mauvais code"
    end
 ```
 ### 2. Accès à un bot gated (`requireAuth: true`, défaut)
 ```mermaid
 %%{init: {'theme':'neutral'}}%%
 sequenceDiagram
    participant U as Utilisateur
    participant TG as Telegram
    participant GW as telegram-gateway
    participant R as Redis
    participant H as Bot handler (echo / http / shell…)
    U->>TG: ping (DM @autre_bot)
    TG->>GW: POST /bot/autre_bot
    GW->>GW: verify secret_token + parse Update
    GW->>GW: ALLOWED_USERS check
    GW->>R: EXISTS tg-gw:auth:<from.id>
    alt Authentifié
        R-->>GW: 1
        GW->>H: Handler.Handle(update, bot)
        H->>TG: sendMessage (réponse métier)
        GW->>TG: 200 OK
    else Non authentifié
        R-->>GW: 0
        GW->>TG: sendMessage "🔒 /auth chez @arcodange_factory_bot"
        GW->>TG: 200 OK
    end
 ```
 ### 3. Décision globale à l'arrivée d'un webhook
 ```mermaid
 %%{init: {'theme':'neutral'}}%%
 graph TD
    %% classDef avec contraste explicite : fond clair → texte sombre
    classDef ok fill:#d4edda,stroke:#28a745,color:#155724;
    classDef block fill:#f8d7da,stroke:#dc3545,color:#721c24;
    classDef neutral fill:#e2e3e5,stroke:#6c757d,color:#383d41;
    Start[Webhook POST /bot/&lt;slug&gt;]:::neutral
    SecretCheck{secret_token<br/>match ?}:::neutral
    AllowlistCheck{from.id ∈<br/>ALLOWED_USERS ?}:::neutral
    HandlerKind{handler == auth ?}:::neutral
    AuthGate{requireAuth ?<br/>+ session valide ?}:::neutral
    Reject401[401 Unauthorized]:::block
    SilentDrop[200 vide<br/>silent drop]:::block
    Forbidden[reply &quot;🔒 /auth …&quot;<br/>200 OK]:::block
    AuthHandler[handler auth<br/>/auth /whoami /logout]:::ok
    BotHandler[Bot handler<br/>echo / http / shell]:::ok
    Start --> SecretCheck
    SecretCheck -- non --> Reject401
    SecretCheck -- oui --> AllowlistCheck
    AllowlistCheck -- non --> SilentDrop
    AllowlistCheck -- oui --> HandlerKind
    HandlerKind -- oui --> AuthHandler
    HandlerKind -- non --> AuthGate
    AuthGate -- pas autorisé --> Forbidden
    AuthGate -- OK --> BotHandler
 ```
 ## Consequences
 ### Positive
 - **Confidentialité** : les bots métier ne répondent qu'aux comptes Telegram authentifiés, **par défaut**.
 - **Défense en profondeur** : `ALLOWED_USERS` (allowlist), `secret_token` (Telegram→GW), `AUTH_SECRET` (user→bot), TTL session.
 - **UX simple** : un `/auth <code>` ponctuel, valide 24 h.
 - **Pas de migration** côté Phase 2/3 : la gate s'insère cleanly avant l'enqueue ou le forward.
 - **Replay defense** : le message contenant le code est supprimé du chat après login réussi.
 - **Secure-by-default** : un nouveau bot ajouté au gateway exige une session sans rien à configurer.
 ### Negative
 - **Code partagé** : `AUTH_SECRET` global (pas TOTP/per-user). Si compromis → rotation manuelle (changer Secret + redeploy).
 - **Pas de rate-limit** sur `/auth` : un utilisateur dans `ALLOWED_USERS` peut bruteforce le code en pratique. Mitigation : `ALLOWED_USERS` agit en floor, et 128+ bits de code rendent le bruteforce inutile dans la fenêtre de TTL.
 - **Dépendance Redis** : si Redis tombe, plus aucun user n'est considéré authentifié → tous les bots gated répondent "🔒". Acceptable (fail-closed) ; Phase 1 a déjà restauré Redis cleanly.
 - **Pas de session multi-device explicite** : `from.id` est le même sur tous les devices d'un compte → l'auth couvre déjà tous les devices, ce qui est le comportement attendu.
 ## Alternatives Considered
 ### Alternative 1 : auth par IP
 **Rejetée**. Telegram n'expose pas l'IP du user au bot. Aurait nécessité un canal d'auth secondaire (web UI sur LAN, page d'accueil arcodange.fr) et un binding device. Coût significatif pour un bénéfice ambigu.
 ### Alternative 2 : TOTP / OTP rotatif
 **Rejetée à ce stade**. Plus sécurisé que le code partagé mais ajoute :
 - Une étape d'enrôlement (afficher un QR code, scanner avec une app).
 - Une horloge synchronisée côté gateway et côté user.
 - De la complexité utilisateur (sortir l'app à chaque /auth).
 À reconsidérer si le code partagé fuit régulièrement ou si on ouvre à plus d'utilisateurs.
 ### Alternative 3 : Postgres au lieu de Redis pour les sessions
 **Rejetée**. Postgres serait nécessaire pour Phase 2 (queue durable), mais pour des sessions à TTL court, Redis est l'outil idiomatique :
 - Latence sub-ms.
 - TTL natif (`SET … EX 86400`).
 - Déjà déployé et utilisé (CrowdSec bouncer).
 ### Alternative 4 : pas de session, vérification du code à chaque message
 **Rejetée**. UX terrible (devoir re-taper le code à chaque DM) et n'apporte rien (le code en clair traîne plus longtemps en chat).
 ### Alternative 5 : `requireAuth: false` par défaut (insecure-by-default)
 **Rejetée** (initialement retenue, puis renversée). Avoir `requireAuth: false` par défaut signifie qu'un bot ajouté sans précaution est accessible à tous. Avec un gateway pensé "private by design", le défaut sécurisé `true` cadre bien mieux.
 ## Plan d'implémentation
 Voir `~/.claude/plans/pour-les-notifications-on-inherited-seal.md` § Phase 1.5.
 Résumé des fichiers touchés :
 - **Nouveaux** (repo `arcodange/telegram-gateway`) : `auth.go`, `handler_auth.go`, `allowlist.go`, `AUTH.md`
 - **Modifiés** : `telegram_types.go`, `telegram.go`, `handlers.go`, `config.go`, `server.go`, `main.go`, `go.mod`, `chart/values.yaml`, `chart/templates/deployment.yaml`, `HOWTO_ADD_BOT.md`
 - **Cluster** : `kubectl patch secret telegram-gateway-bots` pour ajouter `AUTH_SECRET` et (optionnel) `ALLOWED_USERS`
 ## Success Metrics
 - `/auth <wrong>` → 100 % refus, 0 SET Redis.
 - `/auth <right>` → 100 % succès, deleteMessage best-effort exécuté.
 - Bot avec `requireAuth: true` (défaut) répond le message "🔒 …" à 100 % des users non authentifiés.
 - Session expire effectivement après TTL (vérif via `kubectl exec redis-0 -- redis-cli TTL …`).
 - Aucun secret (code, token bot) dans les logs.
 - Latence ajoutée par la gate < 5 ms (Redis EXISTS local).
Author	SHA1	Message	Date
arcodange	54b3092305	Merge pull request 'fix(docs): place ADR under doc/adr (singular) per convention' (#8 ) from fix/adr-path-doc-singular into main Reviewed-on: #8	2026-05-09 15:29:22 +02:00
Gabriel Radureau	e0fb337a5f	docs: place new ADR under doc/adr (singular) per convention The 20260509 ADR landed in docs/adr/ (plural) by mistake. Convention is doc/adr/ (alongside the existing 00_, 01_, … docs and the network-architecture/cicd-architecture ADRs that pre-existed there). Note : 20260407-*.md files in the typo'd docs/adr/ are still untracked (never committed) — separate cleanup task.	2026-05-09 14:25:37 +02:00
arcodange	ea500abe62	Merge pull request 'docs(adr): telegram-gateway auth (Phase 1.5)' (#7 ) from docs/telegram-gateway-auth-adr into main Reviewed-on: #7	2026-05-09 14:22:12 +02:00
Gabriel Radureau	62673a2d65	docs(adr): telegram-gateway auth (Phase 1.5) Documents the authentication layer added to telegram-gateway in Phase 1.5 : - principal bot @arcodange_factory_bot (handler=auth) gère /auth, /whoami, /logout - session Redis 24h keyed by Telegram from.id (TTL via AUTH_SESSION_TTL) - allowlist optionnelle (ALLOWED_USERS) — silent drop avant la gate - requireAuth secure-by-default (true), opt-out explicite par bot - handler=auth force requireAuth=false (chicken-and-egg) Cross-links bidirectionnels avec le code (Gitea URLs vers arcodange/telegram-gateway), AUTH.md (user-facing) et HOWTO_ADD_BOT.md (Cas 2 mis à jour). Diagrammes mermaid avec contrastes explicites.	2026-05-09 13:58:27 +02:00
arcodange	4163b06659	Merge pull request 'argocd: add telegram-gateway application' (#6 ) from feat/homelab-gateway-app into main Reviewed-on: #6	2026-05-09 12:41:49 +02:00
Gabriel Radureau	3fb7544351	argocd: rename homelab-gateway → telegram-gateway Aligns with the upstream repo rename (arcodange/homelab-gateway → arcodange/telegram-gateway) so the name matches the public URL tg.arcodange.fr and Arcodange's naming conventions.	2026-05-09 12:35:37 +02:00
Gabriel Radureau	5038956332	argocd: add homelab-gateway application Adds the homelab-gateway Argo CD Application pointing at arcodange/homelab-gateway (user space, like dance-lessons-coach). Image Updater watches gitea.arcodange.lab/arcodange/homelab-gateway:latest with digest strategy. Phase 1 of the Telegram webhook gateway — a long-running pod that receives webhooks (no more polling) and routes per-bot to handler implementations. Initial bot: @arcodange_factory_bot, slug=factory, echo handler.	2026-05-09 12:25:30 +02:00
Gabriel Radureau	6ede249da9	🔒 fix(ansible): gate vault auth disable behind vault_oidc_force_reset (default off) (#5 ) Co-authored-by: Gabriel Radureau <arcodange@gmail.com> Co-committed-by: Gabriel Radureau <arcodange@gmail.com>	2026-05-06 15:03:33 +02:00
Gabriel Radureau	9e821e1626	♻️ refactor(ansible): move gitea secret user-propagation list to inventory (#4 ) Co-authored-by: Gabriel Radureau <arcodange@gmail.com> Co-committed-by: Gabriel Radureau <arcodange@gmail.com>	2026-05-06 14:48:05 +02:00