Add a root AGENTS.md (ecosystem map of factory/tools/cms + agent operating rules + the persona cohort & workflow) and a new vibe/ knowledge base for LLM agents, modeled on tree-docs conventions and the factory house style. vibe/ folders (each with a README hub + contribution rules): - ADR/ optimized MADR-lite; canonical home going forward (doc/adr stays historical) - PRD/ one subfolder per PRD, mandatory STATUS.md, QA strategy for big ones - investigations/ single INV-NNN-slug.md, or stub + folder w/ notebooks - guidebooks/ tree-docs maps; lab-ecosystem guidebook of factory+tools+cms - runbooks/ [AGENT]/[HUMAN] step procedures (EN; doc/runbooks stays FR) - shareouts/ dated FR handouts (decks/mp4) Seed content (first ADR + PRD): a safe, production-like environment to rehearse risky changes and recovery without touching real prod — local-only sandbox (k3d + arm64 VMs) with a hard prod/sandbox isolation boundary. Includes INV-001 (prod blast-radius couplings), the ecosystem guidebook, and a FR shareout. Conventions enforced: no-tombstone rule, breadcrumb spine, bidirectional cross-links, theme:base mermaid (MCP-validated) + ordered-list-after-diagram. Built with a Workflow + persona cohort; 24 files, zero dead links. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
117 lines
8.5 KiB
Markdown
117 lines
8.5 KiB
Markdown
[vibe](../../README.md) > [Guidebooks](../README.md) > **Lab ecosystem**
|
|
|
|
# Lab ecosystem
|
|
|
|
> **Status:** ✅ Active
|
|
> **Last Updated:** 2026-06-23
|
|
> **Related:** [ADR-0001 · safe prod-like environment](../../ADR/0001-safe-prod-like-environment.md) · [PRD · safe prod-like environment](../../PRD/safe-prod-like-environment/README.md) · [INV-001 · prod blast-radius couplings](../../investigations/INV-001-prod-blast-radius-couplings.md)
|
|
|
|
## What this is
|
|
|
|
This guidebook is the **end-to-end map of the Arcodange home lab** — how the three repos (`factory`, `tools`, `cms`), the three Raspberry Pis, and the cloud edge wire together into one running system. It is a *descriptive reference map*, not a procedure: it answers *"how does this fit together right now?"*. For *"how do I add a new app step by step?"* see the [new-web-app runbook](../../../doc/runbooks/new-web-app/README.md); for *"why was it built this way?"* see the [factory ADRs](../../../doc/adr/README.md).
|
|
|
|
The lab is run from **one control node** — a MacBook Pro M4 — driving everything via Ansible (imperative host setup) and OpenTofu (declarative cloud/Gitea/Vault/Postgres state). The three Pis (`pi1`/`pi2`/`pi3` = `192.168.1.201-203`) sit behind a home Livebox. `pi1` is the k3s server; `pi2`/`pi3` are agents. Gitea + PostgreSQL run as Docker Compose **outside** k3s on `pi2`'s disk; everything else runs **inside** k3s on Longhorn distributed block storage. The public edge is a Cloudflared Zero-Trust tunnel into the internal Traefik, with Cloudflare DNS and Zoho email fronting `arcodange.fr`.
|
|
|
|
## The whole lab, end to end
|
|
|
|
```mermaid
|
|
%%{init: {'theme': 'base'}}%%
|
|
flowchart TB
|
|
classDef ctrl fill:#2563eb,stroke:#1e40af,color:#fff
|
|
classDef host fill:#0891b2,stroke:#0e7490,color:#fff
|
|
classDef proc fill:#059669,stroke:#047857,color:#fff
|
|
classDef store fill:#7c3aed,stroke:#6d28d9,color:#fff
|
|
classDef edge fill:#d97706,stroke:#b45309,color:#fff
|
|
classDef dead fill:#6b7280,stroke:#4b5563,color:#fff
|
|
|
|
MAC["Control node (MacBook Pro M4)<br>Ansible + OpenTofu"]:::ctrl
|
|
|
|
subgraph LAN["Home LAN (Livebox) — 192.168.1.0/24"]
|
|
subgraph PI2["pi2 · 192.168.1.202 (docker-compose, outside k3s)"]
|
|
GITEA["Gitea<br>arcodange-org/*"]:::host
|
|
PG[("PostgreSQL")]:::store
|
|
end
|
|
subgraph K3S["k3s cluster — pi1 server, pi2/pi3 agents"]
|
|
ARGO["ArgoCD app-of-apps<br> /argocd"]:::proc
|
|
LH[("Longhorn<br>block storage")]:::store
|
|
VAULT["Vault + VSO<br>secrets"]:::store
|
|
TRAEFIK["Traefik<br>ingress"]:::proc
|
|
TOOLS["tools namespace<br>(Vault, Grafana, CrowdSec, …)"]:::host
|
|
APPS["app namespaces<br>(webapp, erp, cms, …)"]:::host
|
|
end
|
|
OLLAMA["pi3 · ollama"]:::host
|
|
end
|
|
|
|
subgraph CLOUD["Cloud edge"]
|
|
CF["Cloudflare DNS<br>+ Cloudflared tunnel"]:::edge
|
|
ZOHO["Zoho<br>email (arcodange.fr)"]:::edge
|
|
GCS[("GCS gs://arcodange-tf<br>OpenTofu state + Longhorn backup")]:::store
|
|
end
|
|
|
|
INTERNET(["Internet"]):::edge
|
|
|
|
MAC -- "Ansible: provision hosts, k3s, docker-compose" --> PI2
|
|
MAC -- "Ansible: k3s, Longhorn, Traefik" --> K3S
|
|
MAC -- "OpenTofu: Gitea/Vault/PG/Cloudflare/OVH state" --> GITEA
|
|
MAC -- "OpenTofu state" --> GCS
|
|
|
|
GITEA -- "repoURL chart/" --> ARGO
|
|
ARGO -- "Application CRDs (prune+selfHeal)" --> TOOLS
|
|
ARGO -- "Application CRDs (prune+selfHeal)" --> APPS
|
|
VAULT -- "VSO injects secrets into pods" --> TOOLS
|
|
VAULT -- "VSO injects secrets into pods" --> APPS
|
|
APPS -- "dynamic creds" --> PG
|
|
LH -. "PVCs" .- TOOLS
|
|
LH -. "PVCs" .- APPS
|
|
LH -- "backup target" --> GCS
|
|
|
|
INTERNET --> CF -- "tunnel" --> TRAEFIK --> APPS
|
|
INTERNET --> ZOHO
|
|
```
|
|
|
|
1. The **control node** (MacBook) provisions the three Pis with Ansible (OS, disks, Docker, k3s, Longhorn, Traefik) and manages all SaaS/Gitea/Vault/Postgres state with OpenTofu.
|
|
2. On **pi2**, Gitea and PostgreSQL run as Docker Compose *outside* k3s, on the local disk — they are the source-of-truth services the cluster depends on.
|
|
3. OpenTofu keeps its **state in GCS** (`gs://arcodange-tf`), and Longhorn pushes volume **backups** to the same GCS project.
|
|
4. **Gitea** hosts every app repo; each repo's `chart/` directory is the deployable Helm chart.
|
|
5. **ArgoCD's app-of-apps** turns each Gitea repo into an `Application` CRD (automated `prune` + `selfHeal`) that deploys into the `tools` namespace and the per-app namespaces.
|
|
6. **Vault** is the single source of truth for secrets; the **Vault Secrets Operator (VSO)** injects them into pods via Kubernetes auth, and apps draw dynamic PostgreSQL credentials from Vault against `pi2`.
|
|
7. **Longhorn** provides the PVCs the in-cluster workloads mount, and backs up to GCS.
|
|
8. The **public edge** routes Internet traffic through Cloudflare DNS and a Cloudflared Zero-Trust **tunnel** into the internal **Traefik**, which fronts the app namespaces; **Zoho** handles `arcodange.fr` email.
|
|
|
|
> [!NOTE]
|
|
> The ArgoCD Helm chart under [`argocd/`](../../../argocd/) is defined and templated, but **ArgoCD itself is not currently deployed in-cluster** (its install step is commented out in the `03_cicd` provisioning). The app-of-apps wiring documented here is the intended steady state; see [01 · factory](01-factory.md) for the caveat.
|
|
|
|
## Deploy / secrets / DNS flows
|
|
|
|
- **Deploy flow.** Push to a Gitea repo → CI builds an image into the Gitea registry → ArgoCD (via the app-of-apps and, for some apps, the Image Updater) syncs the `chart/` directory into the matching namespace with `prune` + `selfHeal`. The whole chain keys off one `<app>` identifier — see [naming-conventions.md](naming-conventions.md).
|
|
- **Secrets flow.** Vault is the **single source of truth** (no sops/age). CI authenticates to Vault via **Gitea OIDC JWT** (role `gitea_cicd_<app>`); pods receive secrets at runtime via **VSO** (Kubernetes auth + `VaultDynamicSecret` CRDs). Detail in [secrets-and-vault.md](secrets-and-vault.md).
|
|
- **DNS / edge flow.** Internal names resolve under `*.arcodange.lab` (Pi-hole + Step-CA-issued TLS). Public traffic for `arcodange.fr` enters through Cloudflare and a Cloudflared tunnel to internal Traefik; public TLS is Let's Encrypt via Traefik's DNS-challenge (DuckDNS). Email runs through Zoho. Edge detail in [03 · cms](03-cms.md).
|
|
|
|
## Master index
|
|
|
|
| Page | What it maps | Status |
|
|
|---|---|---|
|
|
| [01 · factory](01-factory.md) | The cornerstone admin repo: Ansible host/cluster provisioning, ArgoCD app-of-apps, OpenTofu (`iac/`), and per-app PostgreSQL (`postgres/iac/`) | ✅ Active |
|
|
| [02 · tools](02-tools.md) | The `tools` namespace: Vault, VSO, Prometheus, Grafana, CrowdSec, poolers, Redis/KeyDB, Plausible + ClickHouse, the `tool` library chart | ✅ Active |
|
|
| [03 · cms](03-cms.md) | The public-facing site: Nuxt static site, Cloudflare zone + tunnel + Turnstile, Zoho email (MX/SPF/DKIM/DMARC/BIMI + aliases) | ✅ Active |
|
|
| [naming-conventions.md](naming-conventions.md) | The `<app>` join key — one kebab-case name reused identically across Gitea, PG, Vault, k8s, ArgoCD, GCS, DNS | ✅ Active |
|
|
| [secrets-and-vault.md](secrets-and-vault.md) | How Vault is the single source of truth: Gitea OIDC JWT for CI, VSO injection for pods, dynamic PostgreSQL creds | ✅ Active |
|
|
| [storage-and-recovery.md](storage-and-recovery.md) | Longhorn block storage, GCS backup target, and the tested power-cut recovery sequence | ✅ Active |
|
|
|
|
## Status legend
|
|
|
|
✅ done · 🟡 beta · 🔴 critical · ⚠️ known issue · ❌ disabled · ⬜ not started.
|
|
|
|
## Maintenance rule
|
|
|
|
> [!IMPORTANT]
|
|
> **If you alter a component documented here, update its page in the same change.** A reference map that drifts from reality sends readers (and agents) confidently down dead paths. The PR that changes the component is the PR that updates its guidebook page — treat the doc edit as part of the diff, not a follow-up.
|
|
|
|
## Cross-references
|
|
|
|
- [ADR-0001 · safe prod-like environment](../../ADR/0001-safe-prod-like-environment.md) — the decision this map supports.
|
|
- [PRD · safe prod-like environment](../../PRD/safe-prod-like-environment/README.md) — the product framing of an isolated, prod-like sandbox.
|
|
- [INV-001 · prod blast-radius couplings](../../investigations/INV-001-prod-blast-radius-couplings.md) — the couplings (the `<app>` join key, shared Vault/PG/Longhorn) that make blast radius real.
|
|
- [doc/adr](../../../doc/adr/README.md) — the canonical infrastructure ADRs (FRENCH).
|
|
- [new-web-app conventions](../../../doc/runbooks/new-web-app/conventions.md) — the authoritative source for the `<app>` naming convention.
|