docs(vibe): add tools/ and cms/ guidebooks

Two code-grounded tree-docs guidebooks under vibe/guidebooks/, drilling into the
lab-ecosystem 02-tools and 03-cms pages (bidirectional):

- tools/  : hub + components.md (Vault+VSO, Prometheus, Grafana, CrowdSec,
  pgbouncer, Redis/KeyDB, Plausible, ClickHouse; pgcat/tool as Tier-2) +
  secrets-and-vso.md (Vault engines/auth, the app_roles/app_policy modules =
  the <app> join-key machinery, VSO CRDs, secret-paths inventory).
- cms/    : hub + site.md (Nuxt + dual Pages/k3s deploy) + cloudflare.md
  (zone via OVH->CF, Pages, cloudflared tunnel, Turnstile, R2 state) +
  zoho-email.md (OAuth, MX/SPF/DKIM/DMARC/BIMI, the 7 aliases).

Sibling-repo code linked via full gitea URLs; vibe-internal links bidirectional.
Reconciled the cloudflared tunnel token path to kvv2 cms/cloudflared (the chart
VaultStaticSecret is kv-v2; the kvv1 tofu reference is a commented-out stub).
6 mermaid diagrams MCP-validated; zero dead links. Lab Cartographer cohort.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-23 21:41:15 +02:00
parent dbe32161dc
commit 548dacfc44
10 changed files with 1110 additions and 0 deletions

View File

@@ -0,0 +1,218 @@
[vibe](../../README.md) > [Guidebooks](../README.md) > [Tools](README.md) > **Components**
# Components
> **Status:** ✅ Active
> **Last Updated:** 2026-06-23
> **Upstream:** [Tools hub](README.md) · [lab-ecosystem 02 · tools](../lab-ecosystem/02-tools.md)
> **Downstream:** [Secrets & VSO](secrets-and-vso.md)
> **Related:** [storage & recovery concept](../lab-ecosystem/storage-and-recovery.md) · [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [naming conventions](../lab-ecosystem/naming-conventions.md)
This is the **per-component reference** for the `tools` platform layer: pinned chart/app versions, the values that actually matter (replicas, storage, ports, auth), and the cross-service wiring. Every component lands in the single **`tools` namespace**. For the deploy model (how one ArgoCD Application fans out into one per component) see the [Tools hub](README.md); for how Vault secrets reach the pods see [Secrets & VSO](secrets-and-vso.md).
Components split into two **tiers**:
- **Tier 1** — the load-bearing services, each with its own subsection and value tables below.
- **Tier 2** — supporting / inactive pieces, summarised in a single table.
Severity legend (GitHub alerts): `[!NOTE]` informational · `[!TIP]` good-to-know · `[!WARNING]` operational hazard · `[!CAUTION]` live risk.
---
## Tier 1 — load-bearing services
### hashicorp-vault
[`hashicorp-vault/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault) — the lab's secrets brain. The chart bundles **three** dependencies: the upstream `vault` server, the `vault-secrets-operator` (VSO) that injects secrets into pods, and the shared `tool` library chart.
| Key | Value |
|---|---|
| Chart deps | `vault` `0.28.1`, `vault-secrets-operator` `0.9.0`, `tool` `0.1.0` |
| Mode | `standalone` (single instance, **not** HA / raft) |
| Storage | `storage "file"` at `/vault/data` + audit storage enabled |
| Listener | TLS **off** (`tls_disable = 1`) on `[::]:8200` — terminated at the edge |
| Ingress | `vault.arcodange.lab` (Traefik `websecure`, Let's Encrypt, `localIp@file` middleware) |
| UI | enabled (`ui = true`) |
| Log level | `trace` |
**Mounts (secret engines) exposed:**
| Mount | Type | Purpose |
|---|---|---|
| `kvv1` | KV v1 | Static secrets (legacy / v1 layout) |
| `kvv2` | KV v2 | Versioned static secrets (primary store) |
| `transit` | transit | Encryption-as-a-service; backs VSO client-cache (`vso-client-cache` key) |
| `postgres` | database | Dynamic PostgreSQL credentials (connection via `pgbouncer.tools:5432`) |
**Auth methods enabled:**
| Method | Used by |
|---|---|
| `kubernetes` | In-cluster workloads (VSO, app ServiceAccounts) authenticate by SA token |
| `gitea_jwt` | Gitea Actions / OIDC-JWT pipelines authenticate from CI |
> [!NOTE]
> The full secret-engine layout, VSO `VaultAuth` / `VaultConnection` / `VaultDynamicSecret` wiring, and the `kvv2/data/...` path conventions are documented in [Secrets & VSO](secrets-and-vso.md) — this page only inventories what the chart stands up.
The VSO sub-chart ships a `defaultVaultConnection` pointing at `http://hashicorp-vault.tools.svc.cluster.local:8200` and a client cache with `persistenceModel: direct-encrypted`, encrypted through the `transit` mount.
### prometheus
[`prometheus/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/prometheus) — metrics collection and TSDB, via the `kube-prometheus`-style community chart.
| Key | Value |
|---|---|
| Chart deps | `prometheus` `28.13.0` (app `v3.10.0`), `tool` `0.1.0` |
| Server replicas | `1` (Deployment, `strategy: Recreate`) |
| Server storage | `persistentVolume` enabled, **8Gi** at `/data` (`ReadWriteOnce`) |
| Retention | `15d` |
| Alertmanager | enabled, persistence **2Gi** (`ReadWriteOnce`) |
| node-exporter | enabled (DaemonSet, `prometheus-node-exporter` sub-chart) |
| kube-state-metrics | enabled |
| pushgateway | enabled (`prometheus.io/probe: pushgateway`) |
| Scrape / eval interval | `1m` (scrape timeout `10s`) |
| Ingress | none — **internal only** |
**Scrape targets** (default `scrapeConfigs`, all enabled): the Prometheus server itself, the Kubernetes API servers, nodes + kubelet cadvisor, plus **annotation-based** service-endpoint and pod discovery (`prometheus.io/scrape`, `prometheus.io/port`, `prometheus.io/path`, `prometheus.io/scheme`), with `*-slow` (5m) variants for cheaper targets.
### grafana
[`grafana/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/grafana) — dashboards over Prometheus and ClickHouse.
| Key | Value |
|---|---|
| Chart deps | `grafana` `10.3.0` (app `latest`), `tool` `0.1.0` |
| Replicas | `1` (Deployment, `RollingUpdate`) |
| Persistence | **disabled** — ephemeral; dashboards/datasources are provisioned at boot |
| Ingress | `grafana.arcodange.lab` (Traefik `websecure`, Let's Encrypt, `localIp@file` middleware) |
| Plugin | `grafana-clickhouse-datasource` |
| Resources | requests `100m` / `128Mi`, limits `100m` / `512Mi` |
| Timezone | `Europe/Paris` |
**Datasources (provisioned):**
| Name | Type | Target | Default |
|---|---|---|---|
| Prometheus | `prometheus` | `http://prometheus-server.tools.svc.cluster.local` | ✅ yes |
| clickhouse | `grafana-clickhouse-datasource` | `clickhouse.tools.svc.cluster.local:9000` (native, `tlsSkipVerify`) | no |
> [!WARNING]
> The Grafana **admin password is static and committed** in [`grafana/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/grafana/values.yaml) (`adminUser: admin`). The provisioned ClickHouse datasource password is committed there too (`secureJsonData.password`). Treat these as lab-only credentials; do not reuse them outside the homelab.
### crowdsec
[`crowdsec/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec) — behavioural edge security that feeds a Traefik blocklist.
| Key | Value |
|---|---|
| Chart deps | `crowdsec` `0.20.1`, `tool` `0.1.0` |
| LAPI | Deployment (`RollingUpdate`, `maxUnavailable: 0`) — the local API + decision store |
| Agent | DaemonSet pinned to control-plane nodes (`node-role.kubernetes.io/control-plane`) |
| Log source | parses **Traefik** pod logs in `kube-system` (`podName: traefik-*`, `program: traefik`) |
| Collections | `crowdsecurity/traefik`, `crowdsecurity/http-cve` (+ AppSec rules below) |
| AppSec (WAF) | **enabled**`crowdsecurity/appsec-default` on `0.0.0.0:7422`; collections `appsec-virtual-patching` + `appsec-generic-rules` |
| Database | external PostgreSQL `crowdsec` via **pgbouncer** (`host: pgbouncer.tools:5432`, `type: postgresql`) |
| DB credentials | dynamic, from secret `crowdsec-db-credentials` (`DB_USER` / `DB_PASSWORD`, sourced via VSO) |
| Console | enrolled as instance `homelab` |
The decisions CrowdSec produces are surfaced as a **Traefik middleware blocklist applied at the edge**, so malicious IPs are dropped before they reach app namespaces. `server_reset_query: DEALLOCATE ALL` on pgbouncer (below) exists specifically to keep CrowdSec's prepared statements happy through the pooler. The CAPTCHA challenge CrowdSec serves on remediated requests is a **Cloudflare Turnstile widget minted by the `cms` repo** — see the [CMS Cloudflare page](../cms/cloudflare.md), which produces the sitekey/secret this bouncer consumes from Vault.
### pgbouncer
[`pgbouncer/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/pgbouncer) — the connection pooler in front of the **external** PostgreSQL.
| Key | Value |
|---|---|
| Chart deps | `pgbouncer` `2.3.1` (`icoretech/pgbouncer`), `tool` `0.1.0` |
| Scheduling | `nodeSelector: kubernetes.io/hostname: pi2` (co-located with PostgreSQL) |
| Upstream DB | external PostgreSQL at `192.168.1.202:5432` (the `pi2` host), wildcard database `"*"` |
| Auth type | `scram-sha-256` |
| `auth_query` | `SELECT uname, phash FROM user_lookup($1)` |
| `server_reset_query` | `DEALLOCATE ALL` (clears prepared statements — fixes CrowdSec re-use) |
| `server_idle_timeout` | `7200` (2h) |
| `ignore_startup_parameters` | `extra_float_digits` (unsupported JDBC arg) |
| Exporter | disabled |
| Service | `pgbouncer.tools:5432` (cluster-internal) |
> [!NOTE]
> pgbouncer is the single front door to the lab's PostgreSQL: CrowdSec, Plausible, and Vault's `postgres` dynamic-secret backend all connect through `pgbouncer.tools:5432`, never to `192.168.1.202` directly.
### redis (KeyDB)
[`redis/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/redis) — the in-memory cache / session store. The chart targets **KeyDB** (EqAlpha, Redis-compatible), tuned for the 2× Raspberry Pi 5 nodes.
| Key | Value |
|---|---|
| Chart deps | `redis` `2.1.0` (`pascaliske/redis`), `tool` `0.1.0` |
| Workload | **StatefulSet** (master at index 0, replica running `replicaof` the master) |
| Storage | PVC `create: true`, **1Gi** at `/data` (`ReadWriteOnce`) |
| Tuning | `server-threads 4` (ARM-tuned for the Pi 5 cores) |
| Port | `6379` (`ClusterIP`) |
| Security | `runAsUser/Group/fsGroup: 999`, non-root |
| Timezone | `Europe/Paris` |
> [!NOTE]
> Access the instance for inspection with `kubectl port-forward -n tools svc/redis 6379:6379` and Redis Insights (per the [chart README](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/redis/README.md)).
### plausible
[`plausible/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible) — privacy-friendly web analytics. Deployed via a **Kustomize** overlay that inflates the upstream Helm chart (not a `Chart.yaml` dependency like the Tier-1 charts above).
| Key | Value |
|---|---|
| Declared via | Kustomize `helmCharts:` inflation generator |
| Chart / version | `plausible` `2.0.0` (`pascaliske/plausible`), image `ghcr.io/plausible/community-edition` |
| Replicas | `1` (Deployment) |
| Ingress | `analytics.arcodange.lab` (Traefik IngressRoute, Let's Encrypt, `localIp@file` middleware) |
| App DB | PostgreSQL via **pgbouncer** — an **init container** assembles `DATABASE_URL` from VSO dynamic creds |
| Event store | **ClickHouse** (see below) |
| GeoIP | MaxMind **GeoLite2** (`GeoLite2-Country` + `GeoLite2-City`), license key from secret `plausible-geoip` |
| Secrets | `SECRET_KEY_BASE` / `TOTP_VAULT_KEY` from existing secret `plausible-config` (VSO-fed) |
Plausible writes analytics events to ClickHouse and stores app/account state in PostgreSQL — two distinct backends, both reached through lab-internal services.
### clickhouse
[`clickhouse/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse) — the OLAP column store behind Plausible. Also a **Kustomize** overlay inflating the upstream chart, plus a `databases` sub-chart that runs an init job.
| Key | Value |
|---|---|
| Declared via | Kustomize `helmCharts:` inflation generator (`chartHome: charts`) |
| Chart / version | `clickhouse` `0.4.0` (`pascaliske/clickhouse`), image `clickhouse/clickhouse-server` |
| Workload | **StatefulSet**, `replicas: 1` |
| Storage | PVC **16Gi** at `/var/lib/clickhouse` (`ReadWriteOnce`) |
| Ports | `8123` (HTTP), `9000` (native protocol) |
| Custom user | `arcodange` (full network access, `access_management: 1`) via `custom-users.xml` |
| Security | `runAsUser/Group/fsGroup: 101`, non-root |
| Timezone | `Europe/Paris` |
> [!WARNING]
> The ClickHouse `arcodange` user password is **static and committed** in [`clickhouse/clickhouseValues.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse/clickhouseValues.yaml) (`custom-users.xml`). The same value appears in Grafana's provisioned datasource — keep the two in sync if you rotate it.
> [!CAUTION]
> ClickHouse carries a `nodeAffinity` that **excludes `pi2`** (`kubernetes.io/hostname NotIn [pi2]`). `pi2` hosts PostgreSQL and pgbouncer; ClickHouse is deliberately kept off it to avoid I/O contention on that node. A cluster where `pi2` is the only schedulable node will leave ClickHouse `Pending`.
---
## Tier 2 — supporting & inactive
| Component | Status | Notes |
|---|---|---|
| [`pgcat/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/pgcat) | ❌ disabled | Alternative Postgres pooler (`pgcat` chart `0.1.0`). Not in service — its sole pool has empty `username`/`password`/`database` placeholders, and it is **not** keyed under `tools:` in [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml), so ArgoCD renders no Application for it. [pgbouncer](#pgbouncer) is the active pooler. |
| [`tool/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/tool) | ✅ active (library) | Helm **library chart** (`type: library`, version `0.1.0`) consumed by **every** component chart via `dependencies:`. Ships shared templates/helpers; **not deployable** on its own. |
---
## Gotchas
> [!WARNING]
> **No high availability.** Every Tier-1 service runs a **single replica** — Vault (`standalone`), Prometheus (`replicaCount: 1`), Grafana (`replicas: 1`), ClickHouse and Redis/KeyDB StatefulSets (`replicas: 1`), Plausible and the CrowdSec LAPI (single Deployment). Any node drain or pod restart is a brief outage for that service, not a failover.
> [!WARNING]
> **Static, committed passwords.** Grafana admin (+ its ClickHouse datasource), the ClickHouse `arcodange` user, and the pgbouncer admin/auth users all carry plaintext credentials in their `values.yaml`. They are lab-only; rotate before any exposure and never copy them to a real environment.
> [!CAUTION]
> **ClickHouse must avoid `pi2`.** The `NotIn [pi2]` `nodeAffinity` keeps it off the PostgreSQL/pgbouncer host. If `pi2` is the only schedulable node, ClickHouse (and therefore Plausible analytics) stays `Pending`. See the [storage & recovery concept](../lab-ecosystem/storage-and-recovery.md) for how PVC-backed services map onto specific nodes.
> [!CAUTION]
> **Vault is single-instance and starts sealed.** After **any** restart (pod reschedule, node reboot, chart upgrade) Vault comes up **sealed** with no automatic unseal configured — every VSO injection and dynamic-secret lease blocks until an operator unseals it. This is the first thing to check when secrets stop flowing across the cluster; the unseal procedure lives in [Secrets & VSO](secrets-and-vso.md).