Files
factory/vibe/guidebooks/tools/components.md
Gabriel Radureau 548dacfc44 docs(vibe): add tools/ and cms/ guidebooks
Two code-grounded tree-docs guidebooks under vibe/guidebooks/, drilling into the
lab-ecosystem 02-tools and 03-cms pages (bidirectional):

- tools/  : hub + components.md (Vault+VSO, Prometheus, Grafana, CrowdSec,
  pgbouncer, Redis/KeyDB, Plausible, ClickHouse; pgcat/tool as Tier-2) +
  secrets-and-vso.md (Vault engines/auth, the app_roles/app_policy modules =
  the <app> join-key machinery, VSO CRDs, secret-paths inventory).
- cms/    : hub + site.md (Nuxt + dual Pages/k3s deploy) + cloudflare.md
  (zone via OVH->CF, Pages, cloudflared tunnel, Turnstile, R2 state) +
  zoho-email.md (OAuth, MX/SPF/DKIM/DMARC/BIMI, the 7 aliases).

Sibling-repo code linked via full gitea URLs; vibe-internal links bidirectional.
Reconciled the cloudflared tunnel token path to kvv2 cms/cloudflared (the chart
VaultStaticSecret is kv-v2; the kvv1 tofu reference is a commented-out stub).
6 mermaid diagrams MCP-validated; zero dead links. Lab Cartographer cohort.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 21:41:15 +02:00

219 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
[vibe](../../README.md) > [Guidebooks](../README.md) > [Tools](README.md) > **Components**
# Components
> **Status:** ✅ Active
> **Last Updated:** 2026-06-23
> **Upstream:** [Tools hub](README.md) · [lab-ecosystem 02 · tools](../lab-ecosystem/02-tools.md)
> **Downstream:** [Secrets & VSO](secrets-and-vso.md)
> **Related:** [storage & recovery concept](../lab-ecosystem/storage-and-recovery.md) · [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [naming conventions](../lab-ecosystem/naming-conventions.md)
This is the **per-component reference** for the `tools` platform layer: pinned chart/app versions, the values that actually matter (replicas, storage, ports, auth), and the cross-service wiring. Every component lands in the single **`tools` namespace**. For the deploy model (how one ArgoCD Application fans out into one per component) see the [Tools hub](README.md); for how Vault secrets reach the pods see [Secrets & VSO](secrets-and-vso.md).
Components split into two **tiers**:
- **Tier 1** — the load-bearing services, each with its own subsection and value tables below.
- **Tier 2** — supporting / inactive pieces, summarised in a single table.
Severity legend (GitHub alerts): `[!NOTE]` informational · `[!TIP]` good-to-know · `[!WARNING]` operational hazard · `[!CAUTION]` live risk.
---
## Tier 1 — load-bearing services
### hashicorp-vault
[`hashicorp-vault/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault) — the lab's secrets brain. The chart bundles **three** dependencies: the upstream `vault` server, the `vault-secrets-operator` (VSO) that injects secrets into pods, and the shared `tool` library chart.
| Key | Value |
|---|---|
| Chart deps | `vault` `0.28.1`, `vault-secrets-operator` `0.9.0`, `tool` `0.1.0` |
| Mode | `standalone` (single instance, **not** HA / raft) |
| Storage | `storage "file"` at `/vault/data` + audit storage enabled |
| Listener | TLS **off** (`tls_disable = 1`) on `[::]:8200` — terminated at the edge |
| Ingress | `vault.arcodange.lab` (Traefik `websecure`, Let's Encrypt, `localIp@file` middleware) |
| UI | enabled (`ui = true`) |
| Log level | `trace` |
**Mounts (secret engines) exposed:**
| Mount | Type | Purpose |
|---|---|---|
| `kvv1` | KV v1 | Static secrets (legacy / v1 layout) |
| `kvv2` | KV v2 | Versioned static secrets (primary store) |
| `transit` | transit | Encryption-as-a-service; backs VSO client-cache (`vso-client-cache` key) |
| `postgres` | database | Dynamic PostgreSQL credentials (connection via `pgbouncer.tools:5432`) |
**Auth methods enabled:**
| Method | Used by |
|---|---|
| `kubernetes` | In-cluster workloads (VSO, app ServiceAccounts) authenticate by SA token |
| `gitea_jwt` | Gitea Actions / OIDC-JWT pipelines authenticate from CI |
> [!NOTE]
> The full secret-engine layout, VSO `VaultAuth` / `VaultConnection` / `VaultDynamicSecret` wiring, and the `kvv2/data/...` path conventions are documented in [Secrets & VSO](secrets-and-vso.md) — this page only inventories what the chart stands up.
The VSO sub-chart ships a `defaultVaultConnection` pointing at `http://hashicorp-vault.tools.svc.cluster.local:8200` and a client cache with `persistenceModel: direct-encrypted`, encrypted through the `transit` mount.
### prometheus
[`prometheus/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/prometheus) — metrics collection and TSDB, via the `kube-prometheus`-style community chart.
| Key | Value |
|---|---|
| Chart deps | `prometheus` `28.13.0` (app `v3.10.0`), `tool` `0.1.0` |
| Server replicas | `1` (Deployment, `strategy: Recreate`) |
| Server storage | `persistentVolume` enabled, **8Gi** at `/data` (`ReadWriteOnce`) |
| Retention | `15d` |
| Alertmanager | enabled, persistence **2Gi** (`ReadWriteOnce`) |
| node-exporter | enabled (DaemonSet, `prometheus-node-exporter` sub-chart) |
| kube-state-metrics | enabled |
| pushgateway | enabled (`prometheus.io/probe: pushgateway`) |
| Scrape / eval interval | `1m` (scrape timeout `10s`) |
| Ingress | none — **internal only** |
**Scrape targets** (default `scrapeConfigs`, all enabled): the Prometheus server itself, the Kubernetes API servers, nodes + kubelet cadvisor, plus **annotation-based** service-endpoint and pod discovery (`prometheus.io/scrape`, `prometheus.io/port`, `prometheus.io/path`, `prometheus.io/scheme`), with `*-slow` (5m) variants for cheaper targets.
### grafana
[`grafana/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/grafana) — dashboards over Prometheus and ClickHouse.
| Key | Value |
|---|---|
| Chart deps | `grafana` `10.3.0` (app `latest`), `tool` `0.1.0` |
| Replicas | `1` (Deployment, `RollingUpdate`) |
| Persistence | **disabled** — ephemeral; dashboards/datasources are provisioned at boot |
| Ingress | `grafana.arcodange.lab` (Traefik `websecure`, Let's Encrypt, `localIp@file` middleware) |
| Plugin | `grafana-clickhouse-datasource` |
| Resources | requests `100m` / `128Mi`, limits `100m` / `512Mi` |
| Timezone | `Europe/Paris` |
**Datasources (provisioned):**
| Name | Type | Target | Default |
|---|---|---|---|
| Prometheus | `prometheus` | `http://prometheus-server.tools.svc.cluster.local` | ✅ yes |
| clickhouse | `grafana-clickhouse-datasource` | `clickhouse.tools.svc.cluster.local:9000` (native, `tlsSkipVerify`) | no |
> [!WARNING]
> The Grafana **admin password is static and committed** in [`grafana/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/grafana/values.yaml) (`adminUser: admin`). The provisioned ClickHouse datasource password is committed there too (`secureJsonData.password`). Treat these as lab-only credentials; do not reuse them outside the homelab.
### crowdsec
[`crowdsec/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec) — behavioural edge security that feeds a Traefik blocklist.
| Key | Value |
|---|---|
| Chart deps | `crowdsec` `0.20.1`, `tool` `0.1.0` |
| LAPI | Deployment (`RollingUpdate`, `maxUnavailable: 0`) — the local API + decision store |
| Agent | DaemonSet pinned to control-plane nodes (`node-role.kubernetes.io/control-plane`) |
| Log source | parses **Traefik** pod logs in `kube-system` (`podName: traefik-*`, `program: traefik`) |
| Collections | `crowdsecurity/traefik`, `crowdsecurity/http-cve` (+ AppSec rules below) |
| AppSec (WAF) | **enabled**`crowdsecurity/appsec-default` on `0.0.0.0:7422`; collections `appsec-virtual-patching` + `appsec-generic-rules` |
| Database | external PostgreSQL `crowdsec` via **pgbouncer** (`host: pgbouncer.tools:5432`, `type: postgresql`) |
| DB credentials | dynamic, from secret `crowdsec-db-credentials` (`DB_USER` / `DB_PASSWORD`, sourced via VSO) |
| Console | enrolled as instance `homelab` |
The decisions CrowdSec produces are surfaced as a **Traefik middleware blocklist applied at the edge**, so malicious IPs are dropped before they reach app namespaces. `server_reset_query: DEALLOCATE ALL` on pgbouncer (below) exists specifically to keep CrowdSec's prepared statements happy through the pooler. The CAPTCHA challenge CrowdSec serves on remediated requests is a **Cloudflare Turnstile widget minted by the `cms` repo** — see the [CMS Cloudflare page](../cms/cloudflare.md), which produces the sitekey/secret this bouncer consumes from Vault.
### pgbouncer
[`pgbouncer/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/pgbouncer) — the connection pooler in front of the **external** PostgreSQL.
| Key | Value |
|---|---|
| Chart deps | `pgbouncer` `2.3.1` (`icoretech/pgbouncer`), `tool` `0.1.0` |
| Scheduling | `nodeSelector: kubernetes.io/hostname: pi2` (co-located with PostgreSQL) |
| Upstream DB | external PostgreSQL at `192.168.1.202:5432` (the `pi2` host), wildcard database `"*"` |
| Auth type | `scram-sha-256` |
| `auth_query` | `SELECT uname, phash FROM user_lookup($1)` |
| `server_reset_query` | `DEALLOCATE ALL` (clears prepared statements — fixes CrowdSec re-use) |
| `server_idle_timeout` | `7200` (2h) |
| `ignore_startup_parameters` | `extra_float_digits` (unsupported JDBC arg) |
| Exporter | disabled |
| Service | `pgbouncer.tools:5432` (cluster-internal) |
> [!NOTE]
> pgbouncer is the single front door to the lab's PostgreSQL: CrowdSec, Plausible, and Vault's `postgres` dynamic-secret backend all connect through `pgbouncer.tools:5432`, never to `192.168.1.202` directly.
### redis (KeyDB)
[`redis/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/redis) — the in-memory cache / session store. The chart targets **KeyDB** (EqAlpha, Redis-compatible), tuned for the 2× Raspberry Pi 5 nodes.
| Key | Value |
|---|---|
| Chart deps | `redis` `2.1.0` (`pascaliske/redis`), `tool` `0.1.0` |
| Workload | **StatefulSet** (master at index 0, replica running `replicaof` the master) |
| Storage | PVC `create: true`, **1Gi** at `/data` (`ReadWriteOnce`) |
| Tuning | `server-threads 4` (ARM-tuned for the Pi 5 cores) |
| Port | `6379` (`ClusterIP`) |
| Security | `runAsUser/Group/fsGroup: 999`, non-root |
| Timezone | `Europe/Paris` |
> [!NOTE]
> Access the instance for inspection with `kubectl port-forward -n tools svc/redis 6379:6379` and Redis Insights (per the [chart README](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/redis/README.md)).
### plausible
[`plausible/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible) — privacy-friendly web analytics. Deployed via a **Kustomize** overlay that inflates the upstream Helm chart (not a `Chart.yaml` dependency like the Tier-1 charts above).
| Key | Value |
|---|---|
| Declared via | Kustomize `helmCharts:` inflation generator |
| Chart / version | `plausible` `2.0.0` (`pascaliske/plausible`), image `ghcr.io/plausible/community-edition` |
| Replicas | `1` (Deployment) |
| Ingress | `analytics.arcodange.lab` (Traefik IngressRoute, Let's Encrypt, `localIp@file` middleware) |
| App DB | PostgreSQL via **pgbouncer** — an **init container** assembles `DATABASE_URL` from VSO dynamic creds |
| Event store | **ClickHouse** (see below) |
| GeoIP | MaxMind **GeoLite2** (`GeoLite2-Country` + `GeoLite2-City`), license key from secret `plausible-geoip` |
| Secrets | `SECRET_KEY_BASE` / `TOTP_VAULT_KEY` from existing secret `plausible-config` (VSO-fed) |
Plausible writes analytics events to ClickHouse and stores app/account state in PostgreSQL — two distinct backends, both reached through lab-internal services.
### clickhouse
[`clickhouse/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse) — the OLAP column store behind Plausible. Also a **Kustomize** overlay inflating the upstream chart, plus a `databases` sub-chart that runs an init job.
| Key | Value |
|---|---|
| Declared via | Kustomize `helmCharts:` inflation generator (`chartHome: charts`) |
| Chart / version | `clickhouse` `0.4.0` (`pascaliske/clickhouse`), image `clickhouse/clickhouse-server` |
| Workload | **StatefulSet**, `replicas: 1` |
| Storage | PVC **16Gi** at `/var/lib/clickhouse` (`ReadWriteOnce`) |
| Ports | `8123` (HTTP), `9000` (native protocol) |
| Custom user | `arcodange` (full network access, `access_management: 1`) via `custom-users.xml` |
| Security | `runAsUser/Group/fsGroup: 101`, non-root |
| Timezone | `Europe/Paris` |
> [!WARNING]
> The ClickHouse `arcodange` user password is **static and committed** in [`clickhouse/clickhouseValues.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse/clickhouseValues.yaml) (`custom-users.xml`). The same value appears in Grafana's provisioned datasource — keep the two in sync if you rotate it.
> [!CAUTION]
> ClickHouse carries a `nodeAffinity` that **excludes `pi2`** (`kubernetes.io/hostname NotIn [pi2]`). `pi2` hosts PostgreSQL and pgbouncer; ClickHouse is deliberately kept off it to avoid I/O contention on that node. A cluster where `pi2` is the only schedulable node will leave ClickHouse `Pending`.
---
## Tier 2 — supporting & inactive
| Component | Status | Notes |
|---|---|---|
| [`pgcat/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/pgcat) | ❌ disabled | Alternative Postgres pooler (`pgcat` chart `0.1.0`). Not in service — its sole pool has empty `username`/`password`/`database` placeholders, and it is **not** keyed under `tools:` in [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml), so ArgoCD renders no Application for it. [pgbouncer](#pgbouncer) is the active pooler. |
| [`tool/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/tool) | ✅ active (library) | Helm **library chart** (`type: library`, version `0.1.0`) consumed by **every** component chart via `dependencies:`. Ships shared templates/helpers; **not deployable** on its own. |
---
## Gotchas
> [!WARNING]
> **No high availability.** Every Tier-1 service runs a **single replica** — Vault (`standalone`), Prometheus (`replicaCount: 1`), Grafana (`replicas: 1`), ClickHouse and Redis/KeyDB StatefulSets (`replicas: 1`), Plausible and the CrowdSec LAPI (single Deployment). Any node drain or pod restart is a brief outage for that service, not a failover.
> [!WARNING]
> **Static, committed passwords.** Grafana admin (+ its ClickHouse datasource), the ClickHouse `arcodange` user, and the pgbouncer admin/auth users all carry plaintext credentials in their `values.yaml`. They are lab-only; rotate before any exposure and never copy them to a real environment.
> [!CAUTION]
> **ClickHouse must avoid `pi2`.** The `NotIn [pi2]` `nodeAffinity` keeps it off the PostgreSQL/pgbouncer host. If `pi2` is the only schedulable node, ClickHouse (and therefore Plausible analytics) stays `Pending`. See the [storage & recovery concept](../lab-ecosystem/storage-and-recovery.md) for how PVC-backed services map onto specific nodes.
> [!CAUTION]
> **Vault is single-instance and starts sealed.** After **any** restart (pod reschedule, node reboot, chart upgrade) Vault comes up **sealed** with no automatic unseal configured — every VSO injection and dynamic-secret lease blocks until an operator unseals it. This is the first thing to check when secrets stop flowing across the cluster; the unseal procedure lives in [Secrets & VSO](secrets-and-vso.md).