docs(vibe): add tools/ and cms/ guidebooks
Two code-grounded tree-docs guidebooks under vibe/guidebooks/, drilling into the lab-ecosystem 02-tools and 03-cms pages (bidirectional): - tools/ : hub + components.md (Vault+VSO, Prometheus, Grafana, CrowdSec, pgbouncer, Redis/KeyDB, Plausible, ClickHouse; pgcat/tool as Tier-2) + secrets-and-vso.md (Vault engines/auth, the app_roles/app_policy modules = the <app> join-key machinery, VSO CRDs, secret-paths inventory). - cms/ : hub + site.md (Nuxt + dual Pages/k3s deploy) + cloudflare.md (zone via OVH->CF, Pages, cloudflared tunnel, Turnstile, R2 state) + zoho-email.md (OAuth, MX/SPF/DKIM/DMARC/BIMI, the 7 aliases). Sibling-repo code linked via full gitea URLs; vibe-internal links bidirectional. Reconciled the cloudflared tunnel token path to kvv2 cms/cloudflared (the chart VaultStaticSecret is kv-v2; the kvv1 tofu reference is a commented-out stub). 6 mermaid diagrams MCP-validated; zero dead links. Lab Cartographer cohort. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
218
vibe/guidebooks/tools/components.md
Normal file
218
vibe/guidebooks/tools/components.md
Normal file
@@ -0,0 +1,218 @@
|
||||
[vibe](../../README.md) > [Guidebooks](../README.md) > [Tools](README.md) > **Components**
|
||||
|
||||
# Components
|
||||
|
||||
> **Status:** ✅ Active
|
||||
> **Last Updated:** 2026-06-23
|
||||
> **Upstream:** [Tools hub](README.md) · [lab-ecosystem 02 · tools](../lab-ecosystem/02-tools.md)
|
||||
> **Downstream:** [Secrets & VSO](secrets-and-vso.md)
|
||||
> **Related:** [storage & recovery concept](../lab-ecosystem/storage-and-recovery.md) · [secrets-and-vault concept](../lab-ecosystem/secrets-and-vault.md) · [naming conventions](../lab-ecosystem/naming-conventions.md)
|
||||
|
||||
This is the **per-component reference** for the `tools` platform layer: pinned chart/app versions, the values that actually matter (replicas, storage, ports, auth), and the cross-service wiring. Every component lands in the single **`tools` namespace**. For the deploy model (how one ArgoCD Application fans out into one per component) see the [Tools hub](README.md); for how Vault secrets reach the pods see [Secrets & VSO](secrets-and-vso.md).
|
||||
|
||||
Components split into two **tiers**:
|
||||
|
||||
- **Tier 1** — the load-bearing services, each with its own subsection and value tables below.
|
||||
- **Tier 2** — supporting / inactive pieces, summarised in a single table.
|
||||
|
||||
Severity legend (GitHub alerts): `[!NOTE]` informational · `[!TIP]` good-to-know · `[!WARNING]` operational hazard · `[!CAUTION]` live risk.
|
||||
|
||||
---
|
||||
|
||||
## Tier 1 — load-bearing services
|
||||
|
||||
### hashicorp-vault
|
||||
|
||||
[`hashicorp-vault/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/hashicorp-vault) — the lab's secrets brain. The chart bundles **three** dependencies: the upstream `vault` server, the `vault-secrets-operator` (VSO) that injects secrets into pods, and the shared `tool` library chart.
|
||||
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
| Chart deps | `vault` `0.28.1`, `vault-secrets-operator` `0.9.0`, `tool` `0.1.0` |
|
||||
| Mode | `standalone` (single instance, **not** HA / raft) |
|
||||
| Storage | `storage "file"` at `/vault/data` + audit storage enabled |
|
||||
| Listener | TLS **off** (`tls_disable = 1`) on `[::]:8200` — terminated at the edge |
|
||||
| Ingress | `vault.arcodange.lab` (Traefik `websecure`, Let's Encrypt, `localIp@file` middleware) |
|
||||
| UI | enabled (`ui = true`) |
|
||||
| Log level | `trace` |
|
||||
|
||||
**Mounts (secret engines) exposed:**
|
||||
|
||||
| Mount | Type | Purpose |
|
||||
|---|---|---|
|
||||
| `kvv1` | KV v1 | Static secrets (legacy / v1 layout) |
|
||||
| `kvv2` | KV v2 | Versioned static secrets (primary store) |
|
||||
| `transit` | transit | Encryption-as-a-service; backs VSO client-cache (`vso-client-cache` key) |
|
||||
| `postgres` | database | Dynamic PostgreSQL credentials (connection via `pgbouncer.tools:5432`) |
|
||||
|
||||
**Auth methods enabled:**
|
||||
|
||||
| Method | Used by |
|
||||
|---|---|
|
||||
| `kubernetes` | In-cluster workloads (VSO, app ServiceAccounts) authenticate by SA token |
|
||||
| `gitea_jwt` | Gitea Actions / OIDC-JWT pipelines authenticate from CI |
|
||||
|
||||
> [!NOTE]
|
||||
> The full secret-engine layout, VSO `VaultAuth` / `VaultConnection` / `VaultDynamicSecret` wiring, and the `kvv2/data/...` path conventions are documented in [Secrets & VSO](secrets-and-vso.md) — this page only inventories what the chart stands up.
|
||||
|
||||
The VSO sub-chart ships a `defaultVaultConnection` pointing at `http://hashicorp-vault.tools.svc.cluster.local:8200` and a client cache with `persistenceModel: direct-encrypted`, encrypted through the `transit` mount.
|
||||
|
||||
### prometheus
|
||||
|
||||
[`prometheus/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/prometheus) — metrics collection and TSDB, via the `kube-prometheus`-style community chart.
|
||||
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
| Chart deps | `prometheus` `28.13.0` (app `v3.10.0`), `tool` `0.1.0` |
|
||||
| Server replicas | `1` (Deployment, `strategy: Recreate`) |
|
||||
| Server storage | `persistentVolume` enabled, **8Gi** at `/data` (`ReadWriteOnce`) |
|
||||
| Retention | `15d` |
|
||||
| Alertmanager | enabled, persistence **2Gi** (`ReadWriteOnce`) |
|
||||
| node-exporter | enabled (DaemonSet, `prometheus-node-exporter` sub-chart) |
|
||||
| kube-state-metrics | enabled |
|
||||
| pushgateway | enabled (`prometheus.io/probe: pushgateway`) |
|
||||
| Scrape / eval interval | `1m` (scrape timeout `10s`) |
|
||||
| Ingress | none — **internal only** |
|
||||
|
||||
**Scrape targets** (default `scrapeConfigs`, all enabled): the Prometheus server itself, the Kubernetes API servers, nodes + kubelet cadvisor, plus **annotation-based** service-endpoint and pod discovery (`prometheus.io/scrape`, `prometheus.io/port`, `prometheus.io/path`, `prometheus.io/scheme`), with `*-slow` (5m) variants for cheaper targets.
|
||||
|
||||
### grafana
|
||||
|
||||
[`grafana/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/grafana) — dashboards over Prometheus and ClickHouse.
|
||||
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
| Chart deps | `grafana` `10.3.0` (app `latest`), `tool` `0.1.0` |
|
||||
| Replicas | `1` (Deployment, `RollingUpdate`) |
|
||||
| Persistence | **disabled** — ephemeral; dashboards/datasources are provisioned at boot |
|
||||
| Ingress | `grafana.arcodange.lab` (Traefik `websecure`, Let's Encrypt, `localIp@file` middleware) |
|
||||
| Plugin | `grafana-clickhouse-datasource` |
|
||||
| Resources | requests `100m` / `128Mi`, limits `100m` / `512Mi` |
|
||||
| Timezone | `Europe/Paris` |
|
||||
|
||||
**Datasources (provisioned):**
|
||||
|
||||
| Name | Type | Target | Default |
|
||||
|---|---|---|---|
|
||||
| Prometheus | `prometheus` | `http://prometheus-server.tools.svc.cluster.local` | ✅ yes |
|
||||
| clickhouse | `grafana-clickhouse-datasource` | `clickhouse.tools.svc.cluster.local:9000` (native, `tlsSkipVerify`) | no |
|
||||
|
||||
> [!WARNING]
|
||||
> The Grafana **admin password is static and committed** in [`grafana/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/grafana/values.yaml) (`adminUser: admin`). The provisioned ClickHouse datasource password is committed there too (`secureJsonData.password`). Treat these as lab-only credentials; do not reuse them outside the homelab.
|
||||
|
||||
### crowdsec
|
||||
|
||||
[`crowdsec/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/crowdsec) — behavioural edge security that feeds a Traefik blocklist.
|
||||
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
| Chart deps | `crowdsec` `0.20.1`, `tool` `0.1.0` |
|
||||
| LAPI | Deployment (`RollingUpdate`, `maxUnavailable: 0`) — the local API + decision store |
|
||||
| Agent | DaemonSet pinned to control-plane nodes (`node-role.kubernetes.io/control-plane`) |
|
||||
| Log source | parses **Traefik** pod logs in `kube-system` (`podName: traefik-*`, `program: traefik`) |
|
||||
| Collections | `crowdsecurity/traefik`, `crowdsecurity/http-cve` (+ AppSec rules below) |
|
||||
| AppSec (WAF) | **enabled** — `crowdsecurity/appsec-default` on `0.0.0.0:7422`; collections `appsec-virtual-patching` + `appsec-generic-rules` |
|
||||
| Database | external PostgreSQL `crowdsec` via **pgbouncer** (`host: pgbouncer.tools:5432`, `type: postgresql`) |
|
||||
| DB credentials | dynamic, from secret `crowdsec-db-credentials` (`DB_USER` / `DB_PASSWORD`, sourced via VSO) |
|
||||
| Console | enrolled as instance `homelab` |
|
||||
|
||||
The decisions CrowdSec produces are surfaced as a **Traefik middleware blocklist applied at the edge**, so malicious IPs are dropped before they reach app namespaces. `server_reset_query: DEALLOCATE ALL` on pgbouncer (below) exists specifically to keep CrowdSec's prepared statements happy through the pooler. The CAPTCHA challenge CrowdSec serves on remediated requests is a **Cloudflare Turnstile widget minted by the `cms` repo** — see the [CMS Cloudflare page](../cms/cloudflare.md), which produces the sitekey/secret this bouncer consumes from Vault.
|
||||
|
||||
### pgbouncer
|
||||
|
||||
[`pgbouncer/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/pgbouncer) — the connection pooler in front of the **external** PostgreSQL.
|
||||
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
| Chart deps | `pgbouncer` `2.3.1` (`icoretech/pgbouncer`), `tool` `0.1.0` |
|
||||
| Scheduling | `nodeSelector: kubernetes.io/hostname: pi2` (co-located with PostgreSQL) |
|
||||
| Upstream DB | external PostgreSQL at `192.168.1.202:5432` (the `pi2` host), wildcard database `"*"` |
|
||||
| Auth type | `scram-sha-256` |
|
||||
| `auth_query` | `SELECT uname, phash FROM user_lookup($1)` |
|
||||
| `server_reset_query` | `DEALLOCATE ALL` (clears prepared statements — fixes CrowdSec re-use) |
|
||||
| `server_idle_timeout` | `7200` (2h) |
|
||||
| `ignore_startup_parameters` | `extra_float_digits` (unsupported JDBC arg) |
|
||||
| Exporter | disabled |
|
||||
| Service | `pgbouncer.tools:5432` (cluster-internal) |
|
||||
|
||||
> [!NOTE]
|
||||
> pgbouncer is the single front door to the lab's PostgreSQL: CrowdSec, Plausible, and Vault's `postgres` dynamic-secret backend all connect through `pgbouncer.tools:5432`, never to `192.168.1.202` directly.
|
||||
|
||||
### redis (KeyDB)
|
||||
|
||||
[`redis/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/redis) — the in-memory cache / session store. The chart targets **KeyDB** (EqAlpha, Redis-compatible), tuned for the 2× Raspberry Pi 5 nodes.
|
||||
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
| Chart deps | `redis` `2.1.0` (`pascaliske/redis`), `tool` `0.1.0` |
|
||||
| Workload | **StatefulSet** (master at index 0, replica running `replicaof` the master) |
|
||||
| Storage | PVC `create: true`, **1Gi** at `/data` (`ReadWriteOnce`) |
|
||||
| Tuning | `server-threads 4` (ARM-tuned for the Pi 5 cores) |
|
||||
| Port | `6379` (`ClusterIP`) |
|
||||
| Security | `runAsUser/Group/fsGroup: 999`, non-root |
|
||||
| Timezone | `Europe/Paris` |
|
||||
|
||||
> [!NOTE]
|
||||
> Access the instance for inspection with `kubectl port-forward -n tools svc/redis 6379:6379` and Redis Insights (per the [chart README](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/redis/README.md)).
|
||||
|
||||
### plausible
|
||||
|
||||
[`plausible/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/plausible) — privacy-friendly web analytics. Deployed via a **Kustomize** overlay that inflates the upstream Helm chart (not a `Chart.yaml` dependency like the Tier-1 charts above).
|
||||
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
| Declared via | Kustomize `helmCharts:` inflation generator |
|
||||
| Chart / version | `plausible` `2.0.0` (`pascaliske/plausible`), image `ghcr.io/plausible/community-edition` |
|
||||
| Replicas | `1` (Deployment) |
|
||||
| Ingress | `analytics.arcodange.lab` (Traefik IngressRoute, Let's Encrypt, `localIp@file` middleware) |
|
||||
| App DB | PostgreSQL via **pgbouncer** — an **init container** assembles `DATABASE_URL` from VSO dynamic creds |
|
||||
| Event store | **ClickHouse** (see below) |
|
||||
| GeoIP | MaxMind **GeoLite2** (`GeoLite2-Country` + `GeoLite2-City`), license key from secret `plausible-geoip` |
|
||||
| Secrets | `SECRET_KEY_BASE` / `TOTP_VAULT_KEY` from existing secret `plausible-config` (VSO-fed) |
|
||||
|
||||
Plausible writes analytics events to ClickHouse and stores app/account state in PostgreSQL — two distinct backends, both reached through lab-internal services.
|
||||
|
||||
### clickhouse
|
||||
|
||||
[`clickhouse/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse) — the OLAP column store behind Plausible. Also a **Kustomize** overlay inflating the upstream chart, plus a `databases` sub-chart that runs an init job.
|
||||
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
| Declared via | Kustomize `helmCharts:` inflation generator (`chartHome: charts`) |
|
||||
| Chart / version | `clickhouse` `0.4.0` (`pascaliske/clickhouse`), image `clickhouse/clickhouse-server` |
|
||||
| Workload | **StatefulSet**, `replicas: 1` |
|
||||
| Storage | PVC **16Gi** at `/var/lib/clickhouse` (`ReadWriteOnce`) |
|
||||
| Ports | `8123` (HTTP), `9000` (native protocol) |
|
||||
| Custom user | `arcodange` (full network access, `access_management: 1`) via `custom-users.xml` |
|
||||
| Security | `runAsUser/Group/fsGroup: 101`, non-root |
|
||||
| Timezone | `Europe/Paris` |
|
||||
|
||||
> [!WARNING]
|
||||
> The ClickHouse `arcodange` user password is **static and committed** in [`clickhouse/clickhouseValues.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/clickhouse/clickhouseValues.yaml) (`custom-users.xml`). The same value appears in Grafana's provisioned datasource — keep the two in sync if you rotate it.
|
||||
|
||||
> [!CAUTION]
|
||||
> ClickHouse carries a `nodeAffinity` that **excludes `pi2`** (`kubernetes.io/hostname NotIn [pi2]`). `pi2` hosts PostgreSQL and pgbouncer; ClickHouse is deliberately kept off it to avoid I/O contention on that node. A cluster where `pi2` is the only schedulable node will leave ClickHouse `Pending`.
|
||||
|
||||
---
|
||||
|
||||
## Tier 2 — supporting & inactive
|
||||
|
||||
| Component | Status | Notes |
|
||||
|---|---|---|
|
||||
| [`pgcat/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/pgcat) | ❌ disabled | Alternative Postgres pooler (`pgcat` chart `0.1.0`). Not in service — its sole pool has empty `username`/`password`/`database` placeholders, and it is **not** keyed under `tools:` in [`chart/values.yaml`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/chart/values.yaml), so ArgoCD renders no Application for it. [pgbouncer](#pgbouncer) is the active pooler. |
|
||||
| [`tool/`](https://gitea.arcodange.lab/arcodange-org/tools/src/branch/main/tool) | ✅ active (library) | Helm **library chart** (`type: library`, version `0.1.0`) consumed by **every** component chart via `dependencies:`. Ships shared templates/helpers; **not deployable** on its own. |
|
||||
|
||||
---
|
||||
|
||||
## Gotchas
|
||||
|
||||
> [!WARNING]
|
||||
> **No high availability.** Every Tier-1 service runs a **single replica** — Vault (`standalone`), Prometheus (`replicaCount: 1`), Grafana (`replicas: 1`), ClickHouse and Redis/KeyDB StatefulSets (`replicas: 1`), Plausible and the CrowdSec LAPI (single Deployment). Any node drain or pod restart is a brief outage for that service, not a failover.
|
||||
|
||||
> [!WARNING]
|
||||
> **Static, committed passwords.** Grafana admin (+ its ClickHouse datasource), the ClickHouse `arcodange` user, and the pgbouncer admin/auth users all carry plaintext credentials in their `values.yaml`. They are lab-only; rotate before any exposure and never copy them to a real environment.
|
||||
|
||||
> [!CAUTION]
|
||||
> **ClickHouse must avoid `pi2`.** The `NotIn [pi2]` `nodeAffinity` keeps it off the PostgreSQL/pgbouncer host. If `pi2` is the only schedulable node, ClickHouse (and therefore Plausible analytics) stays `Pending`. See the [storage & recovery concept](../lab-ecosystem/storage-and-recovery.md) for how PVC-backed services map onto specific nodes.
|
||||
|
||||
> [!CAUTION]
|
||||
> **Vault is single-instance and starts sealed.** After **any** restart (pod reschedule, node reboot, chart upgrade) Vault comes up **sealed** with no automatic unseal configured — every VSO injection and dynamic-secret lease blocks until an operator unseals it. This is the first thing to check when secrets stop flowing across the cluster; the unseal procedure lives in [Secrets & VSO](secrets-and-vso.md).
|
||||
Reference in New Issue
Block a user