docs(vibe): bootstrap vibe/ knowledge tree + ecosystem AGENTS.md

Add a root AGENTS.md (ecosystem map of factory/tools/cms + agent operating
rules + the persona cohort & workflow) and a new vibe/ knowledge base for LLM
agents, modeled on tree-docs conventions and the factory house style.

vibe/ folders (each with a README hub + contribution rules):
- ADR/      optimized MADR-lite; canonical home going forward (doc/adr stays historical)
- PRD/      one subfolder per PRD, mandatory STATUS.md, QA strategy for big ones
- investigations/  single INV-NNN-slug.md, or stub + folder w/ notebooks
- guidebooks/      tree-docs maps; lab-ecosystem guidebook of factory+tools+cms
- runbooks/        [AGENT]/[HUMAN] step procedures (EN; doc/runbooks stays FR)
- shareouts/       dated FR handouts (decks/mp4)

Seed content (first ADR + PRD): a safe, production-like environment to rehearse
risky changes and recovery without touching real prod — local-only sandbox
(k3d + arm64 VMs) with a hard prod/sandbox isolation boundary. Includes
INV-001 (prod blast-radius couplings), the ecosystem guidebook, and a FR shareout.

Conventions enforced: no-tombstone rule, breadcrumb spine, bidirectional
cross-links, theme:base mermaid (MCP-validated) + ordered-list-after-diagram.
Built with a Workflow + persona cohort; 24 files, zero dead links.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-23 11:52:37 +02:00
parent 827af6b392
commit 7647a68cdc
25 changed files with 1878 additions and 0 deletions

60
vibe/runbooks/README.md Normal file
View File

@@ -0,0 +1,60 @@
[vibe](../README.md) > **Runbooks**
# Runbooks
> **Status:** Active (conventions + template only — first concrete runbook lands with PRD Phase 1)
> **Last Updated:** 2026-06-23
> **Related:** [vibe guidebooks](../guidebooks/README.md) · [vibe shareouts](../shareouts/README.md) · [FRENCH human runbooks under doc/runbooks](../../doc/runbooks/README.md)
## What lives here
`vibe/runbooks/` holds **agent-oriented operational runbooks, written in English** (this tree is for LLM agents). Each runbook is an ordered procedure where every step is tagged with an actor marker:
- **`[AGENT]`** — read-only or otherwise safe steps an agent may execute autonomously (inspecting state, dry-runs, generating files, running tests in a sandbox).
- **`[HUMAN]`** — production-mutating steps that require **explicit human approval** before they run (anything that writes to live infrastructure, deletes data, or changes the trunk).
The marker is load-bearing: it tells an agent reading the runbook exactly where its autonomy ends and where it must stop and hand control back to a human.
```mermaid
%%{init: {'theme': 'base'}}%%
flowchart LR
classDef agent fill:#059669,stroke:#047857,color:#fff
classDef human fill:#dc2626,stroke:#b91c1c,color:#fff
classDef gate fill:#7c3aed,stroke:#6d28d9,color:#fff
A["[AGENT] safe steps<br>(inspect, dry-run, generate)"]:::agent --> G{"approval<br>gate"}:::gate --> H["[HUMAN] prod-mutating steps<br>(explicit approval required)"]:::human
```
1. An agent executes the `[AGENT]`-tagged steps on its own — these only read state or act inside a sandbox.
2. When the procedure reaches a prod-mutating step, the agent stops at an approval gate.
3. A human reviews and approves; only then do the `[HUMAN]`-tagged steps run against live infrastructure.
## Not the same as `doc/runbooks`
> [!IMPORTANT]
> There are **two** runbook collections in this lab, and they serve different readers — do not merge them.
>
> | Collection | Reader | Language | Step markers |
> |---|---|---|---|
> | **`vibe/runbooks/`** (this folder) | LLM agents | English | `[AGENT]` / `[HUMAN]` |
> | **[`doc/runbooks/`](../../doc/runbooks/README.md)** | Human operators | French | prose procedures |
>
> The canonical, human-facing operator procedures (e.g. [Nouvelle application web](../../doc/runbooks/new-web-app/README.md)) live in French under `doc/runbooks/`. This folder is the agent-facing mirror: same operational reality, written so an autonomous agent can execute the safe parts and gate the dangerous ones.
## Index
| Runbook | Summary | Status |
|---|---|---|
| [_template](_template.md) | Skeleton for new agent-oriented runbooks (`[AGENT]`/`[HUMAN]` markers, copy-paste commands, verification + rollback) | ✅ Active |
> [!NOTE]
> The first **concrete** runbook — a local sandbox game-day for the safe prod-like environment — ships with **PRD Phase 1** ([safe-prod-like-environment PRD](../PRD/safe-prod-like-environment/README.md)). Until then this folder holds the conventions and the template only.
## Rules to contribute
1. **Start from [`_template.md`](_template.md).** Copy it, rename to `kebab-case.md`, fill every section, then add a row to the index table above.
2. **Tag every procedure step** `[AGENT]` or `[HUMAN]`. When in doubt, tag it `[HUMAN]` — over-gating is safe, under-gating is not.
3. **Use the `tree-docs` skill** and keep the breadcrumb spine: first line is the breadcrumb trail, ancestors as relative links, current page bold-unlinked, separator ` > `.
4. **README hub stays current** — every new runbook gets an index row here with a one-line summary and status.
5. **Bidirectional links.** If a runbook references a guidebook, ADR, or the French operator runbook, link back from there too. Use descriptive link text.
6. **Commands are copy-paste ready** — put them in fenced ```bash blocks, with the `[HUMAN]`/`[AGENT]` marker on the step that owns them.
7. **Status legend.** ✅ done · 🟡 beta · 🔴 critical · ⚠️ known issue · ❌ disabled · ⬜ not started.

View File

@@ -0,0 +1,80 @@
<!--
COPY THIS FILE to start a new agent-oriented runbook.
1. cp _template.md <kebab-case-name>.md
2. Replace the breadcrumb's bold last item with the new page title.
3. Fill every section below; delete this comment and all <…> placeholders.
4. Tag EVERY procedure step [AGENT] (read-only/safe) or [HUMAN] (prod-mutating, needs explicit approval). When in doubt, use [HUMAN].
5. Add a row for the new runbook to runbooks/README.md and wire any bidirectional links.
-->
[vibe](../README.md) > [Runbooks](README.md) > **_template**
# <Runbook title — imperative, e.g. "Run the local sandbox game-day">
> **Status:** ⬜ Not started
> **Audience:** LLM agents (English). For the human-operator equivalent see the French [doc/runbooks](../../doc/runbooks/README.md).
> **Last Updated:** 2026-06-23
## TL;DR
> [!TIP]
> <One or two sentences: what this runbook accomplishes and the single most important thing to know before starting. State up front which steps an agent may run alone and where the human approval gate sits.>
## Scope
<What this runbook covers, and explicitly what it does NOT cover. Name the systems touched (Gitea, Postgres, Vault, k3s, ArgoCD, …) and the `<app>` or environment in play.>
## Preconditions
<Bulleted, verifiable preconditions that must hold before the procedure runs. Examples:>
- [ ] Working in a worktree under `.claude/worktrees/<slug>/` (never the trunk).
- [ ] Access to <Vault role / k3s context / Gitea repo> confirmed.
- [ ] <Any backup taken / snapshot exists / CLUSTER_RECOVERY.md unseal key available>.
## Procedure
<Ordered steps. Each step is tagged [AGENT] (read-only/safe) or [HUMAN] (prod-mutating, requires explicit approval). Put copy-paste commands in fenced bash blocks owned by the step.>
1. **[AGENT]** <Inspect current state / dry-run — safe, no mutation.>
```bash
# read-only example
kubectl --context <ctx> get pods -n <app>
```
2. **[AGENT]** <Generate files, render manifests, run sandbox tests — safe.>
```bash
# safe generation / sandbox example
tofu -chdir=<path> plan
```
3. **[HUMAN]** <Prod-mutating step. STOP here for explicit approval before running.>
```bash
# prod-mutating example — only after approval
tofu -chdir=<path> apply
```
4. **[HUMAN]** <Any further live mutation, each individually gated.>
## Verification
<How to confirm the runbook succeeded. Prefer [AGENT]-runnable, read-only checks with expected output.>
```bash
# verification example
kubectl --context <ctx> get application <app> -n argocd -o jsonpath='{.status.sync.status}'
# expected: Synced
```
## Rollback
<How to undo each prod-mutating step if verification fails. Tag each rollback action [AGENT] or [HUMAN] just like the procedure. Reference CLUSTER_RECOVERY.md by name for full power-cut/cluster recovery (it lives outside this repo — name only, no link).>
## References
- <Related guidebook page, e.g. [Lab ecosystem](../guidebooks/lab-ecosystem/README.md)>
- <Related ADR under [doc/adr](../../doc/adr/README.md)>
- <Human-operator equivalent under [doc/runbooks](../../doc/runbooks/README.md)>