Commit Graph

95 Commits

Author SHA1 Message Date
6ede249da9 🔒 fix(ansible): gate vault auth disable behind vault_oidc_force_reset (default off) (#5)
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-06 15:03:33 +02:00
9e821e1626 ♻️ refactor(ansible): move gitea secret user-propagation list to inventory (#4)
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-06 14:48:05 +02:00
69b7e9ddcb Merge remote-tracking branch 'origin/main' 2026-05-06 14:38:01 +02:00
069edd72f1 chore(cicd): drop temporary commented-out tasks from 03_cicd.yml
Removes the commented PACKAGES_TOKEN/HOMELAB_CA_CERT blocks and the legacy
"Deploy Argo CD" play that were left behind during the migration to
Helm-based ArgoCD.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 14:37:48 +02:00
a644436746 🔒 fix(ansible): propagate vault_oauth__sh_b64 to user-owned namespaces (arcodange) (#3)
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-06 14:18:06 +02:00
a3526e51f8 Merge remote-tracking branch 'origin/main' 2026-05-06 12:58:01 +02:00
01f0f37691 chore(ansible): add per-collection ansible.cfg + drop trailing whitespace
ansible/arcodange/factory/ansible.cfg sets collections_path so ansible
commands run from inside the collection directory still find user-installed
collections under ~/.ansible/collections.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:55:54 +02:00
f114d7e6f0 feat(argocd): allow per-app syncPolicy override in values.yaml
The apps template hardcoded automated{prune,selfHeal} for every app. Some
apps (e.g. tools, where Vault unseal is manual) need a custom syncPolicy
without selfHeal. Read $app_attr.syncPolicy when set, fall back to the
existing automated default otherwise. Use the override on `tools` to keep
the existing behavior explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:55:49 +02:00
1688fe0dfd fix(crowdsec): clean up Failed pods before Traefik middleware reload
Re-running the role would leave behind crowdsec pods stuck in Failed phase
(typically after a config error on a previous run), which then blocked the
Traefik middleware refresh. Delete them up front so the next reconcile
schedules fresh pods.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:55:39 +02:00
499410a160 feat(cicd): persist gitea act-runner cache + isolate on dedicated docker network
Pins the actcache server to a fixed port (43707) and exposes it, then
mounts /mnt/arcodange/gitea-runner-cache and /mnt/arcodange/gitea-runner-act
into the runner so the actions/cache and act image layer cache survive
container restarts. Moves the runner onto a dedicated `gitea_action_network`
so CI job containers can reach the cache server by name without sharing the
host network.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:55:34 +02:00
e3e0decd98 docs(adr): extend network-architecture ADR with .lab SSL/TLS deep dive
Replaces the placeholder "Success Metrics" section with a detailed
walkthrough of the internal PKI: Step CA provisioners, cert-manager +
StepClusterIssuer wiring, certificate issuance/renewal sequence diagram,
device-trust installation steps, and troubleshooting playbook for the
common stuck-CertificateRequest / Traefik TLS / device-trust failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:55:27 +02:00
1ae28cb944 docs(longhorn): document 2026-04-13 power-cut recovery + add data-recovery tooling
Captures the post-mortem of the April 13 power-cut: incident timeline,
retrospective, and architecture/role diagrams. Adds an ADR explaining why
Longhorn cannot re-associate orphaned replica directories after a nuclear
reinstall (engine-id naming), plus block-device recovery runbooks and the
`playbooks/recover/longhorn_data.yml` automation that wires `merge-longhorn-layers.py`
to rebuild PVCs from raw `volume-head-*.img` chains.

Also extends the k3s_pvc backup to capture Longhorn `volumes`/`settings` CRDs
(needed for the fast-path restore) and rewrites the restore script with a
fallback dir + English messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:55:18 +02:00
934b62d922 chore(ansible): use project-local uv venv for ansible runtime deps
Moves the local ansible runtime from a global `uv tool install ansible-core`
(which required remembering `--with kubernetes --with jmespath --with dnspython`)
to a project-managed venv described by `pyproject.toml` + `uv.lock`. Fixes the
"Failed to import the required Python library (kubernetes)" error on localhost.

The localhost inventory entry now derives `ansible_python_interpreter` from
`{{ ansible_playbook_python }}`, so `uv run ansible-playbook` is enough — no
more hardcoded user-specific paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 12:35:28 +02:00
09a270d179 🤖 ci(postgres): declare dance-lessons-coach DB + role + pgbouncer lookup (#2)
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-06 08:18:30 +02:00
0ce004cc6a 🤖 ci(argocd): enroll dance-lessons-coach + per-app org override in apps template (#1)
Co-authored-by: Gabriel Radureau <arcodange@gmail.com>
Co-committed-by: Gabriel Radureau <arcodange@gmail.com>
2026-05-06 08:01:50 +02:00
e6fc24c101 fix(dns): harden DNS resilience after power-cut incident
During the 2026-04-13 power cut recovery, DNS resolution failures blocked
Longhorn reinstall. Root causes:
- CoreDNS forwarded to a single hardcoded Pi-hole IP instead of both HA instances
- CoreDNS main Corefile forwarded to /etc/resolv.conf which pointed to itself on pi3
- Pi-hole lacked explicit upstream DNS, relying on DHCP-provided config
- dnsmasq system service conflicted with pihole-FTL on port 53

Changes:
- k3s_dns: forward CoreDNS to both Pi-hole HA instances (pi1 + pi3) dynamically
- k3s_dns: update main CoreDNS Corefile to forward to Pi-holes instead of resolv.conf
- pihole defaults: add explicit upstream DNS servers (8.8.8.8, 1.1.1.1, 8.8.4.4)
- pihole ha_setup: write /etc/dnsmasq.d/99-upstream.conf with explicit upstreams
- rpi: add dnsmasq user to dip group and disable conflicting dnsmasq service on Pi-hole nodes

See docs/adr/20260414-internal-dns-architecture.md for full rationale.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:54:42 +02:00
355ab11c4d fix(system_docker): fix daemon.json corruption on re-run
Two bugs caused daemon.json to be overwritten with invalid content:
- Invalid `when` condition using unsupported Ansible inline stat syntax,
  causing the existing file read to be silently skipped and docker_config
  to always reset to {}
- Folded scalar `>` in set_fact converted the dict to a Python string
  representation, which to_nice_json serialized as a JSON string instead
  of an object

Fixes identified during 2026-04-13 power cut incident post-mortem.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 10:52:27 +02:00
ad70b424cf Add sequence diagram to Docker storage ADR
This commit adds a detailed sequence diagram to the Docker storage optimization ADR, illustrating the workflow for configuring Docker storage, pinning images, and maintaining Longhorn performance.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-04-08 11:33:03 +02:00
b299469d00 Consolidate ADRs into docs/adr/
This commit moves Architecture Decision Records (ADRs) from ../../../docs/adr/ to docs/adr/ in the arcodange/factory repository. This centralizes all ADRs in one location for better maintainability and discoverability.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-04-08 11:09:34 +02:00
fc9164f11e Update README with detailed playbook execution sequence
This commit updates the README to include a detailed timeline of the playbook execution sequence, organized into sections for system setup, application setup, CI/CD, tools, and backups.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-04-08 11:04:11 +02:00
c751b621ba Enable PostgreSQL backup in backup playbook
This commit uncomments the PostgreSQL backup section in the backup playbook to enable regular backups of the PostgreSQL database.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-04-08 11:04:07 +02:00
07a619b274 Fix step-issuer ARM64 compatibility on pi3
The default kube-rbac-proxy image (gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0) is AMD64-only and fails on pi3 (ARM64). This commit overrides the image to use quay.io/brancz/kube-rbac-proxy:v0.15.0, which supports ARM64.

Note: pi2 (ARMv7) may work with AMD64 images, but pi3 (ARM64) requires an ARM64-compatible image.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-04-08 11:04:03 +02:00
9931f81998 Update Docker storage configuration and revoke token task 2026-04-07 19:19:03 +02:00
437fd506ed Fix Vault Gitea OIDC setup: remove trailing slash from bound_issuer and pass CA certificate 2026-04-07 19:17:47 +02:00
943915be74 gitea act runner: reuse docker images 2026-04-07 09:20:30 +02:00
8a82d14797 upgrade gitea version to 1.25.5 2026-04-06 10:55:20 +02:00
0285d171ff tweack backup and setup cronjob to fix pg table ownership 2026-03-15 22:14:12 +01:00
55d137132f backup k3s volumes 2026-01-23 18:26:28 +01:00
451dfa5133 restart traefik when editing crowdsec middleware 2026-01-03 20:08:00 +01:00
17e99db641 runner image and setup for gitea workflow with self signed cert 2026-01-03 12:44:27 +01:00
07e5ff460b use self signed cert 2026-01-02 18:17:53 +01:00
5b3c896a25 use self signed cert for internal domain arcodange.lab 2025-12-31 17:38:04 +01:00
91219c49f1 use exposed webapp.arcodange.fr instead in gitea cicd 2025-12-23 14:23:12 +01:00
74b8676244 auto upgrade webapp image 2025-12-23 14:20:56 +01:00
1fd47e9d97 install pihole to fix failing duckdns name servers 2025-12-23 14:20:04 +01:00
0fbfbd589f tool plausible CE analytics database 2025-12-11 07:25:04 +01:00
8d6be311ae argocd: add --enable-helm to kustomize ; enable shell from web ui 2025-12-10 13:48:22 +01:00
2b4aa30a64 use cache redis with crowdsec traefik bouncer 2025-12-06 15:09:36 +01:00
cd3c4d86ff install socat package to enable kubectl port-forward 2025-12-06 15:09:12 +01:00
45d39d13b4 postgres db for crowdsec 2025-12-03 16:45:43 +01:00
f4cb04c9c9 configure crowdsec captcha with cloudflare turnstile 2025-12-03 16:45:25 +01:00
17a0f23bbb declare gitea external service 2025-12-01 16:22:44 +01:00
f7bfe2f71d get cloudflared client real ip and fix crowdsec mw 2025-11-29 17:24:51 +01:00
72628f0f0e add crowdsec plugin and middleware for traefik 2025-11-26 14:20:09 +01:00
b6d240ce31 configure ovh client and allow cms project to access zoho client 2025-11-07 13:54:52 +01:00
2d8f5de482 add s3 endpoint to cf r2 secret 2025-10-30 10:27:48 +01:00
140dab4f1d cloudflare management for cms 2025-10-30 10:17:14 +01:00
9b09e6bd86 fixes and set preferred_ip since new interface eth0 2025-10-09 17:27:42 +02:00
83410d9eb1 set cms application argo image updater strategy 2025-10-09 16:12:31 +02:00
fa5bc7e30e deploy argocd image updater 2025-10-09 15:01:05 +02:00