Consolidate ADRs into docs/adr/
This commit moves Architecture Decision Records (ADRs) from ../../../docs/adr/ to docs/adr/ in the arcodange/factory repository. This centralizes all ADRs in one location for better maintainability and discoverability. Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit is contained in:
160
ansible/arcodange/factory/docs/adr/20260407-cicd-architecture.md
Normal file
160
ansible/arcodange/factory/docs/adr/20260407-cicd-architecture.md
Normal file
@@ -0,0 +1,160 @@
|
|||||||
|
# ADR 20260407: CI/CD Architecture with ArgoCD, Gitea, and Vault
|
||||||
|
|
||||||
|
## Status
|
||||||
|
Proposed
|
||||||
|
|
||||||
|
## Context
|
||||||
|
The home lab requires a secure and automated CI/CD pipeline to deploy applications to the k3s cluster. The pipeline must integrate with:
|
||||||
|
- **Gitea**: For Git repository management and CI runners.
|
||||||
|
- **ArgoCD**: For GitOps-based continuous deployment.
|
||||||
|
- **Vault**: For secrets management and OIDC authentication.
|
||||||
|
- **Gitea Act Runner**: For executing CI jobs.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
We will implement a **GitOps-driven CI/CD pipeline** with the following components:
|
||||||
|
|
||||||
|
### 1. Gitea OIDC Authentication with Vault
|
||||||
|
- Gitea is registered as an OIDC application in Vault.
|
||||||
|
- Vault issues short-lived tokens for Gitea users.
|
||||||
|
- The `gitea_oidc_auth.yml` playbook automates this setup using Playwright and OpenTofu.
|
||||||
|
- **OIDC Workflow**:
|
||||||
|
1. The `oidc_jwt_token.sh` script (base64-encoded in `secrets.vault_oauth__sh_b64`) handles the OIDC flow.
|
||||||
|
2. Gitea Act Runner executes the script to obtain an ID token from Gitea.
|
||||||
|
3. The ID token is used to authenticate with Vault and retrieve secrets.
|
||||||
|
|
||||||
|
### 2. Gitea Act Runner
|
||||||
|
- Deployed on `pi1` and `pi3` (not on the Gitea host, which is `pi2`).
|
||||||
|
- Uses Docker-in-Docker for job execution.
|
||||||
|
- **Custom Runner Image (`ubuntu-latest-ca`)**: Required due to the self-signed `.lab` domain. The custom image includes the local CA certificate to trust the Gitea instance (`gitea.arcodange.lab`).
|
||||||
|
- Managed via Docker Compose (`03_cicd.yml`).
|
||||||
|
|
||||||
|
### 3. ArgoCD
|
||||||
|
- Deployed on the k3s cluster (via HelmChart in `/var/lib/rancher/k3s/server/manifests/argocd.yaml`).
|
||||||
|
- Uses Gitea as the source of truth for GitOps.
|
||||||
|
- Synchronizes the `factory` repository to deploy applications.
|
||||||
|
- Configured with Traefik for TLS termination.
|
||||||
|
|
||||||
|
### 4. Vault Secrets Operator
|
||||||
|
- Deployed in the `tools` namespace.
|
||||||
|
- Manages secrets for applications deployed via ArgoCD.
|
||||||
|
- Integrates with Gitea OIDC for authentication.
|
||||||
|
- **Helm Chart Integration**:
|
||||||
|
- `VaultAuth`: Authenticates with Vault using Kubernetes service accounts.
|
||||||
|
- `VaultStaticSecret`: Retrieves static secrets (e.g., `kvv2/webapp/config`).
|
||||||
|
- `VaultDynamicSecret`: Generates dynamic secrets (e.g., PostgreSQL credentials).
|
||||||
|
|
||||||
|
### 5. Security
|
||||||
|
- **TLS**: Traefik terminates TLS using Let's Encrypt.
|
||||||
|
- **OIDC**: Gitea authentication via Vault.
|
||||||
|
- **Secrets**: Stored in Vault, injected via the Vault Secrets Operator.
|
||||||
|
|
||||||
|
## Architecture Diagram
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#333333', 'edgeLabelBackground':'#f0f0f0', 'tertiaryColor': '#e67e22'}}}%%
|
||||||
|
graph TD
|
||||||
|
%% Styles
|
||||||
|
classDef gitea fill:#ffcc99,stroke:#cc9966,color:#333;
|
||||||
|
classDef argocd fill:#99ffcc,stroke:#66cc99,color:#333;
|
||||||
|
classDef vault fill:#ccccff,stroke:#6666cc,color:#333;
|
||||||
|
classDef k3s fill:#ff9999,stroke:#cc0000,color:#333;
|
||||||
|
classDef runner fill:#ffff99,stroke:#cccc00,color:#333;
|
||||||
|
|
||||||
|
%% Components
|
||||||
|
Gitea["Gitea (pi2)"]:::gitea
|
||||||
|
ArgoCD["ArgoCD (k3s)"]:::argocd
|
||||||
|
Vault["Vault (k3s/tools)"]:::vault
|
||||||
|
Runner1["Gitea Act Runner (pi1)"]:::runner
|
||||||
|
Runner2["Gitea Act Runner (pi3)"]:::runner
|
||||||
|
VaultOperator["Vault Secrets Operator (k3s/tools)"]:::vault
|
||||||
|
k3s["k3s Cluster"]:::k3s
|
||||||
|
|
||||||
|
%% Workflow
|
||||||
|
Gitea -->|OIDC Auth| Vault
|
||||||
|
Gitea -->|Trigger CI| Runner1
|
||||||
|
Gitea -->|Trigger CI| Runner2
|
||||||
|
Runner1 -->|Deploy to| k3s
|
||||||
|
Runner2 -->|Deploy to| k3s
|
||||||
|
ArgoCD -->|GitOps Sync| Gitea
|
||||||
|
ArgoCD -->|Deploy Apps| k3s
|
||||||
|
VaultOperator -->|Inject Secrets| k3s
|
||||||
|
Vault -->|Secrets| VaultOperator
|
||||||
|
|
||||||
|
%% Annotations
|
||||||
|
linkStyle 0,1,2,3,4,5,6,7 stroke:#999,stroke-width:1px;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
- **Automated Deployments**: ArgoCD ensures the cluster state matches Git.
|
||||||
|
- **Secure Secrets**: Vault centralizes secret management.
|
||||||
|
- **Scalable CI**: Gitea Act Runners can be added to any host.
|
||||||
|
- **OIDC Integration**: Secure authentication via Vault.
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
- **Complexity**: Multiple moving parts (Gitea, ArgoCD, Vault).
|
||||||
|
- **Dependency on Vault**: If Vault fails, CI/CD may be disrupted.
|
||||||
|
- **Learning Curve**: Requires familiarity with GitOps and Vault.
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
### Alternative 1: GitHub Actions
|
||||||
|
- **Rejected**: Self-hosted Gitea aligns better with the home lab's privacy goals.
|
||||||
|
|
||||||
|
### Alternative 2: Jenkins
|
||||||
|
- **Rejected**: ArgoCD + Gitea Act Runner is lighter and more GitOps-native.
|
||||||
|
|
||||||
|
### Alternative 3: No CI/CD
|
||||||
|
- **Rejected**: Manual deployments are error-prone and unscalable.
|
||||||
|
|
||||||
|
## Sequence Diagrams
|
||||||
|
|
||||||
|
### 1. CI/CD Workflow for OpenTofu/Terraform
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
participant Gitea
|
||||||
|
participant Runner as Gitea Act Runner (pi1/pi3)
|
||||||
|
participant Vault
|
||||||
|
participant WebApp as WebApp (k3s)
|
||||||
|
|
||||||
|
Gitea->>Runner: Trigger vault.yaml workflow
|
||||||
|
Runner->>Gitea: Execute vault_oauth__sh_b64 (OIDC)
|
||||||
|
Gitea-->>Runner: Return ID Token
|
||||||
|
Runner->>Vault: Authenticate with ID Token
|
||||||
|
Vault-->>Runner: Return Vault Token
|
||||||
|
Runner->>Runner: Run OpenTofu/Terraform
|
||||||
|
Runner->>Vault: Fetch Secrets (via Vault Action)
|
||||||
|
Vault-->>Runner: Return Secrets
|
||||||
|
Runner->>WebApp: Deploy Changes
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Vault Secrets Operator Workflow
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
participant ArgoCD
|
||||||
|
participant WebApp as WebApp (k3s)
|
||||||
|
participant VaultOperator as Vault Secrets Operator
|
||||||
|
participant Vault
|
||||||
|
|
||||||
|
ArgoCD->>WebApp: Deploy Helm Chart
|
||||||
|
WebApp->>VaultOperator: Create VaultAuth (K8s Auth)
|
||||||
|
VaultOperator->>Vault: Authenticate (K8s Service Account)
|
||||||
|
Vault-->>VaultOperator: Return Vault Token
|
||||||
|
WebApp->>VaultOperator: Create VaultStaticSecret (kvv2/webapp/config)
|
||||||
|
VaultOperator->>Vault: Fetch Static Secret
|
||||||
|
Vault-->>VaultOperator: Return Secret
|
||||||
|
VaultOperator->>WebApp: Inject Secret (secretkv)
|
||||||
|
WebApp->>VaultOperator: Create VaultDynamicSecret (postgres/creds/webapp)
|
||||||
|
VaultOperator->>Vault: Generate Dynamic Secret
|
||||||
|
Vault-->>VaultOperator: Return Credentials
|
||||||
|
VaultOperator->>WebApp: Inject Credentials (vso-db-credentials)
|
||||||
|
WebApp->>WebApp: Restart Pods (Rollout)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Success Metrics
|
||||||
|
- Gitea Act Runners successfully execute CI jobs.
|
||||||
|
- ArgoCD synchronizes the `factory` repository without errors.
|
||||||
|
- Vault Secrets Operator injects secrets into deployed applications.
|
||||||
@@ -0,0 +1,130 @@
|
|||||||
|
# ADR 20260407: Docker Storage Optimization for Gitea Act Runner
|
||||||
|
|
||||||
|
## Status
|
||||||
|
Proposed
|
||||||
|
|
||||||
|
## Context
|
||||||
|
The `pi3` machine (Raspberry Pi) is running both Docker and k3s, with the following storage constraints:
|
||||||
|
- Root filesystem (`/dev/mmcblk0p2`): 58G total, 89% used (6.4G free)
|
||||||
|
- External disk (`/dev/sda1`): 458G total, 22G used (413G free)
|
||||||
|
|
||||||
|
Gitea Act Runner images (`ubuntu-latest` and `ubuntu-latest-ca`) are frequently deleted, likely due to Docker's automatic garbage collection triggered by low disk space. This disrupts CI/CD pipelines.
|
||||||
|
|
||||||
|
### Current Setup
|
||||||
|
- Docker is configured via Ansible (`system_docker.yml`) using the `geerlingguy.docker` role.
|
||||||
|
- k3s is configured to use Docker as the container runtime (`--docker` flag).
|
||||||
|
- Longhorn is used for persistent storage in k3s, and we want to preserve its performance.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
We will implement a **hybrid storage strategy** to prevent Gitea Act Runner image deletion while maintaining Longhorn performance:
|
||||||
|
|
||||||
|
### 1. Pin Critical Images
|
||||||
|
Use a dummy container to pin the Gitea Act Runner images:
|
||||||
|
```yaml
|
||||||
|
# Add to system_docker.yml or a new playbook
|
||||||
|
- name: Pin Gitea Act Runner images
|
||||||
|
community.docker.docker_container:
|
||||||
|
name: pin-gitea-runner-ubuntu-latest-ca
|
||||||
|
image: gitea.arcodange.lab/arcodange-org/runner-images:ubuntu-latest-ca
|
||||||
|
state: present
|
||||||
|
command: ["sh", "-c", "sleep infinity"]
|
||||||
|
auto_remove: false
|
||||||
|
restart_policy: unless-stopped
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Configure Docker Storage with Overlay on External Disk
|
||||||
|
Modify `/etc/docker/daemon.json` to use the external disk for storage while keeping the root filesystem for metadata:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"data-root": "/mnt/arcodange/docker",
|
||||||
|
"storage-driver": "overlay2",
|
||||||
|
"storage-opts": ["overlay2.override_kernel_check=true"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Ansible Implementation
|
||||||
|
Update `system_docker.yml` to:
|
||||||
|
1. Create `/mnt/arcodange/docker` if it doesn't exist.
|
||||||
|
2. Configure Docker to use the external disk.
|
||||||
|
3. Pin critical images post-installation.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Add to system_docker.yml tasks
|
||||||
|
- name: Ensure Docker storage directory exists on external disk
|
||||||
|
ansible.builtin.file:
|
||||||
|
path: /mnt/arcodange/docker
|
||||||
|
state: directory
|
||||||
|
mode: '0755'
|
||||||
|
owner: root
|
||||||
|
group: docker
|
||||||
|
|
||||||
|
- name: Configure Docker to use external storage
|
||||||
|
ansible.builtin.copy:
|
||||||
|
dest: /etc/docker/daemon.json
|
||||||
|
content: |
|
||||||
|
{
|
||||||
|
"data-root": "/mnt/arcodange/docker",
|
||||||
|
"storage-driver": "overlay2",
|
||||||
|
"storage-opts": ["overlay2.override_kernel_check=true"],
|
||||||
|
"log-driver": "json-file",
|
||||||
|
"log-opts": {
|
||||||
|
"max-size": "10m",
|
||||||
|
"max-file": "5"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
mode: '0644'
|
||||||
|
notify: Redémarrer Docker
|
||||||
|
|
||||||
|
- name: Pin Gitea Act Runner images
|
||||||
|
community.docker.docker_container:
|
||||||
|
name: "{{ item.name }}"
|
||||||
|
image: "{{ item.image }}"
|
||||||
|
state: present
|
||||||
|
command: ["sh", "-c", "sleep infinity"]
|
||||||
|
auto_remove: false
|
||||||
|
restart_policy: unless-stopped
|
||||||
|
loop:
|
||||||
|
- { name: "pin-gitea-runner-ubuntu-latest", image: "gitea/runner-images:ubuntu-latest" }
|
||||||
|
- { name: "pin-gitea-runner-ubuntu-latest-ca", image: "gitea.arcodange.lab/arcodange-org/runner-images:ubuntu-latest-ca" }
|
||||||
|
```
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
- **Prevents Image Deletion**: Critical images are pinned and won't be garbage-collected.
|
||||||
|
- **Preserves Longhorn Performance**: Longhorn continues to use the root filesystem for its operations, maintaining performance.
|
||||||
|
- **Scalable Storage**: Docker images are stored on the external disk (413G free), preventing root filesystem exhaustion.
|
||||||
|
- **No k3s Changes Required**: k3s continues to use Docker as the runtime without modification.
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
- **Migration Effort**: Existing Docker data must be migrated to the external disk (one-time operation).
|
||||||
|
- **Dependency on External Disk**: If `/dev/sda1` fails, Docker will not function until the disk is remounted or the configuration is reverted.
|
||||||
|
- **Slight Performance Overhead**: Accessing images from the external disk may be slightly slower than the root filesystem (mitigated by SSD/HDD performance).
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
### Alternative 1: Increase Root Filesystem Size
|
||||||
|
- **Rejected**: The SD card is already at capacity, and expanding it is not feasible.
|
||||||
|
|
||||||
|
### Alternative 2: Disable Docker Garbage Collection
|
||||||
|
- **Rejected**: This would risk filling the root filesystem completely, causing system instability.
|
||||||
|
|
||||||
|
### Alternative 3: Use k3s Image Garbage Collection
|
||||||
|
- **Rejected**: k3s does not provide fine-grained control over image retention for non-k8s workloads (e.g., Gitea Act Runner).
|
||||||
|
|
||||||
|
### Alternative 4: Save/Load Images Manually
|
||||||
|
- **Rejected**: Manual intervention is not scalable and does not address the root cause.
|
||||||
|
|
||||||
|
## Migration Plan
|
||||||
|
1. **Backup**: Save critical images to `/mnt/arcodange`:
|
||||||
|
```bash
|
||||||
|
docker save gitea.arcodange.lab/arcodange-org/runner-images:ubuntu-latest-ca -o /mnt/arcodange/gitea-runner-backup.tar
|
||||||
|
```
|
||||||
|
2. **Update Ansible**: Apply the changes to `system_docker.yml`.
|
||||||
|
3. **Run Playbook**: Execute the playbook to reconfigure Docker.
|
||||||
|
4. **Verify**: Ensure Gitea Act Runner functions correctly post-migration.
|
||||||
|
|
||||||
|
## Success Metrics
|
||||||
|
- Gitea Act Runner images are no longer deleted between runs.
|
||||||
|
- Root filesystem usage drops below 80%.
|
||||||
|
- CI/CD pipelines complete without image pull errors.
|
||||||
@@ -0,0 +1,334 @@
|
|||||||
|
# ADR 20260407: Network Architecture
|
||||||
|
|
||||||
|
## Status
|
||||||
|
Proposed
|
||||||
|
|
||||||
|
## Context
|
||||||
|
The home lab requires a secure and resilient network architecture to support:
|
||||||
|
- Internal services (`.lab` domain).
|
||||||
|
- External services (`.arcodange.fr` domain).
|
||||||
|
- DNS resolution and ad-blocking (Pi-hole).
|
||||||
|
- TLS certificate management (Step CA).
|
||||||
|
- Ingress routing (Traefik).
|
||||||
|
- CDN and DDoS protection (Cloudflare).
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
We will implement a **multi-layered network architecture** with the following components:
|
||||||
|
|
||||||
|
### 1. External Layer (Internet)
|
||||||
|
- **Cloudflare**: CDN, DDoS protection, and DNS for `.arcodange.fr`.
|
||||||
|
- **DuckDNS**: Dynamic DNS for external access.
|
||||||
|
- **Livebox**: ISP-provided gateway (NAT, DHCP, firewall).
|
||||||
|
|
||||||
|
### 2. Internal Layer (Home Lab)
|
||||||
|
- **Pi-hole (pi1, pi3)**: DNS sinkhole for ad-blocking and internal DNS resolution.
|
||||||
|
- **Step CA (pi1)**: Internal certificate authority for `.lab` domain.
|
||||||
|
- **Traefik (k3s)**: Ingress controller with TLS termination.
|
||||||
|
- **k3s Cluster**: Hosts internal services with Longhorn storage.
|
||||||
|
|
||||||
|
### 3. DNS Architecture
|
||||||
|
- **Pi-hole**: Primary DNS for internal clients.
|
||||||
|
- Forwards `.lab` queries to Step CA.
|
||||||
|
- Forwards external queries to Cloudflare (1.1.1.1).
|
||||||
|
- **Step CA**: Issues certificates for `.lab` services.
|
||||||
|
- **Cloudflare**: Manages `.arcodange.fr` DNS records.
|
||||||
|
|
||||||
|
### 4. Ingress and TLS
|
||||||
|
- **Traefik**: Terminates TLS for both `.lab` and `.arcodange.fr` domains.
|
||||||
|
- Uses Let's Encrypt for `.arcodange.fr`.
|
||||||
|
- Uses Step CA for `.lab`.
|
||||||
|
- **Helm Chart Annotations**:
|
||||||
|
- `traefik.ingress.kubernetes.io/router.entrypoints: websecure`
|
||||||
|
- `traefik.ingress.kubernetes.io/router.tls.certresolver: letsencrypt`
|
||||||
|
- `traefik.ingress.kubernetes.io/router.middlewares: localIp@file`
|
||||||
|
|
||||||
|
### 5. Security
|
||||||
|
- **Cloudflare Tunnel**: Securely exposes internal services without port forwarding.
|
||||||
|
- **CrowdSec**: Intrusion detection and banning.
|
||||||
|
- **Traefik Middlewares**: IP filtering, rate limiting, and authentication.
|
||||||
|
- **Cloudflare Turnstile**: CAPTCHA protection for public-facing services.
|
||||||
|
|
||||||
|
## Architecture Diagrams
|
||||||
|
|
||||||
|
### 0. High-Level Network Architecture (Architecture Beta)
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
%%{init: {'theme': 'neutral', 'themeVariables': {
|
||||||
|
'primaryColor': '#f0f0f0',
|
||||||
|
'primaryBorderColor': '#333333',
|
||||||
|
'primaryTextColor': '#333333',
|
||||||
|
'lineColor': '#333333',
|
||||||
|
'tertiaryColor': '#e67e22'
|
||||||
|
}}}%%
|
||||||
|
architectureBeta
|
||||||
|
%% External Layer
|
||||||
|
box "Internet" #f9f9f9
|
||||||
|
component Cloudflare["Cloudflare\n(CDN/DNS)"] #f9f9f9
|
||||||
|
component DuckDNS["DuckDNS\n(DDNS)"] #f9f9f9
|
||||||
|
end
|
||||||
|
|
||||||
|
%% External Gateway
|
||||||
|
box "External Gateway" #e6e6e6
|
||||||
|
component Livebox["Livebox\n(NAT/Firewall)"] #e6e6e6
|
||||||
|
end
|
||||||
|
|
||||||
|
%% Internal Layer
|
||||||
|
box "Internal Network\n(192.168.1.0/24)" #d4d4d4
|
||||||
|
%% DNS Layer
|
||||||
|
box "DNS" #ffff99
|
||||||
|
component PiHole1["Pi-hole\n(pi1)"] #ffff99
|
||||||
|
component PiHole3["Pi-hole\n(pi3)"] #ffff99
|
||||||
|
component StepCA["Step CA\n(pi1)"] #ccccff
|
||||||
|
end
|
||||||
|
|
||||||
|
%% k3s Layer
|
||||||
|
box "k3s Cluster" #ff9999
|
||||||
|
component Traefik["Traefik\n(Ingress)"] #ff9999
|
||||||
|
component CrowdSec["CrowdSec\n(Security)"] #ff9999
|
||||||
|
component Gitea["Gitea\n(pi2)"] #ffcc99
|
||||||
|
component Vault["Vault\n(Secrets)"] #ccccff
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
%% Connections
|
||||||
|
Cloudflare --> Livebox : "DNS"
|
||||||
|
DuckDNS --> Livebox : "DDNS"
|
||||||
|
Livebox --> PiHole1 : "NAT"
|
||||||
|
Livebox --> PiHole3 : "NAT"
|
||||||
|
Livebox --> Traefik : "NAT"
|
||||||
|
PiHole1 --> StepCA : "Forward .lab"
|
||||||
|
PiHole1 --> Cloudflare : "Forward External"
|
||||||
|
PiHole3 --> StepCA : "Forward .lab"
|
||||||
|
PiHole3 --> Cloudflare : "Forward External"
|
||||||
|
Traefik --> Cloudflare : "TLS (Let's Encrypt)"
|
||||||
|
Traefik --> StepCA : "TLS (Step CA)"
|
||||||
|
CrowdSec --> Traefik : "Ban IPs"
|
||||||
|
Traefik --> Gitea : "Route"
|
||||||
|
Traefik --> Vault : "Route"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 1. High-Level Network Architecture
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#333333', 'edgeLabelBackground':'#f0f0f0', 'tertiaryColor': '#f89136'}}}%%
|
||||||
|
graph TD
|
||||||
|
%% Styles
|
||||||
|
classDef internet fill:#f9f9f9,stroke:#999,color:#333;
|
||||||
|
classDef external fill:#e6e6e6,stroke:#555,color:#333;
|
||||||
|
classDef internal fill:#d4d4d4,stroke:#777,color:#333;
|
||||||
|
classDef security fill:#ff9999,stroke:#cc0000,color:#333;
|
||||||
|
classDef dns fill:#ffff99,stroke:#cccc00,color:#333;
|
||||||
|
classDef ca fill:#ccccff,stroke:#6666cc,color:#333;
|
||||||
|
|
||||||
|
%% Internet
|
||||||
|
subgraph "Internet"
|
||||||
|
Cloudflare["Cloudflare (CDN/DNS)"]:::internet
|
||||||
|
DuckDNS["DuckDNS (DDNS)"]:::internet
|
||||||
|
end
|
||||||
|
|
||||||
|
%% External Gateway
|
||||||
|
subgraph "External Gateway"
|
||||||
|
Livebox["Livebox (NAT/Firewall)"]:::external
|
||||||
|
end
|
||||||
|
|
||||||
|
%% Internal Network
|
||||||
|
subgraph "Internal Network (192.168.1.0/24)"
|
||||||
|
%% Pi-hole DNS
|
||||||
|
PiHole1["Pi-hole (pi1)"]:::dns
|
||||||
|
PiHole3["Pi-hole (pi3)"]:::dns
|
||||||
|
|
||||||
|
%% Step CA
|
||||||
|
StepCA["Step CA (pi1)"]:::ca
|
||||||
|
|
||||||
|
%% k3s Cluster
|
||||||
|
k3s["k3s Cluster"]:::internal
|
||||||
|
Traefik["Traefik (k3s)"]:::internal
|
||||||
|
CrowdSec["CrowdSec (k3s)"]:::security
|
||||||
|
|
||||||
|
%% Services
|
||||||
|
Gitea["Gitea (pi2)"]:::internal
|
||||||
|
Vault["Vault (k3s)"]:::internal
|
||||||
|
end
|
||||||
|
|
||||||
|
%% Connections
|
||||||
|
Cloudflare -->|DNS| Livebox
|
||||||
|
DuckDNS -->|DDNS| Livebox
|
||||||
|
Livebox -->|NAT| PiHole1
|
||||||
|
Livebox -->|NAT| PiHole3
|
||||||
|
Livebox -->|NAT| k3s
|
||||||
|
|
||||||
|
%% Internal DNS
|
||||||
|
PiHole1 -->|Forward .lab| StepCA
|
||||||
|
PiHole1 -->|Forward External| Cloudflare
|
||||||
|
PiHole3 -->|Forward .lab| StepCA
|
||||||
|
PiHole3 -->|Forward External| Cloudflare
|
||||||
|
|
||||||
|
%% Ingress
|
||||||
|
Traefik -->|"TLS (Let's Encrypt)"| Cloudflare
|
||||||
|
Traefik -->|"TLS (Step CA)"| StepCA
|
||||||
|
CrowdSec -->|Ban IPs| Traefik
|
||||||
|
|
||||||
|
%% Service Access
|
||||||
|
Traefik -->|Route| Gitea
|
||||||
|
Traefik -->|Route| Vault
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. DNS Resolution Flow
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
participant Client
|
||||||
|
participant PiHole
|
||||||
|
participant StepCA
|
||||||
|
participant Cloudflare
|
||||||
|
participant ExternalDNS
|
||||||
|
|
||||||
|
Client->>PiHole: Query example.lab
|
||||||
|
PiHole->>StepCA: Forward .lab query
|
||||||
|
StepCA-->>PiHole: Return A record
|
||||||
|
PiHole-->>Client: Return response
|
||||||
|
|
||||||
|
Client->>PiHole: Query example.com
|
||||||
|
PiHole->>Cloudflare: Forward to 1.1.1.1
|
||||||
|
Cloudflare->>ExternalDNS: Resolve externally
|
||||||
|
ExternalDNS-->>Cloudflare: Return response
|
||||||
|
Cloudflare-->>PiHole: Return response
|
||||||
|
PiHole-->>Client: Return response
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Ingress and TLS Flow
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
participant User
|
||||||
|
participant Cloudflare
|
||||||
|
participant Traefik
|
||||||
|
participant StepCA
|
||||||
|
participant Service
|
||||||
|
|
||||||
|
User->>Cloudflare: HTTPS Request (webapp.arcodange.fr)
|
||||||
|
Cloudflare->>Traefik: Forward to internal IP
|
||||||
|
Traefik->>Let's Encrypt: Request Certificate
|
||||||
|
Let's Encrypt-->>Traefik: Issue Certificate
|
||||||
|
Traefik->>Service: Route request
|
||||||
|
Service-->>Traefik: Return response
|
||||||
|
Traefik-->>Cloudflare: Return HTTPS response
|
||||||
|
Cloudflare-->>User: Return response
|
||||||
|
|
||||||
|
User->>Traefik: HTTPS Request (webapp.arcodange.lab)
|
||||||
|
Traefik->>StepCA: Request Certificate
|
||||||
|
StepCA-->>Traefik: Issue Certificate
|
||||||
|
Traefik->>Service: Route request
|
||||||
|
Service-->>Traefik: Return response
|
||||||
|
Traefik-->>User: Return HTTPS response
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Security Flow (CrowdSec + Traefik)
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
participant Attacker
|
||||||
|
participant Traefik
|
||||||
|
participant CrowdSec
|
||||||
|
participant BannedIPs
|
||||||
|
|
||||||
|
Attacker->>Traefik: Malicious Request
|
||||||
|
Traefik->>CrowdSec: Log suspicious activity
|
||||||
|
CrowdSec->>BannedIPs: Add IP to ban list
|
||||||
|
BannedIPs-->>Traefik: Update middleware
|
||||||
|
Traefik-->>Attacker: Block request (403)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Playbook and Role Analysis
|
||||||
|
|
||||||
|
### 1. Pi-hole Deployment
|
||||||
|
- **Playbook**: `playbooks/system/pihole.yml`
|
||||||
|
- **Role**: `arcodange.factory.pihole`
|
||||||
|
- **Configuration**:
|
||||||
|
- Upstream DNS: Cloudflare (1.1.1.1) and Step CA for `.lab`.
|
||||||
|
- Blocklists: Ad-blocking and malware domains.
|
||||||
|
|
||||||
|
### 2. Step CA Deployment
|
||||||
|
- **Playbook**: `playbooks/ssl/ssl.yml`
|
||||||
|
- **Role**: `step_ca`
|
||||||
|
- **Configuration**:
|
||||||
|
- Internal CA for `.lab` domain.
|
||||||
|
- Short-lived certificates (default: 24h).
|
||||||
|
|
||||||
|
### 3. Traefik Deployment
|
||||||
|
- **Playbook**: `playbooks/system/system_k3s.yml` (via k3s)
|
||||||
|
- **Helm Chart**: `traefik` (installed via k3s)
|
||||||
|
- **Key Annotations**:
|
||||||
|
```yaml
|
||||||
|
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||||
|
traefik.ingress.kubernetes.io/router.tls.certresolver: letsencrypt
|
||||||
|
traefik.ingress.kubernetes.io/router.middlewares: localIp@file
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. CrowdSec Deployment
|
||||||
|
- **Playbook**: `playbooks/tools/crowdsec.yml`
|
||||||
|
- **Role**: `arcodange.factory.crowdsec`
|
||||||
|
- **Configuration**:
|
||||||
|
- Bouncer integration with Traefik.
|
||||||
|
- Custom scenarios for brute-force and bot detection.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
- **Resilient DNS**: Pi-hole provides ad-blocking and internal DNS resolution.
|
||||||
|
- **Secure TLS**: Step CA for internal services, Let's Encrypt for external.
|
||||||
|
- **DDoS Protection**: Cloudflare absorbs external attacks.
|
||||||
|
- **Intrusion Detection**: CrowdSec bans malicious IPs automatically.
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
- **Complexity**: Multiple layers require careful configuration.
|
||||||
|
- **Single Point of Failure**: Pi-hole is critical for internal DNS.
|
||||||
|
- **Certificate Management**: Step CA requires maintenance for `.lab` domain.
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
### Alternative 1: Public DNS for `.lab`
|
||||||
|
- **Rejected**: Exposing internal domains is a security risk.
|
||||||
|
|
||||||
|
### Alternative 2: No Ad-Blocking
|
||||||
|
- **Rejected**: Pi-hole provides essential security and privacy.
|
||||||
|
|
||||||
|
### Alternative 3: Self-Signed Certificates
|
||||||
|
- **Rejected**: Step CA provides better usability with short-lived certs.
|
||||||
|
|
||||||
|
### 5. Cloudflare Turnstile + CrowdSec Flow
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
participant User
|
||||||
|
participant Cloudflare
|
||||||
|
participant Turnstile
|
||||||
|
participant Traefik
|
||||||
|
participant CrowdSec
|
||||||
|
participant BannedIPs
|
||||||
|
|
||||||
|
User->>Cloudflare: Request protected endpoint
|
||||||
|
Cloudflare->>Turnstile: Challenge (CAPTCHA)
|
||||||
|
Turnstile-->>Cloudflare: Return token
|
||||||
|
Cloudflare->>Traefik: Forward request with token
|
||||||
|
|
||||||
|
alt Valid Token
|
||||||
|
Traefik->>Service: Route request
|
||||||
|
Service-->>Traefik: Return response
|
||||||
|
Traefik-->>Cloudflare: Return response
|
||||||
|
Cloudflare-->>User: Return success
|
||||||
|
else Invalid Token
|
||||||
|
Traefik->>CrowdSec: Log suspicious activity
|
||||||
|
CrowdSec->>BannedIPs: Add IP to ban list
|
||||||
|
BannedIPs-->>Traefik: Update middleware
|
||||||
|
Traefik-->>Cloudflare: Block request (403)
|
||||||
|
Cloudflare-->>User: Return "Access Denied"
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
## Success Metrics
|
||||||
|
- Pi-hole blocks >50% of ads and trackers.
|
||||||
|
- Step CA issues certificates without downtime.
|
||||||
|
- Traefik routes 100% of external traffic via Cloudflare.
|
||||||
|
- CrowdSec bans >10 malicious IPs per day.
|
||||||
|
- Cloudflare Turnstile blocks >90% of bot traffic.
|
||||||
Reference in New Issue
Block a user