Consolidate ADRs into docs/adr/
This commit moves Architecture Decision Records (ADRs) from ../../../docs/adr/ to docs/adr/ in the arcodange/factory repository. This centralizes all ADRs in one location for better maintainability and discoverability. Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit is contained in:
160
ansible/arcodange/factory/docs/adr/20260407-cicd-architecture.md
Normal file
160
ansible/arcodange/factory/docs/adr/20260407-cicd-architecture.md
Normal file
@@ -0,0 +1,160 @@
|
||||
# ADR 20260407: CI/CD Architecture with ArgoCD, Gitea, and Vault
|
||||
|
||||
## Status
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
The home lab requires a secure and automated CI/CD pipeline to deploy applications to the k3s cluster. The pipeline must integrate with:
|
||||
- **Gitea**: For Git repository management and CI runners.
|
||||
- **ArgoCD**: For GitOps-based continuous deployment.
|
||||
- **Vault**: For secrets management and OIDC authentication.
|
||||
- **Gitea Act Runner**: For executing CI jobs.
|
||||
|
||||
## Decision
|
||||
We will implement a **GitOps-driven CI/CD pipeline** with the following components:
|
||||
|
||||
### 1. Gitea OIDC Authentication with Vault
|
||||
- Gitea is registered as an OIDC application in Vault.
|
||||
- Vault issues short-lived tokens for Gitea users.
|
||||
- The `gitea_oidc_auth.yml` playbook automates this setup using Playwright and OpenTofu.
|
||||
- **OIDC Workflow**:
|
||||
1. The `oidc_jwt_token.sh` script (base64-encoded in `secrets.vault_oauth__sh_b64`) handles the OIDC flow.
|
||||
2. Gitea Act Runner executes the script to obtain an ID token from Gitea.
|
||||
3. The ID token is used to authenticate with Vault and retrieve secrets.
|
||||
|
||||
### 2. Gitea Act Runner
|
||||
- Deployed on `pi1` and `pi3` (not on the Gitea host, which is `pi2`).
|
||||
- Uses Docker-in-Docker for job execution.
|
||||
- **Custom Runner Image (`ubuntu-latest-ca`)**: Required due to the self-signed `.lab` domain. The custom image includes the local CA certificate to trust the Gitea instance (`gitea.arcodange.lab`).
|
||||
- Managed via Docker Compose (`03_cicd.yml`).
|
||||
|
||||
### 3. ArgoCD
|
||||
- Deployed on the k3s cluster (via HelmChart in `/var/lib/rancher/k3s/server/manifests/argocd.yaml`).
|
||||
- Uses Gitea as the source of truth for GitOps.
|
||||
- Synchronizes the `factory` repository to deploy applications.
|
||||
- Configured with Traefik for TLS termination.
|
||||
|
||||
### 4. Vault Secrets Operator
|
||||
- Deployed in the `tools` namespace.
|
||||
- Manages secrets for applications deployed via ArgoCD.
|
||||
- Integrates with Gitea OIDC for authentication.
|
||||
- **Helm Chart Integration**:
|
||||
- `VaultAuth`: Authenticates with Vault using Kubernetes service accounts.
|
||||
- `VaultStaticSecret`: Retrieves static secrets (e.g., `kvv2/webapp/config`).
|
||||
- `VaultDynamicSecret`: Generates dynamic secrets (e.g., PostgreSQL credentials).
|
||||
|
||||
### 5. Security
|
||||
- **TLS**: Traefik terminates TLS using Let's Encrypt.
|
||||
- **OIDC**: Gitea authentication via Vault.
|
||||
- **Secrets**: Stored in Vault, injected via the Vault Secrets Operator.
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```mermaid
|
||||
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#333333', 'edgeLabelBackground':'#f0f0f0', 'tertiaryColor': '#e67e22'}}}%%
|
||||
graph TD
|
||||
%% Styles
|
||||
classDef gitea fill:#ffcc99,stroke:#cc9966,color:#333;
|
||||
classDef argocd fill:#99ffcc,stroke:#66cc99,color:#333;
|
||||
classDef vault fill:#ccccff,stroke:#6666cc,color:#333;
|
||||
classDef k3s fill:#ff9999,stroke:#cc0000,color:#333;
|
||||
classDef runner fill:#ffff99,stroke:#cccc00,color:#333;
|
||||
|
||||
%% Components
|
||||
Gitea["Gitea (pi2)"]:::gitea
|
||||
ArgoCD["ArgoCD (k3s)"]:::argocd
|
||||
Vault["Vault (k3s/tools)"]:::vault
|
||||
Runner1["Gitea Act Runner (pi1)"]:::runner
|
||||
Runner2["Gitea Act Runner (pi3)"]:::runner
|
||||
VaultOperator["Vault Secrets Operator (k3s/tools)"]:::vault
|
||||
k3s["k3s Cluster"]:::k3s
|
||||
|
||||
%% Workflow
|
||||
Gitea -->|OIDC Auth| Vault
|
||||
Gitea -->|Trigger CI| Runner1
|
||||
Gitea -->|Trigger CI| Runner2
|
||||
Runner1 -->|Deploy to| k3s
|
||||
Runner2 -->|Deploy to| k3s
|
||||
ArgoCD -->|GitOps Sync| Gitea
|
||||
ArgoCD -->|Deploy Apps| k3s
|
||||
VaultOperator -->|Inject Secrets| k3s
|
||||
Vault -->|Secrets| VaultOperator
|
||||
|
||||
%% Annotations
|
||||
linkStyle 0,1,2,3,4,5,6,7 stroke:#999,stroke-width:1px;
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Automated Deployments**: ArgoCD ensures the cluster state matches Git.
|
||||
- **Secure Secrets**: Vault centralizes secret management.
|
||||
- **Scalable CI**: Gitea Act Runners can be added to any host.
|
||||
- **OIDC Integration**: Secure authentication via Vault.
|
||||
|
||||
### Negative
|
||||
- **Complexity**: Multiple moving parts (Gitea, ArgoCD, Vault).
|
||||
- **Dependency on Vault**: If Vault fails, CI/CD may be disrupted.
|
||||
- **Learning Curve**: Requires familiarity with GitOps and Vault.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Alternative 1: GitHub Actions
|
||||
- **Rejected**: Self-hosted Gitea aligns better with the home lab's privacy goals.
|
||||
|
||||
### Alternative 2: Jenkins
|
||||
- **Rejected**: ArgoCD + Gitea Act Runner is lighter and more GitOps-native.
|
||||
|
||||
### Alternative 3: No CI/CD
|
||||
- **Rejected**: Manual deployments are error-prone and unscalable.
|
||||
|
||||
## Sequence Diagrams
|
||||
|
||||
### 1. CI/CD Workflow for OpenTofu/Terraform
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Gitea
|
||||
participant Runner as Gitea Act Runner (pi1/pi3)
|
||||
participant Vault
|
||||
participant WebApp as WebApp (k3s)
|
||||
|
||||
Gitea->>Runner: Trigger vault.yaml workflow
|
||||
Runner->>Gitea: Execute vault_oauth__sh_b64 (OIDC)
|
||||
Gitea-->>Runner: Return ID Token
|
||||
Runner->>Vault: Authenticate with ID Token
|
||||
Vault-->>Runner: Return Vault Token
|
||||
Runner->>Runner: Run OpenTofu/Terraform
|
||||
Runner->>Vault: Fetch Secrets (via Vault Action)
|
||||
Vault-->>Runner: Return Secrets
|
||||
Runner->>WebApp: Deploy Changes
|
||||
```
|
||||
|
||||
### 2. Vault Secrets Operator Workflow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant ArgoCD
|
||||
participant WebApp as WebApp (k3s)
|
||||
participant VaultOperator as Vault Secrets Operator
|
||||
participant Vault
|
||||
|
||||
ArgoCD->>WebApp: Deploy Helm Chart
|
||||
WebApp->>VaultOperator: Create VaultAuth (K8s Auth)
|
||||
VaultOperator->>Vault: Authenticate (K8s Service Account)
|
||||
Vault-->>VaultOperator: Return Vault Token
|
||||
WebApp->>VaultOperator: Create VaultStaticSecret (kvv2/webapp/config)
|
||||
VaultOperator->>Vault: Fetch Static Secret
|
||||
Vault-->>VaultOperator: Return Secret
|
||||
VaultOperator->>WebApp: Inject Secret (secretkv)
|
||||
WebApp->>VaultOperator: Create VaultDynamicSecret (postgres/creds/webapp)
|
||||
VaultOperator->>Vault: Generate Dynamic Secret
|
||||
Vault-->>VaultOperator: Return Credentials
|
||||
VaultOperator->>WebApp: Inject Credentials (vso-db-credentials)
|
||||
WebApp->>WebApp: Restart Pods (Rollout)
|
||||
```
|
||||
|
||||
## Success Metrics
|
||||
- Gitea Act Runners successfully execute CI jobs.
|
||||
- ArgoCD synchronizes the `factory` repository without errors.
|
||||
- Vault Secrets Operator injects secrets into deployed applications.
|
||||
@@ -0,0 +1,130 @@
|
||||
# ADR 20260407: Docker Storage Optimization for Gitea Act Runner
|
||||
|
||||
## Status
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
The `pi3` machine (Raspberry Pi) is running both Docker and k3s, with the following storage constraints:
|
||||
- Root filesystem (`/dev/mmcblk0p2`): 58G total, 89% used (6.4G free)
|
||||
- External disk (`/dev/sda1`): 458G total, 22G used (413G free)
|
||||
|
||||
Gitea Act Runner images (`ubuntu-latest` and `ubuntu-latest-ca`) are frequently deleted, likely due to Docker's automatic garbage collection triggered by low disk space. This disrupts CI/CD pipelines.
|
||||
|
||||
### Current Setup
|
||||
- Docker is configured via Ansible (`system_docker.yml`) using the `geerlingguy.docker` role.
|
||||
- k3s is configured to use Docker as the container runtime (`--docker` flag).
|
||||
- Longhorn is used for persistent storage in k3s, and we want to preserve its performance.
|
||||
|
||||
## Decision
|
||||
We will implement a **hybrid storage strategy** to prevent Gitea Act Runner image deletion while maintaining Longhorn performance:
|
||||
|
||||
### 1. Pin Critical Images
|
||||
Use a dummy container to pin the Gitea Act Runner images:
|
||||
```yaml
|
||||
# Add to system_docker.yml or a new playbook
|
||||
- name: Pin Gitea Act Runner images
|
||||
community.docker.docker_container:
|
||||
name: pin-gitea-runner-ubuntu-latest-ca
|
||||
image: gitea.arcodange.lab/arcodange-org/runner-images:ubuntu-latest-ca
|
||||
state: present
|
||||
command: ["sh", "-c", "sleep infinity"]
|
||||
auto_remove: false
|
||||
restart_policy: unless-stopped
|
||||
```
|
||||
|
||||
### 2. Configure Docker Storage with Overlay on External Disk
|
||||
Modify `/etc/docker/daemon.json` to use the external disk for storage while keeping the root filesystem for metadata:
|
||||
```json
|
||||
{
|
||||
"data-root": "/mnt/arcodange/docker",
|
||||
"storage-driver": "overlay2",
|
||||
"storage-opts": ["overlay2.override_kernel_check=true"]
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Ansible Implementation
|
||||
Update `system_docker.yml` to:
|
||||
1. Create `/mnt/arcodange/docker` if it doesn't exist.
|
||||
2. Configure Docker to use the external disk.
|
||||
3. Pin critical images post-installation.
|
||||
|
||||
```yaml
|
||||
# Add to system_docker.yml tasks
|
||||
- name: Ensure Docker storage directory exists on external disk
|
||||
ansible.builtin.file:
|
||||
path: /mnt/arcodange/docker
|
||||
state: directory
|
||||
mode: '0755'
|
||||
owner: root
|
||||
group: docker
|
||||
|
||||
- name: Configure Docker to use external storage
|
||||
ansible.builtin.copy:
|
||||
dest: /etc/docker/daemon.json
|
||||
content: |
|
||||
{
|
||||
"data-root": "/mnt/arcodange/docker",
|
||||
"storage-driver": "overlay2",
|
||||
"storage-opts": ["overlay2.override_kernel_check=true"],
|
||||
"log-driver": "json-file",
|
||||
"log-opts": {
|
||||
"max-size": "10m",
|
||||
"max-file": "5"
|
||||
}
|
||||
}
|
||||
mode: '0644'
|
||||
notify: Redémarrer Docker
|
||||
|
||||
- name: Pin Gitea Act Runner images
|
||||
community.docker.docker_container:
|
||||
name: "{{ item.name }}"
|
||||
image: "{{ item.image }}"
|
||||
state: present
|
||||
command: ["sh", "-c", "sleep infinity"]
|
||||
auto_remove: false
|
||||
restart_policy: unless-stopped
|
||||
loop:
|
||||
- { name: "pin-gitea-runner-ubuntu-latest", image: "gitea/runner-images:ubuntu-latest" }
|
||||
- { name: "pin-gitea-runner-ubuntu-latest-ca", image: "gitea.arcodange.lab/arcodange-org/runner-images:ubuntu-latest-ca" }
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Prevents Image Deletion**: Critical images are pinned and won't be garbage-collected.
|
||||
- **Preserves Longhorn Performance**: Longhorn continues to use the root filesystem for its operations, maintaining performance.
|
||||
- **Scalable Storage**: Docker images are stored on the external disk (413G free), preventing root filesystem exhaustion.
|
||||
- **No k3s Changes Required**: k3s continues to use Docker as the runtime without modification.
|
||||
|
||||
### Negative
|
||||
- **Migration Effort**: Existing Docker data must be migrated to the external disk (one-time operation).
|
||||
- **Dependency on External Disk**: If `/dev/sda1` fails, Docker will not function until the disk is remounted or the configuration is reverted.
|
||||
- **Slight Performance Overhead**: Accessing images from the external disk may be slightly slower than the root filesystem (mitigated by SSD/HDD performance).
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Alternative 1: Increase Root Filesystem Size
|
||||
- **Rejected**: The SD card is already at capacity, and expanding it is not feasible.
|
||||
|
||||
### Alternative 2: Disable Docker Garbage Collection
|
||||
- **Rejected**: This would risk filling the root filesystem completely, causing system instability.
|
||||
|
||||
### Alternative 3: Use k3s Image Garbage Collection
|
||||
- **Rejected**: k3s does not provide fine-grained control over image retention for non-k8s workloads (e.g., Gitea Act Runner).
|
||||
|
||||
### Alternative 4: Save/Load Images Manually
|
||||
- **Rejected**: Manual intervention is not scalable and does not address the root cause.
|
||||
|
||||
## Migration Plan
|
||||
1. **Backup**: Save critical images to `/mnt/arcodange`:
|
||||
```bash
|
||||
docker save gitea.arcodange.lab/arcodange-org/runner-images:ubuntu-latest-ca -o /mnt/arcodange/gitea-runner-backup.tar
|
||||
```
|
||||
2. **Update Ansible**: Apply the changes to `system_docker.yml`.
|
||||
3. **Run Playbook**: Execute the playbook to reconfigure Docker.
|
||||
4. **Verify**: Ensure Gitea Act Runner functions correctly post-migration.
|
||||
|
||||
## Success Metrics
|
||||
- Gitea Act Runner images are no longer deleted between runs.
|
||||
- Root filesystem usage drops below 80%.
|
||||
- CI/CD pipelines complete without image pull errors.
|
||||
@@ -0,0 +1,334 @@
|
||||
# ADR 20260407: Network Architecture
|
||||
|
||||
## Status
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
The home lab requires a secure and resilient network architecture to support:
|
||||
- Internal services (`.lab` domain).
|
||||
- External services (`.arcodange.fr` domain).
|
||||
- DNS resolution and ad-blocking (Pi-hole).
|
||||
- TLS certificate management (Step CA).
|
||||
- Ingress routing (Traefik).
|
||||
- CDN and DDoS protection (Cloudflare).
|
||||
|
||||
## Decision
|
||||
We will implement a **multi-layered network architecture** with the following components:
|
||||
|
||||
### 1. External Layer (Internet)
|
||||
- **Cloudflare**: CDN, DDoS protection, and DNS for `.arcodange.fr`.
|
||||
- **DuckDNS**: Dynamic DNS for external access.
|
||||
- **Livebox**: ISP-provided gateway (NAT, DHCP, firewall).
|
||||
|
||||
### 2. Internal Layer (Home Lab)
|
||||
- **Pi-hole (pi1, pi3)**: DNS sinkhole for ad-blocking and internal DNS resolution.
|
||||
- **Step CA (pi1)**: Internal certificate authority for `.lab` domain.
|
||||
- **Traefik (k3s)**: Ingress controller with TLS termination.
|
||||
- **k3s Cluster**: Hosts internal services with Longhorn storage.
|
||||
|
||||
### 3. DNS Architecture
|
||||
- **Pi-hole**: Primary DNS for internal clients.
|
||||
- Forwards `.lab` queries to Step CA.
|
||||
- Forwards external queries to Cloudflare (1.1.1.1).
|
||||
- **Step CA**: Issues certificates for `.lab` services.
|
||||
- **Cloudflare**: Manages `.arcodange.fr` DNS records.
|
||||
|
||||
### 4. Ingress and TLS
|
||||
- **Traefik**: Terminates TLS for both `.lab` and `.arcodange.fr` domains.
|
||||
- Uses Let's Encrypt for `.arcodange.fr`.
|
||||
- Uses Step CA for `.lab`.
|
||||
- **Helm Chart Annotations**:
|
||||
- `traefik.ingress.kubernetes.io/router.entrypoints: websecure`
|
||||
- `traefik.ingress.kubernetes.io/router.tls.certresolver: letsencrypt`
|
||||
- `traefik.ingress.kubernetes.io/router.middlewares: localIp@file`
|
||||
|
||||
### 5. Security
|
||||
- **Cloudflare Tunnel**: Securely exposes internal services without port forwarding.
|
||||
- **CrowdSec**: Intrusion detection and banning.
|
||||
- **Traefik Middlewares**: IP filtering, rate limiting, and authentication.
|
||||
- **Cloudflare Turnstile**: CAPTCHA protection for public-facing services.
|
||||
|
||||
## Architecture Diagrams
|
||||
|
||||
### 0. High-Level Network Architecture (Architecture Beta)
|
||||
|
||||
```mermaid
|
||||
%%{init: {'theme': 'neutral', 'themeVariables': {
|
||||
'primaryColor': '#f0f0f0',
|
||||
'primaryBorderColor': '#333333',
|
||||
'primaryTextColor': '#333333',
|
||||
'lineColor': '#333333',
|
||||
'tertiaryColor': '#e67e22'
|
||||
}}}%%
|
||||
architectureBeta
|
||||
%% External Layer
|
||||
box "Internet" #f9f9f9
|
||||
component Cloudflare["Cloudflare\n(CDN/DNS)"] #f9f9f9
|
||||
component DuckDNS["DuckDNS\n(DDNS)"] #f9f9f9
|
||||
end
|
||||
|
||||
%% External Gateway
|
||||
box "External Gateway" #e6e6e6
|
||||
component Livebox["Livebox\n(NAT/Firewall)"] #e6e6e6
|
||||
end
|
||||
|
||||
%% Internal Layer
|
||||
box "Internal Network\n(192.168.1.0/24)" #d4d4d4
|
||||
%% DNS Layer
|
||||
box "DNS" #ffff99
|
||||
component PiHole1["Pi-hole\n(pi1)"] #ffff99
|
||||
component PiHole3["Pi-hole\n(pi3)"] #ffff99
|
||||
component StepCA["Step CA\n(pi1)"] #ccccff
|
||||
end
|
||||
|
||||
%% k3s Layer
|
||||
box "k3s Cluster" #ff9999
|
||||
component Traefik["Traefik\n(Ingress)"] #ff9999
|
||||
component CrowdSec["CrowdSec\n(Security)"] #ff9999
|
||||
component Gitea["Gitea\n(pi2)"] #ffcc99
|
||||
component Vault["Vault\n(Secrets)"] #ccccff
|
||||
end
|
||||
end
|
||||
|
||||
%% Connections
|
||||
Cloudflare --> Livebox : "DNS"
|
||||
DuckDNS --> Livebox : "DDNS"
|
||||
Livebox --> PiHole1 : "NAT"
|
||||
Livebox --> PiHole3 : "NAT"
|
||||
Livebox --> Traefik : "NAT"
|
||||
PiHole1 --> StepCA : "Forward .lab"
|
||||
PiHole1 --> Cloudflare : "Forward External"
|
||||
PiHole3 --> StepCA : "Forward .lab"
|
||||
PiHole3 --> Cloudflare : "Forward External"
|
||||
Traefik --> Cloudflare : "TLS (Let's Encrypt)"
|
||||
Traefik --> StepCA : "TLS (Step CA)"
|
||||
CrowdSec --> Traefik : "Ban IPs"
|
||||
Traefik --> Gitea : "Route"
|
||||
Traefik --> Vault : "Route"
|
||||
```
|
||||
|
||||
### 1. High-Level Network Architecture
|
||||
|
||||
```mermaid
|
||||
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#333333', 'edgeLabelBackground':'#f0f0f0', 'tertiaryColor': '#f89136'}}}%%
|
||||
graph TD
|
||||
%% Styles
|
||||
classDef internet fill:#f9f9f9,stroke:#999,color:#333;
|
||||
classDef external fill:#e6e6e6,stroke:#555,color:#333;
|
||||
classDef internal fill:#d4d4d4,stroke:#777,color:#333;
|
||||
classDef security fill:#ff9999,stroke:#cc0000,color:#333;
|
||||
classDef dns fill:#ffff99,stroke:#cccc00,color:#333;
|
||||
classDef ca fill:#ccccff,stroke:#6666cc,color:#333;
|
||||
|
||||
%% Internet
|
||||
subgraph "Internet"
|
||||
Cloudflare["Cloudflare (CDN/DNS)"]:::internet
|
||||
DuckDNS["DuckDNS (DDNS)"]:::internet
|
||||
end
|
||||
|
||||
%% External Gateway
|
||||
subgraph "External Gateway"
|
||||
Livebox["Livebox (NAT/Firewall)"]:::external
|
||||
end
|
||||
|
||||
%% Internal Network
|
||||
subgraph "Internal Network (192.168.1.0/24)"
|
||||
%% Pi-hole DNS
|
||||
PiHole1["Pi-hole (pi1)"]:::dns
|
||||
PiHole3["Pi-hole (pi3)"]:::dns
|
||||
|
||||
%% Step CA
|
||||
StepCA["Step CA (pi1)"]:::ca
|
||||
|
||||
%% k3s Cluster
|
||||
k3s["k3s Cluster"]:::internal
|
||||
Traefik["Traefik (k3s)"]:::internal
|
||||
CrowdSec["CrowdSec (k3s)"]:::security
|
||||
|
||||
%% Services
|
||||
Gitea["Gitea (pi2)"]:::internal
|
||||
Vault["Vault (k3s)"]:::internal
|
||||
end
|
||||
|
||||
%% Connections
|
||||
Cloudflare -->|DNS| Livebox
|
||||
DuckDNS -->|DDNS| Livebox
|
||||
Livebox -->|NAT| PiHole1
|
||||
Livebox -->|NAT| PiHole3
|
||||
Livebox -->|NAT| k3s
|
||||
|
||||
%% Internal DNS
|
||||
PiHole1 -->|Forward .lab| StepCA
|
||||
PiHole1 -->|Forward External| Cloudflare
|
||||
PiHole3 -->|Forward .lab| StepCA
|
||||
PiHole3 -->|Forward External| Cloudflare
|
||||
|
||||
%% Ingress
|
||||
Traefik -->|"TLS (Let's Encrypt)"| Cloudflare
|
||||
Traefik -->|"TLS (Step CA)"| StepCA
|
||||
CrowdSec -->|Ban IPs| Traefik
|
||||
|
||||
%% Service Access
|
||||
Traefik -->|Route| Gitea
|
||||
Traefik -->|Route| Vault
|
||||
```
|
||||
|
||||
### 2. DNS Resolution Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant PiHole
|
||||
participant StepCA
|
||||
participant Cloudflare
|
||||
participant ExternalDNS
|
||||
|
||||
Client->>PiHole: Query example.lab
|
||||
PiHole->>StepCA: Forward .lab query
|
||||
StepCA-->>PiHole: Return A record
|
||||
PiHole-->>Client: Return response
|
||||
|
||||
Client->>PiHole: Query example.com
|
||||
PiHole->>Cloudflare: Forward to 1.1.1.1
|
||||
Cloudflare->>ExternalDNS: Resolve externally
|
||||
ExternalDNS-->>Cloudflare: Return response
|
||||
Cloudflare-->>PiHole: Return response
|
||||
PiHole-->>Client: Return response
|
||||
```
|
||||
|
||||
### 3. Ingress and TLS Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Cloudflare
|
||||
participant Traefik
|
||||
participant StepCA
|
||||
participant Service
|
||||
|
||||
User->>Cloudflare: HTTPS Request (webapp.arcodange.fr)
|
||||
Cloudflare->>Traefik: Forward to internal IP
|
||||
Traefik->>Let's Encrypt: Request Certificate
|
||||
Let's Encrypt-->>Traefik: Issue Certificate
|
||||
Traefik->>Service: Route request
|
||||
Service-->>Traefik: Return response
|
||||
Traefik-->>Cloudflare: Return HTTPS response
|
||||
Cloudflare-->>User: Return response
|
||||
|
||||
User->>Traefik: HTTPS Request (webapp.arcodange.lab)
|
||||
Traefik->>StepCA: Request Certificate
|
||||
StepCA-->>Traefik: Issue Certificate
|
||||
Traefik->>Service: Route request
|
||||
Service-->>Traefik: Return response
|
||||
Traefik-->>User: Return HTTPS response
|
||||
```
|
||||
|
||||
### 4. Security Flow (CrowdSec + Traefik)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Attacker
|
||||
participant Traefik
|
||||
participant CrowdSec
|
||||
participant BannedIPs
|
||||
|
||||
Attacker->>Traefik: Malicious Request
|
||||
Traefik->>CrowdSec: Log suspicious activity
|
||||
CrowdSec->>BannedIPs: Add IP to ban list
|
||||
BannedIPs-->>Traefik: Update middleware
|
||||
Traefik-->>Attacker: Block request (403)
|
||||
```
|
||||
|
||||
## Playbook and Role Analysis
|
||||
|
||||
### 1. Pi-hole Deployment
|
||||
- **Playbook**: `playbooks/system/pihole.yml`
|
||||
- **Role**: `arcodange.factory.pihole`
|
||||
- **Configuration**:
|
||||
- Upstream DNS: Cloudflare (1.1.1.1) and Step CA for `.lab`.
|
||||
- Blocklists: Ad-blocking and malware domains.
|
||||
|
||||
### 2. Step CA Deployment
|
||||
- **Playbook**: `playbooks/ssl/ssl.yml`
|
||||
- **Role**: `step_ca`
|
||||
- **Configuration**:
|
||||
- Internal CA for `.lab` domain.
|
||||
- Short-lived certificates (default: 24h).
|
||||
|
||||
### 3. Traefik Deployment
|
||||
- **Playbook**: `playbooks/system/system_k3s.yml` (via k3s)
|
||||
- **Helm Chart**: `traefik` (installed via k3s)
|
||||
- **Key Annotations**:
|
||||
```yaml
|
||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||
traefik.ingress.kubernetes.io/router.tls.certresolver: letsencrypt
|
||||
traefik.ingress.kubernetes.io/router.middlewares: localIp@file
|
||||
```
|
||||
|
||||
### 4. CrowdSec Deployment
|
||||
- **Playbook**: `playbooks/tools/crowdsec.yml`
|
||||
- **Role**: `arcodange.factory.crowdsec`
|
||||
- **Configuration**:
|
||||
- Bouncer integration with Traefik.
|
||||
- Custom scenarios for brute-force and bot detection.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Resilient DNS**: Pi-hole provides ad-blocking and internal DNS resolution.
|
||||
- **Secure TLS**: Step CA for internal services, Let's Encrypt for external.
|
||||
- **DDoS Protection**: Cloudflare absorbs external attacks.
|
||||
- **Intrusion Detection**: CrowdSec bans malicious IPs automatically.
|
||||
|
||||
### Negative
|
||||
- **Complexity**: Multiple layers require careful configuration.
|
||||
- **Single Point of Failure**: Pi-hole is critical for internal DNS.
|
||||
- **Certificate Management**: Step CA requires maintenance for `.lab` domain.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Alternative 1: Public DNS for `.lab`
|
||||
- **Rejected**: Exposing internal domains is a security risk.
|
||||
|
||||
### Alternative 2: No Ad-Blocking
|
||||
- **Rejected**: Pi-hole provides essential security and privacy.
|
||||
|
||||
### Alternative 3: Self-Signed Certificates
|
||||
- **Rejected**: Step CA provides better usability with short-lived certs.
|
||||
|
||||
### 5. Cloudflare Turnstile + CrowdSec Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant Cloudflare
|
||||
participant Turnstile
|
||||
participant Traefik
|
||||
participant CrowdSec
|
||||
participant BannedIPs
|
||||
|
||||
User->>Cloudflare: Request protected endpoint
|
||||
Cloudflare->>Turnstile: Challenge (CAPTCHA)
|
||||
Turnstile-->>Cloudflare: Return token
|
||||
Cloudflare->>Traefik: Forward request with token
|
||||
|
||||
alt Valid Token
|
||||
Traefik->>Service: Route request
|
||||
Service-->>Traefik: Return response
|
||||
Traefik-->>Cloudflare: Return response
|
||||
Cloudflare-->>User: Return success
|
||||
else Invalid Token
|
||||
Traefik->>CrowdSec: Log suspicious activity
|
||||
CrowdSec->>BannedIPs: Add IP to ban list
|
||||
BannedIPs-->>Traefik: Update middleware
|
||||
Traefik-->>Cloudflare: Block request (403)
|
||||
Cloudflare-->>User: Return "Access Denied"
|
||||
end
|
||||
```
|
||||
|
||||
## Success Metrics
|
||||
- Pi-hole blocks >50% of ads and trackers.
|
||||
- Step CA issues certificates without downtime.
|
||||
- Traefik routes 100% of external traffic via Cloudflare.
|
||||
- CrowdSec bans >10 malicious IPs per day.
|
||||
- Cloudflare Turnstile blocks >90% of bot traffic.
|
||||
Reference in New Issue
Block a user