docs(adr): extend network-architecture ADR with .lab SSL/TLS deep dive
Replaces the placeholder "Success Metrics" section with a detailed walkthrough of the internal PKI: Step CA provisioners, cert-manager + StepClusterIssuer wiring, certificate issuance/renewal sequence diagram, device-trust installation steps, and troubleshooting playbook for the common stuck-CertificateRequest / Traefik TLS / device-trust failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -326,9 +326,251 @@ sequenceDiagram
|
|||||||
end
|
end
|
||||||
```
|
```
|
||||||
|
|
||||||
## Success Metrics
|
## Deep Dive: `.lab` Domain SSL/TLS Architecture
|
||||||
- Pi-hole blocks >50% of ads and trackers.
|
|
||||||
- Step CA issues certificates without downtime.
|
### Overview
|
||||||
- Traefik routes 100% of external traffic via Cloudflare.
|
The `.lab` domain relies on a **zero-trust internal PKI** (Public Key Infrastructure) powered by **Step CA**, integrated with **k3s**, **Traefik**, and **cert-manager**. This section details the components, interactions, and operational workflows.
|
||||||
- CrowdSec bans >10 malicious IPs per day.
|
|
||||||
- Cloudflare Turnstile blocks >90% of bot traffic.
|
### Core Components
|
||||||
|
|
||||||
|
#### 1. **Step CA (Certificate Authority)**
|
||||||
|
- **Host**: `pi1` (primary), with standby nodes for resilience.
|
||||||
|
- **Ports**: `8443` (HTTPS), `443` (ACME).
|
||||||
|
- **Provisioners**:
|
||||||
|
- `cert-manager`: Dedicated for k3s workloads.
|
||||||
|
- `admin`: For manual certificate issuance.
|
||||||
|
- **Certificate Lifecycle**:
|
||||||
|
- **Short-lived certificates** (default: 24h).
|
||||||
|
- **Automatic renewal** via cert-manager.
|
||||||
|
- **OCSP stapling** for revocation checks.
|
||||||
|
|
||||||
|
#### 2. **cert-manager**
|
||||||
|
- **Namespace**: `cert-manager`.
|
||||||
|
- **CRDs**:
|
||||||
|
- `Certificate`: Defines desired certificates.
|
||||||
|
- `CertificateRequest`: Requests signed by Step CA.
|
||||||
|
- `ClusterIssuer`/`Issuer`: References Step CA.
|
||||||
|
- `StepClusterIssuer`: Custom resource for Step CA integration.
|
||||||
|
|
||||||
|
#### 3. **StepClusterIssuer**
|
||||||
|
- **Purpose**: Bridges cert-manager with Step CA.
|
||||||
|
- **Configuration**:
|
||||||
|
```yaml
|
||||||
|
apiVersion: certmanager.step.sm/v1beta1
|
||||||
|
kind: StepClusterIssuer
|
||||||
|
metadata:
|
||||||
|
name: step-issuer
|
||||||
|
namespace: cert-manager
|
||||||
|
spec:
|
||||||
|
url: "https://ssl-ca.arcodange.lab:8443"
|
||||||
|
caBundle: "<base64-encoded-root-ca>"
|
||||||
|
provisioner:
|
||||||
|
name: cert-manager
|
||||||
|
kid: "<key-id>"
|
||||||
|
passwordRef:
|
||||||
|
name: step-jwk-password
|
||||||
|
key: password
|
||||||
|
```
|
||||||
|
- **Workflow**:
|
||||||
|
1. cert-manager creates a `CertificateRequest`.
|
||||||
|
2. `StepClusterIssuer` forwards the request to Step CA.
|
||||||
|
3. Step CA signs the certificate and returns it to cert-manager.
|
||||||
|
4. cert-manager stores the certificate in a Kubernetes `Secret`.
|
||||||
|
|
||||||
|
#### 4. **Traefik Ingress Controller**
|
||||||
|
- **Namespace**: `kube-system`.
|
||||||
|
- **TLS Configuration**:
|
||||||
|
- **EntryPoints**: `websecure` (HTTPS), `web` (HTTP → redirect).
|
||||||
|
- **Certificate Resolvers**:
|
||||||
|
- `letsencrypt`: For `.arcodange.fr` (public).
|
||||||
|
- `step-ca`: For `.lab` (internal).
|
||||||
|
- **Middlewares**:
|
||||||
|
- `localIp@file`: IP allowlisting.
|
||||||
|
- `crowdsec-bouncer`: Intrusion prevention.
|
||||||
|
|
||||||
|
#### 5. **Certificate and CertificateRequest**
|
||||||
|
- **Example `Certificate` for `.lab`**:
|
||||||
|
```yaml
|
||||||
|
apiVersion: cert-manager.io/v1
|
||||||
|
kind: Certificate
|
||||||
|
metadata:
|
||||||
|
name: wildcard-arcodange-lab
|
||||||
|
namespace: kube-system
|
||||||
|
spec:
|
||||||
|
secretName: wildcard-arcodange-lab-tls
|
||||||
|
issuerRef:
|
||||||
|
name: step-issuer
|
||||||
|
kind: StepClusterIssuer
|
||||||
|
group: certmanager.step.sm
|
||||||
|
dnsNames:
|
||||||
|
- "*.arcodange.lab"
|
||||||
|
- "arcodange.lab"
|
||||||
|
```
|
||||||
|
- **Generated `CertificateRequest`**:
|
||||||
|
- Automatically created by cert-manager.
|
||||||
|
- References the `StepClusterIssuer`.
|
||||||
|
- Status transitions: `Pending` → `Approved` → `Ready`.
|
||||||
|
|
||||||
|
#### 6. **k3s Cluster Integration**
|
||||||
|
- **Nodes**: `pi1` (control plane), `pi2`, `pi3` (workers).
|
||||||
|
- **Storage**: Longhorn for persistent volumes.
|
||||||
|
- **Networking**:
|
||||||
|
- **CNI**: Flannel.
|
||||||
|
- **Service Mesh**: Traefik for ingress, Linkerd (optional).
|
||||||
|
|
||||||
|
### Workflow: Certificate Issuance and Renewal
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
participant App as Application (e.g., Gitea)
|
||||||
|
participant Cert as Certificate
|
||||||
|
participant CR as CertificateRequest
|
||||||
|
participant SCI as StepClusterIssuer
|
||||||
|
participant StepCA as Step CA
|
||||||
|
participant Secret as Kubernetes Secret
|
||||||
|
participant Traefik as Traefik
|
||||||
|
|
||||||
|
App->>Cert: Declare desired certificate
|
||||||
|
Cert->>CR: Create CertificateRequest
|
||||||
|
CR->>SCI: Forward to StepClusterIssuer
|
||||||
|
SCI->>StepCA: Sign CSR (via JWK provisioner)
|
||||||
|
StepCA-->>SCI: Return signed certificate
|
||||||
|
SCI->>Secret: Store certificate/key
|
||||||
|
Secret-->>Traefik: Mount as TLS secret
|
||||||
|
Traefik->>App: Route traffic with TLS
|
||||||
|
|
||||||
|
loop Every 2/3 of certificate lifetime
|
||||||
|
Cert->>CR: Trigger renewal
|
||||||
|
CR->>SCI: Re-sign CSR
|
||||||
|
SCI->>StepCA: Request new certificate
|
||||||
|
StepCA-->>SCI: Return signed certificate
|
||||||
|
SCI->>Secret: Update secret
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
### Device Trust: Adding `.lab` CA to External Devices
|
||||||
|
|
||||||
|
#### **Manual Trust Installation**
|
||||||
|
1. **Export Root CA**:
|
||||||
|
```bash
|
||||||
|
scp pi1:/home/step/.step/certs/root_ca.crt ./arcodange-lab-ca.crt
|
||||||
|
```
|
||||||
|
2. **Install on Devices**:
|
||||||
|
- **macOS**:
|
||||||
|
```bash
|
||||||
|
sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain ./arcodange-lab-ca.crt
|
||||||
|
```
|
||||||
|
- **Linux (Debian/Ubuntu)**:
|
||||||
|
```bash
|
||||||
|
sudo cp arcodange-lab-ca.crt /usr/local/share/ca-certificates/
|
||||||
|
sudo update-ca-certificates
|
||||||
|
```
|
||||||
|
- **Windows**:
|
||||||
|
- Import via `certmgr.msc` → **Trusted Root Certification Authorities**.
|
||||||
|
- **Android/iOS**:
|
||||||
|
- Email the `.crt` and install via device settings.
|
||||||
|
- **Raspberry Pi**:
|
||||||
|
```bash
|
||||||
|
sudo cp arcodange-lab-ca.crt /etc/ssl/certs/
|
||||||
|
sudo update-ca-certificates
|
||||||
|
```
|
||||||
|
|
||||||
|
#### **Automated Trust via Ansible**
|
||||||
|
- **Playbook**: `playbooks/system/trust_ca.yml`
|
||||||
|
- **Role**: `arcodange.factory.trust_ca`
|
||||||
|
- **Targets**: All nodes in `raspberries` group.
|
||||||
|
|
||||||
|
### Troubleshooting Common Issues
|
||||||
|
|
||||||
|
#### 1. **Certificate Not Issued**
|
||||||
|
- **Symptoms**: `CertificateRequest` stuck in `Pending`.
|
||||||
|
- **Causes**:
|
||||||
|
- Step CA unreachable.
|
||||||
|
- Incorrect `caBundle` or provisioner `kid`.
|
||||||
|
- Network policies blocking egress to Step CA.
|
||||||
|
- **Fixes**:
|
||||||
|
```bash
|
||||||
|
# Check StepClusterIssuer status
|
||||||
|
kubectl -n cert-manager describe stepclusterissuer step-issuer
|
||||||
|
|
||||||
|
# Verify Step CA connectivity
|
||||||
|
kubectl -n cert-manager logs -l app.kubernetes.io/name=step-issuer
|
||||||
|
|
||||||
|
# Test Step CA manually
|
||||||
|
step ca certificate --ca-url https://ssl-ca.arcodange.lab:8443 \
|
||||||
|
--root /home/step/.step/certs/root_ca.crt \
|
||||||
|
test.lab test.crt test.key
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2. **Traefik TLS Errors**
|
||||||
|
- **Symptoms**: `502 Bad Gateway` or TLS handshake failures.
|
||||||
|
- **Causes**:
|
||||||
|
- Missing certificate in `Secret`.
|
||||||
|
- Incorrect SNI routing.
|
||||||
|
- Expired certificates.
|
||||||
|
- **Fixes**:
|
||||||
|
```bash
|
||||||
|
# Check Traefik logs
|
||||||
|
kubectl -n kube-system logs -l app.kubernetes.io/name=traefik
|
||||||
|
|
||||||
|
# Verify certificate secret
|
||||||
|
kubectl -n kube-system get secret wildcard-arcodange-lab-tls -o yaml
|
||||||
|
|
||||||
|
# Restart Traefik
|
||||||
|
kubectl -n kube-system rollout restart deployment/traefik
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3. **Device Trust Issues**
|
||||||
|
- **Symptoms**: Browser warnings (`NET::ERR_CERT_AUTHORITY_INVALID`).
|
||||||
|
- **Causes**:
|
||||||
|
- CA not installed in device trust store.
|
||||||
|
- Clock skew (certificate validity).
|
||||||
|
- **Fixes**:
|
||||||
|
- Reinstall CA certificate.
|
||||||
|
- Sync device clock with NTP:
|
||||||
|
```bash
|
||||||
|
sudo ntpdate pool.ntp.org
|
||||||
|
```
|
||||||
|
|
||||||
|
### Security Considerations
|
||||||
|
|
||||||
|
#### 1. **Provisioner Security**
|
||||||
|
- **JWK Provisioner**: Encrypted with a password stored in Kubernetes `Secret`.
|
||||||
|
- **Password Rotation**:
|
||||||
|
```bash
|
||||||
|
# Rotate JWK password via Ansible
|
||||||
|
ansible-playbook playbooks/ssl/rotate_jwk_password.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2. **Certificate Revocation**
|
||||||
|
- **OCSP**: Step CA supports Online Certificate Status Protocol.
|
||||||
|
- **Manual Revocation**:
|
||||||
|
```bash
|
||||||
|
step ca revoke <serial> --reason superseded
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3. **Network Isolation**
|
||||||
|
- **Step CA Access**: Restricted to k3s cluster IPs via firewall rules.
|
||||||
|
- **Traefik Middlewares**: Enforce IP allowlisting for internal services.
|
||||||
|
|
||||||
|
### Future Enhancements
|
||||||
|
|
||||||
|
1. **Automated Device Onboarding**:
|
||||||
|
- MDM (Mobile Device Management) integration for CA trust.
|
||||||
|
- Ansible playbook for bulk device enrollment.
|
||||||
|
|
||||||
|
2. **Step CA High Availability**:
|
||||||
|
- Multi-node Step CA with RAFT consensus.
|
||||||
|
- Automatic failover for provisioners.
|
||||||
|
|
||||||
|
3. **Certificate Transparency**:
|
||||||
|
- Log all `.lab` certificates to a private CT log.
|
||||||
|
|
||||||
|
4. **Short-Lived Certificates**:
|
||||||
|
- Reduce default TTL to 1h for critical services.
|
||||||
|
|
||||||
|
### References
|
||||||
|
|
||||||
|
- [Step CA Documentation](https://smallstep.com/docs/step-ca/)
|
||||||
|
- [cert-manager Step Issuer](https://smallstep.com/docs/step-certificates/kubernetes/)
|
||||||
|
- [Traefik TLS Configuration](https://doc.traefik.io/traefik/https/tls/)
|
||||||
|
|||||||
Reference in New Issue
Block a user