Files
homelab-docs/infrastructure/BRAINSTORM.md

15 KiB

Infrastructure Brainstorming Session

Date: 2025-10-28 Status: Planning Phase


Initial Claude Code Discovery

I watched this video, https://youtu.be/MsQACpcuTkU?si=2h5VUlgtIcpLbP1v literally took his word as fact and subscribed to Claude Pro. I need to set this up on my 2013 Mac Pro Running Sequoia (using Open Core Legacy Patcher) please outline the steps and process for making this happen

Claude Code Interest - The /init Command

Specifically Claude Code, as a clarification, I am most intrigued by the use of the /init command

Setup Requirements:

  • Homebrew installation
  • Claude Code CLI tool
  • API authentication (separate from Claude Pro subscription)
  • Note: Claude Pro ≠ API access (separate billing)

Infrastructure Expansion Plans

Current Environment

VPS:

  • 2 cores / 4GB RAM
  • Running: Pangolin reverse proxy with Gerbil tunnels (WireGuard-based)
  • Concern: RAM and CPU usage limits

Home Lab (Proxmox):

  • DL380p: 32 cores, 96GB RAM (main cluster node)
  • i5: 8 cores, 8GB RAM (secondary cluster node)
  • OMV: 12TB storage node

Development Machine:

  • Mac Pro 2013 running Sequoia (via Open Core Legacy Patcher)

Proposed New Services

  1. RustDesk Server - Self-hosted remote desktop
  2. n8n - Workflow automation platform
  3. Authentik - Single Sign-On (SSO) platform
  4. Obsidian Livesync - Self-hosted note synchronization

Architecture Decision: Hybrid Approach

VPS (Lightweight Services Only)

  • Pangolin reverse proxy (existing)
  • Gerbil tunnels (existing, WireGuard-based)
  • RustDesk relay server (hbbr) - ~30-50MB RAM for NAT traversal only

Reasoning: Keep VPS lightweight to avoid resource constraints

DL380p Proxmox (Heavy Lifting)

  • PostgreSQL (shared database server)
  • Authentik SSO with WebAuthn support
  • n8n workflow automation
  • RustDesk ID server (hbbs) - handles registration and signaling
  • Prometheus + Grafana monitoring
  • Obsidian CouchDB sync server

Reasoning: Abundant resources (32 cores, 96GB RAM) available for all services


Authentik SSO - Core Requirements

WebAuthn/FIDO2 Hardware Authentication

Critical Requirement: Device-specific hardware 2FA

Supported Devices:

  • iPhone with Face ID (biometric authentication)
  • Windows 11 laptop with Windows Hello (fingerprint/face/PIN)
  • No YubiKey required (but supported if needed later)

Security Features:

  • Phishing-resistant (WebAuthn verifies domain)
  • Each device has unique cryptographic key
  • Keys stored in device secure enclave (iPhone) or TPM (Windows)
  • Can revoke individual devices if lost/stolen
  • TOTP as backup MFA method

Integration Targets

Priority 1 (Critical):

  • Proxmox VE (OpenID Connect)
  • n8n (OAuth2)
  • Pangolin admin dashboard (if supported)

Priority 2 (Nice to have):

  • Grafana (OAuth2)
  • HomeAssistant (OAuth2)
  • Any future services

SSO Policies:

  • External access (via Pangolin): WebAuthn REQUIRED
  • Internal network access: WebAuthn preferred, TOTP acceptable
  • Admin operations: Always require WebAuthn

Network Architecture

Flow Diagram

Internet → VPS (Pangolin Reverse Proxy)
              ↓
       Gerbil Tunnel (WireGuard)
              ↓
       DL380p Proxmox Home Lab
              ↓
       Authentik SSO ←→ All Services
              ├─→ n8n
              ├─→ RustDesk (hbbs)
              ├─→ Grafana
              ├─→ Proxmox Web UI
              └─→ HomeAssistant (future)

Service Endpoints

  • auth.yourdomain.com → Authentik SSO
  • n8n.yourdomain.com → n8n workflows
  • grafana.yourdomain.com → Monitoring dashboards
  • obsidian.yourdomain.com → Note sync (CouchDB)

Implementation Strategy: 8 Phases

Phase 1: Planning & Preparation

  • Document current infrastructure
  • Make architecture decisions (LXC vs Docker, shared vs separate PostgreSQL)
  • Create project structure with Claude Code
  • Plan network layout and port assignments

Phase 2: Infrastructure Foundation on Proxmox

  • Deploy PostgreSQL 15 (shared database server)
  • Network and port planning
  • Reserve static IPs for all services

Phase 3: Deploy Core Services on Proxmox

  • Authentik SSO with WebAuthn/FIDO2 support
  • n8n workflow automation
  • RustDesk ID server (hbbs)

Phase 4: VPS Configuration

  • RustDesk relay server (hbbr) - lightweight
  • Update Pangolin reverse proxy routes
  • DNS record creation
  • SSL certificate management

Phase 5: SSO Integration & WebAuthn Enrollment

  • Configure Authentik OAuth2/OIDC providers
  • Integrate Proxmox with OpenID Connect
  • Integrate n8n with OAuth2
  • Enroll all personal devices (iPhone, Windows laptop)
  • Set up TOTP backup

Phase 6: Monitoring, Security & Hardening

  • Deploy Prometheus + Grafana monitoring stack
  • Security hardening (firewall rules, Fail2ban, SSL)
  • WebAuthn policies and device management
  • Configure alerts

Phase 7: Backup, Documentation & Testing

  • Comprehensive backup solution to OMV (NFS)
  • Complete infrastructure documentation
  • Testing and validation procedures
  • Disaster recovery drills

Phase 8: Future Integrations

  • HomeAssistant integration with Authentik
  • Obsidian Livesync deployment
  • Additional services as needed

Resource Allocation Plan

Proxmox DL380p Services

Service Cores RAM Storage Purpose
PostgreSQL 2 4GB 20GB Shared database for all services
Authentik 2 3GB 30GB SSO platform with WebAuthn
n8n 4 4GB 40GB Workflow automation
RustDesk (hbbs) 2 2GB 10GB Remote desktop ID server
Monitoring 2 4GB 50GB Prometheus + Grafana
Obsidian Sync 2 2GB 50GB CouchDB for note synchronization
Total 14 19GB 200GB
Available 18/32 77GB/96GB - Still plenty of headroom!

VPS Resource Usage

Service Cores RAM Purpose
Pangolin ~1 ~2GB Reverse proxy
Gerbil ~0.5 ~256MB WireGuard tunnels
RustDesk (hbbr) ~0.5 ~128MB NAT traversal relay
Total ~2 ~2.4GB
Limit 2 4GB Within safe limits

Obsidian Implementation Details

Why Obsidian for Infrastructure Documentation?

  • Native markdown checkbox support
  • Real-time sync across all devices (Mac, Windows, iPhone)
  • Self-hosted sync (no subscription needed)
  • Can store infrastructure checklist, notes, diagrams
  • Works offline
  • End-to-end encrypted

Obsidian Livesync Architecture

  • CouchDB server on Proxmox (backend)
  • Obsidian apps on all devices (clients)
  • Self-hosted sync via Pangolin reverse proxy
  • Database: obsidian-vault
  • Backup to OMV storage

Device Setup

  1. Mac Pro: Primary documentation device
  2. Windows 11 Laptop: Access from work/travel
  3. iPhone: Mobile access to infrastructure notes and checklists

Integration with Infrastructure Project

  • Implementation checklist (190+ tasks) stored in Obsidian
  • Real-time updates across devices as tasks are completed
  • Can attach network diagrams, screenshots, configs
  • Version history via CouchDB replication

Security Considerations

Authentication Layers

  1. Network Level: Gerbil tunnel encryption (WireGuard)
  2. Application Level: Authentik SSO with WebAuthn
  3. Device Level: Hardware-based authentication (Face ID, Windows Hello)
  4. Backup Level: TOTP authenticator app

Firewall Strategy

  • VPS: Only expose Pangolin ports (80, 443, Gerbil tunnel port)
  • Proxmox: Internal network only, no direct external access
  • LXC containers: Isolated, only necessary inter-container communication
  • Fail2ban on Authentik and VPS SSH

Backup Security

  • Daily backups to OMV (12TB NFS storage)
  • Weekly and monthly rotation
  • PostgreSQL dumps (compressed)
  • Authentik media and config backups
  • n8n workflow backups (credentials encrypted)
  • RustDesk encryption keys (CRITICAL)
  • Grafana dashboards
  • Off-site backup optional (cloud via rclone)

Certificate Management

  • Let's Encrypt via Pangolin
  • Automated renewal
  • HSTS headers enabled
  • TLS 1.3 enforcement

Development Approach: Claude Code Usage

Primary Use Cases

  1. Generate complete deployment scripts for each service
  2. Create LXC container configurations
  3. Generate Docker Compose files
  4. Create backup automation scripts
  5. Generate comprehensive documentation
  6. Create testing and validation scripts

Example /init Commands

PostgreSQL Deployment:

/init Create PostgreSQL 15 deployment for Proxmox LXC container with:
- Debian 12 base
- Separate databases for authentik, n8n, rustdesk, grafana
- Optimized for 4GB RAM
- Backup scripts to NFS mount

Authentik with WebAuthn:

/init Create Authentik SSO server deployment for Proxmox LXC with WebAuthn/FIDO2 support:
- Docker Compose setup
- External PostgreSQL connection
- WebAuthn enrollment flows
- OAuth2/OIDC provider configurations
- Integration templates for Proxmox, n8n, Grafana

Complete Infrastructure:

/init Create comprehensive project structure for self-hosted infrastructure:
- Folder organization for all services
- Deployment phase documentation
- Environment templates
- Backup automation
- Monitoring dashboards
- Security hardening checklists

Timeline Estimate

Week 1: Foundation (Phases 1-3)

  • Day 1-2: Planning and documentation
  • Day 3-4: PostgreSQL and network setup
  • Day 5-7: Deploy Authentik, n8n, RustDesk on Proxmox

Week 2: Integration (Phases 4-5)

  • Day 1-2: VPS services and Pangolin configuration
  • Day 3-5: SSO integration and WebAuthn enrollment
  • Day 6-7: Testing and troubleshooting

Week 3: Finalization (Phases 6-7)

  • Day 1-3: Monitoring, security hardening, backup automation
  • Day 4-5: Complete documentation
  • Day 6-7: Comprehensive testing and disaster recovery drill

Week 4+: Expansion (Phase 8)

  • HomeAssistant integration
  • Obsidian Livesync deployment
  • Additional services as needed

Note: This is a methodical, careful rollout. No rushing. Test each phase thoroughly before proceeding.


Success Metrics

Technical Metrics

  • All services accessible externally via SSO
  • WebAuthn works on all enrolled devices
  • No single service exceeding allocated resources
  • VPS CPU/RAM usage under control (<50% / <3GB)
  • Backups running successfully (100% success rate)
  • All monitoring dashboards populated with data
  • Zero unplanned downtime during deployment

User Experience Metrics

  • Single sign-on across all services
  • Face ID / Windows Hello authentication works seamlessly
  • No password fatigue (SSO handles everything)
  • Mobile access to all services via Authentik
  • Infrastructure documentation accessible from any device (Obsidian)
  • Fast response times (<2s for service access)

Security Metrics

  • All external access requires WebAuthn
  • No default passwords remaining
  • Fail2ban protecting critical services
  • SSL certificates valid and auto-renewing
  • Audit logging enabled in Authentik
  • Regular backup verification (monthly)

Open Questions / Decisions Needed

To Decide Before Starting:

  • Confirm domain names to use (auth.domain.com, n8n.domain.com, etc.)
  • LXC containers vs Docker VMs? (Recommendation: LXC for efficiency)
  • Shared PostgreSQL or separate instances? (Recommendation: Shared)
  • Separate VLAN for services? (Recommendation: Yes, if possible)
  • Let's Encrypt via Pangolin or internal CA? (Recommendation: Let's Encrypt)
  • Off-site backup strategy? (Cloud, second location, etc.)

To Document During Setup:

  • IP addresses assigned to each service
  • Database credentials (store securely)
  • OAuth Client IDs and secrets
  • Authentik admin credentials
  • RustDesk encryption keys (CRITICAL!)
  • Backup schedule and retention
  • Emergency access procedures

Lessons Learned / Notes

Why Hybrid Architecture?

  • VPS is resource-constrained (2 cores / 4GB RAM)
  • DL380p has abundant resources (32 cores / 96GB RAM)
  • Gerbil tunnels already provide secure connectivity
  • Minimizes VPS costs while maximizing home lab utilization
  • Services stay responsive (no resource contention on VPS)

Why Authentik over Alternatives?

  • vs Keycloak: Much lighter weight (Keycloak needs 1-2GB+ RAM)
  • vs Authelia: More feature-complete, better app support
  • Native WebAuthn/FIDO2 support
  • Modern UI
  • Active development
  • Good documentation
  • Self-hosted (privacy and control)

Why LXC Containers?

  • More efficient than VMs (less overhead)
  • Native Proxmox integration
  • Easier backups and snapshots
  • Better resource utilization
  • Faster boot times
  • Still provides isolation

Why Shared PostgreSQL?

  • Single database server to manage
  • Easier backups (one dump for all databases)
  • Resource efficiency (connection pooling)
  • Simpler monitoring
  • Adequate for home lab scale
  • Can migrate to separate instances later if needed

Tools & Services

Documentation Created

  • CLAUDE.md - Repository guidance for Claude Code
  • RUNBOOK.md - Operational procedures
  • DISASTER-RECOVERY.md - Recovery procedures
  • SERVICES.md - Service configuration templates
  • IMPROVEMENTS.md - Infrastructure recommendations
  • MONITORING.md - Monitoring setup guide
  • infrastructure-audit.md - Infrastructure audit checklist
  • Infrastructure-Implementation-Checklist.md - Complete deployment checklist

Automation Scripts

  • backup-proxmox.sh - VM/container backups
  • backup-vps.sh - VPS configuration backups
  • health-check.sh - Service health monitoring
  • cert-check.sh - SSL certificate expiration
  • tunnel-monitor.sh - Gerbil tunnel monitoring
  • resource-report.sh - Weekly resource reports

Next Immediate Actions

  1. Review and finalize architecture decisions

    • Confirm domain names
    • Decide on LXC vs Docker
    • Plan network/VLAN layout
  2. Start with Claude Code project structure

    cd ~/proxmox-infrastructure
    claude
    /init Create comprehensive project structure...
    
  3. Fill out infrastructure audit checklist

    • Current VPS details
    • Proxmox network configuration
    • Available IP addresses
    • DNS provider details
  4. Set up Obsidian for documentation

    • Install on Mac Pro
    • Import implementation checklist
    • Begin checking off tasks as completed
  5. Begin Phase 1: Planning & Preparation

    • Document current state
    • Make final decisions
    • Create project scaffolding

Status: Ready to begin implementation! Excitement Level: 🚀🚀🚀

Last Updated: 2025-10-28