# Infrastructure Brainstorming Session **Date**: 2025-10-28 **Status**: Planning Phase --- ## Initial Claude Code Discovery I watched this video, https://youtu.be/MsQACpcuTkU?si=2h5VUlgtIcpLbP1v literally took his word as fact and subscribed to Claude Pro. I need to set this up on my 2013 Mac Pro Running Sequoia (using Open Core Legacy Patcher) please outline the steps and process for making this happen ### Claude Code Interest - The /init Command Specifically Claude Code, as a clarification, I am most intrigued by the use of the /init command **Setup Requirements:** - Homebrew installation - Claude Code CLI tool - API authentication (separate from Claude Pro subscription) - Note: Claude Pro ≠ API access (separate billing) --- ## Infrastructure Expansion Plans ### Current Environment **VPS:** - 2 cores / 4GB RAM - Running: Pangolin reverse proxy with Gerbil tunnels (WireGuard-based) - Concern: RAM and CPU usage limits **Home Lab (Proxmox):** - **DL380p**: 32 cores, 96GB RAM (main cluster node) - **i5**: 8 cores, 8GB RAM (secondary cluster node) - **OMV**: 12TB storage node **Development Machine:** - Mac Pro 2013 running Sequoia (via Open Core Legacy Patcher) ### Proposed New Services 1. **RustDesk Server** - Self-hosted remote desktop 2. **n8n** - Workflow automation platform 3. **Authentik** - Single Sign-On (SSO) platform 4. **Obsidian Livesync** - Self-hosted note synchronization --- ## Architecture Decision: Hybrid Approach ### VPS (Lightweight Services Only) - Pangolin reverse proxy (existing) - Gerbil tunnels (existing, WireGuard-based) - RustDesk relay server (hbbr) - ~30-50MB RAM for NAT traversal only **Reasoning**: Keep VPS lightweight to avoid resource constraints ### DL380p Proxmox (Heavy Lifting) - PostgreSQL (shared database server) - Authentik SSO with WebAuthn support - n8n workflow automation - RustDesk ID server (hbbs) - handles registration and signaling - Prometheus + Grafana monitoring - Obsidian CouchDB sync server **Reasoning**: Abundant resources (32 cores, 96GB RAM) available for all services --- ## Authentik SSO - Core Requirements ### WebAuthn/FIDO2 Hardware Authentication **Critical Requirement**: Device-specific hardware 2FA **Supported Devices:** - iPhone with Face ID (biometric authentication) - Windows 11 laptop with Windows Hello (fingerprint/face/PIN) - No YubiKey required (but supported if needed later) **Security Features:** - Phishing-resistant (WebAuthn verifies domain) - Each device has unique cryptographic key - Keys stored in device secure enclave (iPhone) or TPM (Windows) - Can revoke individual devices if lost/stolen - TOTP as backup MFA method ### Integration Targets **Priority 1 (Critical):** - Proxmox VE (OpenID Connect) - n8n (OAuth2) - Pangolin admin dashboard (if supported) **Priority 2 (Nice to have):** - Grafana (OAuth2) - HomeAssistant (OAuth2) - Any future services **SSO Policies:** - External access (via Pangolin): WebAuthn REQUIRED - Internal network access: WebAuthn preferred, TOTP acceptable - Admin operations: Always require WebAuthn --- ## Network Architecture ### Flow Diagram ``` Internet → VPS (Pangolin Reverse Proxy) ↓ Gerbil Tunnel (WireGuard) ↓ DL380p Proxmox Home Lab ↓ Authentik SSO ←→ All Services ├─→ n8n ├─→ RustDesk (hbbs) ├─→ Grafana ├─→ Proxmox Web UI └─→ HomeAssistant (future) ``` ### Service Endpoints - `auth.yourdomain.com` → Authentik SSO - `n8n.yourdomain.com` → n8n workflows - `grafana.yourdomain.com` → Monitoring dashboards - `obsidian.yourdomain.com` → Note sync (CouchDB) --- ## Implementation Strategy: 8 Phases ### Phase 1: Planning & Preparation - Document current infrastructure - Make architecture decisions (LXC vs Docker, shared vs separate PostgreSQL) - Create project structure with Claude Code - Plan network layout and port assignments ### Phase 2: Infrastructure Foundation on Proxmox - Deploy PostgreSQL 15 (shared database server) - Network and port planning - Reserve static IPs for all services ### Phase 3: Deploy Core Services on Proxmox - Authentik SSO with WebAuthn/FIDO2 support - n8n workflow automation - RustDesk ID server (hbbs) ### Phase 4: VPS Configuration - RustDesk relay server (hbbr) - lightweight - Update Pangolin reverse proxy routes - DNS record creation - SSL certificate management ### Phase 5: SSO Integration & WebAuthn Enrollment - Configure Authentik OAuth2/OIDC providers - Integrate Proxmox with OpenID Connect - Integrate n8n with OAuth2 - Enroll all personal devices (iPhone, Windows laptop) - Set up TOTP backup ### Phase 6: Monitoring, Security & Hardening - Deploy Prometheus + Grafana monitoring stack - Security hardening (firewall rules, Fail2ban, SSL) - WebAuthn policies and device management - Configure alerts ### Phase 7: Backup, Documentation & Testing - Comprehensive backup solution to OMV (NFS) - Complete infrastructure documentation - Testing and validation procedures - Disaster recovery drills ### Phase 8: Future Integrations - HomeAssistant integration with Authentik - Obsidian Livesync deployment - Additional services as needed --- ## Resource Allocation Plan ### Proxmox DL380p Services | Service | Cores | RAM | Storage | Purpose | |---------|-------|-----|---------|---------| | PostgreSQL | 2 | 4GB | 20GB | Shared database for all services | | Authentik | 2 | 3GB | 30GB | SSO platform with WebAuthn | | n8n | 4 | 4GB | 40GB | Workflow automation | | RustDesk (hbbs) | 2 | 2GB | 10GB | Remote desktop ID server | | Monitoring | 2 | 4GB | 50GB | Prometheus + Grafana | | Obsidian Sync | 2 | 2GB | 50GB | CouchDB for note synchronization | | **Total** | **14** | **19GB** | **200GB** | | | **Available** | **18/32** | **77GB/96GB** | - | Still plenty of headroom! | ### VPS Resource Usage | Service | Cores | RAM | Purpose | |---------|-------|-----|---------| | Pangolin | ~1 | ~2GB | Reverse proxy | | Gerbil | ~0.5 | ~256MB | WireGuard tunnels | | RustDesk (hbbr) | ~0.5 | ~128MB | NAT traversal relay | | **Total** | **~2** | **~2.4GB** | | | **Limit** | **2** | **4GB** | Within safe limits ✅ | --- ## Obsidian Implementation Details ### Why Obsidian for Infrastructure Documentation? - Native markdown checkbox support - Real-time sync across all devices (Mac, Windows, iPhone) - Self-hosted sync (no subscription needed) - Can store infrastructure checklist, notes, diagrams - Works offline - End-to-end encrypted ### Obsidian Livesync Architecture - CouchDB server on Proxmox (backend) - Obsidian apps on all devices (clients) - Self-hosted sync via Pangolin reverse proxy - Database: `obsidian-vault` - Backup to OMV storage ### Device Setup 1. Mac Pro: Primary documentation device 2. Windows 11 Laptop: Access from work/travel 3. iPhone: Mobile access to infrastructure notes and checklists ### Integration with Infrastructure Project - Implementation checklist (190+ tasks) stored in Obsidian - Real-time updates across devices as tasks are completed - Can attach network diagrams, screenshots, configs - Version history via CouchDB replication --- ## Security Considerations ### Authentication Layers 1. **Network Level**: Gerbil tunnel encryption (WireGuard) 2. **Application Level**: Authentik SSO with WebAuthn 3. **Device Level**: Hardware-based authentication (Face ID, Windows Hello) 4. **Backup Level**: TOTP authenticator app ### Firewall Strategy - VPS: Only expose Pangolin ports (80, 443, Gerbil tunnel port) - Proxmox: Internal network only, no direct external access - LXC containers: Isolated, only necessary inter-container communication - Fail2ban on Authentik and VPS SSH ### Backup Security - Daily backups to OMV (12TB NFS storage) - Weekly and monthly rotation - PostgreSQL dumps (compressed) - Authentik media and config backups - n8n workflow backups (credentials encrypted) - RustDesk encryption keys (CRITICAL) - Grafana dashboards - Off-site backup optional (cloud via rclone) ### Certificate Management - Let's Encrypt via Pangolin - Automated renewal - HSTS headers enabled - TLS 1.3 enforcement --- ## Development Approach: Claude Code Usage ### Primary Use Cases 1. Generate complete deployment scripts for each service 2. Create LXC container configurations 3. Generate Docker Compose files 4. Create backup automation scripts 5. Generate comprehensive documentation 6. Create testing and validation scripts ### Example /init Commands **PostgreSQL Deployment:** ``` /init Create PostgreSQL 15 deployment for Proxmox LXC container with: - Debian 12 base - Separate databases for authentik, n8n, rustdesk, grafana - Optimized for 4GB RAM - Backup scripts to NFS mount ``` **Authentik with WebAuthn:** ``` /init Create Authentik SSO server deployment for Proxmox LXC with WebAuthn/FIDO2 support: - Docker Compose setup - External PostgreSQL connection - WebAuthn enrollment flows - OAuth2/OIDC provider configurations - Integration templates for Proxmox, n8n, Grafana ``` **Complete Infrastructure:** ``` /init Create comprehensive project structure for self-hosted infrastructure: - Folder organization for all services - Deployment phase documentation - Environment templates - Backup automation - Monitoring dashboards - Security hardening checklists ``` --- ## Timeline Estimate ### Week 1: Foundation (Phases 1-3) - Day 1-2: Planning and documentation - Day 3-4: PostgreSQL and network setup - Day 5-7: Deploy Authentik, n8n, RustDesk on Proxmox ### Week 2: Integration (Phases 4-5) - Day 1-2: VPS services and Pangolin configuration - Day 3-5: SSO integration and WebAuthn enrollment - Day 6-7: Testing and troubleshooting ### Week 3: Finalization (Phases 6-7) - Day 1-3: Monitoring, security hardening, backup automation - Day 4-5: Complete documentation - Day 6-7: Comprehensive testing and disaster recovery drill ### Week 4+: Expansion (Phase 8) - HomeAssistant integration - Obsidian Livesync deployment - Additional services as needed **Note**: This is a methodical, careful rollout. No rushing. Test each phase thoroughly before proceeding. --- ## Success Metrics ### Technical Metrics - All services accessible externally via SSO - WebAuthn works on all enrolled devices - No single service exceeding allocated resources - VPS CPU/RAM usage under control (<50% / <3GB) - Backups running successfully (100% success rate) - All monitoring dashboards populated with data - Zero unplanned downtime during deployment ### User Experience Metrics - Single sign-on across all services - Face ID / Windows Hello authentication works seamlessly - No password fatigue (SSO handles everything) - Mobile access to all services via Authentik - Infrastructure documentation accessible from any device (Obsidian) - Fast response times (<2s for service access) ### Security Metrics - All external access requires WebAuthn - No default passwords remaining - Fail2ban protecting critical services - SSL certificates valid and auto-renewing - Audit logging enabled in Authentik - Regular backup verification (monthly) --- ## Open Questions / Decisions Needed ### To Decide Before Starting: - [ ] Confirm domain names to use (auth.domain.com, n8n.domain.com, etc.) - [ ] LXC containers vs Docker VMs? (Recommendation: LXC for efficiency) - [ ] Shared PostgreSQL or separate instances? (Recommendation: Shared) - [ ] Separate VLAN for services? (Recommendation: Yes, if possible) - [ ] Let's Encrypt via Pangolin or internal CA? (Recommendation: Let's Encrypt) - [ ] Off-site backup strategy? (Cloud, second location, etc.) ### To Document During Setup: - [ ] IP addresses assigned to each service - [ ] Database credentials (store securely) - [ ] OAuth Client IDs and secrets - [ ] Authentik admin credentials - [ ] RustDesk encryption keys (CRITICAL!) - [ ] Backup schedule and retention - [ ] Emergency access procedures --- ## Lessons Learned / Notes ### Why Hybrid Architecture? - VPS is resource-constrained (2 cores / 4GB RAM) - DL380p has abundant resources (32 cores / 96GB RAM) - Gerbil tunnels already provide secure connectivity - Minimizes VPS costs while maximizing home lab utilization - Services stay responsive (no resource contention on VPS) ### Why Authentik over Alternatives? - **vs Keycloak**: Much lighter weight (Keycloak needs 1-2GB+ RAM) - **vs Authelia**: More feature-complete, better app support - Native WebAuthn/FIDO2 support - Modern UI - Active development - Good documentation - Self-hosted (privacy and control) ### Why LXC Containers? - More efficient than VMs (less overhead) - Native Proxmox integration - Easier backups and snapshots - Better resource utilization - Faster boot times - Still provides isolation ### Why Shared PostgreSQL? - Single database server to manage - Easier backups (one dump for all databases) - Resource efficiency (connection pooling) - Simpler monitoring - Adequate for home lab scale - Can migrate to separate instances later if needed --- ## Reference Links ### Tools & Services - **Claude Code**: https://docs.claude.com/en/docs/claude-code - **Authentik**: https://goauthentik.io/ - **n8n**: https://n8n.io/ - **RustDesk**: https://rustdesk.com/ - **Obsidian**: https://obsidian.md/ - **Prometheus**: https://prometheus.io/ - **Grafana**: https://grafana.com/ ### Documentation Created - CLAUDE.md - Repository guidance for Claude Code - RUNBOOK.md - Operational procedures - DISASTER-RECOVERY.md - Recovery procedures - SERVICES.md - Service configuration templates - IMPROVEMENTS.md - Infrastructure recommendations - MONITORING.md - Monitoring setup guide - infrastructure-audit.md - Infrastructure audit checklist - Infrastructure-Implementation-Checklist.md - Complete deployment checklist ### Automation Scripts - backup-proxmox.sh - VM/container backups - backup-vps.sh - VPS configuration backups - health-check.sh - Service health monitoring - cert-check.sh - SSL certificate expiration - tunnel-monitor.sh - Gerbil tunnel monitoring - resource-report.sh - Weekly resource reports --- ## Next Immediate Actions 1. **Review and finalize architecture decisions** - Confirm domain names - Decide on LXC vs Docker - Plan network/VLAN layout 2. **Start with Claude Code project structure** ```bash cd ~/proxmox-infrastructure claude /init Create comprehensive project structure... ``` 3. **Fill out infrastructure audit checklist** - Current VPS details - Proxmox network configuration - Available IP addresses - DNS provider details 4. **Set up Obsidian for documentation** - Install on Mac Pro - Import implementation checklist - Begin checking off tasks as completed 5. **Begin Phase 1: Planning & Preparation** - Document current state - Make final decisions - Create project scaffolding --- **Status**: Ready to begin implementation! **Excitement Level**: 🚀🚀🚀 **Last Updated**: 2025-10-28