# Infrastructure TODO List **Created:** 2025-12-29 **Last Updated:** 2025-12-29 **Status:** Active development tasks This document tracks all incomplete infrastructure tasks and future improvements. --- ## ✅ Completed Items ### 1. Fix Home Assistant Public Domain Access **Status**: ✅ COMPLETED (2025-12-29) **What was done**: 1. Updated Caddy to use HTTPS backend for Home Assistant 2. Added VPS WireGuard IP (10.0.8.1) to Home Assistant's trusted_proxies 3. Verified bob.nianticbooks.com is accessible **Result**: All 5 public domains now working: - ✅ freddesk.nianticbooks.com → Proxmox - ✅ bob.nianticbooks.com → Home Assistant - ✅ ad5m.nianticbooks.com → 3D Printer - ✅ auth.nianticbooks.com → Authentik SSO - ✅ bible.nianticbooks.com → Bible reading plan ### 2. Deploy RustDesk ID Server **Status**: ✅ COMPLETED (2025-12-25) **What was deployed**: 1. ID Server (hbbs) on main-pve LXC 123 at 10.0.10.23 2. Relay Server (hbbr) on VPS at 66.63.182.168:21117 3. Generated encryption key pair 4. Verified client connectivity **Result**: RustDesk fully operational - ✅ ID Server (hbbs): 10.0.10.23 ports 21115, 21116, 21118 - ✅ Relay Server (hbbr): VPS port 21117 - ✅ Public Key: `sfYuCTMHxrA22kukomb/RAKYyUgr8iaMfm/U4CFLfL0=` - ✅ Client Configuration: ID Server `66.63.182.168`, Key included - ✅ Version: 1.1.14 (both servers) **Documentation**: - SERVICES.md - Service inventory and health checks - guides/RUSTDESK-DEPLOYMENT-COMPLETE.md - Complete deployment guide --- ## Medium Priority ### 3. Deploy Prometheus + Grafana Monitoring **Status**: ✅ DISCOVERED - Already deployed (2025-12-29) **Current State**: - **Location**: 10.0.10.25 (responding to ping) - **Grafana**: Port 3000 ✅ Running (redirects to /login) - **Prometheus**: Port 9090 ✅ Running - **Deployment Method**: TBD (need to investigate) **Remaining Configuration Tasks**: 1. Document deployment method (Docker Compose, systemd, VM/Container type) 2. Configure PostgreSQL database on 10.0.10.20 for Grafana (if not already done) 3. Set up Authentik SSO for Grafana 4. Configure Prometheus monitoring targets: - Proxmox nodes (via node_exporter) - VPS (WireGuard tunnel metrics) - PostgreSQL - Home Assistant - Other services 5. Import Grafana dashboards: - Proxmox overview - PostgreSQL metrics - Network metrics 6. Set up alerting (email/Slack) 7. Optionally add Caddy public route **Priority**: Low-Medium (services running, configuration needed) **Note**: This was discovered during the infrastructure audit. The basic services are operational, but monitoring targets and dashboards need configuration. --- ## Low Priority (Cleanup) ### 4. Remove Deprecated VMs **Objective**: Reclaim resources from unused services **Status**: ⏸️ Deferred - Non-critical #### 4.1 Remove Spoolman VM **Current State**: - IP: 10.0.10.71 (allocated but not in use) - Reason: Bambu printer incompatible, service no longer needed **Steps**: 1. Verify no dependencies: `pct/qm status ` 2. Backup if needed: `vzdump --storage backup` 3. Stop VM/container: `pct stop ` or `qm stop ` 4. Delete: `pct destroy ` or `qm destroy ` 5. Remove Pangolin route (if exists) 6. Update IP-ALLOCATION.md to mark 10.0.10.71 as available 7. Update documentation **Priority**: Low **Estimated Time**: 15 minutes #### 4.2 Remove Authelia VM **Current State**: - IP: 10.0.10.112 (allocated but not in use) - Reason: Replaced by Authentik SSO **Steps**: 1. Verify Authentik is working for all services 2. Backup Authelia config for reference (if needed) 3. Stop VM/container: `pct stop ` or `qm stop ` 4. Delete: `pct destroy ` or `qm destroy ` 5. Update IP-ALLOCATION.md to mark 10.0.10.112 as available (or remove from list) 6. Update documentation **Priority**: Low **Estimated Time**: 15 minutes --- ## Future Enhancements ### 5. n8n + Claude Code Advanced Features **Objective**: Enhance n8n and Claude Code integration **Status**: ✅ Basic integration working, advanced features optional **Remaining Optional Tasks** (from MIGRATION-CHECKLIST.md 6.4): - [ ] Session management workflow (UUID generation, multi-turn conversations) - [ ] Slack integration (Slack → n8n → Claude Code → Slack) - [ ] Tool deployment with `--dangerously-skip-permissions` flag - [ ] Error handling (network disconnect, invalid commands) - [ ] Resource monitoring during heavy Claude operations - [ ] Production hardening: - SSH timeout configuration - Output length limits - Logging for Claude executions - Error notifications - Optional Caddy route for public n8n access (with Authentik SSO) **Reference**: - MIGRATION-CHECKLIST.md section 6.4 - N8N-CLAUDE-STATUS.md **Priority**: Low (nice-to-have, basic functionality working) **Estimated Time**: 2-4 hours for each feature --- ### 6. Home Assistant Enhancements #### 6.1 Configure Local HTTPS Certificates **Objective**: Use local CA certificates for internal HTTPS access **Status**: ⏸️ Deferred (CA setup complete, deployment pending) **Details**: - CA already set up (HTTPS-SETUP-STATUS.md from 2025-12-06) - Certificates generated for services - Need to deploy certificates to Home Assistant and other services **Steps** (from HTTPS-SETUP-STATUS.md): 1. Copy certificates to Home Assistant: ```bash scp ~/certs/bob.crt ~/certs/bob.key root@10.0.10.24:/config/ssl/ ``` 2. Update Home Assistant configuration: ```yaml http: ssl_certificate: /config/ssl/bob.crt ssl_key: /config/ssl/bob.key server_port: 8123 ``` 3. Restart Home Assistant 4. Trust CA on client devices **Note**: Current setup uses local CA certificate. Public domain uses Caddy with Let's Encrypt. **Priority**: Low (HTTPS already working with local CA cert) **Estimated Time**: 30 minutes #### 6.2 Integrate More Services with Authentik SSO **Objective**: Single sign-on for additional services **Status**: 📋 Planned **Completed**: - ✅ Proxmox (all 3 hosts) - ✅ Grafana (OAuth2 configured) **Not Possible**: - ❌ n8n (requires Enterprise license for OIDC/SSO) **Pending**: - [ ] Home Assistant (complex - requires proxy provider or LDAP) - [ ] Other services as they're deployed **Priority**: Low (manual login acceptable for now) **Estimated Time**: 1-2 hours per service --- ### 7. Backup Strategy Completion **Objective**: Implement full 3-tier backup system **Status**: ✅ Tier 1 complete, Tier 2-3 planned **Current State** (from CLAUDE.md): - ✅ Tier 1 (Local/OMV NFS): Fully operational - PostgreSQL backups: Daily 2:00 AM - Proxmox VM/container backups: Daily 2:30 AM - Retention: 7 days daily, 4 weeks weekly, 3 months monthly **Remaining Tiers**: - [ ] Tier 2: Off-site external drives (manual rotation) - [ ] Tier 3: Backblaze B2 cloud storage (automated) **Reference**: - guides/HOMELAB-BACKUP-STRATEGY.md - guides/BACKUP-QUICK-START.md **Priority**: Medium (Tier 1 provides good protection, Tier 2-3 for disaster recovery) **Estimated Time**: 2-4 hours for Tier 3 cloud setup --- ### 8. Monitoring & Alerting **Objective**: Proactive monitoring of infrastructure health **Status**: 📋 Planned (prerequisite: Prometheus + Grafana deployment) **Components**: - [ ] Service uptime monitoring - [ ] Resource utilization (CPU, RAM, disk) - [ ] Network connectivity (WireGuard tunnel status) - [ ] Backup success/failure alerts - [ ] Certificate expiration warnings - [ ] Disk space alerts (OMV storage) **Alerting Methods**: - Email - Slack/Discord webhook - Home Assistant notifications **Priority**: Medium (blocked by Prometheus deployment) **Estimated Time**: 2-3 hours (after Prometheus is deployed) --- ### 9. Cleanup and Archive Old Documentation **Objective**: Remove or archive outdated status documents **Status**: 📋 Pending **Files to Archive or Update**: 1. **wireguard-setup-progress.md** - Status: Outdated (from November 2025) - Contains old troubleshooting info that's no longer relevant - WireGuard now operational (verified 2025-12-29) - Action: Archive to `docs/archive/` or delete 2. **HTTPS-SETUP-STATUS.md** - Status: Partially outdated (from December 6, 2025) - CA setup complete, but local cert deployment not done - Services using Caddy with Let's Encrypt for public access - Action: Archive or update with current HTTPS status 3. **N8N-CLAUDE-STATUS.md** - Status: Partially outdated - Basic integration complete - Many "TODO" items that are now optional - Action: Archive or consolidate into SERVICES.md **Priority**: Low **Estimated Time**: 30 minutes --- ## Documentation Maintenance ### 10. Keep Documentation Updated **Objective**: Maintain accurate infrastructure documentation **Regular Tasks**: - [ ] Update SERVICES.md when services change - [ ] Update IP-ALLOCATION.md for new devices - [ ] Update MIGRATION-CHECKLIST.md for completed phases - [ ] Update INFRASTRUCTURE-TODO.md (this file) as tasks are completed - [ ] Update CLAUDE.md when architecture changes **Frequency**: As changes occur **Priority**: Ongoing --- ## Quick Reference: IP Addresses Still Available ### Reserved but Unused (Available for new services): - 10.0.10.6-9 (infrastructure expansion) - 10.0.10.11-12, 10.0.10.14-19 (management) - 10.0.10.23 (RustDesk - planned) - 10.0.10.25 (Prometheus/Grafana - planned) - 10.0.10.26 (production services) - 10.0.10.28 (was ESPHome - now runs as HA add-on, IP available) - 10.0.10.31-39 (IoT devices) - 10.0.10.41-49 (utility services) ### To Be Reclaimed (after cleanup): - 10.0.10.71 (Spoolman - to be removed) - 10.0.10.112 (Authelia - to be removed) --- ## Notes - All critical infrastructure is operational (verified 2025-12-29) - WireGuard tunnel stable and functional - Public domains working (except Home Assistant HTTPS backend) - PostgreSQL shared database serving multiple services - Authentik SSO integrated with Proxmox cluster - Automated backups operational (Tier 1 local/NFS) **Next High-Value Tasks**: 1. ✅ ~~Fix Home Assistant public domain~~ - COMPLETED 2. ✅ ~~Discover/Document Prometheus + Grafana~~ - COMPLETED 3. ✅ ~~Discover/Document RustDesk~~ - COMPLETED 4. Configure Prometheus monitoring targets and Grafana dashboards 5. Cleanup deprecated VMs (Spoolman, Authelia) --- **Last Updated**: 2025-12-29 **Updated By**: Fred (with Claude Code)