10 KiB
Infrastructure TODO List
Created: 2025-12-29 Last Updated: 2025-12-29 Status: Active development tasks
This document tracks all incomplete infrastructure tasks and future improvements.
✅ Completed Items
1. Fix Home Assistant Public Domain Access
Status: ✅ COMPLETED (2025-12-29)
What was done:
- Updated Caddy to use HTTPS backend for Home Assistant
- Added VPS WireGuard IP (10.0.8.1) to Home Assistant's trusted_proxies
- Verified bob.nianticbooks.com is accessible
Result: All 5 public domains now working:
- ✅ freddesk.nianticbooks.com → Proxmox
- ✅ bob.nianticbooks.com → Home Assistant
- ✅ ad5m.nianticbooks.com → 3D Printer
- ✅ auth.nianticbooks.com → Authentik SSO
- ✅ bible.nianticbooks.com → Bible reading plan
2. Deploy RustDesk ID Server
Status: ✅ COMPLETED (2025-12-25)
What was deployed:
- ID Server (hbbs) on main-pve LXC 123 at 10.0.10.23
- Relay Server (hbbr) on VPS at 66.63.182.168:21117
- Generated encryption key pair
- Verified client connectivity
Result: RustDesk fully operational
- ✅ ID Server (hbbs): 10.0.10.23 ports 21115, 21116, 21118
- ✅ Relay Server (hbbr): VPS port 21117
- ✅ Public Key:
sfYuCTMHxrA22kukomb/RAKYyUgr8iaMfm/U4CFLfL0= - ✅ Client Configuration: ID Server
66.63.182.168, Key included - ✅ Version: 1.1.14 (both servers)
Documentation:
- SERVICES.md - Service inventory and health checks
- guides/RUSTDESK-DEPLOYMENT-COMPLETE.md - Complete deployment guide
Medium Priority
3. Deploy Prometheus + Grafana Monitoring
Status: ✅ DISCOVERED - Already deployed (2025-12-29)
Current State:
- Location: 10.0.10.25 (responding to ping)
- Grafana: Port 3000 ✅ Running (redirects to /login)
- Prometheus: Port 9090 ✅ Running
- Deployment Method: TBD (need to investigate)
Remaining Configuration Tasks:
- Document deployment method (Docker Compose, systemd, VM/Container type)
- Configure PostgreSQL database on 10.0.10.20 for Grafana (if not already done)
- Set up Authentik SSO for Grafana
- Configure Prometheus monitoring targets:
- Proxmox nodes (via node_exporter)
- VPS (WireGuard tunnel metrics)
- PostgreSQL
- Home Assistant
- Other services
- Import Grafana dashboards:
- Proxmox overview
- PostgreSQL metrics
- Network metrics
- Set up alerting (email/Slack)
- Optionally add Caddy public route
Priority: Low-Medium (services running, configuration needed)
Note: This was discovered during the infrastructure audit. The basic services are operational, but monitoring targets and dashboards need configuration.
Low Priority (Cleanup)
4. Remove Deprecated VMs
Objective: Reclaim resources from unused services
Status: ⏸️ Deferred - Non-critical
4.1 Remove Spoolman VM
Current State:
- IP: 10.0.10.71 (allocated but not in use)
- Reason: Bambu printer incompatible, service no longer needed
Steps:
- Verify no dependencies:
pct/qm status <VMID> - Backup if needed:
vzdump <VMID> --storage backup - Stop VM/container:
pct stop <VMID>orqm stop <VMID> - Delete:
pct destroy <VMID>orqm destroy <VMID> - Remove Pangolin route (if exists)
- Update IP-ALLOCATION.md to mark 10.0.10.71 as available
- Update documentation
Priority: Low
Estimated Time: 15 minutes
4.2 Remove Authelia VM
Current State:
- IP: 10.0.10.112 (allocated but not in use)
- Reason: Replaced by Authentik SSO
Steps:
- Verify Authentik is working for all services
- Backup Authelia config for reference (if needed)
- Stop VM/container:
pct stop <VMID>orqm stop <VMID> - Delete:
pct destroy <VMID>orqm destroy <VMID> - Update IP-ALLOCATION.md to mark 10.0.10.112 as available (or remove from list)
- Update documentation
Priority: Low
Estimated Time: 15 minutes
Future Enhancements
5. n8n + Claude Code Advanced Features
Objective: Enhance n8n and Claude Code integration
Status: ✅ Basic integration working, advanced features optional
Remaining Optional Tasks (from MIGRATION-CHECKLIST.md 6.4):
- Session management workflow (UUID generation, multi-turn conversations)
- Slack integration (Slack → n8n → Claude Code → Slack)
- Tool deployment with
--dangerously-skip-permissionsflag - Error handling (network disconnect, invalid commands)
- Resource monitoring during heavy Claude operations
- Production hardening:
- SSH timeout configuration
- Output length limits
- Logging for Claude executions
- Error notifications
- Optional Caddy route for public n8n access (with Authentik SSO)
Reference:
- MIGRATION-CHECKLIST.md section 6.4
- N8N-CLAUDE-STATUS.md
Priority: Low (nice-to-have, basic functionality working)
Estimated Time: 2-4 hours for each feature
6. Home Assistant Enhancements
6.1 Configure Local HTTPS Certificates
Objective: Use local CA certificates for internal HTTPS access
Status: ⏸️ Deferred (CA setup complete, deployment pending)
Details:
- CA already set up (HTTPS-SETUP-STATUS.md from 2025-12-06)
- Certificates generated for services
- Need to deploy certificates to Home Assistant and other services
Steps (from HTTPS-SETUP-STATUS.md):
- Copy certificates to Home Assistant:
scp ~/certs/bob.crt ~/certs/bob.key root@10.0.10.24:/config/ssl/ - Update Home Assistant configuration:
http: ssl_certificate: /config/ssl/bob.crt ssl_key: /config/ssl/bob.key server_port: 8123 - Restart Home Assistant
- Trust CA on client devices
Note: Current setup uses local CA certificate. Public domain uses Caddy with Let's Encrypt.
Priority: Low (HTTPS already working with local CA cert)
Estimated Time: 30 minutes
6.2 Integrate More Services with Authentik SSO
Objective: Single sign-on for additional services
Status: 📋 Planned
Completed:
- ✅ Proxmox (all 3 hosts)
- ✅ Grafana (OAuth2 configured)
Not Possible:
- ❌ n8n (requires Enterprise license for OIDC/SSO)
Pending:
- Home Assistant (complex - requires proxy provider or LDAP)
- Other services as they're deployed
Priority: Low (manual login acceptable for now)
Estimated Time: 1-2 hours per service
7. Backup Strategy Completion
Objective: Implement full 3-tier backup system
Status: ✅ Tier 1 complete, Tier 2-3 planned
Current State (from CLAUDE.md):
- ✅ Tier 1 (Local/OMV NFS): Fully operational
- PostgreSQL backups: Daily 2:00 AM
- Proxmox VM/container backups: Daily 2:30 AM
- Retention: 7 days daily, 4 weeks weekly, 3 months monthly
Remaining Tiers:
- Tier 2: Off-site external drives (manual rotation)
- Tier 3: Backblaze B2 cloud storage (automated)
Reference:
- guides/HOMELAB-BACKUP-STRATEGY.md
- guides/BACKUP-QUICK-START.md
Priority: Medium (Tier 1 provides good protection, Tier 2-3 for disaster recovery)
Estimated Time: 2-4 hours for Tier 3 cloud setup
8. Monitoring & Alerting
Objective: Proactive monitoring of infrastructure health
Status: 📋 Planned (prerequisite: Prometheus + Grafana deployment)
Components:
- Service uptime monitoring
- Resource utilization (CPU, RAM, disk)
- Network connectivity (WireGuard tunnel status)
- Backup success/failure alerts
- Certificate expiration warnings
- Disk space alerts (OMV storage)
Alerting Methods:
- Slack/Discord webhook
- Home Assistant notifications
Priority: Medium (blocked by Prometheus deployment)
Estimated Time: 2-3 hours (after Prometheus is deployed)
9. Cleanup and Archive Old Documentation
Objective: Remove or archive outdated status documents
Status: 📋 Pending
Files to Archive or Update:
-
wireguard-setup-progress.md
- Status: Outdated (from November 2025)
- Contains old troubleshooting info that's no longer relevant
- WireGuard now operational (verified 2025-12-29)
- Action: Archive to
docs/archive/or delete
-
HTTPS-SETUP-STATUS.md
- Status: Partially outdated (from December 6, 2025)
- CA setup complete, but local cert deployment not done
- Services using Caddy with Let's Encrypt for public access
- Action: Archive or update with current HTTPS status
-
N8N-CLAUDE-STATUS.md
- Status: Partially outdated
- Basic integration complete
- Many "TODO" items that are now optional
- Action: Archive or consolidate into SERVICES.md
Priority: Low
Estimated Time: 30 minutes
Documentation Maintenance
10. Keep Documentation Updated
Objective: Maintain accurate infrastructure documentation
Regular Tasks:
- Update SERVICES.md when services change
- Update IP-ALLOCATION.md for new devices
- Update MIGRATION-CHECKLIST.md for completed phases
- Update INFRASTRUCTURE-TODO.md (this file) as tasks are completed
- Update CLAUDE.md when architecture changes
Frequency: As changes occur
Priority: Ongoing
Quick Reference: IP Addresses Still Available
Reserved but Unused (Available for new services):
- 10.0.10.6-9 (infrastructure expansion)
- 10.0.10.11-12, 10.0.10.14-19 (management)
- 10.0.10.23 (RustDesk - planned)
- 10.0.10.25 (Prometheus/Grafana - planned)
- 10.0.10.26 (production services)
- 10.0.10.28 (was ESPHome - now runs as HA add-on, IP available)
- 10.0.10.31-39 (IoT devices)
- 10.0.10.41-49 (utility services)
To Be Reclaimed (after cleanup):
- 10.0.10.71 (Spoolman - to be removed)
- 10.0.10.112 (Authelia - to be removed)
Notes
- All critical infrastructure is operational (verified 2025-12-29)
- WireGuard tunnel stable and functional
- Public domains working (except Home Assistant HTTPS backend)
- PostgreSQL shared database serving multiple services
- Authentik SSO integrated with Proxmox cluster
- Automated backups operational (Tier 1 local/NFS)
Next High-Value Tasks:
- ✅
Fix Home Assistant public domain- COMPLETED - ✅
Discover/Document Prometheus + Grafana- COMPLETED - ✅
Discover/Document RustDesk- COMPLETED - Configure Prometheus monitoring targets and Grafana dashboards
- Cleanup deprecated VMs (Spoolman, Authelia)
Last Updated: 2025-12-29 Updated By: Fred (with Claude Code)