fred/homelab-docs

Fork 0

Files

Funky (OpenClaw) 0682c79580 Initial infrastructure documentation - comprehensive homelab reference

2026-02-23 03:42:22 +00:00

10 KiB

Raw Permalink Blame History

Infrastructure TODO List

Created: 2025-12-29 Last Updated: 2025-12-29 Status: Active development tasks

This document tracks all incomplete infrastructure tasks and future improvements.

✅ Completed Items

1. Fix Home Assistant Public Domain Access

Status: ✅ COMPLETED (2025-12-29)

What was done:

Updated Caddy to use HTTPS backend for Home Assistant
Added VPS WireGuard IP (10.0.8.1) to Home Assistant's trusted_proxies
Verified bob.nianticbooks.com is accessible

Result: All 5 public domains now working:

✅ freddesk.nianticbooks.com → Proxmox
✅ bob.nianticbooks.com → Home Assistant
✅ ad5m.nianticbooks.com → 3D Printer
✅ auth.nianticbooks.com → Authentik SSO
✅ bible.nianticbooks.com → Bible reading plan

2. Deploy RustDesk ID Server

Status: ✅ COMPLETED (2025-12-25)

What was deployed:

ID Server (hbbs) on main-pve LXC 123 at 10.0.10.23
Relay Server (hbbr) on VPS at 66.63.182.168:21117
Generated encryption key pair
Verified client connectivity

Result: RustDesk fully operational

✅ ID Server (hbbs): 10.0.10.23 ports 21115, 21116, 21118
✅ Relay Server (hbbr): VPS port 21117
✅ Public Key: sfYuCTMHxrA22kukomb/RAKYyUgr8iaMfm/U4CFLfL0=
✅ Client Configuration: ID Server 66.63.182.168, Key included
✅ Version: 1.1.14 (both servers)

Documentation:

SERVICES.md - Service inventory and health checks
guides/RUSTDESK-DEPLOYMENT-COMPLETE.md - Complete deployment guide

Medium Priority

3. Deploy Prometheus + Grafana Monitoring

Status: ✅ DISCOVERED - Already deployed (2025-12-29)

Current State:

Location: 10.0.10.25 (responding to ping)
Grafana: Port 3000 ✅ Running (redirects to /login)
Prometheus: Port 9090 ✅ Running
Deployment Method: TBD (need to investigate)

Remaining Configuration Tasks:

Document deployment method (Docker Compose, systemd, VM/Container type)
Configure PostgreSQL database on 10.0.10.20 for Grafana (if not already done)
Set up Authentik SSO for Grafana
Configure Prometheus monitoring targets:
- Proxmox nodes (via node_exporter)
- VPS (WireGuard tunnel metrics)
- PostgreSQL
- Home Assistant
- Other services
Import Grafana dashboards:
- Proxmox overview
- PostgreSQL metrics
- Network metrics
Set up alerting (email/Slack)
Optionally add Caddy public route

Priority: Low-Medium (services running, configuration needed)

Note: This was discovered during the infrastructure audit. The basic services are operational, but monitoring targets and dashboards need configuration.

Low Priority (Cleanup)

4. Remove Deprecated VMs

Objective: Reclaim resources from unused services

Status: ⏸️ Deferred - Non-critical

4.1 Remove Spoolman VM

Current State:

IP: 10.0.10.71 (allocated but not in use)
Reason: Bambu printer incompatible, service no longer needed

Steps:

Verify no dependencies: pct/qm status <VMID>
Backup if needed: vzdump <VMID> --storage backup
Stop VM/container: pct stop <VMID> or qm stop <VMID>
Delete: pct destroy <VMID> or qm destroy <VMID>
Remove Pangolin route (if exists)
Update IP-ALLOCATION.md to mark 10.0.10.71 as available
Update documentation

Priority: Low

Estimated Time: 15 minutes

4.2 Remove Authelia VM

Current State:

IP: 10.0.10.112 (allocated but not in use)
Reason: Replaced by Authentik SSO

Steps:

Verify Authentik is working for all services
Backup Authelia config for reference (if needed)
Stop VM/container: pct stop <VMID> or qm stop <VMID>
Delete: pct destroy <VMID> or qm destroy <VMID>
Update IP-ALLOCATION.md to mark 10.0.10.112 as available (or remove from list)
Update documentation

Priority: Low

Estimated Time: 15 minutes

Future Enhancements

5. n8n + Claude Code Advanced Features

Objective: Enhance n8n and Claude Code integration

Status: ✅ Basic integration working, advanced features optional

Remaining Optional Tasks (from MIGRATION-CHECKLIST.md 6.4):

Session management workflow (UUID generation, multi-turn conversations)
Slack integration (Slack → n8n → Claude Code → Slack)
Tool deployment with --dangerously-skip-permissions flag
Error handling (network disconnect, invalid commands)
Resource monitoring during heavy Claude operations
Production hardening:
- SSH timeout configuration
- Output length limits
- Logging for Claude executions
- Error notifications
- Optional Caddy route for public n8n access (with Authentik SSO)

Reference:

MIGRATION-CHECKLIST.md section 6.4
N8N-CLAUDE-STATUS.md

Priority: Low (nice-to-have, basic functionality working)

Estimated Time: 2-4 hours for each feature

6. Home Assistant Enhancements

6.1 Configure Local HTTPS Certificates

Objective: Use local CA certificates for internal HTTPS access

Status: ⏸️ Deferred (CA setup complete, deployment pending)

Details:

CA already set up (HTTPS-SETUP-STATUS.md from 2025-12-06)
Certificates generated for services
Need to deploy certificates to Home Assistant and other services

Steps (from HTTPS-SETUP-STATUS.md):

Copy certificates to Home Assistant:

scp ~/certs/bob.crt ~/certs/bob.key root@10.0.10.24:/config/ssl/

Update Home Assistant configuration:

http:
  ssl_certificate: /config/ssl/bob.crt
  ssl_key: /config/ssl/bob.key
  server_port: 8123

Restart Home Assistant
Trust CA on client devices

Note: Current setup uses local CA certificate. Public domain uses Caddy with Let's Encrypt.

Priority: Low (HTTPS already working with local CA cert)

Estimated Time: 30 minutes

6.2 Integrate More Services with Authentik SSO

Objective: Single sign-on for additional services

Status: 📋 Planned

Completed:

✅ Proxmox (all 3 hosts)
✅ Grafana (OAuth2 configured)

Not Possible:

❌ n8n (requires Enterprise license for OIDC/SSO)

Pending:

Home Assistant (complex - requires proxy provider or LDAP)
Other services as they're deployed

Priority: Low (manual login acceptable for now)

Estimated Time: 1-2 hours per service

7. Backup Strategy Completion

Objective: Implement full 3-tier backup system

Status: ✅ Tier 1 complete, Tier 2-3 planned

Current State (from CLAUDE.md):

✅ Tier 1 (Local/OMV NFS): Fully operational
- PostgreSQL backups: Daily 2:00 AM
- Proxmox VM/container backups: Daily 2:30 AM
- Retention: 7 days daily, 4 weeks weekly, 3 months monthly

Remaining Tiers:

Tier 2: Off-site external drives (manual rotation)
Tier 3: Backblaze B2 cloud storage (automated)

Reference:

guides/HOMELAB-BACKUP-STRATEGY.md
guides/BACKUP-QUICK-START.md

Priority: Medium (Tier 1 provides good protection, Tier 2-3 for disaster recovery)

Estimated Time: 2-4 hours for Tier 3 cloud setup

8. Monitoring & Alerting

Objective: Proactive monitoring of infrastructure health

Status: 📋 Planned (prerequisite: Prometheus + Grafana deployment)

Components:

Service uptime monitoring
Resource utilization (CPU, RAM, disk)
Network connectivity (WireGuard tunnel status)
Backup success/failure alerts
Certificate expiration warnings
Disk space alerts (OMV storage)

Alerting Methods:

Email
Slack/Discord webhook
Home Assistant notifications

Priority: Medium (blocked by Prometheus deployment)

Estimated Time: 2-3 hours (after Prometheus is deployed)

9. Cleanup and Archive Old Documentation

Objective: Remove or archive outdated status documents

Status: 📋 Pending

Files to Archive or Update:

wireguard-setup-progress.md
- Status: Outdated (from November 2025)
- Contains old troubleshooting info that's no longer relevant
- WireGuard now operational (verified 2025-12-29)
- Action: Archive to docs/archive/ or delete
HTTPS-SETUP-STATUS.md
- Status: Partially outdated (from December 6, 2025)
- CA setup complete, but local cert deployment not done
- Services using Caddy with Let's Encrypt for public access
- Action: Archive or update with current HTTPS status
N8N-CLAUDE-STATUS.md
- Status: Partially outdated
- Basic integration complete
- Many "TODO" items that are now optional
- Action: Archive or consolidate into SERVICES.md

Priority: Low

Estimated Time: 30 minutes

Documentation Maintenance

10. Keep Documentation Updated

Objective: Maintain accurate infrastructure documentation

Regular Tasks:

Update SERVICES.md when services change
Update IP-ALLOCATION.md for new devices
Update MIGRATION-CHECKLIST.md for completed phases
Update INFRASTRUCTURE-TODO.md (this file) as tasks are completed
Update CLAUDE.md when architecture changes

Frequency: As changes occur

Priority: Ongoing

Quick Reference: IP Addresses Still Available

Reserved but Unused (Available for new services):

10.0.10.6-9 (infrastructure expansion)
10.0.10.11-12, 10.0.10.14-19 (management)
10.0.10.23 (RustDesk - planned)
10.0.10.25 (Prometheus/Grafana - planned)
10.0.10.26 (production services)
10.0.10.28 (was ESPHome - now runs as HA add-on, IP available)
10.0.10.31-39 (IoT devices)
10.0.10.41-49 (utility services)

To Be Reclaimed (after cleanup):

10.0.10.71 (Spoolman - to be removed)
10.0.10.112 (Authelia - to be removed)

Notes

All critical infrastructure is operational (verified 2025-12-29)
WireGuard tunnel stable and functional
Public domains working (except Home Assistant HTTPS backend)
PostgreSQL shared database serving multiple services
Authentik SSO integrated with Proxmox cluster
Automated backups operational (Tier 1 local/NFS)

Next High-Value Tasks:

✅ ~~Fix Home Assistant public domain~~ - COMPLETED
✅ ~~Discover/Document Prometheus + Grafana~~ - COMPLETED
✅ ~~Discover/Document RustDesk~~ - COMPLETED
Configure Prometheus monitoring targets and Grafana dashboards
Cleanup deprecated VMs (Spoolman, Authelia)

Last Updated: 2025-12-29 Updated By: Fred (with Claude Code)

10 KiB Raw Permalink Blame History

Infrastructure TODO List

✅ Completed Items

1. Fix Home Assistant Public Domain Access

2. Deploy RustDesk ID Server

Medium Priority

3. Deploy Prometheus + Grafana Monitoring

Low Priority (Cleanup)

4. Remove Deprecated VMs

4.1 Remove Spoolman VM

4.2 Remove Authelia VM

Future Enhancements

5. n8n + Claude Code Advanced Features

6. Home Assistant Enhancements

6.1 Configure Local HTTPS Certificates

6.2 Integrate More Services with Authentik SSO

7. Backup Strategy Completion

8. Monitoring & Alerting

9. Cleanup and Archive Old Documentation

Documentation Maintenance

10. Keep Documentation Updated

Quick Reference: IP Addresses Still Available

Reserved but Unused (Available for new services):

To Be Reclaimed (after cleanup):

Notes

10 KiB

Raw Permalink Blame History