Initial infrastructure documentation - comprehensive homelab reference
This commit is contained in:
495
infrastructure/BRAINSTORM.md
Normal file
495
infrastructure/BRAINSTORM.md
Normal file
@@ -0,0 +1,495 @@
|
||||
# Infrastructure Brainstorming Session
|
||||
|
||||
**Date**: 2025-10-28
|
||||
**Status**: Planning Phase
|
||||
|
||||
---
|
||||
|
||||
## Initial Claude Code Discovery
|
||||
|
||||
I watched this video, https://youtu.be/MsQACpcuTkU?si=2h5VUlgtIcpLbP1v literally took his word as fact and subscribed to Claude Pro. I need to set this up on my 2013 Mac Pro Running Sequoia (using Open Core Legacy Patcher) please outline the steps and process for making this happen
|
||||
|
||||
### Claude Code Interest - The /init Command
|
||||
|
||||
Specifically Claude Code, as a clarification, I am most intrigued by the use of the /init command
|
||||
|
||||
**Setup Requirements:**
|
||||
- Homebrew installation
|
||||
- Claude Code CLI tool
|
||||
- API authentication (separate from Claude Pro subscription)
|
||||
- Note: Claude Pro ≠ API access (separate billing)
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure Expansion Plans
|
||||
|
||||
### Current Environment
|
||||
|
||||
**VPS:**
|
||||
- 2 cores / 4GB RAM
|
||||
- Running: Pangolin reverse proxy with Gerbil tunnels (WireGuard-based)
|
||||
- Concern: RAM and CPU usage limits
|
||||
|
||||
**Home Lab (Proxmox):**
|
||||
- **DL380p**: 32 cores, 96GB RAM (main cluster node)
|
||||
- **i5**: 8 cores, 8GB RAM (secondary cluster node)
|
||||
- **OMV**: 12TB storage node
|
||||
|
||||
**Development Machine:**
|
||||
- Mac Pro 2013 running Sequoia (via Open Core Legacy Patcher)
|
||||
|
||||
### Proposed New Services
|
||||
|
||||
1. **RustDesk Server** - Self-hosted remote desktop
|
||||
2. **n8n** - Workflow automation platform
|
||||
3. **Authentik** - Single Sign-On (SSO) platform
|
||||
4. **Obsidian Livesync** - Self-hosted note synchronization
|
||||
|
||||
---
|
||||
|
||||
## Architecture Decision: Hybrid Approach
|
||||
|
||||
### VPS (Lightweight Services Only)
|
||||
- Pangolin reverse proxy (existing)
|
||||
- Gerbil tunnels (existing, WireGuard-based)
|
||||
- RustDesk relay server (hbbr) - ~30-50MB RAM for NAT traversal only
|
||||
|
||||
**Reasoning**: Keep VPS lightweight to avoid resource constraints
|
||||
|
||||
### DL380p Proxmox (Heavy Lifting)
|
||||
- PostgreSQL (shared database server)
|
||||
- Authentik SSO with WebAuthn support
|
||||
- n8n workflow automation
|
||||
- RustDesk ID server (hbbs) - handles registration and signaling
|
||||
- Prometheus + Grafana monitoring
|
||||
- Obsidian CouchDB sync server
|
||||
|
||||
**Reasoning**: Abundant resources (32 cores, 96GB RAM) available for all services
|
||||
|
||||
---
|
||||
|
||||
## Authentik SSO - Core Requirements
|
||||
|
||||
### WebAuthn/FIDO2 Hardware Authentication
|
||||
|
||||
**Critical Requirement**: Device-specific hardware 2FA
|
||||
|
||||
**Supported Devices:**
|
||||
- iPhone with Face ID (biometric authentication)
|
||||
- Windows 11 laptop with Windows Hello (fingerprint/face/PIN)
|
||||
- No YubiKey required (but supported if needed later)
|
||||
|
||||
**Security Features:**
|
||||
- Phishing-resistant (WebAuthn verifies domain)
|
||||
- Each device has unique cryptographic key
|
||||
- Keys stored in device secure enclave (iPhone) or TPM (Windows)
|
||||
- Can revoke individual devices if lost/stolen
|
||||
- TOTP as backup MFA method
|
||||
|
||||
### Integration Targets
|
||||
|
||||
**Priority 1 (Critical):**
|
||||
- Proxmox VE (OpenID Connect)
|
||||
- n8n (OAuth2)
|
||||
- Pangolin admin dashboard (if supported)
|
||||
|
||||
**Priority 2 (Nice to have):**
|
||||
- Grafana (OAuth2)
|
||||
- HomeAssistant (OAuth2)
|
||||
- Any future services
|
||||
|
||||
**SSO Policies:**
|
||||
- External access (via Pangolin): WebAuthn REQUIRED
|
||||
- Internal network access: WebAuthn preferred, TOTP acceptable
|
||||
- Admin operations: Always require WebAuthn
|
||||
|
||||
---
|
||||
|
||||
## Network Architecture
|
||||
|
||||
### Flow Diagram
|
||||
```
|
||||
Internet → VPS (Pangolin Reverse Proxy)
|
||||
↓
|
||||
Gerbil Tunnel (WireGuard)
|
||||
↓
|
||||
DL380p Proxmox Home Lab
|
||||
↓
|
||||
Authentik SSO ←→ All Services
|
||||
├─→ n8n
|
||||
├─→ RustDesk (hbbs)
|
||||
├─→ Grafana
|
||||
├─→ Proxmox Web UI
|
||||
└─→ HomeAssistant (future)
|
||||
```
|
||||
|
||||
### Service Endpoints
|
||||
- `auth.yourdomain.com` → Authentik SSO
|
||||
- `n8n.yourdomain.com` → n8n workflows
|
||||
- `grafana.yourdomain.com` → Monitoring dashboards
|
||||
- `obsidian.yourdomain.com` → Note sync (CouchDB)
|
||||
|
||||
---
|
||||
|
||||
## Implementation Strategy: 8 Phases
|
||||
|
||||
### Phase 1: Planning & Preparation
|
||||
- Document current infrastructure
|
||||
- Make architecture decisions (LXC vs Docker, shared vs separate PostgreSQL)
|
||||
- Create project structure with Claude Code
|
||||
- Plan network layout and port assignments
|
||||
|
||||
### Phase 2: Infrastructure Foundation on Proxmox
|
||||
- Deploy PostgreSQL 15 (shared database server)
|
||||
- Network and port planning
|
||||
- Reserve static IPs for all services
|
||||
|
||||
### Phase 3: Deploy Core Services on Proxmox
|
||||
- Authentik SSO with WebAuthn/FIDO2 support
|
||||
- n8n workflow automation
|
||||
- RustDesk ID server (hbbs)
|
||||
|
||||
### Phase 4: VPS Configuration
|
||||
- RustDesk relay server (hbbr) - lightweight
|
||||
- Update Pangolin reverse proxy routes
|
||||
- DNS record creation
|
||||
- SSL certificate management
|
||||
|
||||
### Phase 5: SSO Integration & WebAuthn Enrollment
|
||||
- Configure Authentik OAuth2/OIDC providers
|
||||
- Integrate Proxmox with OpenID Connect
|
||||
- Integrate n8n with OAuth2
|
||||
- Enroll all personal devices (iPhone, Windows laptop)
|
||||
- Set up TOTP backup
|
||||
|
||||
### Phase 6: Monitoring, Security & Hardening
|
||||
- Deploy Prometheus + Grafana monitoring stack
|
||||
- Security hardening (firewall rules, Fail2ban, SSL)
|
||||
- WebAuthn policies and device management
|
||||
- Configure alerts
|
||||
|
||||
### Phase 7: Backup, Documentation & Testing
|
||||
- Comprehensive backup solution to OMV (NFS)
|
||||
- Complete infrastructure documentation
|
||||
- Testing and validation procedures
|
||||
- Disaster recovery drills
|
||||
|
||||
### Phase 8: Future Integrations
|
||||
- HomeAssistant integration with Authentik
|
||||
- Obsidian Livesync deployment
|
||||
- Additional services as needed
|
||||
|
||||
---
|
||||
|
||||
## Resource Allocation Plan
|
||||
|
||||
### Proxmox DL380p Services
|
||||
|
||||
| Service | Cores | RAM | Storage | Purpose |
|
||||
|---------|-------|-----|---------|---------|
|
||||
| PostgreSQL | 2 | 4GB | 20GB | Shared database for all services |
|
||||
| Authentik | 2 | 3GB | 30GB | SSO platform with WebAuthn |
|
||||
| n8n | 4 | 4GB | 40GB | Workflow automation |
|
||||
| RustDesk (hbbs) | 2 | 2GB | 10GB | Remote desktop ID server |
|
||||
| Monitoring | 2 | 4GB | 50GB | Prometheus + Grafana |
|
||||
| Obsidian Sync | 2 | 2GB | 50GB | CouchDB for note synchronization |
|
||||
| **Total** | **14** | **19GB** | **200GB** | |
|
||||
| **Available** | **18/32** | **77GB/96GB** | - | Still plenty of headroom! |
|
||||
|
||||
### VPS Resource Usage
|
||||
|
||||
| Service | Cores | RAM | Purpose |
|
||||
|---------|-------|-----|---------|
|
||||
| Pangolin | ~1 | ~2GB | Reverse proxy |
|
||||
| Gerbil | ~0.5 | ~256MB | WireGuard tunnels |
|
||||
| RustDesk (hbbr) | ~0.5 | ~128MB | NAT traversal relay |
|
||||
| **Total** | **~2** | **~2.4GB** | |
|
||||
| **Limit** | **2** | **4GB** | Within safe limits ✅ |
|
||||
|
||||
---
|
||||
|
||||
## Obsidian Implementation Details
|
||||
|
||||
### Why Obsidian for Infrastructure Documentation?
|
||||
- Native markdown checkbox support
|
||||
- Real-time sync across all devices (Mac, Windows, iPhone)
|
||||
- Self-hosted sync (no subscription needed)
|
||||
- Can store infrastructure checklist, notes, diagrams
|
||||
- Works offline
|
||||
- End-to-end encrypted
|
||||
|
||||
### Obsidian Livesync Architecture
|
||||
- CouchDB server on Proxmox (backend)
|
||||
- Obsidian apps on all devices (clients)
|
||||
- Self-hosted sync via Pangolin reverse proxy
|
||||
- Database: `obsidian-vault`
|
||||
- Backup to OMV storage
|
||||
|
||||
### Device Setup
|
||||
1. Mac Pro: Primary documentation device
|
||||
2. Windows 11 Laptop: Access from work/travel
|
||||
3. iPhone: Mobile access to infrastructure notes and checklists
|
||||
|
||||
### Integration with Infrastructure Project
|
||||
- Implementation checklist (190+ tasks) stored in Obsidian
|
||||
- Real-time updates across devices as tasks are completed
|
||||
- Can attach network diagrams, screenshots, configs
|
||||
- Version history via CouchDB replication
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Authentication Layers
|
||||
1. **Network Level**: Gerbil tunnel encryption (WireGuard)
|
||||
2. **Application Level**: Authentik SSO with WebAuthn
|
||||
3. **Device Level**: Hardware-based authentication (Face ID, Windows Hello)
|
||||
4. **Backup Level**: TOTP authenticator app
|
||||
|
||||
### Firewall Strategy
|
||||
- VPS: Only expose Pangolin ports (80, 443, Gerbil tunnel port)
|
||||
- Proxmox: Internal network only, no direct external access
|
||||
- LXC containers: Isolated, only necessary inter-container communication
|
||||
- Fail2ban on Authentik and VPS SSH
|
||||
|
||||
### Backup Security
|
||||
- Daily backups to OMV (12TB NFS storage)
|
||||
- Weekly and monthly rotation
|
||||
- PostgreSQL dumps (compressed)
|
||||
- Authentik media and config backups
|
||||
- n8n workflow backups (credentials encrypted)
|
||||
- RustDesk encryption keys (CRITICAL)
|
||||
- Grafana dashboards
|
||||
- Off-site backup optional (cloud via rclone)
|
||||
|
||||
### Certificate Management
|
||||
- Let's Encrypt via Pangolin
|
||||
- Automated renewal
|
||||
- HSTS headers enabled
|
||||
- TLS 1.3 enforcement
|
||||
|
||||
---
|
||||
|
||||
## Development Approach: Claude Code Usage
|
||||
|
||||
### Primary Use Cases
|
||||
1. Generate complete deployment scripts for each service
|
||||
2. Create LXC container configurations
|
||||
3. Generate Docker Compose files
|
||||
4. Create backup automation scripts
|
||||
5. Generate comprehensive documentation
|
||||
6. Create testing and validation scripts
|
||||
|
||||
### Example /init Commands
|
||||
|
||||
**PostgreSQL Deployment:**
|
||||
```
|
||||
/init Create PostgreSQL 15 deployment for Proxmox LXC container with:
|
||||
- Debian 12 base
|
||||
- Separate databases for authentik, n8n, rustdesk, grafana
|
||||
- Optimized for 4GB RAM
|
||||
- Backup scripts to NFS mount
|
||||
```
|
||||
|
||||
**Authentik with WebAuthn:**
|
||||
```
|
||||
/init Create Authentik SSO server deployment for Proxmox LXC with WebAuthn/FIDO2 support:
|
||||
- Docker Compose setup
|
||||
- External PostgreSQL connection
|
||||
- WebAuthn enrollment flows
|
||||
- OAuth2/OIDC provider configurations
|
||||
- Integration templates for Proxmox, n8n, Grafana
|
||||
```
|
||||
|
||||
**Complete Infrastructure:**
|
||||
```
|
||||
/init Create comprehensive project structure for self-hosted infrastructure:
|
||||
- Folder organization for all services
|
||||
- Deployment phase documentation
|
||||
- Environment templates
|
||||
- Backup automation
|
||||
- Monitoring dashboards
|
||||
- Security hardening checklists
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Timeline Estimate
|
||||
|
||||
### Week 1: Foundation (Phases 1-3)
|
||||
- Day 1-2: Planning and documentation
|
||||
- Day 3-4: PostgreSQL and network setup
|
||||
- Day 5-7: Deploy Authentik, n8n, RustDesk on Proxmox
|
||||
|
||||
### Week 2: Integration (Phases 4-5)
|
||||
- Day 1-2: VPS services and Pangolin configuration
|
||||
- Day 3-5: SSO integration and WebAuthn enrollment
|
||||
- Day 6-7: Testing and troubleshooting
|
||||
|
||||
### Week 3: Finalization (Phases 6-7)
|
||||
- Day 1-3: Monitoring, security hardening, backup automation
|
||||
- Day 4-5: Complete documentation
|
||||
- Day 6-7: Comprehensive testing and disaster recovery drill
|
||||
|
||||
### Week 4+: Expansion (Phase 8)
|
||||
- HomeAssistant integration
|
||||
- Obsidian Livesync deployment
|
||||
- Additional services as needed
|
||||
|
||||
**Note**: This is a methodical, careful rollout. No rushing. Test each phase thoroughly before proceeding.
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### Technical Metrics
|
||||
- All services accessible externally via SSO
|
||||
- WebAuthn works on all enrolled devices
|
||||
- No single service exceeding allocated resources
|
||||
- VPS CPU/RAM usage under control (<50% / <3GB)
|
||||
- Backups running successfully (100% success rate)
|
||||
- All monitoring dashboards populated with data
|
||||
- Zero unplanned downtime during deployment
|
||||
|
||||
### User Experience Metrics
|
||||
- Single sign-on across all services
|
||||
- Face ID / Windows Hello authentication works seamlessly
|
||||
- No password fatigue (SSO handles everything)
|
||||
- Mobile access to all services via Authentik
|
||||
- Infrastructure documentation accessible from any device (Obsidian)
|
||||
- Fast response times (<2s for service access)
|
||||
|
||||
### Security Metrics
|
||||
- All external access requires WebAuthn
|
||||
- No default passwords remaining
|
||||
- Fail2ban protecting critical services
|
||||
- SSL certificates valid and auto-renewing
|
||||
- Audit logging enabled in Authentik
|
||||
- Regular backup verification (monthly)
|
||||
|
||||
---
|
||||
|
||||
## Open Questions / Decisions Needed
|
||||
|
||||
### To Decide Before Starting:
|
||||
- [ ] Confirm domain names to use (auth.domain.com, n8n.domain.com, etc.)
|
||||
- [ ] LXC containers vs Docker VMs? (Recommendation: LXC for efficiency)
|
||||
- [ ] Shared PostgreSQL or separate instances? (Recommendation: Shared)
|
||||
- [ ] Separate VLAN for services? (Recommendation: Yes, if possible)
|
||||
- [ ] Let's Encrypt via Pangolin or internal CA? (Recommendation: Let's Encrypt)
|
||||
- [ ] Off-site backup strategy? (Cloud, second location, etc.)
|
||||
|
||||
### To Document During Setup:
|
||||
- [ ] IP addresses assigned to each service
|
||||
- [ ] Database credentials (store securely)
|
||||
- [ ] OAuth Client IDs and secrets
|
||||
- [ ] Authentik admin credentials
|
||||
- [ ] RustDesk encryption keys (CRITICAL!)
|
||||
- [ ] Backup schedule and retention
|
||||
- [ ] Emergency access procedures
|
||||
|
||||
---
|
||||
|
||||
## Lessons Learned / Notes
|
||||
|
||||
### Why Hybrid Architecture?
|
||||
- VPS is resource-constrained (2 cores / 4GB RAM)
|
||||
- DL380p has abundant resources (32 cores / 96GB RAM)
|
||||
- Gerbil tunnels already provide secure connectivity
|
||||
- Minimizes VPS costs while maximizing home lab utilization
|
||||
- Services stay responsive (no resource contention on VPS)
|
||||
|
||||
### Why Authentik over Alternatives?
|
||||
- **vs Keycloak**: Much lighter weight (Keycloak needs 1-2GB+ RAM)
|
||||
- **vs Authelia**: More feature-complete, better app support
|
||||
- Native WebAuthn/FIDO2 support
|
||||
- Modern UI
|
||||
- Active development
|
||||
- Good documentation
|
||||
- Self-hosted (privacy and control)
|
||||
|
||||
### Why LXC Containers?
|
||||
- More efficient than VMs (less overhead)
|
||||
- Native Proxmox integration
|
||||
- Easier backups and snapshots
|
||||
- Better resource utilization
|
||||
- Faster boot times
|
||||
- Still provides isolation
|
||||
|
||||
### Why Shared PostgreSQL?
|
||||
- Single database server to manage
|
||||
- Easier backups (one dump for all databases)
|
||||
- Resource efficiency (connection pooling)
|
||||
- Simpler monitoring
|
||||
- Adequate for home lab scale
|
||||
- Can migrate to separate instances later if needed
|
||||
|
||||
---
|
||||
|
||||
## Reference Links
|
||||
|
||||
### Tools & Services
|
||||
- **Claude Code**: https://docs.claude.com/en/docs/claude-code
|
||||
- **Authentik**: https://goauthentik.io/
|
||||
- **n8n**: https://n8n.io/
|
||||
- **RustDesk**: https://rustdesk.com/
|
||||
- **Obsidian**: https://obsidian.md/
|
||||
- **Prometheus**: https://prometheus.io/
|
||||
- **Grafana**: https://grafana.com/
|
||||
|
||||
### Documentation Created
|
||||
- CLAUDE.md - Repository guidance for Claude Code
|
||||
- RUNBOOK.md - Operational procedures
|
||||
- DISASTER-RECOVERY.md - Recovery procedures
|
||||
- SERVICES.md - Service configuration templates
|
||||
- IMPROVEMENTS.md - Infrastructure recommendations
|
||||
- MONITORING.md - Monitoring setup guide
|
||||
- infrastructure-audit.md - Infrastructure audit checklist
|
||||
- Infrastructure-Implementation-Checklist.md - Complete deployment checklist
|
||||
|
||||
### Automation Scripts
|
||||
- backup-proxmox.sh - VM/container backups
|
||||
- backup-vps.sh - VPS configuration backups
|
||||
- health-check.sh - Service health monitoring
|
||||
- cert-check.sh - SSL certificate expiration
|
||||
- tunnel-monitor.sh - Gerbil tunnel monitoring
|
||||
- resource-report.sh - Weekly resource reports
|
||||
|
||||
---
|
||||
|
||||
## Next Immediate Actions
|
||||
|
||||
1. **Review and finalize architecture decisions**
|
||||
- Confirm domain names
|
||||
- Decide on LXC vs Docker
|
||||
- Plan network/VLAN layout
|
||||
|
||||
2. **Start with Claude Code project structure**
|
||||
```bash
|
||||
cd ~/proxmox-infrastructure
|
||||
claude
|
||||
/init Create comprehensive project structure...
|
||||
```
|
||||
|
||||
3. **Fill out infrastructure audit checklist**
|
||||
- Current VPS details
|
||||
- Proxmox network configuration
|
||||
- Available IP addresses
|
||||
- DNS provider details
|
||||
|
||||
4. **Set up Obsidian for documentation**
|
||||
- Install on Mac Pro
|
||||
- Import implementation checklist
|
||||
- Begin checking off tasks as completed
|
||||
|
||||
5. **Begin Phase 1: Planning & Preparation**
|
||||
- Document current state
|
||||
- Make final decisions
|
||||
- Create project scaffolding
|
||||
|
||||
---
|
||||
|
||||
**Status**: Ready to begin implementation!
|
||||
**Excitement Level**: 🚀🚀🚀
|
||||
|
||||
**Last Updated**: 2025-10-28
|
||||
Reference in New Issue
Block a user