480 lines
9.6 KiB
Markdown
480 lines
9.6 KiB
Markdown
# Infrastructure Runbook
|
|
|
|
This runbook provides step-by-step procedures for common operational tasks in your infrastructure.
|
|
|
|
## Table of Contents
|
|
- [Pangolin Reverse Proxy Operations](#pangolin-reverse-proxy-operations)
|
|
- [Gerbil Tunnel Management](#gerbil-tunnel-management)
|
|
- [Proxmox Operations](#proxmox-operations)
|
|
- [SSL/TLS Certificate Management](#ssltls-certificate-management)
|
|
- [Network Troubleshooting](#network-troubleshooting)
|
|
- [Security Procedures](#security-procedures)
|
|
- [Backup Operations](#backup-operations)
|
|
|
|
---
|
|
|
|
## Pangolin Reverse Proxy Operations
|
|
|
|
### Add a New Route
|
|
```bash
|
|
# 1. SSH into VPS
|
|
ssh user@your-vps-ip
|
|
|
|
# 2. Edit Pangolin configuration
|
|
sudo nano /path/to/pangolin/config.yml
|
|
|
|
# 3. Add new route configuration
|
|
# domain.example.com -> backend:port
|
|
|
|
# 4. Test configuration
|
|
sudo pangolin config test
|
|
|
|
# 5. Reload Pangolin
|
|
sudo systemctl reload pangolin
|
|
# OR
|
|
sudo pangolin reload
|
|
|
|
# 6. Verify route is active
|
|
curl -I https://domain.example.com
|
|
```
|
|
|
|
### Remove a Route
|
|
```bash
|
|
# 1. Edit configuration and comment out or remove route
|
|
sudo nano /path/to/pangolin/config.yml
|
|
|
|
# 2. Reload Pangolin
|
|
sudo systemctl reload pangolin
|
|
|
|
# 3. Verify route is removed
|
|
curl -I https://domain.example.com
|
|
```
|
|
|
|
### View Pangolin Logs
|
|
```bash
|
|
# Real-time logs
|
|
sudo tail -f /var/log/pangolin/access.log
|
|
sudo tail -f /var/log/pangolin/error.log
|
|
|
|
# Search for specific domain
|
|
grep "domain.example.com" /var/log/pangolin/access.log
|
|
|
|
# Check last 100 errors
|
|
sudo tail -n 100 /var/log/pangolin/error.log
|
|
```
|
|
|
|
### Restart Pangolin Service
|
|
```bash
|
|
# Check status
|
|
sudo systemctl status pangolin
|
|
|
|
# Restart
|
|
sudo systemctl restart pangolin
|
|
|
|
# Verify it's running
|
|
sudo systemctl is-active pangolin
|
|
```
|
|
|
|
---
|
|
|
|
## Gerbil Tunnel Management
|
|
|
|
### Check Active Tunnels
|
|
```bash
|
|
# On VPS - check listening Gerbil server
|
|
ss -tlnp | grep gerbil
|
|
|
|
# On home lab - check active tunnel connections
|
|
gerbil status
|
|
# OR
|
|
ps aux | grep gerbil
|
|
```
|
|
|
|
### Start a Tunnel
|
|
```bash
|
|
# On home lab machine
|
|
gerbil connect --name tunnel-name \
|
|
--local localhost:PORT \
|
|
--remote VPS_IP:REMOTE_PORT \
|
|
--auth-key /path/to/auth.key
|
|
|
|
# Start as systemd service
|
|
sudo systemctl start gerbil-tunnel-name
|
|
```
|
|
|
|
### Stop a Tunnel
|
|
```bash
|
|
# If running as service
|
|
sudo systemctl stop gerbil-tunnel-name
|
|
|
|
# If running manually
|
|
pkill -f "gerbil.*tunnel-name"
|
|
```
|
|
|
|
### Restart a Tunnel
|
|
```bash
|
|
sudo systemctl restart gerbil-tunnel-name
|
|
|
|
# Verify tunnel is active
|
|
gerbil status tunnel-name
|
|
# OR
|
|
ss -tn | grep REMOTE_PORT
|
|
```
|
|
|
|
### Debug Tunnel Connection Issues
|
|
```bash
|
|
# 1. Check if local service is running
|
|
curl http://localhost:LOCAL_PORT
|
|
|
|
# 2. Check if tunnel process is running
|
|
ps aux | grep gerbil
|
|
|
|
# 3. Check tunnel logs
|
|
journalctl -u gerbil-tunnel-name -n 50
|
|
|
|
# 4. Test VPS endpoint
|
|
# On VPS:
|
|
curl http://localhost:REMOTE_PORT
|
|
|
|
# 5. Check firewall on VPS
|
|
sudo ufw status
|
|
sudo iptables -L -n | grep REMOTE_PORT
|
|
```
|
|
|
|
---
|
|
|
|
## Proxmox Operations
|
|
|
|
### Create a New VM
|
|
```bash
|
|
# Via Proxmox web UI: https://PROXMOX_IP:8006
|
|
|
|
# Via CLI on Proxmox node:
|
|
qm create VMID --name vm-name --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
|
|
|
|
# Attach disk
|
|
qm set VMID --scsi0 local-lvm:32
|
|
|
|
# Set boot order
|
|
qm set VMID --boot order=scsi0
|
|
|
|
# Start VM
|
|
qm start VMID
|
|
```
|
|
|
|
### Create a New Container (LXC)
|
|
```bash
|
|
# Download template
|
|
pveam update
|
|
pveam available
|
|
pveam download local ubuntu-22.04-standard
|
|
|
|
# Create container
|
|
pct create CTID local:vztmpl/ubuntu-22.04-standard.tar.gz \
|
|
--hostname ct-name \
|
|
--memory 1024 \
|
|
--cores 2 \
|
|
--net0 name=eth0,bridge=vmbr0,ip=dhcp
|
|
|
|
# Start container
|
|
pct start CTID
|
|
|
|
# Enter container
|
|
pct enter CTID
|
|
```
|
|
|
|
### Stop/Start VM or Container
|
|
```bash
|
|
# VM operations
|
|
qm stop VMID # Stop
|
|
qm start VMID # Start
|
|
qm shutdown VMID # Graceful shutdown
|
|
qm reboot VMID # Reboot
|
|
qm status VMID # Check status
|
|
|
|
# Container operations
|
|
pct stop CTID
|
|
pct start CTID
|
|
pct shutdown CTID
|
|
pct reboot CTID
|
|
pct status CTID
|
|
```
|
|
|
|
### Migrate VM Between Nodes
|
|
```bash
|
|
# Online migration (VM stays running)
|
|
qm migrate VMID target-node --online
|
|
|
|
# Offline migration
|
|
qm migrate VMID target-node
|
|
|
|
# Check migration status
|
|
qm status VMID
|
|
```
|
|
|
|
### Check Resource Usage
|
|
```bash
|
|
# Overall cluster resources
|
|
pvesh get /cluster/resources
|
|
|
|
# Specific node resources
|
|
pvesh get /nodes/NODE_NAME/status
|
|
|
|
# VM resource usage
|
|
qm status VMID --verbose
|
|
|
|
# Storage usage
|
|
pvesm status
|
|
```
|
|
|
|
### Backup VM or Container
|
|
```bash
|
|
# Backup VM
|
|
vzdump VMID --storage STORAGE_NAME --mode snapshot
|
|
|
|
# Backup container
|
|
vzdump CTID --storage STORAGE_NAME
|
|
|
|
# List backups
|
|
pvesm list STORAGE_NAME
|
|
```
|
|
|
|
### Restore from Backup
|
|
```bash
|
|
# Restore VM
|
|
qmrestore /path/to/backup/vzdump-qemu-VMID.vma.zst VMID
|
|
|
|
# Restore container
|
|
pct restore CTID /path/to/backup/vzdump-lxc-CTID.tar.zst
|
|
```
|
|
|
|
---
|
|
|
|
## SSL/TLS Certificate Management
|
|
|
|
### Request New Let's Encrypt Certificate
|
|
```bash
|
|
# Install certbot if needed
|
|
sudo apt install certbot
|
|
|
|
# Request certificate (HTTP-01 challenge)
|
|
sudo certbot certonly --standalone -d domain.example.com
|
|
|
|
# Request wildcard certificate (DNS-01 challenge)
|
|
sudo certbot certonly --manual --preferred-challenges dns -d "*.example.com"
|
|
|
|
# Certificates are stored in: /etc/letsencrypt/live/domain.example.com/
|
|
```
|
|
|
|
### Renew Certificates
|
|
```bash
|
|
# Dry run to test renewal
|
|
sudo certbot renew --dry-run
|
|
|
|
# Renew all certificates
|
|
sudo certbot renew
|
|
|
|
# Renew specific certificate
|
|
sudo certbot renew --cert-name domain.example.com
|
|
|
|
# Set up auto-renewal (check if already configured)
|
|
sudo systemctl status certbot.timer
|
|
```
|
|
|
|
### Check Certificate Expiration
|
|
```bash
|
|
# Check local certificate
|
|
sudo certbot certificates
|
|
|
|
# Check remote certificate
|
|
echo | openssl s_client -servername domain.example.com -connect domain.example.com:443 2>/dev/null | openssl x509 -noout -dates
|
|
|
|
# Check all certificates expiring in 30 days
|
|
sudo certbot certificates | grep "Expiry Date"
|
|
```
|
|
|
|
### Deploy Certificate to Service
|
|
```bash
|
|
# Copy certificate to service location
|
|
sudo cp /etc/letsencrypt/live/domain.example.com/fullchain.pem /path/to/service/cert.pem
|
|
sudo cp /etc/letsencrypt/live/domain.example.com/privkey.pem /path/to/service/key.pem
|
|
|
|
# Set permissions
|
|
sudo chmod 644 /path/to/service/cert.pem
|
|
sudo chmod 600 /path/to/service/key.pem
|
|
|
|
# Reload service
|
|
sudo systemctl reload service-name
|
|
```
|
|
|
|
---
|
|
|
|
## Network Troubleshooting
|
|
|
|
### Check Network Connectivity
|
|
```bash
|
|
# Ping test
|
|
ping -c 4 8.8.8.8
|
|
|
|
# DNS resolution
|
|
nslookup domain.example.com
|
|
dig domain.example.com
|
|
|
|
# Trace route
|
|
traceroute domain.example.com
|
|
mtr domain.example.com
|
|
```
|
|
|
|
### Check Open Ports
|
|
```bash
|
|
# Check listening ports
|
|
ss -tlnp
|
|
netstat -tlnp
|
|
|
|
# Check if specific port is open
|
|
ss -tlnp | grep :PORT
|
|
nc -zv localhost PORT
|
|
|
|
# Check firewall rules
|
|
sudo ufw status numbered
|
|
sudo iptables -L -n -v
|
|
```
|
|
|
|
### Test Service Availability
|
|
```bash
|
|
# HTTP/HTTPS test
|
|
curl -I https://domain.example.com
|
|
curl -v https://domain.example.com
|
|
|
|
# Test specific port
|
|
nc -zv host PORT
|
|
telnet host PORT
|
|
|
|
# Check service status
|
|
sudo systemctl status service-name
|
|
```
|
|
|
|
### Check Network Interface Status
|
|
```bash
|
|
# List all interfaces
|
|
ip addr show
|
|
ip link show
|
|
|
|
# Check interface statistics
|
|
ip -s link show eth0
|
|
|
|
# Restart interface
|
|
sudo ip link set eth0 down
|
|
sudo ip link set eth0 up
|
|
```
|
|
|
|
---
|
|
|
|
## Security Procedures
|
|
|
|
### Update SSH Key
|
|
```bash
|
|
# Generate new SSH key
|
|
ssh-keygen -t ed25519 -C "description"
|
|
|
|
# Copy to server
|
|
ssh-copy-id -i ~/.ssh/new_key.pub user@server
|
|
|
|
# Test new key
|
|
ssh -i ~/.ssh/new_key user@server
|
|
|
|
# Update SSH config
|
|
nano ~/.ssh/config
|
|
```
|
|
|
|
### Review Failed Login Attempts
|
|
```bash
|
|
# Check auth logs
|
|
sudo grep "Failed password" /var/log/auth.log
|
|
sudo journalctl -u ssh -n 100
|
|
|
|
# Check fail2ban status (if installed)
|
|
sudo fail2ban-client status sshd
|
|
```
|
|
|
|
### Update Firewall Rules
|
|
```bash
|
|
# Add new rule
|
|
sudo ufw allow PORT/tcp
|
|
sudo ufw allow from IP_ADDRESS to any port PORT
|
|
|
|
# Remove rule
|
|
sudo ufw delete allow PORT/tcp
|
|
sudo ufw status numbered
|
|
sudo ufw delete NUMBER
|
|
|
|
# Reload firewall
|
|
sudo ufw reload
|
|
```
|
|
|
|
### Security Updates
|
|
```bash
|
|
# Check for updates
|
|
sudo apt update
|
|
sudo apt list --upgradable
|
|
|
|
# Install security updates only
|
|
sudo apt upgrade -y
|
|
|
|
# Reboot if kernel updated
|
|
sudo needrestart -r a
|
|
```
|
|
|
|
---
|
|
|
|
## Backup Operations
|
|
|
|
### Manual Backup
|
|
```bash
|
|
# Backup specific VM/Container
|
|
vzdump VMID --storage STORAGE_NAME --mode snapshot --compress zstd
|
|
|
|
# Backup configuration files
|
|
tar -czf config-backup-$(date +%Y%m%d).tar.gz /etc/pangolin /etc/gerbil
|
|
|
|
# Backup to remote location
|
|
rsync -avz /path/to/data/ user@backup-server:/path/to/backup/
|
|
```
|
|
|
|
### Verify Backup
|
|
```bash
|
|
# List backup contents
|
|
tar -tzf backup.tar.gz | less
|
|
|
|
# Check backup integrity
|
|
tar -tzf backup.tar.gz > /dev/null && echo "OK" || echo "CORRUPTED"
|
|
|
|
# Check vzdump backup
|
|
cat /path/to/backup/vzdump-qemu-VMID.log
|
|
```
|
|
|
|
### Restore Specific Files
|
|
```bash
|
|
# Extract specific file from backup
|
|
tar -xzf backup.tar.gz path/to/specific/file
|
|
|
|
# Restore from rsync backup
|
|
rsync -avz user@backup-server:/path/to/backup/ /path/to/restore/
|
|
```
|
|
|
|
---
|
|
|
|
## Emergency Contacts
|
|
|
|
- Infrastructure Owner: _______________
|
|
- Network Administrator: _______________
|
|
- VPS Provider Support: _______________
|
|
- DNS Provider Support: _______________
|
|
|
|
## Additional Resources
|
|
|
|
- Pangolin Documentation: _______________
|
|
- Gerbil Documentation: _______________
|
|
- Proxmox Documentation: https://pve.proxmox.com/pve-docs/
|
|
- Internal Wiki: _______________
|