4.1 KiB
Prometheus Alert Investigation - Feb 3, 2026
🔍 Investigation Summary
Time: 1:58 PM CST
Investigator: OpenClaw (Funky)
Scope: 4 hosts showing as DOWN in Prometheus
📊 Findings
1. pve-router (10.0.10.2) - Proxmox Host
Status: ⚠️ Host UP, Monitoring DOWN
Issue: node_exporter not responding on port 9100
✅ ICMP ping: Responding (0.4ms latency)
❌ node_exporter (port 9100): Timeout after 2 seconds
Diagnosis:
- Host is online and reachable
- node_exporter service likely stopped or not installed
- This is your office Proxmox host (i5)
Action Required:
ssh root@10.0.10.2
systemctl status prometheus-node-exporter
systemctl start prometheus-node-exporter
systemctl enable prometheus-node-exporter
2. vps-gaming (51.222.12.162) - OVH Gaming VPS
Status: ⚠️ Host UP, Monitoring DOWN
Issue: node_exporter not responding on port 9100
✅ ICMP ping: Responding (23.8ms latency - normal for OVH Canada)
❌ node_exporter (port 9100): Timeout after 2 seconds
Diagnosis:
- Host is online (WireGuard VPN likely working)
- node_exporter either not installed or firewall blocking port 9100
- Provider: OVH (deadeyeg4ming.vip)
Action Required:
ssh root@51.222.12.162
# Check if installed
systemctl status prometheus-node-exporter
# If not installed
apt update && apt install prometheus-node-exporter -y
# Check firewall
ufw status
ufw allow 9100/tcp # If using UFW
3. OpenClaw Gateway (10.0.10.41) - CT 130
Status: 🔴 Host DOWN / Missing node_exporter Issue: Container reachable but node_exporter not installed
Note: This is the OpenClaw container (me!) - node_exporter should be installed for self-monitoring.
Action Required:
ssh root@10.0.10.41
apt update && apt install prometheus-node-exporter -y
systemctl enable --now prometheus-node-exporter
4. Available Container (10.0.10.42) - CT 131
Status: 🟢 Available for use Issue: Container available but not yet deployed
Note: This container is available for future use.
🎯 Priority Action Items
Critical (Affects Real Monitoring)
- Fix pve-router node_exporter - This is a production Proxmox host
- Fix vps-gaming node_exporter - This is your WireGuard VPN endpoint
Low Priority (Game Servers)
- Decide on minecraft-forge - Start if needed, or remove from Prometheus config
- Decide on minecraft-stoneblock - Start if needed, or remove from Prometheus config
🔧 Quick Fix Commands
For pve-router (10.0.10.2)
ssh root@10.0.10.2 "apt update && apt install prometheus-node-exporter -y && systemctl enable --now prometheus-node-exporter"
For vps-gaming (51.222.12.162)
ssh root@51.222.12.162 "apt update && apt install prometheus-node-exporter -y && systemctl enable --now prometheus-node-exporter && ufw allow 9100/tcp"
Clean Up Prometheus Config (Remove Game Servers)
If you don't want to monitor stopped game servers:
ssh root@10.0.10.25
nano /etc/prometheus/prometheus.yml
# Comment out or remove the minecraft targets (10.0.10.41, 10.0.10.42)
systemctl reload prometheus
📈 Expected Outcome
After fixes:
- ✅ 2/4 hosts back online (pve-router, vps-gaming)
- ✅ Only real infrastructure monitored
- ✅ No false positive alerts
- ✅ Inbox stays clean
Time to fix: ~5 minutes total
🚨 Current Alert Status
These hosts are NOT firing critical Discord alerts yet because:
- They're in "pending" state (less than 2 minutes down)
- Our threshold is 2+ minutes before triggering
If you don't fix them, you'll get Discord alerts in:
- ~1-2 minutes from now (they've been down for a while already)
Notes
- pve-router and vps-gaming are real issues - these should be monitored
- Minecraft servers are probably intentional - you don't run them 24/7
- Consider removing game servers from Prometheus if you don't want to track them
Let me know if you want me to:
- Fix the node_exporters remotely (if I have SSH access)
- Remove game servers from Prometheus config
- Both!