I use some batch scripts in my proxmox installation. They are in cron.hourly and daily checking for virus and ram/CPU load of my LXC containers. An email is send on condition.
What are your tipps or solution without unnecessary load on disc io or CPU time. Lets keep it simple.
Edit: a lot of great input about possible solutions. In addition TIL “that keep it simple” means a lot of different things to people.😉
I’m in a similar boat except I just do everything on standard Docker containers but so do use Telegraf, Influx, and Grafana for everything. I’ve gone mostly to Discord notifications on any alerts. If I run into any problem scenarios, I figure out how to monitor it and add it via Telegraf and add an alert. I’m still just using Grafana alerts but it works fine for my home lab.
Even better if I can automate fixes to those problems. One of the best things I did was monitoring all of my network devices and all major hops. If I have internet or network issues, I know exactly where the problem is without having to troubleshoot. Lots of dpinger and shell scripts to input data to Telegraf.