Project

General

Profile

Enhancement #743

Updated by Marc Dequènes about 3 years ago

With exporters gaining TLS support there is no obvious major problem left and we can do some testing. 

 I've started a new playbook and role to experiment and so far it is working well. 

 Some though, in no order: 
 * Zabbix is hard to configure all in the slow UI 
 * certain features are slow to come (#495, native systemd support, LLD web checks…) 
 * no service autodetection anymore, I found very easy to map "features" to inventory groups or variables; it's now easy to manually disable or force-enable if needed 
 * "there a nice feature coming to help split the config":https://github.com/prometheus/prometheus/issues/8543, but in the meanwhile I might be able to use _file_sd_configs_ and avoid passing the inventory directly into the role to work around the problem 
 * I would have preferred if grafana was packaged in Debian but in the end it's very handy to make use of their dashboard libraries and avoid spending hours and hours designing every little graph. Currently only LXD stats have no dashboard (the only one available is for some exporter but LXD can now be polled directly). 

 What we have so far: 
 * node basic and all the hardware goodies, temperature etc seem to be there too 
 * poller stats 
 * Bind 
 * Postfix 
 * Apache 
 * PG 

 I was able to setup several exporters and borrow various alerts from https://awesome-prometheus-alerts.grep.to/ but even if we have more than before in certain areas I'd like to check if we're missing something important (compared to our Zabbix installation): 
 * time sync is checked but NTPd stats are missing; there is an exporter but it is not packaged 
 * no maps, but if that was cute that was also utterly useless 
 * ProFTPd, but I'm not sure it's worth it now 
 * SNMP checks for my internal switches, more out of curiosity 
 * SNMP checks for my printer, but I don't use it very often so it's not critical 
 * OpenLDAP stats, more out of curiosity 
 * MDA, this is important 
 * MySQL, also important 


 What I plan to look at: 
 * make the role generic and split it form our main repo (and use it at OSCI) 
 * generation of alerter contacts and alert methods (Matrix, XMPP, Mail) 
 * blackbox, maybe replace smokeping? add check for certs, DNSSEC etc 
 * grafana base config generation 
 * MySQL exporter 
 * SNMP for my internal switches 
 * could we make certain graphs public? (like pings etc?) 

Back