Few weeks ago Munin was flooding me with message alerts about IO latency, read or write, going too high. Ironically it turned out Munin was the culprit. And the solution was to use rrdcached.
Lets see a Munin graph that shows pretty well the difference using rrdcached made:
Around week 21, when I deployed this optimization, writes dropped quite a lot and reads stabilized.
Now follows the step-by-step guide to use rrdcached (with Gentoo).
Configure rrdcached by editing /etc/conf.d/rrdcached:
/etc/conf.d/rrdcached RRCACHE_ARGS="-s munin -l unix:/run/rrdcached.sock -j /var/lib/rrdcached/journal/ -F -b /var/lib/munin/ -B -w 1800 -z 900"
The options changed or added from the defaults are:
- Change group to munin (-s munin)
- Use /run instead of /var/run
- Change database folder to /var/lib/munin
- Set timeout to 30 minutes (-w 1800) and delay to 15 minutes (-z 900)
Note that the USER, GROUP, MODE and MAXWAIT variables present in the configuration file are not used, actually. It happens the Gentoo ebuild needs some love, see bug https://bugs.gentoo.org/show_bug.cgi?id=450674
Now start the service:
# rc-update add rrdcached default # /etc/init.d/rrdcached start
Tell Munin to use rrdcached
- Change /var/run/rrdcached.sock to /run/rrdcached.sock
# /etc/init.d/munin-node restart # /etc/init.d/spawn-fcgi.munin-cgi-html restart # /etc/init.d/spawn-fcgi.munin-cgi-graph restart
This could be done better
It would be best, for security, to run the rrdcached service as its own user. See again bug https://bugs.gentoo.org/show_bug.cgi?id=450674
There is one little problem when using rrdcached, see this graph:
It looks a little weird at the end doesn’t it? Check the Current column, there are some -nan values. Apparently when using rrdcached Munin fails to get the latest values.
Guess I should check for alternative solutions to Munin…