Munin, how to reduce IO with rrdcached

Few weeks ago Munin was flooding me with message alerts about IO latency, read or write, going too high. Ironically it turned out Munin was the culprit. And the solution was to use rrdcached.

Lets see a Munin graph that shows pretty well the difference using rrdcached made:


Around week 21, when I deployed this optimization, writes dropped quite a lot and reads stabilized.

Now follows the step-by-step guide to use rrdcached (with Gentoo).

Set-up rrdcached

Configure rrdcached by editing /etc/conf.d/rrdcached:

RRCACHE_ARGS="-s munin -l unix:/run/rrdcached.sock -j /var/lib/rrdcached/journal/ -F -b /var/lib/munin/ -B -w 1800 -z 900"

The options changed or added from the defaults are:

  • Change group to munin (-s munin)
  • Use /run instead of /var/run
  • Change database folder to /var/lib/munin
  • Set timeout to 30 minutes (-w 1800) and delay to 15 minutes (-z 900)

Note that the USER, GROUP, MODE and MAXWAIT variables present in the configuration file are not used, actually. It happens the Gentoo ebuild needs some love, see bug

Now start the service:

# rc-update add rrdcached default
# /etc/init.d/rrdcached start

Tell Munin to use rrdcached

Edit /etc/munin/munin.conf:

  • Change /var/run/rrdcached.sock to /run/rrdcached.sock

And restart:

# /etc/init.d/munin-node restart
# /etc/init.d/spawn-fcgi.munin-cgi-html restart
# /etc/init.d/spawn-fcgi.munin-cgi-graph restart

This could be done better

It would be best, for security, to run the rrdcached service as its own user. See again bug


There is one little problem when using rrdcached, see this graph:


It looks a little weird at the end doesn’t it? Check the Current column, there are some -nan values. Apparently when using rrdcached Munin fails to get the latest values.

Guess I should check for alternative solutions to Munin…








Leave a Reply