As noted in other posts, I’m writing a monitoring application. Because this is at best a part time effort that needs to be done quickly, I’ve made some technology choices that emphasize rapid development: the app is a Rails app, and I’m storing statistics with RRD.
I really like RRD, as I’ve mentioned before, it gets me out of the business of drawing graphs and storing data, both of which are hard problems I’d rather not solve. But I was having some issues with it when I would try to store values.
How I thought RRD worked
It seemed pretty simple. I thought I would create an RRD file (wow, lots of parameters, wonder what they mean?), then update it whenever I had a number. And the values would be graphed. And all would be good. But when I did all of the above, I noticed that the values being graphed were not the values I was storing. Hmmm. Time to figure out what some of those parameters mean.
How RRD actually works
Pretty well, actually, because it was designed by some smart people to store numbers that came in at any time, and average those numbers across a create time defined interval. Well, that’s one of the ways RRD works. It can also store counters, store the results of methods applied to raw values, and store the derivative value of the line being graphed. I was using it in the simplest case, to store a value.
What I didn’t realize is that if my values were updated between interval boundaries (known as ‘step values’), they would be averaged across that interval. If the values were updated outside of the specified ‘heartbeat’ value, RRD would store an ‘unknown’ value. A good explanation of how this works is found here ( in an SNMP monitoring solution).
That is actually the way graphing in a loosely coupled environment_should_ work. The reason that I was seeing strange numbers was because my insertions were falling within the same interval boundary. Which may be rational, but doesn’t jibe with then numbers I’m trying to (a) display and (b) alert on.
How I got my app to work with RRD
The key for this app is that it is expecting an average value across a time interval. So in essence I have to make sure that only one data point is inserted per interval. I do this by munging the time of insertion in the RRD graph (I still keep the original insert time for purposes of reporting).
I insert the data point at the end of the interval, so if I have a 5 minute interval and I receive and update at 21:52:34, my actual insertion is at 21:55:00. The next value will be inserted at 22:00:00. If the interval was 1 minute, I would have inserted at 21:53:00, and the next value would be inserted at 21:54:00.
More fun with RRD
I’m sure that more fun awaits. I have not hit the point where round robin averaging kicks in, and my ‘default’ values are based on my current (mis)understanding of RRD. I’ll update this post so I don’t repeat history.