moving to blogspot

October 9, 2008

For the last 2 1/2 years I’ve been working on a series of prototypes that have morphed — thanks to lots of really smart people and the work they brought as well as the work they’ve done since — into evri. Evri’s main goal is to create a ‘data graph of the web’, where you can find the best media for specificĀ  people, places, and things, as well as navigate from one entity to the next via the relationship between the two.

Gee, that sounds like Semantic Web, and Semantic Web has not really shown itself to be Useful. Well at Evri we’ve been focused on the user experience, and while we’ve got a ways to go, we feel that providing the site as well as the tools to access the underlying data store is important.

One of these tools is the Evri content recommendation widget, which looks up all entities in your blog post and shows connections between the entities and recommends related media for those entities. Unfortunately the hosted version of wordpress is very restrictive when it comes to widgets, so I can’t put the Evri widget in my blog.

So, I’m moving my blog — to Waving Not Drowning, where I’ve embedded the widget. I’ll still use this blog, it represents a year and a lot of knowledge — of things that I forget, frequently. I’ll continue taking ‘notes to self’ in the new blog.


Mac launchd and launchctl — the OSX alternative to cron

August 28, 2008

I was revisiting my metrics project, having used the first one as the prototype to refine requirements (nothing works better at getting real requirements out of people than showing them something that doesn’t quite do what they want).

When it came time to test a monitor, I tried to get one running under cron and it didnt actually work for me. I can’t remember if cron has ever worked for me on a mac, but didn’t have the time to figure out why and how. It was time to make the jump to launchd.

Launchd is billed as anĀ  init.d, /etc/rc, xinetd, .profiile, and crontab replacement, i.e. it can launch scripts at system startup, user login, or on a specified interval.

My use case was to do something cron like. This was not entirely straightforward, there is a difference between using StartCalendarInterval (to run things on a specified date, or every minute if no value is specified) and StartInterval (to run things at a specified interval, similar to specifying */5 for every 5 minutesin cron).

programs are loaded into launchd with launchctl, they are specified as plist files with a pretty simple key/value and/or key/dictionary of values XML format. Here is my .plist file for running something every 5 minutes:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
     <dict>
       <key>Label</key>
       <string>com.evri.metrics.deployment.cron</string>
       <key>ProgramArguments</key>
       <array>
          <string>/opt/local/bin/ruby</string>
          <string>/Users/arunjacob/hypertext/metrics_monitor/lib/tasks/deployment_monitoring/deployment_monitor_driver.rb</string>
          <string>deployment_aggregator</string>
       </array>
       <key>StandardErrorPath</key>
       <string>/dev/null</string>
       <key>StandardOutPath</key>
       <string>/dev/null</string>
       <key>StartInterval</key>
       <integer>300</integer>
       <key>RunAtLoad</key>
       <true/>
     </dict>
</plist>

Note that in key value parlance, StartInterval takes an integer which specifies the # of seconds. If I wanted to run something every day at a specified time, I would use StartCalendarInterval, which takes a dictionary element that contains time intervals.

<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
	"http://www.apple.
	com/DTDs/PropertyList-1.0.dtd">
	<plist version="1.0">
	<dict>
		<key>Label</key>
		<string>com.apple.periodic-daily</string>
		<key>ProgramArguments</key>
		<array>
			<string>/usr/sbin/periodic</string>
			<string>daily</string>
		</array>
		<key>LowPriorityIO</key>
		<true/>
		<key>Nice</key>
		<integer>1</integer>
		<key>StartCalendarInterval</key>
		<dict>
			<key>Hour</key>
			<integer>3</integer>
			<key>Minute</key>
			<integer>15</integer>
		</dict>
	</dict>
	</plist>

Note the difference between StartCalendarInterval syntax and StartInterval syntax — StartCalendarInterval takes a dict structure that contains key/value pairs. In other words it takes a hash. You can also use Arrays, as specified in the value for the ProgramArguments key. Just make sure your keys have the correct kind of values. as specified here.


More Rails-tarded ness: named resources

August 21, 2008

I was showing my monitoring app to a co-worker, who wanted to access some of the resources by URLs that contained their names. Hey, that actually makes sense! He wants to refer to resources by their actual names — brilliant. Unfortunately for my lazy ass, this is a departure from the standard rails resource routing conventions, where

map.resources :{controller name}

automagically generates routing like this:

/controller name/:id

I wanted to have both approaches, mainly because I’m lazy and dont want to rework my code that navigates back to these resources by ID. My first attempt at doing this was to put a custom named resource in front of my default map.resources statement:

map.named_monitor_instances ‘monitor_instances/:name’, :controller=>’monitor_instances’, :action=>’show_named_monitors’

this resulted in me getting a ‘missing template for show_named_monitors’ message, which was fine. I didn’t want to render the same view in another erb file.

The best solution I’ve found for having it both ways is by realizing that the default route :id parameter is just a parameter, and can contain a name as well as a number. Other named routes can be quite specific about what they contain, but the default route is pretty forgiving. I modified the controller code to look like this:

begin
@monitor_instance = MonitorInstance.find(params[:id])
rescue

@monitor_instance = MonitorInstance.find_by_name(params[:id])
end

to catch the instance where the find_by_id(‘foo’) fails and try to find foo by name. Graceful? No. Elegant? Not really. I’m sure this level of rails-tardedness will get me flamed by Rails Zealots who think I’ve gone and dicked up a perfectly elegant solution. But is it easy? Hell to the Yeah it is.


The Rails-tarded way to inject related model info into your form

July 13, 2008

I’m 100% sure that there is an elegant, rails-tastic way to do this, but I’m short on time and want to remember this approach, because all things considered, it doesn’t seem that bad.

I’m trying to create a set of objects that can nest. These MonitorGroup objects can contain other MonitorGroup objects, so I can create a hierarchy of things I want to monitor. The model definition for MonitorGroup is:

class MonitorGroup < ActiveRecord::Base


has_many :monitor_instances
has_many :monitor_groups
belongs_to :monitor_group
belongs_to :status

validates_presence_of :name, :status
validates_uniqueness_of :name

end

Note that the ‘nesting’ is indicated in the highlighted belongs_to method.

In order to actually get this to happen, I need to have a way to add child MonitorGroup objects to a parent. I do this in the parent’s edit page as follows:

<%=link_to “Add Child Monitor Group”, {:action=>”new”,:parent_monitor_group_id=>@monitor_group.id} %>

The parent_monitor_group_id gets put into the params of the new page: basically this link_to builds a link like this:

http://localhost:3000/monitor_groups/new?parent_monitor_group_id=10

So if the user is editing the parent monitor group, they have the option to add a child monitor group. This request gets routed to the new method of the MonitorGroupsController, where I unpack the parameter and look up the associated Monitor:

def new


# this is needed so that the form can access model properties.
@monitor_group = MonitorGroup.new
parent_monitor_group_id = params[:parent_monitor_group_id]
if(parent_monitor_group_id != nil)


@monitor_group.monitor_group = MonitorGroup.find(parent_monitor_group_id)
logger.debug(“parent monitor name = #{@monitor_group.monitor_group.name}, id = #{@monitor_group.monitor_group.id}”)

end

respond_to do |format|


format.html # new.html.erb

format.xml { render :xml => @monitor_group }


end


end

So far, so good. I’ve now created the MonitorGroup object that is going to be used to drive the form_for block I use in the new MonitorGroup page, if the parent_monitor_group_id is passed in the params hash.

In order to access this value in the form that I was going to submit, I needed to embed it using fields_for, this process is described quite well here. The main difference between the standard use of fields_for and the way I’m using it is that I want to pass this variable as a hidden field variable, instead of one that requires input.

Long story short: I was able to embed the id of the parent MonitorGroup in the MonitorGroup new page like this:

<% form_for(@monitor_group) do |f|

fields_for(:monitor) do | mon |

if(@monitor_group.monitor_group != nil) %>
<%=
hidden_field “parent_monitor”,:id,{:value=>@monitor_group.monitor_group.id} %>

<% end

end
%>

A couple of things to note: the hidden field gets translated to the following html:

<input id=”parent_monitor_id” name=”parent_monitor[id]” type=”hidden” value=”9″ />,

which in turn means that the params passed to the create method of the MonitorsGroupController look like this:

Parameters: {“commit”=>”Create”, “authenticity_token”=>”xxx”, “action”=>”create”, “controller”=>”monitor_groups”, “monitor_group”=>{“name”=>”child_mon”, “description”=>”child monitor”}, “parent_monitor”=>{“id”=>”9”}}

and I access the monitor ID in the create method as follows:

parent_id = params[:parent_monitor][:id]
if(id != nil)

@monitor_group.monitor_group = MonitorGroup.find(parent_id)

end

This seems kind of klugey, but it works, and I’ve got a deadline. Any Rails Gods lurking out there, please show me the elegant concise way of doing this?!?


Metrics Fast ‘n Easy, Part II: actually using RRDTool from Ruby

June 14, 2008

Continued from part I:

Now that the RRD bundle is installed in Ruby’s default load path, I require RRD and access the convenience methods. The methods basically pass all parameters in as strings, which is fine, but I don’t like thinking of time and values as strings if I can avoid it. So I wrote a wrapper class that allows me to pass in values as typed options, and then casts them to the internal strings.

A couple of notes about creating, updating, and rendering RRD graphs using the built in Ruby binding.

Creating an RRD graph

At create time, the first parameter is the name of the file, minus the .rrb extension (create will puke if you specify the extension). The start time is expressed in seconds, my code below passed it in as a Time object and converts it to seconds. The step time is the minimal amount of time an update can occur at — in other words, if your step is 50 seconds and you try to update at 10 seconds, you get an error.

The DS option defines a dataset as follows:

DS:[name]:[graph type]:[min time to show an error condition]:[min value or unknown]:[max value or unknown]. More explanation of the suitable graph types is found here.

RRD.create(
name,
"--start", "#{@start.to_i}",
"--step", "#{@step}",
"DS:#{@dataset}:#{@type}:#{@heartbeat}:#{@min}:#{@max}",
"RRA:#{@collapse_method}:#{@xff}:#{@collapse_steps}:#{@collapse_rows}")

In the example above, I only create a single dataset. You can create 1..N, although I’m sure N has an upper limit, I haven’t found it specified anywhere. Also, I believe the data set is restricted to < 19 characters in length. The RRA section syntax is as follows:
RRA:AVERAGE | MIN | MAX | LAST:xff:steps:rows
where the collapsing is done by averaging values, or min/maxing values, the xff value specified limits unknown values from being collapsed by establishing a max ratio of unknown values to known values. The steps value specifies the number of datapoints collapsed, and the rows value specifies how many collapsed datapoints to keep. So RRA is where you really get a chance to limit the size of the RRD file.

Updating an RRD Graph

Once the graph has been created, it exists as the file you specified using the name parameter above. You update it with time:value statements: in the code below, I’m updating an array of time:value statements:
# simple update of multiple values
def update(times, values)


for i in 0..times.length-1

RRD.update(@name,”#{times[i].to_i}:#{values[i]}”)
end
end

In the code above, as for all RRD operations, you specify the name of the RRD file you want to operate on in the first parameter.

Note that you cannot update a graph with a time less than it’s start time or a time that is less then the last time + the step time specified at creation.

Displaying an RRD Graph:

Graph display is the most complex operation with RRD. I’m not going to go into all of the details: some really good examples are found here.

I’ve taken the simplest approach to displaying a graph:

RRD.graph(
renderedFile,
"--title", title,
"--start", start.to_i.to_s,
"--end", finish.to_i.to_s,
"--interlace",
"--imgformat", "PNG",
"--width=#{width}",
"DEF:a=#{@name}:#{@dataset}:AVERAGE",
"LINE1:a#0022e9:#{@dataset}")

Unlike the update method, the name of the actual desired graph is the first parameter, not the name of the RRD file. The RRD file to load is specified in the DEF line. You can specify multiple DEF values to display dataset from different RRD graphs. You will need to specify the way you want each dataset rendered: in the above example, I define a value a with the DEF statement that I reference in the following LINE statement:

In order to render data, you will need to specify how you want to display it with( as a line, as area under a line, as a tick mark, etc). More details about how to define data sets, including creating datasets via the CDEF statement, are found in the graph data documentation. Details about how to display data are in the rrdgraph method documentation. The format of the DEF, CDEF, LINE statements is RPN, i.

Make sure to specify start and end in a way that shows values as you would like to see them, i.e. make sure your latest value is in the specified start and end range.