The Rails-tarded way to inject related model info into your form

July 13, 2008

I’m 100% sure that there is an elegant, rails-tastic way to do this, but I’m short on time and want to remember this approach, because all things considered, it doesn’t seem that bad.

I’m trying to create a set of objects that can nest. These MonitorGroup objects can contain other MonitorGroup objects, so I can create a hierarchy of things I want to monitor. The model definition for MonitorGroup is:

class MonitorGroup < ActiveRecord::Base

has_many :monitor_instances
has_many :monitor_groups
belongs_to :monitor_group
belongs_to :status

validates_presence_of :name, :status
validates_uniqueness_of :name


Note that the ‘nesting’ is indicated in the highlighted belongs_to method.

In order to actually get this to happen, I need to have a way to add child MonitorGroup objects to a parent. I do this in the parent’s edit page as follows:

<%=link_to “Add Child Monitor Group”, {:action=>”new”,:parent_monitor_group_id=>} %>

The parent_monitor_group_id gets put into the params of the new page: basically this link_to builds a link like this:


So if the user is editing the parent monitor group, they have the option to add a child monitor group. This request gets routed to the new method of the MonitorGroupsController, where I unpack the parameter and look up the associated Monitor:

def new

# this is needed so that the form can access model properties.
@monitor_group =
parent_monitor_group_id = params[:parent_monitor_group_id]
if(parent_monitor_group_id != nil)

@monitor_group.monitor_group = MonitorGroup.find(parent_monitor_group_id)
logger.debug(“parent monitor name = #{}, id = #{}”)


respond_to do |format|

format.html # new.html.erb

format.xml { render :xml => @monitor_group }



So far, so good. I’ve now created the MonitorGroup object that is going to be used to drive the form_for block I use in the new MonitorGroup page, if the parent_monitor_group_id is passed in the params hash.

In order to access this value in the form that I was going to submit, I needed to embed it using fields_for, this process is described quite well here. The main difference between the standard use of fields_for and the way I’m using it is that I want to pass this variable as a hidden field variable, instead of one that requires input.

Long story short: I was able to embed the id of the parent MonitorGroup in the MonitorGroup new page like this:

<% form_for(@monitor_group) do |f|

fields_for(:monitor) do | mon |

if(@monitor_group.monitor_group != nil) %>
hidden_field “parent_monitor”,:id,{:value=>} %>

<% end


A couple of things to note: the hidden field gets translated to the following html:

<input id=”parent_monitor_id” name=”parent_monitor[id]” type=”hidden” value=”9″ />,

which in turn means that the params passed to the create method of the MonitorsGroupController look like this:

Parameters: {“commit”=>”Create”, “authenticity_token”=>”xxx”, “action”=>”create”, “controller”=>”monitor_groups”, “monitor_group”=>{“name”=>”child_mon”, “description”=>”child monitor”}, “parent_monitor”=>{“id”=>”9”}}

and I access the monitor ID in the create method as follows:

parent_id = params[:parent_monitor][:id]
if(id != nil)

@monitor_group.monitor_group = MonitorGroup.find(parent_id)


This seems kind of klugey, but it works, and I’ve got a deadline. Any Rails Gods lurking out there, please show me the elegant concise way of doing this?!?


Is Search Really Broken?

July 7, 2008

It’s July 2008, and I’m looking around wondering how many people think search is broken. Actually, I’m searching around wondering how many people think search is broken. And, using this method, I’ve been able to deduce the following (from the first page of results, no less):

  1. search is broken because publishers have to put additional metadata/markup on the pages/sites they want found, in addition to sitemaps, no-follow, robots.txt, etc.
  2. search is broken because when I want to find something, I have to look in so many different places.
  3. search is broken because there needs to be a human editorial overlay in order to achieve useful result precision.
  4. search is broken because search applications are not telepathic, i.e. they cannot perceive context and other subtle metadata in a user search request.

All of the above is more or less accurate (though IMO #1 is kind of whiny and #4 is unrealistic), but at the same time there is good evidence that search actually does work for most people most of the time. And search, as a knowledge acquisition paradigm, has become incredibly ingrained in people’s usage patterns. At Evri the #1-with-a-bullet question we always get asked is “where is search?”

I dont think that a yes/no answer to the question ‘Is Search Broken’ does that question, or the technology behind it, any justice at all. I can say that from the perspective of a software engineer, the fact that I can ask a question at any time of the day and get an answer back in hundreds of milliseconds (COMCASTically of course) is just amazing. The relative precision and especially the recall is mind blowing. The rumor/FUD about Google infrastructure is scary and exciting at the same time. The fact that TF/IDF works as well as it does proves that Simplicity does equal Elegance.

When I take a more philosophical viewpoint of search, I have to say that the way I take search for granted and have completely outsourced my long term memory is very scary. I have voluntarily ceded control of information in my head and traded it for the ability to retrieve that information. Which gives me a lot more apparent bandwidth, as long as there is a computer nearby 🙂

Still, while the implementation, and more importantly, the functionality of search is something so powerful that I cannot function without it, I do see some challenges ahead for the traditional search model:

  1. Content and the traffic generated by users wanting to access that content are growing: specifically, traffic is expected to grow at 46 percent annually between now and 2012, while the amount of content available continues to skyrocket upward.
  2. Content is morphing from text based documents to include videos and audio. Search is not keeping up. Currently, most video/audio content is not considered ‘searchable’, other than by associated metadata. There are some attempts to change this, i.e. delve networks for video search, but by and large search cannot — in it’s current incarnation – treat video, audio, or image data as equivalent content to documents.
  3. People are turning to less machine driven means of finding out information — Mahalo is an example of editorialized search, and Wikipedia, while not exactly a search engine, can be used like one if you use Firefox.

That last two points taken together are pretty interesting, because in this age of massive document recall, people are veering towards precision, and precision across media types. People want content — video, text, image — to be fused together into a single result page, not a result set. The Lance Armstrong Wikipedia Page and the Lance Armstrong Mahalo result page provide a much more readable information set than the Lance Armstrong Google result page.

Implicit in those edited result sets is that the information is one click closer — the salient facts about Lance are front and center, not (just) in the first returned document. Search has made us very good at ‘search, inspect, reject, repeat’, in which we sift through keyword results and painstakingly evaluate the returned documents like ancient priests must have sifted through tea leaves, or entrails, or whatever their search engine result set was. Edited page result sets present that information in a page view that we can actually browse.

I don’t think editorialized pages are the right answer, mainly because they cannot possibly scale, and the amount of human effort required to keep them up to date is massive. I think the next internet scale ‘killer app’ must provide complete fusion of search results across media types into a single, focused page per entity, that changes in response to real world events and doesn’t rely on an army of editors to keep it up to date. That’s definitely a holy grail, but one worth shooting for if we want to have any hope of keeping the internet as useful — if not more useful — than it is today.

Evri Goes Beta!

June 26, 2008

Whew! Its been an exciting couple of weeks as we’ve gone through our first launch, but it’s official: Evri has a real beta release, and we’re telling the world.

It’s been two and a half years since I started down this road, building our very first, very crude prototype in a couple of weeks, and walking into the first demo knowing that we were onto something big, even if we didn’t know what that really meant. I consider myself very lucky to have had the opportunity to dive headfirst into such an engrossing, challenging space over the last couple of years, it’s been a great ride with a great team!

Neil’s blog post says it much better than I could ever rehash it (but I’ll try anyway): we are building a data graph of the web, a set of people, places, things, concepts, connected by specific relationships harvested from relevant content. The company motto: “search less. understand more.” is our way of saying that we have something truly disruptive: this is not search gone to eleven, or Wikipedia on steroids, but something much more powerful that gets around the current, completely disjoint user experience of “search, then browse, then search some more, then forget what you were originally searching for”.

Check it out! Sign up, and, most importantly, tell us what you think, so we can make it (even) better.

The Cron of Tab

June 19, 2008

Damn, setting up crons was supposed to be a walk in the park, so much so that I didn’t even budget mental energy for it! Maybe that was the problem…

Anyways, lessons learned from setting up simple cron jobs.

  1. How-Tos are good, but man is better.
  2. crontab -e will either load your current crontab, or load a blank template for you.
  3. crontab -r will kill your current crontab. Which is why it’s nice to keep the output of crontab -l, which lists your crontab jobs, in a backup file. Because unless you like setting up crons, you don’t want to be left high and dry without a backup.
  4. try running your jobs before putting them in the crontab. Or, if you’re like me, do crontab -l, cut and paste, and figure out why /usr/local/binruby is not the command you want (/usr/local/bin/ruby was what I was looking for).
  5. to make sure your jobs are running, tail -f /var/log/syslog. Note that this doesn’t tell you if they’re crap or not.
  6. Or, append your output to a log, and check to see that the log is growing.

I’m still kind of cruxing about how I’m going to run this on a deployed app. I think I’m going to have to get ops to add deploy_user, and add the crons on deploy_user’s behalf. Of course, I’ll worry about that when I can actually smoothly install mod_rails on my semi jacked up box. I must be the only tard in the universe who screwed up a mod_rails install, but more on that tomorrow.

Tomorrow (er, 1 week later)

In the mad rush to launch (more later), I forgot to update this page, which is bad because this is my scratchpad that has more than once saved me from repeating a painful process. Summary: anyone installing mod rails should take 4 minutes and view the railscast. I was missing the following in my conf file:

LoadModule passenger_module /usr/local/lib/ruby/gems/1.8/gems/passenger-1.0.5/ext/apache2/
RailsSpawnServer /usr/local/lib/ruby/gems/1.8/gems/passenger-1.0.5/bin/passenger-spawn-server
RailsRuby /usr/local/bin/ruby

Metrics Part IV: RRDTool on Ubuntu 7.04 (Feisty)

June 19, 2008

The production instance of this metrics server is going to run on Ubuntu Feisty, which comes installed with rrdtool 1.2, but the ruby bindings I want (need) to use bind to 1.3. So this is how to install rrdtool on Feisty.

(1) download the source:

curl > rrdtool.tar.gz

then follow the instructions rrdbuild page : basically,


sudo make

sudo make install

However, in order to get configure to complete, I needed to install a couple of dependencies, pango and xml-2. I like configure, it’s very good about telling you what is missing and where to get it. And the rrdbuild page is also great at specifying exactly how to install the missing packages.

I figured I was up and running at that point. I built the ruby bindings from {src dir}/bindings/ruby, but when I tried to run the files I had been running on my Mac, I got:

/usr/local/lib/ruby/site_ruby/1.8/x86_64-linux/ cannot open shared object file: No such file or directory - /usr/local/lib/ruby/site_ruby/1.8/x86_64-linux/ (LoadError)


How can a file that exists, /usr/local/lib/ruby/site_ruby/1.8/x86_64-linux/, not be found? Was it a permissions thing? I tried building as sudo, same thing. Then I made sure that the file actually existed, just to check my head. Yes, it’s there. Yes, I’m getting the same error. Wait. Could it be the ? I try a

ldconfig -v | grep rrd

and only get Hmmm. OK, desperate times…I edit /etc/, and add


to the path. That worked. The ops team is not going to be super thrilled about the amount of jackassery it took to get this up and running, which is why I’m documenting it here. Because they’ll make me maintain it. Just like I would if I were in their shoes 🙂

Metrics Fast ‘n Easy, part III: accessing Rails goodies outside of a Rails app

June 18, 2008

The basic architecture of this (very simple) metrics gathering and display application is:

  1. Scripts run as crons from within the rails directory. Every minute, the cron wakes up and checks the database for the last time run and the poll interval. If they are
  2. They update RRD files and generate .pngs that reside in the rails /public dir.
  3. they update the latest value, and the last time run.

The scripts and the rails app intersect at two points:

  • the interval and last time polled
  • the generated PNG file.

I may choose to store and display the last collected value, but that will happen after getting feedback from my customers (the development, operations, and product teams).

Since the script has to check the interval and the last time polled, it needs access to the database. I naturally wanted to use the ActiveRecord classes that I use in the rails app, I also wanted access to rails environment variables, like RAILS_ROOT.

By requiring environment.rb:

require File.dirname(__FILE__) + '/../../config/environment.rb'

I was able to get rails like behavior into my scripts.

Metrics Fast ‘n Easy, Part II: actually using RRDTool from Ruby

June 14, 2008

Continued from part I:

Now that the RRD bundle is installed in Ruby’s default load path, I require RRD and access the convenience methods. The methods basically pass all parameters in as strings, which is fine, but I don’t like thinking of time and values as strings if I can avoid it. So I wrote a wrapper class that allows me to pass in values as typed options, and then casts them to the internal strings.

A couple of notes about creating, updating, and rendering RRD graphs using the built in Ruby binding.

Creating an RRD graph

At create time, the first parameter is the name of the file, minus the .rrb extension (create will puke if you specify the extension). The start time is expressed in seconds, my code below passed it in as a Time object and converts it to seconds. The step time is the minimal amount of time an update can occur at — in other words, if your step is 50 seconds and you try to update at 10 seconds, you get an error.

The DS option defines a dataset as follows:

DS:[name]:[graph type]:[min time to show an error condition]:[min value or unknown]:[max value or unknown]. More explanation of the suitable graph types is found here.

"--start", "#{@start.to_i}",
"--step", "#{@step}",

In the example above, I only create a single dataset. You can create 1..N, although I’m sure N has an upper limit, I haven’t found it specified anywhere. Also, I believe the data set is restricted to < 19 characters in length. The RRA section syntax is as follows:
RRA:AVERAGE | MIN | MAX | LAST:xff:steps:rows
where the collapsing is done by averaging values, or min/maxing values, the xff value specified limits unknown values from being collapsed by establishing a max ratio of unknown values to known values. The steps value specifies the number of datapoints collapsed, and the rows value specifies how many collapsed datapoints to keep. So RRA is where you really get a chance to limit the size of the RRD file.

Updating an RRD Graph

Once the graph has been created, it exists as the file you specified using the name parameter above. You update it with time:value statements: in the code below, I’m updating an array of time:value statements:
# simple update of multiple values
def update(times, values)

for i in 0..times.length-1


In the code above, as for all RRD operations, you specify the name of the RRD file you want to operate on in the first parameter.

Note that you cannot update a graph with a time less than it’s start time or a time that is less then the last time + the step time specified at creation.

Displaying an RRD Graph:

Graph display is the most complex operation with RRD. I’m not going to go into all of the details: some really good examples are found here.

I’ve taken the simplest approach to displaying a graph:

"--title", title,
"--start", start.to_i.to_s,
"--end", finish.to_i.to_s,
"--imgformat", "PNG",

Unlike the update method, the name of the actual desired graph is the first parameter, not the name of the RRD file. The RRD file to load is specified in the DEF line. You can specify multiple DEF values to display dataset from different RRD graphs. You will need to specify the way you want each dataset rendered: in the above example, I define a value a with the DEF statement that I reference in the following LINE statement:

In order to render data, you will need to specify how you want to display it with( as a line, as area under a line, as a tick mark, etc). More details about how to define data sets, including creating datasets via the CDEF statement, are found in the graph data documentation. Details about how to display data are in the rrdgraph method documentation. The format of the DEF, CDEF, LINE statements is RPN, i.

Make sure to specify start and end in a way that shows values as you would like to see them, i.e. make sure your latest value is in the specified start and end range.