moving to blogspot

October 9, 2008

For the last 2 1/2 years I’ve been working on a series of prototypes that have morphed — thanks to lots of really smart people and the work they brought as well as the work they’ve done since — into evri. Evri’s main goal is to create a ‘data graph of the web’, where you can find the best media for specific  people, places, and things, as well as navigate from one entity to the next via the relationship between the two.

Gee, that sounds like Semantic Web, and Semantic Web has not really shown itself to be Useful. Well at Evri we’ve been focused on the user experience, and while we’ve got a ways to go, we feel that providing the site as well as the tools to access the underlying data store is important.

One of these tools is the Evri content recommendation widget, which looks up all entities in your blog post and shows connections between the entities and recommends related media for those entities. Unfortunately the hosted version of wordpress is very restrictive when it comes to widgets, so I can’t put the Evri widget in my blog.

So, I’m moving my blog — to Waving Not Drowning, where I’ve embedded the widget. I’ll still use this blog, it represents a year and a lot of knowledge — of things that I forget, frequently. I’ll continue taking ‘notes to self’ in the new blog.


I guess this is why you should never confuse ambition with intelligence

September 26, 2008

I used to think that all people with outsize ambition were very intelligent, and used their intelligence to rise above the rest of us. Even when GWB ‘won’, and when he won for real 4 years later, I just thought it was Satan Dick Cheney being the puppet master. That was before I saw Sarah Palin in action. Go ahead. Watch this, it’s a powerful object lesson that while you can be a mayor and a governor and a mom — a juggling act if there ever has been one– and just because you are photogenically catty and can spin funny pitbull jokes when the teleprompter is rolling, those qualities don’t translate to the raw intelligence needed to be in the #2 slot.

In fact, the more she talks without the backup of a teleprompter, the worse I feel. As much as I hate to admit it, she’s got a decent shot of being one heartbeat away from the presidency. That said I think — and fervently hope — that watching the VP debate is going to be like watching a slow motion train wreck. Hopefully Biden wont let his predilection for talking way too much get in the way of putting Palin out to pasture.

If they win I’m outta here. Canada is looking pretty good right now.

Evri Beta goes Open!

September 24, 2008

A big day here at Evri, as we have taken the password protection off and opened ourselves up to the world. When we started looking at the problem of managing information on the web almost 3 years ago as a tiny research team of 2, our main goal was to make processing information easier for ourselves. We were stuck in an endless loop of keyword search -> sift through results -> alter keyword search -> forget what we were looking for in the first place.

Fast forward to now and Evri is an incredibly talented team who have gone far beyond the initial proof of concept prototypes and have delivered an intuitive and easy to use site that lets you find the content you want to find about the things you care about. Along the way I’ve been exposed to the real problems and solutions inherent in making a real product from a raw prototype, and I’ve got to say it’s been a great ride so far, and with the open beta we have just crossed the starting line…now it’s real!

Instead of writing about what we do, which is best summed up here, I encourage you to visit the site and poke around. If you have a blog, try installing the widget — note that my blog, which is hosted by wordpress, cannot run the widget, but this is a general wordpress problem, and there are known work-arounds that we are investigating. Stay Tuned!

RRD and averages

September 15, 2008

As noted in other posts, I’m writing a monitoring application. Because this is at best a part time effort that needs to be done quickly, I’ve made some technology choices that emphasize rapid development: the app is a Rails app, and I’m storing statistics with RRD.

I really like RRD, as I’ve mentioned before, it gets me out of the business of drawing graphs and storing data, both of which are hard problems I’d rather not solve. But I was having some issues with it when I would try to store values.

How I thought RRD worked

It seemed pretty simple. I thought I would create an RRD file (wow, lots of parameters, wonder what they mean?), then update it whenever I had a number. And the values would be graphed. And all would be good. But when I did all of the above, I noticed that the values being graphed were not the values I was storing. Hmmm. Time to figure out what some of those parameters mean.

How RRD actually works

Pretty well, actually, because it was designed by some smart people to store numbers that came in at any time, and average those numbers across a create time defined interval. Well, that’s one of the ways RRD works. It can also store counters, store the results of methods applied to raw values, and store the derivative value of the line being graphed. I was using it in the simplest case, to store a value.

What I didn’t realize is  that if my values were updated  between interval boundaries (known as ‘step values’), they would be averaged across that interval. If the values were updated outside of the specified ‘heartbeat’ value, RRD would store an ‘unknown’ value. A good explanation of how this works is found here ( in an SNMP monitoring solution).

That is actually the way graphing in a loosely coupled environment_should_ work. The reason that I was seeing strange numbers was because my insertions were falling within the same interval boundary. Which may be rational, but doesn’t jibe with then numbers I’m trying to (a) display and (b) alert on.

How I got my app to work with RRD

The key for this app is that it is expecting an average value across a time interval. So in essence I have to make sure that only one data point is inserted per interval. I do this by munging the time of insertion in the RRD graph (I still keep the original insert time for purposes of reporting).

I insert the data point at the end of the interval, so if I have a 5 minute interval and I receive and update at 21:52:34, my actual insertion is at 21:55:00. The next value will be inserted at 22:00:00. If the interval was 1 minute, I would have inserted at 21:53:00, and the next value would be inserted at 21:54:00.

More fun with RRD

I’m sure that more fun awaits. I have not hit the point where round robin averaging kicks in, and my ‘default’ values are based on my current (mis)understanding of RRD. I’ll update this post so I don’t repeat history.

ActiveResource as a Web Servicification Tool

September 11, 2008

That’s right Web Servicification. Servicifying, just like the subtle art of Strategery, is an oft derided but subtly powerful part of my toolkit.

Or at least it is now. I have been working on a monitoring application so that we could figure out when things were going pear shaped — as opposed to finding out after the fact. This monitoring application was somewhat novel in that it successfully got me out of doing hard things of little quantifiable value and let me focus on doing easy things of much greater value (see above goal). Using RRD is a good example of outsourcing a hard problem  — what to do with all that data ?!?– to something that handled it for me. Not an especially hard leap to make, thanks to lots of SysAdmins who feel exactly the same way, but still, I’m really happy that I’m not collapsing data to keep my disk footprint somewhat finite.

This whole strategy of ‘doing more with less effort’ is really fun, I’m searching for something non geeky to try it in.  If I get the same efficiency boost in my personal life I’ll have enough time to write a best seller, become a kickboxing champion, or both.

In the context of my (geeky) monitoring app, Web Servicification is something else that gets me out of a couple of fairly hard to do things. Wait, let me back up a bit. Web Servicification with ActiveResource gets me out of a couple of fairly hard to do things.

The first is writing a web service. Go ahead and sneer at the difficulties of writing an XML consuming service, but until you’ve rolled your own in Java, you just don’t have that Juan Valdez “I wrote this one bean at a time” feeling.

The second hard thing this gets me out of is writing monitors for a bunch of heterogenous systems. We’ve got some things implemented in Java, some in Ruby, some in Perl, etc. I either slap a bunch of web interfaces on all of those systems, then write some centralized code to poll those interfaces, or I slap a web interface on my monitor app and let other people figure out which statistics are meaningful to them, and how often they should be updated.

A Web Service by Default

ActiveResource comes more or less enabled by default in Rails 2.0. Every method in a default generated controller can be accessed either by the UI or an REST action. Here is an example controller generated for one of my resources.

# GET /samples
# GET /samples.xml
def index

respond_to do |format|
format.html  #index.html.erb
format.xml  { render :xml => @monitor_instances }


# GET /monitor_instances/foo
# GET /samples/1
# GET /samples/1.xml
def show

# GET /samples/new
# GET /samples/new.xml
def new

# GET /samples/1/edit
def edit

# POST /samples
# POST /samples.xml
def create

# PUT /samples/1
# PUT /samples/1.xml
def update

# DELETE /samples/1
# DELETE /samples/1.xml
def destroy

Notice that the standard REST verbs are implemented in the same methods that handle rails application page requests. That’s pretty cool, and it means that you’ve got basic CRUD from the get go. The secret is in the render method (in bold above), which returns either a page or XML content depending on the requested format. If the request ends in xml, it’s assumed to be REST, otherwise it’s assumed to be a standard page request.

Routing for both REST and page based requests is provided in routes.rb:

map.resources :monitor_instances

provides routing access to the default methods defined above.

Accessing the Default Web Service

ActiveResource::Base is the class that abstracts the wire format and provides basic CRUD access to the resource. To access the MonitorInstance objects defined above, I could do the following:

class MonitorInstance < ActiveResource::Base
# define what you need in here


The ActiveResource based MonitorInstance acts similarly to an ActiveRecord based MonitorInstance:

monitor_instance = MonitorInstance.create(:name=>monitor_name,:monitor_instance_id=>,:frequency_id=>,:status_id=>@status_by_name[‘good’].id,:monitor_type_id=>@monitor_type_by_name[“stand_alone”].id)

creates a monitor with the parameters as specified above.

MonitorInstance.delete( OR


removes the monitor instance.

monitor_instance = MonitorInstance.find(1) finds me the monitor instance with an ID of 1.

removes that Monitor. So far, so good.

Find (not by ID)

What if I want to find something by a secondary attribute, like name? The default rails app expects qualifying parameters to be passed in a params hash:

monitor = MonitorInstance.find(:first,:params=>{:name=>monitor_name})

My MonitorInstances can be nested under other MonitorInstances. In the Rails app model, each MonitorInstance model specifies that it  belongs_to :monitor_instance.

This doesn’t quite have a corollary in the ActiveResource world. ActiveResource is concerned with abstracting basic access of web based resources, and that associations are not available via that abstraction layer. When I want to find a nested MonitorInstance, I do the following:

def get_monitor(monitor_name,parent_name = nil)

if(parent_name != nil)
parent = get_monitor(parent_name)
@logger.debug(“finding first instance of monitor #{monitor_name} under #{parent_name}”)
monitor = MonitorInstance.find(:first,:params=>{:name=>monitor_name,:monitor_instance_id=>})
@logger.debug(“finding first instance of monitor #{monitor_name}”)
monitor = MonitorInstance.find(:first,:params=>{:name=>monitor_name})


So I need to first get the parent resource, then make a request with the parent ID in the params hash, as indicated by the bolded text above.

Updating — Avoid Non Writeable Parameters!

It was hard to find any updating doc that didn’t just say “to update, just invoke the ActiveResource-derived object save method”. Which sounds great in theory, but didn’t work, because the default implementation of save POSTS all attributes, even those that are considered immutable, to the web service endpoint. For instance, my MonitorInstance class has an id field that is immutable. That field is posted with all other (mutable) fields. There is a method in ActiveResource to remove all immutable/protected attributes, but that method calls an undefined logger object to notify you that you are trying to modify an immuatble attribute, and an exception is raised.

To get around this, I stripped the immutable attribute — the id — out of the incoming params hash of the controller update method (in the Rails app) — see the bolded text below:

class StatisticsController

# PUT /statistics/1
# PUT /statistics/1.xml
def update
@statistic = Statistic.find(params[:id])

if(params[:statistic][:id] != nil)
logger.debug(“removing ID from input params!”)

respond_to do |format|
if @statistic.update_attributes(params[:statistic])





Nested Resources

The StatisticsController above handles all posts to Statistics resources, which are 1..N measurements associated with a monitor. In order to enforce that kind of scoping in the request path, I need to update the monitor_instances routes to scope the statistics routes:

#map.resources :statistics

map.resources :monitor_instances, :has_many => [:statistics]

In the statistics controller, I now need to always be aware of the ‘owner’ MonitorInstance. I do this by adding a before_filter, a method that gets invoked prior to every method being called:

before_filter :find_monitor_instance

This before_filter corresponds to the find_monitor_instance method, which returns the appropriate MonitorInstance:


def find_monitor_instance
@monitor_instance = MonitorInstance.find(params[:monitor_instance_id])

Now I have an attribute that I can refer to in my controller. Note that in all of the controller methods that handle both REST and page requests, I need to scope my model requests/updates with the @monitor_instance variable:

def index
@statistics = Statistic.find(:all,:conditions=>{:monitor_instance_id=>})

respond_to do |format|
format.html # index.html.erb
format.xml  { render :xml => @statistics }

I can also take advantage of RAILS path freebies. For instance, after a create request for statistic, I redirect to the appropriate monitor_instance scoped path like this:

flash[:notice] = ‘Statistic was successfully created.’
format.html { redirect_to(monitor_instance_statistic_path(@monitor_instance,@statistic)) }
format.xml  { render :xml => @statistic.to_xml, :status => :created, :location => monitor_instance_statistic_path(@monitor_instance,@statistic) }

monitor_instance_statistic_path generates a path that looks like {path to server}/monitor_instances/1/statistics/3.html or .xml depending on the requested output format.

Some Helpful Links:

ActiveResource RDoc

REST + ActiveResource

Comments from this Railscast

Mac launchd and launchctl — the OSX alternative to cron

August 28, 2008

I was revisiting my metrics project, having used the first one as the prototype to refine requirements (nothing works better at getting real requirements out of people than showing them something that doesn’t quite do what they want).

When it came time to test a monitor, I tried to get one running under cron and it didnt actually work for me. I can’t remember if cron has ever worked for me on a mac, but didn’t have the time to figure out why and how. It was time to make the jump to launchd.

Launchd is billed as an  init.d, /etc/rc, xinetd, .profiile, and crontab replacement, i.e. it can launch scripts at system startup, user login, or on a specified interval.

My use case was to do something cron like. This was not entirely straightforward, there is a difference between using StartCalendarInterval (to run things on a specified date, or every minute if no value is specified) and StartInterval (to run things at a specified interval, similar to specifying */5 for every 5 minutesin cron).

programs are loaded into launchd with launchctl, they are specified as plist files with a pretty simple key/value and/or key/dictionary of values XML format. Here is my .plist file for running something every 5 minutes:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "">
<plist version="1.0">

Note that in key value parlance, StartInterval takes an integer which specifies the # of seconds. If I wanted to run something every day at a specified time, I would use StartCalendarInterval, which takes a dictionary element that contains time intervals.

<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
	<plist version="1.0">

Note the difference between StartCalendarInterval syntax and StartInterval syntax — StartCalendarInterval takes a dict structure that contains key/value pairs. In other words it takes a hash. You can also use Arrays, as specified in the value for the ProgramArguments key. Just make sure your keys have the correct kind of values. as specified here.

More Rails-tarded ness: named resources

August 21, 2008

I was showing my monitoring app to a co-worker, who wanted to access some of the resources by URLs that contained their names. Hey, that actually makes sense! He wants to refer to resources by their actual names — brilliant. Unfortunately for my lazy ass, this is a departure from the standard rails resource routing conventions, where

map.resources :{controller name}

automagically generates routing like this:

/controller name/:id

I wanted to have both approaches, mainly because I’m lazy and dont want to rework my code that navigates back to these resources by ID. My first attempt at doing this was to put a custom named resource in front of my default map.resources statement:

map.named_monitor_instances ‘monitor_instances/:name’, :controller=>’monitor_instances’, :action=>’show_named_monitors’

this resulted in me getting a ‘missing template for show_named_monitors’ message, which was fine. I didn’t want to render the same view in another erb file.

The best solution I’ve found for having it both ways is by realizing that the default route :id parameter is just a parameter, and can contain a name as well as a number. Other named routes can be quite specific about what they contain, but the default route is pretty forgiving. I modified the controller code to look like this:

@monitor_instance = MonitorInstance.find(params[:id])

@monitor_instance = MonitorInstance.find_by_name(params[:id])

to catch the instance where the find_by_id(‘foo’) fails and try to find foo by name. Graceful? No. Elegant? Not really. I’m sure this level of rails-tardedness will get me flamed by Rails Zealots who think I’ve gone and dicked up a perfectly elegant solution. But is it easy? Hell to the Yeah it is.

The Rails-tarded way to inject related model info into your form

July 13, 2008

I’m 100% sure that there is an elegant, rails-tastic way to do this, but I’m short on time and want to remember this approach, because all things considered, it doesn’t seem that bad.

I’m trying to create a set of objects that can nest. These MonitorGroup objects can contain other MonitorGroup objects, so I can create a hierarchy of things I want to monitor. The model definition for MonitorGroup is:

class MonitorGroup < ActiveRecord::Base

has_many :monitor_instances
has_many :monitor_groups
belongs_to :monitor_group
belongs_to :status

validates_presence_of :name, :status
validates_uniqueness_of :name


Note that the ‘nesting’ is indicated in the highlighted belongs_to method.

In order to actually get this to happen, I need to have a way to add child MonitorGroup objects to a parent. I do this in the parent’s edit page as follows:

<%=link_to “Add Child Monitor Group”, {:action=>”new”,:parent_monitor_group_id=>} %>

The parent_monitor_group_id gets put into the params of the new page: basically this link_to builds a link like this:


So if the user is editing the parent monitor group, they have the option to add a child monitor group. This request gets routed to the new method of the MonitorGroupsController, where I unpack the parameter and look up the associated Monitor:

def new

# this is needed so that the form can access model properties.
@monitor_group =
parent_monitor_group_id = params[:parent_monitor_group_id]
if(parent_monitor_group_id != nil)

@monitor_group.monitor_group = MonitorGroup.find(parent_monitor_group_id)
logger.debug(“parent monitor name = #{}, id = #{}”)


respond_to do |format|

format.html # new.html.erb

format.xml { render :xml => @monitor_group }



So far, so good. I’ve now created the MonitorGroup object that is going to be used to drive the form_for block I use in the new MonitorGroup page, if the parent_monitor_group_id is passed in the params hash.

In order to access this value in the form that I was going to submit, I needed to embed it using fields_for, this process is described quite well here. The main difference between the standard use of fields_for and the way I’m using it is that I want to pass this variable as a hidden field variable, instead of one that requires input.

Long story short: I was able to embed the id of the parent MonitorGroup in the MonitorGroup new page like this:

<% form_for(@monitor_group) do |f|

fields_for(:monitor) do | mon |

if(@monitor_group.monitor_group != nil) %>
hidden_field “parent_monitor”,:id,{:value=>} %>

<% end


A couple of things to note: the hidden field gets translated to the following html:

<input id=”parent_monitor_id” name=”parent_monitor[id]” type=”hidden” value=”9″ />,

which in turn means that the params passed to the create method of the MonitorsGroupController look like this:

Parameters: {“commit”=>”Create”, “authenticity_token”=>”xxx”, “action”=>”create”, “controller”=>”monitor_groups”, “monitor_group”=>{“name”=>”child_mon”, “description”=>”child monitor”}, “parent_monitor”=>{“id”=>”9”}}

and I access the monitor ID in the create method as follows:

parent_id = params[:parent_monitor][:id]
if(id != nil)

@monitor_group.monitor_group = MonitorGroup.find(parent_id)


This seems kind of klugey, but it works, and I’ve got a deadline. Any Rails Gods lurking out there, please show me the elegant concise way of doing this?!?

Is Search Really Broken?

July 7, 2008

It’s July 2008, and I’m looking around wondering how many people think search is broken. Actually, I’m searching around wondering how many people think search is broken. And, using this method, I’ve been able to deduce the following (from the first page of results, no less):

  1. search is broken because publishers have to put additional metadata/markup on the pages/sites they want found, in addition to sitemaps, no-follow, robots.txt, etc.
  2. search is broken because when I want to find something, I have to look in so many different places.
  3. search is broken because there needs to be a human editorial overlay in order to achieve useful result precision.
  4. search is broken because search applications are not telepathic, i.e. they cannot perceive context and other subtle metadata in a user search request.

All of the above is more or less accurate (though IMO #1 is kind of whiny and #4 is unrealistic), but at the same time there is good evidence that search actually does work for most people most of the time. And search, as a knowledge acquisition paradigm, has become incredibly ingrained in people’s usage patterns. At Evri the #1-with-a-bullet question we always get asked is “where is search?”

I dont think that a yes/no answer to the question ‘Is Search Broken’ does that question, or the technology behind it, any justice at all. I can say that from the perspective of a software engineer, the fact that I can ask a question at any time of the day and get an answer back in hundreds of milliseconds (COMCASTically of course) is just amazing. The relative precision and especially the recall is mind blowing. The rumor/FUD about Google infrastructure is scary and exciting at the same time. The fact that TF/IDF works as well as it does proves that Simplicity does equal Elegance.

When I take a more philosophical viewpoint of search, I have to say that the way I take search for granted and have completely outsourced my long term memory is very scary. I have voluntarily ceded control of information in my head and traded it for the ability to retrieve that information. Which gives me a lot more apparent bandwidth, as long as there is a computer nearby 🙂

Still, while the implementation, and more importantly, the functionality of search is something so powerful that I cannot function without it, I do see some challenges ahead for the traditional search model:

  1. Content and the traffic generated by users wanting to access that content are growing: specifically, traffic is expected to grow at 46 percent annually between now and 2012, while the amount of content available continues to skyrocket upward.
  2. Content is morphing from text based documents to include videos and audio. Search is not keeping up. Currently, most video/audio content is not considered ‘searchable’, other than by associated metadata. There are some attempts to change this, i.e. delve networks for video search, but by and large search cannot — in it’s current incarnation – treat video, audio, or image data as equivalent content to documents.
  3. People are turning to less machine driven means of finding out information — Mahalo is an example of editorialized search, and Wikipedia, while not exactly a search engine, can be used like one if you use Firefox.

That last two points taken together are pretty interesting, because in this age of massive document recall, people are veering towards precision, and precision across media types. People want content — video, text, image — to be fused together into a single result page, not a result set. The Lance Armstrong Wikipedia Page and the Lance Armstrong Mahalo result page provide a much more readable information set than the Lance Armstrong Google result page.

Implicit in those edited result sets is that the information is one click closer — the salient facts about Lance are front and center, not (just) in the first returned document. Search has made us very good at ‘search, inspect, reject, repeat’, in which we sift through keyword results and painstakingly evaluate the returned documents like ancient priests must have sifted through tea leaves, or entrails, or whatever their search engine result set was. Edited page result sets present that information in a page view that we can actually browse.

I don’t think editorialized pages are the right answer, mainly because they cannot possibly scale, and the amount of human effort required to keep them up to date is massive. I think the next internet scale ‘killer app’ must provide complete fusion of search results across media types into a single, focused page per entity, that changes in response to real world events and doesn’t rely on an army of editors to keep it up to date. That’s definitely a holy grail, but one worth shooting for if we want to have any hope of keeping the internet as useful — if not more useful — than it is today.

Evri Goes Beta!

June 26, 2008

Whew! Its been an exciting couple of weeks as we’ve gone through our first launch, but it’s official: Evri has a real beta release, and we’re telling the world.

It’s been two and a half years since I started down this road, building our very first, very crude prototype in a couple of weeks, and walking into the first demo knowing that we were onto something big, even if we didn’t know what that really meant. I consider myself very lucky to have had the opportunity to dive headfirst into such an engrossing, challenging space over the last couple of years, it’s been a great ride with a great team!

Neil’s blog post says it much better than I could ever rehash it (but I’ll try anyway): we are building a data graph of the web, a set of people, places, things, concepts, connected by specific relationships harvested from relevant content. The company motto: “search less. understand more.” is our way of saying that we have something truly disruptive: this is not search gone to eleven, or Wikipedia on steroids, but something much more powerful that gets around the current, completely disjoint user experience of “search, then browse, then search some more, then forget what you were originally searching for”.

Check it out! Sign up, and, most importantly, tell us what you think, so we can make it (even) better.