Coding By Intention

I’ve been awake since 3AM, thinking profound thoughts. At around 5 I realized that I had created a blog to capture what I was thinking, and at 5:15 I finally rolled out of bed and fired up the Mac.

Now that I’m more of a manager and less of a dev, I relish the opportunity to actually cut some code every once in a while. Usually I take the lower priority tasks that the team needs done and leave the more involved coding to people that can actually sit down and do it as part of their day job 🙂 But in a startup that doesn’t always happen…case in point, we are writing a wikipedia scraper to try and extract some of the infobox table information from wikipedia pages. The infobox table is that table off to the right hand side that lists factoids about the page entity.

The whole reason we are doing this rather late in the game is that we’ve tried to outsource web page scraping and the results have been, well, typical. After realizing that correctly scraping data is a core competency that is required to deliver on our final vision, I had one of the newer devs, Phil, take a look at writing a scraper that goes right up against the page (yes, I know I’m supposed to use the wikipedia static dumps, but right now we’re in pure prototype phase, not hitting the damn thing all day).

Phil and I thought that basing something on Hpricot would be smart, we had both used it before and found it extraordinarily useful. In the tradition of lazy Ruby programmers everywhere, we even checked out scrubyt and mechanize, but they weren’t quite what we needed. So, while I was sitting in meetings trying to come up with requirements for the larger application, Phil went off and wrote some really nice, generic, scraping code that took full advantage of the sweet bits of Ruby.

To summarize, we needed to extract out information and map it to specific properties: we needed to create a hash, given text on a page and some hints about where that text would be. I would have done this in a very circa 1999 way, creating a driver object that farmed out the scraping to domain specific sub objects. Phil did something that I consider quite brilliant: he created a class that implemented a bunch of static methods that allows the user to declare a class that lists they are trying to scrape and what they want to map it to, like this:

class Scraper < PageScraper

scrape_rule :property,”{some regex that is a rule}”

scrape_rule :another_prop,”{an xpath expression}” {some cleanup code }

scrape_rule :one_more_prop, “{a css search path} { a regex to apply to matched element text}

end

Implementation aside, this is clean, useful code, which is inherently concise and descriptive. I’m not a language bigot, but that doesnt mean I don’t appreciate it when a language makes it easier for me to get my job done!

Long story short, working on this code has helped me focus on coding by intention, which is something that comes naturally to smart people (Phil) and something that the rest of us can learn, given the opportunity.

I’ve extended the example above by adding the ability to scrape repetitive blocks of code into ‘object’ templates, where properties are subgrouped. This is handy when you are trying to get all of the positions that Norman Mineta has ever held — you want to group things like job title, time span, acting President, etc together. Phil had already handled the notion of setting scope: my addition was done to allow us to iterate through a chunk of repetitive (i.e. table) HTML. It looks like this:

class BlockScraper < PageScraper

scrape_loop :job_title “//div[1]” do

scrape_rule :held_position /position (.*)/i { | e| Date.parse(e) }

scrape_rule :acting_president /President\w(.*)/mi

end

end

In any case, the code above makes it pretty easy for the user to ‘say what they mean’, which to me is beautiful.

Advertisements

4 Responses to Coding By Intention

  1. cttmmd says:

    What does coding by intention mean? I’m not a software engineer so I’m not sure how “coding by intention” is different from the normal coding. Or is that a management term?

  2. arunxjacob says:

    management term? Hell no. check out this guys definition: http://clintshank.javadevelopersjournal.com/coding_by_intention.htm

    basically, its top down development. You start out by saying “what do I want to do?” and take it from there. Note that it goes hand in hand with TDD, which really allows you to think clearly about how you would use a method in addition to how you would break it.

  3. […] Software Without Coding August 31st, 2007 — cttmmd Wow! So that is what coding by intention means.  I’ve been using that concept indirectly without knowing it.  I did not know that […]

  4. […] Naturally we targeted Wikipedia as the best seed source to get us started. After writing some very specialized table scrapers in Ruby, we decided to take another approach and just ingest the entire Wikipedia […]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: