Language Zealots

August 31, 2007

I think people are xenophobic by nature. Racism, sexism, being exclusionary, are all built into the human genome. Which makes it a foregone conclusion that in the world of software development, there are language zealots. These are the people running around claiming that Language X is better than Language Y for reasons 1,2, and 3.

After listening to me rave about Ruby for hours on end, many people have concluded that I am a Language Zealot, a Ruby Jihadi bent on destroying all Java using infidels. Nothing could be further from the truth — I’m happy to use the best language for the situation. I’ve coded C on embedded set top boxes due to processor and memory constraints, Java server side processes now that no one codes in C anymore :), and Perl when I had to rip through a thousand files quickly.

I’ve quickly become enamored with Ruby primarily because it lets me get my job done better — right now, as a manager, I tend to do more scripting that isn’t performance bound, and Ruby, unlike Perl, lets me do that in a clean, maintainable, dare I say potentially elegant way.

People use languages because they are inherently useful, somewhere, and in those specific situations using another language would be painful, usually from a performance perspective. After all, we are in the business of producing usable applications, not writing elegant algorithms.  I would gladly code in C again if I had to write a Ruby extension, I just haven’t found a really good reason to yet.  I may trip all over memory allocation — it’s been a while — but I would also hope that my exposure to other languages ends up making my C easier to understand/maintain than it would have been had I only known C.

This is because in learning a new language, I have had to apply the same underlying principles — i.e. expressions, error handling, I/O — that I knew from of my previous language(s), and in doing so I understand those concepts better. I think that the first part of coding is really understanding what you are trying to do — if you know that,  it will be concise and maintainable in any syntax.

Coding By Intention

August 31, 2007

I’ve been awake since 3AM, thinking profound thoughts. At around 5 I realized that I had created a blog to capture what I was thinking, and at 5:15 I finally rolled out of bed and fired up the Mac.

Now that I’m more of a manager and less of a dev, I relish the opportunity to actually cut some code every once in a while. Usually I take the lower priority tasks that the team needs done and leave the more involved coding to people that can actually sit down and do it as part of their day job 🙂 But in a startup that doesn’t always happen…case in point, we are writing a wikipedia scraper to try and extract some of the infobox table information from wikipedia pages. The infobox table is that table off to the right hand side that lists factoids about the page entity.

The whole reason we are doing this rather late in the game is that we’ve tried to outsource web page scraping and the results have been, well, typical. After realizing that correctly scraping data is a core competency that is required to deliver on our final vision, I had one of the newer devs, Phil, take a look at writing a scraper that goes right up against the page (yes, I know I’m supposed to use the wikipedia static dumps, but right now we’re in pure prototype phase, not hitting the damn thing all day).

Phil and I thought that basing something on Hpricot would be smart, we had both used it before and found it extraordinarily useful. In the tradition of lazy Ruby programmers everywhere, we even checked out scrubyt and mechanize, but they weren’t quite what we needed. So, while I was sitting in meetings trying to come up with requirements for the larger application, Phil went off and wrote some really nice, generic, scraping code that took full advantage of the sweet bits of Ruby.

To summarize, we needed to extract out information and map it to specific properties: we needed to create a hash, given text on a page and some hints about where that text would be. I would have done this in a very circa 1999 way, creating a driver object that farmed out the scraping to domain specific sub objects. Phil did something that I consider quite brilliant: he created a class that implemented a bunch of static methods that allows the user to declare a class that lists they are trying to scrape and what they want to map it to, like this:

class Scraper < PageScraper

scrape_rule :property,”{some regex that is a rule}”

scrape_rule :another_prop,”{an xpath expression}” {some cleanup code }

scrape_rule :one_more_prop, “{a css search path} { a regex to apply to matched element text}


Implementation aside, this is clean, useful code, which is inherently concise and descriptive. I’m not a language bigot, but that doesnt mean I don’t appreciate it when a language makes it easier for me to get my job done!

Long story short, working on this code has helped me focus on coding by intention, which is something that comes naturally to smart people (Phil) and something that the rest of us can learn, given the opportunity.

I’ve extended the example above by adding the ability to scrape repetitive blocks of code into ‘object’ templates, where properties are subgrouped. This is handy when you are trying to get all of the positions that Norman Mineta has ever held — you want to group things like job title, time span, acting President, etc together. Phil had already handled the notion of setting scope: my addition was done to allow us to iterate through a chunk of repetitive (i.e. table) HTML. It looks like this:

class BlockScraper < PageScraper

scrape_loop :job_title “//div[1]” do

scrape_rule :held_position /position (.*)/i { | e| Date.parse(e) }

scrape_rule :acting_president /President\w(.*)/mi



In any case, the code above makes it pretty easy for the user to ‘say what they mean’, which to me is beautiful.

what, me blog

August 16, 2007

I’ve been meaning to blog (again) for a while now, but I suppose everyone says that. I was thinking that I could use this blog as more of a ‘notes to self’ kind of thing, to rant, rave, philosophize, diarize, etc. But it never seems to happen (without some kind of discipline). Hey, I just categorized this blog entry as about blogging, is that Meta or what?