Software Design Lessons from the School of Hard Knocks

For the first ten years of my software engineering career, I was really into ‘being creative’, using cool and complex data structures, multithreading for the hell of it, in short reveling in complexity and that I was one of the few people who could understand what I was doing. In other words I was making mountains out of molehills and enjoying the resulting drama. That went on until I really began to not enjoy the drama of memory leaks that take 3 days to manifest, complex infrastructure (EJB 2.0, anyone?) that it took large tomes to master, and the joy of having to remember the inspiration behind some arbitrarily complex code while at a customer site with a very angry customer looking over my shoulder.

I had an inkling that what I was doing was wrong, but only in a vague way. When I learned design patterns, I began to sprinkle them liberally through my code like a bad cook overuses oregano. When I wrote multithreaded programs, I waxed poetic in comments about the locking order of my mutexes. All the while I wrote hundreds of lines of code, as if I could overwhelm any problem domain with sheer volume.

There were some telltale signs that this approach wasn’t working. I’m not proud to have introduced a memory leak in Java by getting cute with multiple references to the same object. Spending more time looking at the code, trying to understand it, than actually implementing a bug fix. Having a project canned because no one else could ramp up on the infrastructure, let alone the business logic.

That was what finally broke me — the sheer cost of making small changes in the code I had written. I was spending inordinate amounts of time working with brittle code and making it do things that should have been easy but felt unholy — and I had no one else to blame.

So I did a reset, and started hanging out with people who talked about red-green refactoring and what that actually implied. I started coding in iterations instead of death marches, and stopped trying to create the ultimate solution.

The other day I was telling a friend about this enlightening experience, and over beer we distilled it into a set of lessons:

(1) The first design is going to be chucked, so don’t spend so goddamn long trying to get it right.

(2) Its OK to not know what the ultimate long term design is — If you think about it, there isn’t an ultimate long term design for any piece of software that actually gets used — requirements always get changed, different solutions emerge. Flickr was originally a gaming startup.

(3) Premature optimization is evil. OK, that’s a Knuth-ism, but when I see code like this:

if(condition met) {

synchronize(some object) {

if(condition met) {




and the explanation is “I put the if statement outside of the synchronize because synchronize statements are expensive” I have to ask why? Specifically :

  • how expensive is the synchronize statement? Did you quantitatively prove that this was the biggest bottleneck in your code? Really?
  • Can a redesign be done in some other way that avoids this kind of obfuscation?
  • If not, can you comment the complexity so that the next poor bastard doesn’t want to strangle you? I know you think he will want to worship you, but I’ve never seen that happen. Actually, I have seen people want to worship other developers, but only when those developers write easy to understand and maintain code.

(4) Failing a test is much better than not failing a non existent test. It’s better to know where you suck instead of pretending that you dont.

(5) If you believe (4), then try this on for size: it’s easier to write a test first than it is to write code and then write a test. Why? Because you are limited to validating the interface signature, and you know nothing about it’s implementation. So you then test as if you didn’t know that args are being checked, return values will never be null, and pre-conditions and post-conditions will always be honored. Then, when you write the code, and it fails those specific tests, you get those tests to pass. And you know something about the quality of your code.
(6) You dont know squat about your system until it goes live. Let me take that back. If you have unit and integration tests, you know it passes those, and thats about all you know. If you’ve done load testing, then you have validated your system under a specific load, and that’s it. No one has the time to accurately simulate live traffic, so stop pretending.

(7) Because of (6), you need to instrument any system bound for production so that you see when your system is failing and actually do something about it.

(8) Less code = better code. The happiest part of my job is deleting lots of code that I had just written because I realized there is a simpler way to do things. Today I nuked 4 classes and replaced their functionality with about 10 lines of code. That I can understand.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: