It’s July 2008, and I’m looking around wondering how many people think search is broken. Actually, I’m searching around wondering how many people think search is broken. And, using this method, I’ve been able to deduce the following (from the first page of results, no less):
- search is broken because publishers have to put additional metadata/markup on the pages/sites they want found, in addition to sitemaps, no-follow, robots.txt, etc.
- search is broken because when I want to find something, I have to look in so many different places.
- search is broken because there needs to be a human editorial overlay in order to achieve useful result precision.
- search is broken because search applications are not telepathic, i.e. they cannot perceive context and other subtle metadata in a user search request.
All of the above is more or less accurate (though IMO #1 is kind of whiny and #4 is unrealistic), but at the same time there is good evidence that search actually does work for most people most of the time. And search, as a knowledge acquisition paradigm, has become incredibly ingrained in people’s usage patterns. At Evri the #1-with-a-bullet question we always get asked is “where is search?”
I dont think that a yes/no answer to the question ‘Is Search Broken’ does that question, or the technology behind it, any justice at all. I can say that from the perspective of a software engineer, the fact that I can ask a question at any time of the day and get an answer back in hundreds of milliseconds (COMCASTically of course) is just amazing. The relative precision and especially the recall is mind blowing. The rumor/FUD about Google infrastructure is scary and exciting at the same time. The fact that TF/IDF works as well as it does proves that Simplicity does equal Elegance.
When I take a more philosophical viewpoint of search, I have to say that the way I take search for granted and have completely outsourced my long term memory is very scary. I have voluntarily ceded control of information in my head and traded it for the ability to retrieve that information. Which gives me a lot more apparent bandwidth, as long as there is a computer nearby 🙂
Still, while the implementation, and more importantly, the functionality of search is something so powerful that I cannot function without it, I do see some challenges ahead for the traditional search model:
- Content and the traffic generated by users wanting to access that content are growing: specifically, traffic is expected to grow at 46 percent annually between now and 2012, while the amount of content available continues to skyrocket upward.
- Content is morphing from text based documents to include videos and audio. Search is not keeping up. Currently, most video/audio content is not considered ‘searchable’, other than by associated metadata. There are some attempts to change this, i.e. delve networks for video search, but by and large search cannot — in it’s current incarnation – treat video, audio, or image data as equivalent content to documents.
- People are turning to less machine driven means of finding out information — Mahalo is an example of editorialized search, and Wikipedia, while not exactly a search engine, can be used like one if you use Firefox.
That last two points taken together are pretty interesting, because in this age of massive document recall, people are veering towards precision, and precision across media types. People want content — video, text, image — to be fused together into a single result page, not a result set. The Lance Armstrong Wikipedia Page and the Lance Armstrong Mahalo result page provide a much more readable information set than the Lance Armstrong Google result page.
Implicit in those edited result sets is that the information is one click closer — the salient facts about Lance are front and center, not (just) in the first returned document. Search has made us very good at ‘search, inspect, reject, repeat’, in which we sift through keyword results and painstakingly evaluate the returned documents like ancient priests must have sifted through tea leaves, or entrails, or whatever their search engine result set was. Edited page result sets present that information in a page view that we can actually browse.
I don’t think editorialized pages are the right answer, mainly because they cannot possibly scale, and the amount of human effort required to keep them up to date is massive. I think the next internet scale ‘killer app’ must provide complete fusion of search results across media types into a single, focused page per entity, that changes in response to real world events and doesn’t rely on an army of editors to keep it up to date. That’s definitely a holy grail, but one worth shooting for if we want to have any hope of keeping the internet as useful — if not more useful — than it is today.