Twitter’s “Real-Time Search” Ain’t That Hard

March 4, 2009
242 Views

Google CEO Eric Shmidt’s dismissal of Twitter as a poor man’s email was petty and comical, but no more amusing that the blogosphere’s obsession with the wonders of “real-time search“. The dominant narrative in the echo chamber seems to be that Google is in danger of being usurped writing off this segment of the information seeking market.

I actually see some merit to this narrative: search engines in general, and Google in particular, could do a lot to improve their alerting tools. But I want to get something straight: from a technical perspective, real-time search, at least as Twitter implements it, is not that hard.

Let me try to explain this is terms that hopefully do not require technical background.

Twitter’s search interface offers a simple search box. If users do not use any operators, then the results for search are those tweets containing all of the words the user enters…  

Google CEO Eric Shmidt’s dismissal of Twitter as a poor man’s email was petty and comical, but no more amusing that the blogosphere’s obsession with the wonders of “real-time search“. The dominant narrative in the echo chamber seems to be that Google is in danger of being usurped writing off this segment of the information seeking market.

I actually see some merit to this narrative: search engines in general, and Google in particular, could do a lot to improve their alerting tools. But I want to get something straight: from a technical perspective, real-time search, at least as Twitter implements it, is not that hard.

Let me try to explain this is terms that hopefully do not require technical background.

Twitter’s search interface offers a simple search box. If users do not use any operators, then the results for search are those tweets containing all of the words the user enters. In logical terms, it is as if the terms were combined with a logical AND (e.g., information seeking). In fact, Twitter supports a few Boolean logic operators, so that it is possible to combine terms with OR (e.g., tunkelang OR dtunkelang) and the minus sign (-) for negation (e.g., dtunkelang -published). Twitter also supports quotes as a way of requiring two or more words to occur as a phrase (e.g., “noisy channel”). Finally, Twitter supports some other filtering on its advanced search page.

But what is important is that Twitter only supports one sort, reverse order by date. This vastly simplifies the requirements for Twitter’s inverted index. For those of you unfamiliar with and inverted index, it is much like an index at the back of a book (remember those relics? don’t forget to buy mine!) that associates each word with a list of the documents (in this case, tweets) in which it occurs.

Since Twitter users can only see search results sorted by date, the inverted index presumably maintains its lists in date order. Doing so makes it trivial to add new content, since all additions are at the end of the lists. Moreover, as the index grows, there’s a natural way to partition it into smaller chunks: time-slicing. The problem is, as computer scientists say, embarrassingly parallel.

I’m not trying to suggest that real-time search–or alerting, as it used to be called in ancient pre-Twitter times–isn’t valuable. But, if there is an entry barrier, it is surely not a technical one. Rather, it’s a human one: Twitter’s great achievement, much like Wikipedia’s, is one of human computation: its users supply the content that makes it valuable. Twitter may be much smaller than Facebook, but its single-minded focus on micro-blogging has made it incredibly efficient at what it does, the noise from follower-whores notwithstanding. Twitter’s strength comes from the loyalty of its users.

But this strength is also a vulnerability. As Twitter looks into ways to monetize the attention of its users, it has to be extraordinarily careful not to alienate them. Loyalty is a two-way street.

Link to original post