Wolfram/Alpha and the future of search

September 27, 2010
174 Views

The New York Times article on the arm of my chair is about the plight of poor workers in South Africa. With a few key words, Google could help me find the article. But then it would be up to me to process the information. In the third paragraph, it tells of a woman who earns $36 a week, $21 less than the minimum wage. If the article were “computable,” I could ask it about minimum wage in South Africa, and a search engine, or whatever you want to call it, would answer: $57.

The New York Times article on the arm of my chair is about the plight of poor workers in South Africa. With a few key words, Google could help me find the article. But then it would be up to me to process the information. In the third paragraph, it tells of a woman who earns $36 a week, $21 less than the minimum wage. If the article were “computable,” I could ask it about minimum wage in South Africa, and a search engine, or whatever you want to call it, would answer: $57.

Stephen Wolfram, the physicist, author, entrepreneur and founder of the Wolfram/Alpha computational knowledge engine, was speaking at MIT last week about computational knowledge. In the past, computers could process only information in structured data bases. But the overwhelming majority of data we produce today is unstructured, most of it words. (Fix: Multimedia, too, of course, but here I’m focusing on words) Traditional search engines help us find documents in that mountain of words. But they do very little to distill those words into knowledge, or to answer our questions.

The challenge in the coming years, Wolfram said, was to make more of these files and documents computable. That would enable systems like Wolfram/Alpha to digest them, and to use them to produce answers and analysis. He compared the transition ahead to one we’ve already been through. A couple of decades ago, most people used computers to create paper documents. It was such an improvement over typewriters.  But then we began to see the value in digital files. They could be emailed, forwarded, posted on the Web, cut-and-pasted (in the digital sense). And they could be searched. Documents on paper, by comparison, seemed marooned.

The next transition, according to Wolfram, will be to make written information computable. If a document isn’t formatted so that computers can read, summarize and extract information from it, it will seem like a dead end, he predicted. His team at Wolfram/Alpha is busy importing and curating large sets of data. From my experience on their “knowledge engine,” it appears that much of the data comes from the realm of facts and figures–population numbers, stock market performance, birthdays, etc. But the way Wolfram sees it, more of us will produce information in a style (or on templates) that will make it computable, and machines like his will eventually be able to answer all sorts of questions. In a sense, an early stage of this pre-processing is already happening: An entire industry is formatting Web pages to make them more searchable.

Still, the idea of knowledge organizing itself for machines, it seems to me, is a limited approach to to the problem. It’s akin to building game preserves. How can you be sure that your structured world reflects the truth in the wilds beyond the fences? The untamed world outside of Wolfram’s mathematical domain is the big and chaotic realm of language. There, Wolfram/Alpha appears handicapped. If you type even a moderately complex question into Wolfram/Alpha, such as “What is the largest university within 100 miles of Portland, Or?” it’s stumped. The system appears to have primitive language capabilities. No surprise then that Wolfram wants to world to make its information computable.

This leads me to wonder which approach is more likely to master knowledge. Will it be one that requires that knowledge be simplified and structured so that machines can digest it? Or will it be a linguistically-savvy system that can digest virtually anything? I’ll bet on the linguistic omnivores, including Google and IBM. The problem they face–mastering language–is a bear. Language is frightfully complex. But they’re making progress. And their approach requires less work from the public. That’s usually a winning formula.