Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Tough Analytics? Watson to the Rescue
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Quality > Tough Analytics? Watson to the Rescue
AnalyticsBusiness IntelligenceData QualityUnstructured Data

Tough Analytics? Watson to the Rescue

Barry Devlin
Barry Devlin
9 Min Read
SHARE

“Quickly Watson, get your service revolver!”  Is Watson about to put business intelligence out of its misery?  Is the good doctor about to surpass Sherlock Holmes in his ability to solve life’s enduring mysteries?  Or are we in jeopardy of falling into another artificial intelligence rabbit hole?

“Quickly Watson, get your service revolver!”  Is Watson about to put business intelligence out of its misery?  Is the good doctor about to surpass Sherlock Holmes in his ability to solve life’s enduring mysteries?  Or are we in jeopardy of falling into another artificial intelligence rabbit hole?

Yes, I know.  Although I haven’t found a reference to prove it, I’m pretty sure that IBM Watson, the computer that recently won “Jeopardy!” is named after one of the founding fathers of IBM–Thomas J. Watson Sr. or Jr.–rather than Sherlock Holmes’ sidekick.  But, the questions above remain highly relevant.

IBM Watson is, of course, an interesting beast.  The emphasis in the popular press has been on the physical technology specs–10 refrigerator-sized cabinets containing approximately 3,000 CPU cores, 15 TB of RAM and 500 GB of disk running at about 80 teraflops, and cooled by two industrial air-conditioning units.  But, in comparison to some of today’s “big data” implementations, IBM Watson is pretty insignificant.  eBay, for example, is running up to 20 petabytes of storage.  As of 2010, Facebook’s Hadoop cluster was running on 2300 servers with over 150,000 cores and 64 TB of memory between them.  The world’s current (Chinese) supercomputer champion is running at 2.5 petaflops.

More Read

SAS: Great Revenues in a Bad Economy
Data, Data and More Data [Infographic]
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Big Data in Insurance – Is the Industry Ready to Take Advantage?
Information Governance: What are the Best Techniques for Putting a Value on Information?

On the other hand, a perhaps more telling comparison is to the size and energy consumption of the human brain that Watson beat, but certainly did not outclass, in the quiz show!

However, what’s really more interesting from a business intelligence viewpoint is the information stored, the architecture employed and the effort expended in optimizing the processing and population of the information store.  

We know from the type of knowledge needed in Jeopardy! and, indeed, from the possible future applications of the technology discussed by IBM that the raw information input to the system was largely unstructured, or soft information, as I prefer to call it.  During the game, Watson was disconnected from the Internet, so its entire knowledge base was only 500 GB in size.  This suggests the use of some very effective artificial intelligence and learning techniques to condense a much larger natural language information base to a much more compact and usable structure prior to the game.  Over a period of more than four years, IBM researchers developed DeepQA, a massively parallel, probabilistic, evidence-based architecture that enables Watson to extract and structure meaning from standard textbooks, encyclopedias and other documents.  When we recall that the natural language used in such documents contains implicit meaning, is highly contextual, and often ambiguous or imprecise, we can begin to appreciate the scale of the achievement.  A wide variety of AI techniques, such as temporal reasoning, statistical paraphrasing, and geospatial reasoning, were used extensively in this process.

Dr. David Ferrucci, leader of the research project, states that no database of questions and answers was used nor was a formal model of the world created in the project.  However, he does say that structured data and knowledge bases were used as background knowledge for the required natural language processing.  It makes sense to me that such knowledge, previously gathered from human experts, would be needed to contextualize and disambiguate the much larger natural language sources as Watson pre-processed them.  Watson’s success in the game suggests to me that IBM have succeeded in using existing human expertise, probably gathered in previous AI tools, to seed a much larger automated knowledge mining process.  If so, we are on the cusp of an enormous leap in our ability to reliably extract meaning and context from soft information and to use it in ways long envisaged by proponents of artificial intelligence.

What this means for traditional business intelligence is a moot point.  Our focus and experience is directed mainly towards structured, or hard, data.  By definition, such data has already been processed to remove or minimize ambiguity in context or content by creating and maintaining a separate metadata store, as I’ve described elsewhere.  

However, there is no doubt that the major growth area for business intelligence over the coming years is soft information, which, according to IDC is growing at over 60% compound annual growth rate, about three times as fast as hard information, and which already accounts for over 95% of the information stored in enterprises.  It is in this area, I believe, that Watson will make an enormous impact as the technology, already based on the open-source Apache UIMA (Unstructured Information Management Architecture), moves from research to full-fledged production.  There already exists a significant pent-up demand to gain business advantage by mining and analyzing such information.  Progress in releasing the value tied up in soft information has been slowed by a lack of appropriate technology.  That is something that Watson and its successors will certainly change.

While I have focused so far on the knowledge/information aspects of Watson–that being probably the most relevant aspect for BI experts, there is one other key feature of the technology that should be emphasized.  That is Watson’s ability to parse and understand the sort of questions posed in everyday English with all their implicit assumptions and inherent context.  Despite appearances to the contrary in the game show, Watson was not responding to the spoken questions from the quiz master; the computer had no audio input, so the exact same questions were passed to it as text as were heard by the human contestants.  In fact, speech recognition technology has also advanced significantly to the stage where very high levels of accuracy can be achieved.  (As an aside, I use this technology myself extensively and successfully for all my writing…)  The opportunities that this affords in simplifying business users’ communication with computers are immense.

It seems likely that over the next few years this combination of technologies will empower business users to ask the sort of questions that they’ve always dreamed of, and perhaps haven’t even dreamed of yet.  They will gain access, albeit indirectly, to a store of information far in excess of what any human mind can hope to amass a lifetime.  And they will receive answers based directly on the sum total of all that information, seeded by the expertise of renowned authorities in their respective fields and analyzed by highly structured and logic-based methods.

Of course, there is the danger that if a given answer happens to be incorrect, it is difficult to see how the business user would discover that error or be able to figure out why it had been generated.

And that, as Sherlock Holmes never said is far from “Elementary, my dear Watson!”

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Trying out glmnet: a case study in open-source development

4 Min Read

The Global Nature of Big Data and Analytics

5 Min Read

Getting to Enterprise Application 2.0

8 Min Read

How Data Analytics and BI Pros Used Twitter in August

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?