Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Why Nobody Is Actually Analyzing Unstructured Data
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Why Nobody Is Actually Analyzing Unstructured Data
AnalyticsCollaborative Data

Why Nobody Is Actually Analyzing Unstructured Data

BillFranks
BillFranks
5 Min Read
SHARE

Unstructured data has been a very popular topic lately since so many big data sources are unstructured. However, an important nuance is often missed – the fact is that virtually no analytics directly analyze unstructured data. 

Unstructured data may be an input to an analytic process, but when it comes time to do any actual analysis, the unstructured data itself isn’t utilized. “How can that be?” you ask. Let me explain…

Unstructured data has been a very popular topic lately since so many big data sources are unstructured. However, an important nuance is often missed – the fact is that virtually no analytics directly analyze unstructured data. 

More Read

Understanding and Analyzing the Hidden Structures of a Unstructured Data Set
Analytics and Its Effect on Data Dissemination and eCommerce
Discovering Analytics – A Revelation or Slow Investigation?
2009: Products I Can’t Live Without
Report from the 2012 Hadoop Summit

Unstructured data may be an input to an analytic process, but when it comes time to do any actual analysis, the unstructured data itself isn’t utilized. “How can that be?” you ask. Let me explain…

Let’s start with the example of fingerprint matching. If you watch shows like CSI, you see them match up fingerprints all the time. A fingerprint image is totally unstructured and also can be fairly large in size if the image is of high quality. So, when police on TV or in real life go to match fingerprints, do they match up actual images to find a match? No. What they do is first identify a set of important points on each print. Then, a map or polygon is created from those points. It is the map or polygon created from the prints that is actually matched.

More important is the fact that the map or polygon is fully structured and small in size, even though the original prints were not. While unstructured prints are an input to the process, the actual analysis to match them up doesn’t use the unstructured images, but rather structured information extracted from them.

 

 

An example everyone will appreciate is the analysis of text. Let’s consider the now popular approach of social media sentiment analysis. Are tweets, Facebook postings, and other social comments directly analyzed to determine their sentiment? Not really. The text is parsed into words or phrases. Then, those words and phrases are flagged as good or bad.

In a simple example, perhaps a “good” word gets a “1”, a “bad” word gets a “-1”, and a “neutral” word gets a “0”. The sentiment of the posting is determined by the sum of the individual word or phrase scores. Therefore, the sentiment score itself is created from fully structured numeric data that was derived from the initially unstructured source text. Any further analysis on trends or patterns in sentiment is based fully on the structured, numeric summaries of the text, not the text itself.

This same logic applies across the board. If you’re going to build a propensity model to predict customer behavior, you’re going to have to transform your unstructured data into structured, numeric extracts. That’s what the vast majority of analytic algorithms require. An argument can be made that extracting structured information from an unstructured source is a form of analysis itself. However, my point is simply that the final analysis, which is what started the process of acquiring the unstructured data to begin with, does not use the unstructured data. It uses the structured information that has been extracted from it. This is an important nuance.

One reason it is important is that it gets to the heart of how to handle unstructured big data sources in the long run. Clearly, some new tools can be useful to aid in the initial processing of unstructured data. However, once the information extraction step is complete, you’re left with a set of data that is fully structured and, typically, much smaller than what you had when you started. This makes the information much easier to incorporate into analytic processes and standard tools than most people think.

Through an appropriate information extraction process, a big data source can shrink to a much more manageable size and format. At that point, you can proceed with your analytics as usual. For this reason, the thought of using unstructured data really shouldn’t intimidate people as much as it often does.

Originally published by the International Institute for Analytics

TAGGED:unstructured data
Share This Article
Facebook Pinterest LinkedIn
Share
ByBillFranks
Follow:
Bill Franks is Chief Analytics Officer for The International Institute For Analytics (IIA). Franks is also the author of Taming The Big Data Tidal Wave and The Analytics Revolution. His work has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.

Follow us on Facebook

Latest News

Hidden AI, a risk?
Hidden AI, Real Risk: A Governance Roadmap For Mid-Market Organizations
Artificial Intelligence Exclusive Infographic
unusual trading activity
Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
Analytics Exclusive Infographic
Ai agents
AI Agent Trends Shaping Data-Driven Businesses
Artificial Intelligence Exclusive Infographic
Why Businesses Are Using Data to Rethink Office Operations
Why Businesses Are Using Data to Rethink Office Operations
Big Data Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Unlocking the Potential of ‘Big Data’ in the Market Research industry

3 Min Read

How HP finds Sweet Spot customers in its data trove

5 Min Read

BI and unstructured data continued…

5 Min Read

O Knowledge Graph, Where Art Thou?

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?