Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data mining to find the right poly bag makers
    Using Data Analytics to Choose the Best Poly Mailer Bags
    12 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
    car expense data analytics
    Data Analytics for Smarter Vehicle Expense Management
    10 Min Read
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Why Nobody Is Actually Analyzing Unstructured Data
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Why Nobody Is Actually Analyzing Unstructured Data
AnalyticsCollaborative Data

Why Nobody Is Actually Analyzing Unstructured Data

BillFranks
BillFranks
5 Min Read
SHARE

Unstructured data has been a very popular topic lately since so many big data sources are unstructured. However, an important nuance is often missed – the fact is that virtually no analytics directly analyze unstructured data. 

Unstructured data may be an input to an analytic process, but when it comes time to do any actual analysis, the unstructured data itself isn’t utilized. “How can that be?” you ask. Let me explain…

Unstructured data has been a very popular topic lately since so many big data sources are unstructured. However, an important nuance is often missed – the fact is that virtually no analytics directly analyze unstructured data. 

More Read

How to Balance the Five Analytic Dimensions
Get the Most Out of Your Oracle Application
First Look – FICO Insurance Fraud Manager
5 Ways to Improve Organizational Learning with Big Data Analytics
Google Uses Web Searches to Track Flu’s Spread

Unstructured data may be an input to an analytic process, but when it comes time to do any actual analysis, the unstructured data itself isn’t utilized. “How can that be?” you ask. Let me explain…

Let’s start with the example of fingerprint matching. If you watch shows like CSI, you see them match up fingerprints all the time. A fingerprint image is totally unstructured and also can be fairly large in size if the image is of high quality. So, when police on TV or in real life go to match fingerprints, do they match up actual images to find a match? No. What they do is first identify a set of important points on each print. Then, a map or polygon is created from those points. It is the map or polygon created from the prints that is actually matched.

More important is the fact that the map or polygon is fully structured and small in size, even though the original prints were not. While unstructured prints are an input to the process, the actual analysis to match them up doesn’t use the unstructured images, but rather structured information extracted from them.

 

 

An example everyone will appreciate is the analysis of text. Let’s consider the now popular approach of social media sentiment analysis. Are tweets, Facebook postings, and other social comments directly analyzed to determine their sentiment? Not really. The text is parsed into words or phrases. Then, those words and phrases are flagged as good or bad.

In a simple example, perhaps a “good” word gets a “1”, a “bad” word gets a “-1”, and a “neutral” word gets a “0”. The sentiment of the posting is determined by the sum of the individual word or phrase scores. Therefore, the sentiment score itself is created from fully structured numeric data that was derived from the initially unstructured source text. Any further analysis on trends or patterns in sentiment is based fully on the structured, numeric summaries of the text, not the text itself.

This same logic applies across the board. If you’re going to build a propensity model to predict customer behavior, you’re going to have to transform your unstructured data into structured, numeric extracts. That’s what the vast majority of analytic algorithms require. An argument can be made that extracting structured information from an unstructured source is a form of analysis itself. However, my point is simply that the final analysis, which is what started the process of acquiring the unstructured data to begin with, does not use the unstructured data. It uses the structured information that has been extracted from it. This is an important nuance.

One reason it is important is that it gets to the heart of how to handle unstructured big data sources in the long run. Clearly, some new tools can be useful to aid in the initial processing of unstructured data. However, once the information extraction step is complete, you’re left with a set of data that is fully structured and, typically, much smaller than what you had when you started. This makes the information much easier to incorporate into analytic processes and standard tools than most people think.

Through an appropriate information extraction process, a big data source can shrink to a much more manageable size and format. At that point, you can proceed with your analytics as usual. For this reason, the thought of using unstructured data really shouldn’t intimidate people as much as it often does.

Originally published by the International Institute for Analytics

TAGGED:unstructured data
Share This Article
Facebook Pinterest LinkedIn
Share
ByBillFranks
Follow:
Bill Franks is Chief Analytics Officer for The International Institute For Analytics (IIA). Franks is also the author of Taming The Big Data Tidal Wave and The Analytics Revolution. His work has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations. You can learn more at http://www.bill-franks.com.

Follow us on Facebook

Latest News

data mining to find the right poly bag makers
Using Data Analytics to Choose the Best Poly Mailer Bags
Analytics Big Data Exclusive
data science importance of flexibility
Why Flexibility Defines the Future of Data Science
Big Data Exclusive
payment methods
How Data Analytics Is Transforming eCommerce Payments
Business Intelligence
cybersecurity essentials
Cybersecurity Essentials For Customer-Facing Platforms
Exclusive Infographic IT Security

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Analytics: Not About Saving Time

7 Min Read

SAS and Socialytics

5 Min Read

Robotic folder: mastery in a domain

4 Min Read
Structured Data vs Unstructured Data
AnalyticsBig DataData ManagementData MiningHadoopMapReduceMarketingSocial DataStatisticsUnstructured DataWeb Analytics

A Quick Guide to Structured and Unstructured Data

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?