A Quick Guide to Structured and Unstructured Data
Big data has opened doors never before considered by many businesses. The idea of utilizing unstructured data for analysis has in the past been far too expensive for most companies to consider. Thanks to technologies such as Hadoop, unstructured data analysis is becoming more common in the business world.
Business owners may be wondering if the use of unstructured data could give them valuable insights as well. Answering this question starts with understanding the difference between structured and unstructured data.
First, I would like to refer to an illustration that provides a quick snapshot of structured versus unstructured data.
Photo Credit: The Executive’s Guide to Big Data & Apache Hadoop written by Robert D. Schneider; Page 9
I would like to add even further context to the illustration by adding the definition of unstructured data:
“Unstructured data refers to information that either does not have a pre-defined data model and/or is not organized in a predefined manner.”
In fine, unstructured data is not useful when fit into a schema/table. I’ll use email as an example. There are certain values from an email that can be fit into a table. Sender, recipient, email body, etc. Although you can have a column for the email body, the information stored in that column would be useless when analyzed in such a way. What questions could analysts ask of all data entries in the “email body” column? Could they be answered? The answer is no.
When looking at the illustration it's obvious that social media plays a heavy role in unstructured data. According the PewResearch, 73% of online adults use a social networking site. One of the ways many businesses are utilizing this data is to gather brand sentiment.
In addition to social media there are many other common forms of unstructured data:
- Word Doc’s, PDF’s and Other Text Files - Books, letters, other written documents, audio and video transcripts
- Audio Files - Customer service recordings, voicemails, 911 phone calls
- Presentations - PowerPoints, SlideShares
- Videos - Police dash cam, personal video, YouTube uploads
- Images - Pictures, illustrations, memes
- Messaging - Instant messages, text messages
- In all these instances, the data can provide compelling insights. Using the right tools, unstructured data can add a depth to data analysis that couldn’t be achieved otherwise.
I would like to use customer service audio and transcripts as an example. Structured data that's gathered in a customer service scenario could include the following:
- Number of customer inquiries
- Category of complaint
- How quickly was a the problem resolved
- Customer service rating via consumer feedback
All this data is helpful, but it's missing enhancement from its unstructured data counterpart. By looking at customer service audio in tandem with structured data insights, a company might discover the following:
The Genesis of the Problem - What is causing a problem in the technical or billing department? Is the customer confused because they weren’t guided effectively? Is there an issue across certain regions, age groups or technical abilities?
Better Consumer Feedback - Instead of a star rating, businesses can see why they got that rating in the first place. Was the consumer frustrated with the communication ability of the rep? Does the involvement of a supervisor lead to a better experience? What is the general tone of the dialogue between reps and customers?
Insight into Speed to Problem Resolution - What kinds of problems are taking extensive timeframes to resolve? Are the customer service reps trained adequately to handle common problems? Is there a logical system to get the customer to the right person as fast as possible to resolve their problem?
All these insights connect with a structured data counterpart. The unstructured data enhances a business’ ability to derive greater insight from the data sets.
Unstructured data is a valuable piece to the data pie of any business. Tools that are widely accessible today can help businesses use this data to its greatest potential.
Contrasting to unstructured data, structured data is data that can be easily organized. Regardless of its simplicity, most experts in today’s data industry estimate that structured data accounts for only 20% of the data available. It is clean, analytical and usually stored in databases.
Today, big data tools and apps have allowed for the exploration of structured data that was once too expensive to gather and store. Some examples of structured data:
Sensory Data - GPS data, manufacturing sensors, medical devices
Point-of-Sale Data - Credit card information, location of sale, product information
Call Detail Records - Time of call, caller and recipient information
Web Server Logs - Page requests, other server activity
Input Data - Any data inputted into a computer: age, zip code, gender, etc.
Although it's outnumbered by its unstructured brother, structured data has always and will always play a critical role in data analytics. It functions as a backbone to critical business insights. Without structured data, it is difficult to know where to find insights hiding in your unstructured data sets.
Structured and unstructured data are very different. Regardless of their differences, they work in tandem in any effective big data operation. Companies wishing to make the most of their data should use tools that utilize the benefits of both.