Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: It’s data, Jim, but not as we know it – Part 1: What the echo of the Big Bang tells us about the nature of information
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > It’s data, Jim, but not as we know it – Part 1: What the echo of the Big Bang tells us about the nature of information
Data Mining

It’s data, Jim, but not as we know it – Part 1: What the echo of the Big Bang tells us about the nature of information

TeradataEMEA
TeradataEMEA
9 Min Read
SHARE

Possibly I am just turning into a grumpy old man in my middle-age, but there are two words that when used together annoy me beyond almost all reason – yes, even more than the “p-word” that has featured in two of my previous posts: “unstructured” and “data.”

Despite what some vendors – and some commentators, who really should know better – would have you believe, there is nothing remotely formless or “unstructured” about “new” types of data, like image files, audio files, text-based documents, XML documents and so on. Of course for the most part these data hardly qualify as “new,” either, but don’t indulge my pedantry by getting me started down that road.

Data is merely information that has been encoded in some way and the only truly “unstructured data” is “noise”; random signals, representative of nothing much more than a system in equilibrium with its environment. A picture, a song, the complete works of Shakespeare – these are all forms of information and they are emphatically not “unstructured.”

To see the truth of this, take, for example, a GIF file (make sure that it is one that you don’t much care about, or a copy of one that you do) and open it with a text …

More Read

Podcast Available
Market Penetration of Social Media – Who Uses Twitter?
20 Top Twitter Monitoring and Analytics Tools
Social Media Profile Management
The CODA is the outcome of the only Brown-RISD joint studio, Out…

Possibly I am just turning into a grumpy old man in my middle-age, but there are two words that when used together annoy me beyond almost all reason – yes, even more than the “p-word” that has featured in two of my previous posts: “unstructured” and “data.”

Despite what some vendors – and some commentators, who really should know better – would have you believe, there is nothing remotely formless or “unstructured” about “new” types of data, like image files, audio files, text-based documents, XML documents and so on. Of course for the most part these data hardly qualify as “new,” either, but don’t indulge my pedantry by getting me started down that road.

Data is merely information that has been encoded in some way and the only truly “unstructured data” is “noise”; random signals, representative of nothing much more than a system in equilibrium with its environment. A picture, a song, the complete works of Shakespeare – these are all forms of information and they are emphatically not “unstructured.”

To see the truth of this, take, for example, a GIF file (make sure that it is one that you don’t much care about, or a copy of one that you do) and open it with a text editor. Now mess with and/or delete some of the bytes at random, save the adulterated file and then try and open it with your normal picture editing or viewing software.

In fact a GIF file is highly structured and includes meta-data in the header that, for example, includes a colour table; the height and width of the pixels represented by the bitmap that follows; whether the image is animated or still; etc., etc. All this meta-data is then followed by an array of bytes that define the actual bitmap bits and an end-of-file marker. Monkey with this file structure and you risk reducing the value of the data that it contains to peanuts; monkey with the actual data payload and you likewise either corrupt the file so that it can’t be read or so that it represents a different or a degraded image. Repeat this experiment with just about any multimedia file type and you will get the same result – either a corrupt file that cannot be read correctly or one that is no longer an accurate representation of the original object. These data are not only structured; the nature of that structure is critical to their correct interpretation.

And of course it’s not just the “wrapper” that has structure; the structure of the data itself is critical. Most people would interpret the statement “Dave didn’t marry Sue because she was rich” as meaning that Dave and Sue were married, but that Dave’s motivation for their union was not financial. Conversely, the statement that “Dave didn’t marry Sue, because she was rich” would probably be interpreted as meaning that Dave and Sue did not marry and that is was the difference in their circumstances that got in the way. A single structural element – one comma – makes a big difference to our interpretation of the “same” data. Suppose that during their courtship Dave tells Sue “I love you”; the structure of this sentence is identical to the structure of the sentence “I want you” (subject-verb-object, I think, but if I am mistaken and there are any linguists out there reading this, please feel free to correct me), but the two statements may or may not be synonymous (although I hear that Dave is a good guy, so perhaps we should give him the benefit of the doubt).

In fact, even apparently random noise can convey meaning. Tune a radio telescope to the microwave range of the electromagnetic spectrum and you will hear a faint hum, directionally uniform to 1 part in 500. This is quite literally a distant reverberation of the “Big Bang” in which the Universe was created and which confirms that the Universe was indeed once hot-and-dense, as the Big Bang theory demands that it must have been. That’s important information, as historically there have been other theories of the origin of the Universe that don’t assume an explosive beginning.

From measurements of the cosmic microwave background radiation, as it is called, physicists and astronomers are able either to infer or to calculate directly many other essential truths about the Universe, including the speed at which our galaxy is moving (600 kilometres-per-second towards the constellation of Leo, in case this answer is one day all that stands between you and the “who wants to be a millionaire?” prize money). It turns out that there is an awful lot of important information encoded in that apparently random noise.

Back on Earth, less exotic, “new” types of data are increasingly interesting to the commercial and government organizations that most of us serve. We should probably call these “multimedia data”, “non-record based data” or “non-relational” data. Actually, I’m not crazy about “non-relational” either; whilst this data is typically not relational in the accepted sense – the ordering of the bytes that define the bitmap in a GIF file is important, for example – this data can, after all, be accommodated in tables in a relational database using BLOB and CLOB objects. So long as we regard these objects themselves as atomic, it seems to me these data are as relational as any other attribute of an entity. Things clearly get more complex if we want to examine or “query” the objects themselves (“select all of the pictures in which the sky is red”), but let’s not go there for now.

My recent travelling companion and the main attraction on the “CTO Road Show” that we took on tour across the EMEA region in June – Teradata CTO Stephen Brobst – refers to “non-traditional data types” versus “record-based” or “square” data. These are definitions that I can live with. And I’m sure that engineering PhD Stephen will sleep easier for knowing that the flunky from marketing considers his use of technical vocabulary to be correct and not in the least aggravating!

 

Martin Willcox

TAGGED:data qualityunstructured data
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

sales and data analytics
How Data Analytics Improves Lead Management and Sales Results
Analytics Big Data Exclusive
ai in marketing
How AI and Smart Platforms Improve Email Marketing
Artificial Intelligence Exclusive Marketing
AI Document Verification for Legal Firms: Importance & Top Tools
AI Document Verification for Legal Firms: Importance & Top Tools
Artificial Intelligence Exclusive
AI supply chain
AI Tools Are Strengthening Global Supply Chains
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

data quality and quantity in artificial intelligence
Artificial IntelligenceBig DataData QualityExclusiveMachine Learning

What To Know About The Impact of Data Quality and Quantity In AI

8 Min Read

Entry Point: Change is a Constant

5 Min Read

Are You Afraid Of Your Data Quality Solution?

4 Min Read

The Importance of Scope In Data Quality Efforts

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?