By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    Promising Benefits of Predictive Analytics in Asset Management
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: It’s data, Jim, but not as we know it – Part 1: What the echo of the Big Bang tells us about the nature of information
Share
Notification Show More
Latest News
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
ai for small business tax planning
Maximize Tax Deductions as a Business Owner with AI
Artificial Intelligence
ai in marketing with 3D rendering
Marketers Use AI to Take Advantage of 3D Rendering
Artificial Intelligence
How Big Data Is Transforming the Maritime Industry
How Big Data Is Transforming the Maritime Industry
Big Data
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > It’s data, Jim, but not as we know it – Part 1: What the echo of the Big Bang tells us about the nature of information
Data Mining

It’s data, Jim, but not as we know it – Part 1: What the echo of the Big Bang tells us about the nature of information

TeradataEMEA
Last updated: 2009/07/27 at 9:02 PM
TeradataEMEA
9 Min Read
SHARE

Possibly I am just turning into a grumpy old man in my middle-age, but there are two words that when used together annoy me beyond almost all reason – yes, even more than the “p-word” that has featured in two of my previous posts: “unstructured” and “data.”

Despite what some vendors – and some commentators, who really should know better – would have you believe, there is nothing remotely formless or “unstructured” about “new” types of data, like image files, audio files, text-based documents, XML documents and so on. Of course for the most part these data hardly qualify as “new,” either, but don’t indulge my pedantry by getting me started down that road.

Data is merely information that has been encoded in some way and the only truly “unstructured data” is “noise”; random signals, representative of nothing much more than a system in equilibrium with its environment. A picture, a song, the complete works of Shakespeare – these are all forms of information and they are emphatically not “unstructured.”

To see the truth of this, take, for example, a GIF file (make sure that it is one that you don’t much care about, or a copy of one that you do) and open it with a text …

More Read

analyzing big data for its quality and value

Use this Strategic Approach to Maximize Your Data’s Value

7 Data Lineage Tool Tips For Preventing Human Error in Data Processing
Preserving Data Quality is Critical for Leveraging Analytics with Amazon PPC
Quality Control Tips for Data Collection with Drone Surveying
3 Huge Reasons that Data Integrity is Absolutely Essential

Possibly I am just turning into a grumpy old man in my middle-age, but there are two words that when used together annoy me beyond almost all reason – yes, even more than the “p-word” that has featured in two of my previous posts: “unstructured” and “data.”

Despite what some vendors – and some commentators, who really should know better – would have you believe, there is nothing remotely formless or “unstructured” about “new” types of data, like image files, audio files, text-based documents, XML documents and so on. Of course for the most part these data hardly qualify as “new,” either, but don’t indulge my pedantry by getting me started down that road.

Data is merely information that has been encoded in some way and the only truly “unstructured data” is “noise”; random signals, representative of nothing much more than a system in equilibrium with its environment. A picture, a song, the complete works of Shakespeare – these are all forms of information and they are emphatically not “unstructured.”

To see the truth of this, take, for example, a GIF file (make sure that it is one that you don’t much care about, or a copy of one that you do) and open it with a text editor. Now mess with and/or delete some of the bytes at random, save the adulterated file and then try and open it with your normal picture editing or viewing software.

In fact a GIF file is highly structured and includes meta-data in the header that, for example, includes a colour table; the height and width of the pixels represented by the bitmap that follows; whether the image is animated or still; etc., etc. All this meta-data is then followed by an array of bytes that define the actual bitmap bits and an end-of-file marker. Monkey with this file structure and you risk reducing the value of the data that it contains to peanuts; monkey with the actual data payload and you likewise either corrupt the file so that it can’t be read or so that it represents a different or a degraded image. Repeat this experiment with just about any multimedia file type and you will get the same result – either a corrupt file that cannot be read correctly or one that is no longer an accurate representation of the original object. These data are not only structured; the nature of that structure is critical to their correct interpretation.

And of course it’s not just the “wrapper” that has structure; the structure of the data itself is critical. Most people would interpret the statement “Dave didn’t marry Sue because she was rich” as meaning that Dave and Sue were married, but that Dave’s motivation for their union was not financial. Conversely, the statement that “Dave didn’t marry Sue, because she was rich” would probably be interpreted as meaning that Dave and Sue did not marry and that is was the difference in their circumstances that got in the way. A single structural element – one comma – makes a big difference to our interpretation of the “same” data. Suppose that during their courtship Dave tells Sue “I love you”; the structure of this sentence is identical to the structure of the sentence “I want you” (subject-verb-object, I think, but if I am mistaken and there are any linguists out there reading this, please feel free to correct me), but the two statements may or may not be synonymous (although I hear that Dave is a good guy, so perhaps we should give him the benefit of the doubt).

In fact, even apparently random noise can convey meaning. Tune a radio telescope to the microwave range of the electromagnetic spectrum and you will hear a faint hum, directionally uniform to 1 part in 500. This is quite literally a distant reverberation of the “Big Bang” in which the Universe was created and which confirms that the Universe was indeed once hot-and-dense, as the Big Bang theory demands that it must have been. That’s important information, as historically there have been other theories of the origin of the Universe that don’t assume an explosive beginning.

From measurements of the cosmic microwave background radiation, as it is called, physicists and astronomers are able either to infer or to calculate directly many other essential truths about the Universe, including the speed at which our galaxy is moving (600 kilometres-per-second towards the constellation of Leo, in case this answer is one day all that stands between you and the “who wants to be a millionaire?” prize money). It turns out that there is an awful lot of important information encoded in that apparently random noise.

Back on Earth, less exotic, “new” types of data are increasingly interesting to the commercial and government organizations that most of us serve. We should probably call these “multimedia data”, “non-record based data” or “non-relational” data. Actually, I’m not crazy about “non-relational” either; whilst this data is typically not relational in the accepted sense – the ordering of the bytes that define the bitmap in a GIF file is important, for example – this data can, after all, be accommodated in tables in a relational database using BLOB and CLOB objects. So long as we regard these objects themselves as atomic, it seems to me these data are as relational as any other attribute of an entity. Things clearly get more complex if we want to examine or “query” the objects themselves (“select all of the pictures in which the sky is red”), but let’s not go there for now.

My recent travelling companion and the main attraction on the “CTO Road Show” that we took on tour across the EMEA region in June – Teradata CTO Stephen Brobst – refers to “non-traditional data types” versus “record-based” or “square” data. These are definitions that I can live with. And I’m sure that engineering PhD Stephen will sleep easier for knowing that the flunky from marketing considers his use of technical vocabulary to be correct and not in the least aggravating!

 

Martin Willcox

TAGGED: data quality, unstructured data
TeradataEMEA July 27, 2009
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
ai for small business tax planning
Maximize Tax Deductions as a Business Owner with AI
Artificial Intelligence
ai in marketing with 3D rendering
Marketers Use AI to Take Advantage of 3D Rendering
Artificial Intelligence

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

analyzing big data for its quality and value
Big Data

Use this Strategic Approach to Maximize Your Data’s Value

6 Min Read
data lineage tool
Big Data

7 Data Lineage Tool Tips For Preventing Human Error in Data Processing

6 Min Read
data quality and role of analytics
Data Quality

Preserving Data Quality is Critical for Leveraging Analytics with Amazon PPC

8 Min Read
data collection with drone use
Data Collection

Quality Control Tips for Data Collection with Drone Surveying

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?