Welcome to the Big Data Economy
So much of our American identity is tied up in the iconic image: the ability to move freely,
to experience a wider array of goods than when we were regionally locked by limited infrastructure, and the opportunity to pursue wildly different dreams than what were possible before these paths to prosperity were laid. For example, there’s a 76.6% chance that you or a loved one uses a car as your primary source of transportation to and from work. Considering that nearly 130 million people were employed in 2010, that’s a lot of cars buzzing around every day. Just a century ago this number was impossible. Cars were expensive, roads were spotty, and the morning commutes were much shorter than those seen in modern American cities today.
The 20th century witnessed the shift from a rail-centric transportation system to one dominated by the automotive centric industry—one that opened up the path for people and goods to travel wider, more distant paths on extensive public roads projects.
The advent of the Internet in the latter half of the 20th Century as an accessible, user-friendly apparatus created a new highway: the Digital Highway. Whereas the Interstate Highway System exists as an enabling infrastructure for moving people and physical goods from one point to another along varying stretches of road, the Digital Highway has allowed for the proliferation of unlimited amounts of data—data that travels with far fewer physical limits than what it takes for your car to get from point A to B. Since highways were built, American consumerism has been fed like a wild beast that drove its size to proportions previously unseen, but there are always barriers and limits to what can be created and consumed. There are only so many resources in the world, so many items that can be dreamed up, so many people who are able to obtain and use them, and only so much room to store these items while not in use or once they have become waste.
When it comes to the Data Economy, there are no such limits.
The Internet and all the amazing, mundane, useful, and incredibly useless things we do with it has created an invisible world where data can grow at rates virtually unconfined by the laws of physics. Sure, there’s only so much storage for all that data. As storage options grow cheaper and sensors and other data collectors create increasingly large amounts of data, there’s no stopping the snowball. The potential for data to grow
is presumably limitless, and that prospect is not always easy to grasp. The passage of time and history has proven that humans fear of what we don’t know—what we can’t see, what we can’t wrap our minds around, and what can’t be sensed.
The best news is that we don’t need to fear the data-driven Digital Highway.
Power of Information
If you have ever had to drive through the massive Dallas-Fort Worth metropolitan area, you’ve probably been on one of the five stacked bridge decks of the massive High Five Interchange. It’s a tangled mass of controlled chaos. This major infrastructure achievement replaced a clover-leaf design left over from the Eisenhower period and was finished in 2005 because there was simply too much demand from many streams of traffic for the previous system to hold.
The current world of data is not unlike the High Five Interchange. It is a massive undertaking necessitated by demand. Since humanity began to record the world around us and strive to make sense of it, we’ve used words and numbers to collect our memory and track numbers pertinent to our lives. This road has been, for the most part, a fairly linear one. Until Gutenberg introduced the first modern printing press in 1450, knowledge had to be written or carved into stone by hand and preserved from the elements. The book allowed collected knowledge both to be protected from the elements and disseminated with less error than the previous expensive, manually intensive copying manuscripts.
In centuries since, literacy rates had to catch up to the slow and steady growth of recorded knowledge that came with easier means of disseminating it.
History does not see literacy rates catch up to affordable access to information until the early 20th Century—the same century that witnessed a massive growth of available goods, services, and transformative new technologies like the Internet. In a world that constantly demands more needs and wants, including disparate types of consumable information, the Internet has become a multi-layered deck of services that are constantly being used to create more and more data. While the highest point of the High Five interchange might be scary to approach, mounting this artificial hill is necessary for a lot of traffic to travel efficiently where they need to go. As data’s size grows we will have to place more trust in the infrastructures that we have built for ourselves to be efficient, well-engineered pieces of an oiled machine.
There’s no denying that the world has seen enormous change in the last 100 years. The years since cars and planes were first invented have been momentous and the growth of data that has come with the introduction of a consumer-based Internet has caused what a lot of people and businesses picture to be a daunting mountain of insurmountable information.
The current world of data is forcing businesses, governments, and other organizations to reassess how they are functioning and how they can use the data they have collected so rapidly. As with most shifts in thought and function, there ends up being more good than harm done, even if the world spins in transition.
The essential rule to remember about data is this: don’t panic.
The Challenge of Understanding Big Data
The challenge in our relationship with data comes into play because it’s an idea that lives in the realm of the imagination, and we tend not to trust things that we cannot easily see and understand. With the exception of the computer screen you use to access specific information that you have queried or created, the Internet and its infrastructure make up an invisible landscape of data. The veins of the Internet’s body are fiber cables buried underground. They run along power lines that most of us forget exist. These cables run from homes to large buildings full of servers that are like brains that store and manage the flow of massive amounts of data every second. While that infrastructure is very much a thing of the physical world, you cannot walk up to these buildings and see the actual data housed there.
It’s invisible. This is where the challenge of understanding massive data sets comes from.
Just how much data does there have to be to be considered “massive?” Studies suggest that nearly all of the data that has been created in human history has materialized in the last two years, and the speed at which it will continue to grow is staggering. The amount of information swirling around in those unseen wires is measured in units that dazzle the layperson. While most regular computer users think of hundreds of gigabytes as being a lot of storage space, the average supervisor of large data sets encounters measurements like petabytes and zettabytes: 1,048,576 or 1,099,511,627,776 gigabytes, respectively. With big, scary sounding words and numbers like petabyte and zettabyte, the knee-jerk way to talk about and understand this type of data is to conflate it with hype and fear.
Using Data to Create Value
With any new infrastructure or system come doubt, fear, confusion, excitement, anxiety, frustrations and awe. The thing to accept is that this magnitude of data will continue to be mostly invisible and we will be required to press our ability to think in the abstract. This presents a new opportunity to realize that while the magnitude is huge, most of the data that exists in this invisible landscape is worthless on its own. Once you understand that something is both invisible and mostly worthless without some addition, it’s easy to move past the shock and awe of it and come to grips with how to use the data to create value.
The best way to move past the misunderstanding of this invisible force is to picture what it is made up of and where it’s coming from. Since the invention of the Internet, a number of different data streams have developed that are augmented by businesses’ data and the data coming from mobile devices.
If the Internet is an information superhighway, there are multiple lanes that represent different types of data.
- Website data: the collection of information from and about websites that includes the information on the website, web traffic data, rankings, where the website is linked in, and so on.
- Social Media data: this data set encompasses all posts, messages, and emails, how they’re interconnected and shared, the geolocational data around them, and how often they’re shared.
- Mobile data: comes from your device’s interactions—usage times, data use, where the device has been, and any other actions that you have performed on your device.
- Machine data: generated by computers and other sensors.
Different Types of Data
There are also many other types of data that are not mentioned above, primarily because they are not nearly as big an instigator in the recent boom of data’s invisible cloud or have not yet been digitized. A macro-level list of these types of data includes, but certainly is not limited to, the following:
- Governmental data: this information encompasses the various branches of governmental entities and the recorded information about the places they are governing and their constituency.
- Business data: businesses have collected data about purchases, inventory, and other transaction information in varying forms for quite some time. This data can include sales lead information, pertinent customer data, inventory information, and purchasing patterns. This is an incredibly varied set of data.
- Scientific data: one of the largest creators of new data, physicists, geographers, ecologists, biologists, and chemists are generating profound amounts of data as science moves towards the more ambitious experiments and investigations that modern technology allow for.
- Historical data: the annals of human history largely reside in books or other forms of record, a number of which are either currently digitized or will become so as electronic archives grow in importance and funding.
All of these varying types of data populate the invisible landscape of the data superhighway, whether they are currently digitized or not. The amount of digitized data will only grow as these varying forms of data make their way into this ether. As these varying types of data are considered, something becomes very evident about them: whereas most of human existence was spent dealing with one type of data, numerical, we have moved into a brave new realm of data creation that exists around so much more than strings of numbers.
- Structured data: the image most of us recall when we think of data is a spreadsheet populated with numbers that reflect certain information. This is one type of structured data. Structured data is data that is stored methodologically and in an orderly fashion.
- Unstructured data: one of the most relevant examples of unstructured data is Twitter. The stream of tweets really has no rhyme or reason to it, other than the fact that it is a stream of words. Unstructured data is composed of photos, text, documents, videos, and emails, to name a few types.
- Numerical: this type of data is exactly what it sounds like—data in numbers. This type of data is what we have analyzed for thousands of years with mathematics.
- Non-numeric: yet another part of an oppositional pair with a name that does a lot of the explaining itself, non-numeric data is any other data that is not a number—words, pictures, images, etc.
Consider these lists in the context of your life or the business that you do every day. The data in your day might consist of the geolocational information your cell phone gives out on the way to work, the tweets of articles sent out that are relevant to your industry (or not), the financial data created by the bills paid, and could end with the information your Netflix account collected on the shows you watched and rated.
This ballooning world of invisible data probably means many things to many people, but how do we create some kind of value from it all? The answer lies in collaboration.
This is the first chapter of a new eBook that details the 4 ways the future of data is cleaner, leaner, and smarter than its storied past. Download the entire eBook, Big Data Economy, for free here.
Emcien CEO Radhika Subramanian is a seasoned entrepreneur with decades of experience helping organizations utilize the insight buried within their data. Numerous associations recognize her as an innovator in analytics, with a proven track record with global giants such as Porsche, John Deere, NCR, Dell and more. Tweet to @RadhikaAtEmcien & join her Big Data Apps LinkedIn group.