The Case for Data Hoarding
Do we really need all of this data?
It’s a question that I’ve heard more than a few times in my consulting career, especially as organizations have moved from legacy systems to more contemporary equivalents. Most of these improvements mean that organizations can consolidate data sources and, at least in theory, store all of their data in one place.
Now, generally speaking, hoarding is not a healthy thing–and I don’t need reality shows to tell me as much. With respect to data management, though, is it really detrimental to an organization?
Amber Simonsen seems to think so. Simonsen is the PMP Executive Project Manager of Pierce Transit and talks about the perils of data hoards. ”What begins as an innocent desire to keep relevant information close at hand can turn into an unhealthy obsession that plagues IT departments and Records Managers in organizations everywhere,” Simonsen warns.
It’s a fair point, but think for a minute about data storage costs. To say that they’ve dropped in cost over the last three decades is the acme of understatement. Consider the following chart:
For more on the methodology used to derive these numbers, click here.
So, the cost of data hoarding has dropped exponentially, even as the amount of data available has also risen exponentially. If you do the math, you’ll discover that organizations can store much more information these days than even five years ago for significantly lower costs. As a result, it’s hard to buy the “it’s too expensive to store this data” argument.
Of course, just because an organization can store a great deal of data doesn’t mean that it should. Data storage is a continuum, not a binary. What’s more, CIOs can actively decide not to store certain types of information for all sorts of reasons. Perhaps the most important consideration for an organization is what it uses to access and analyze new forms of data. If you’re trying to cram Big Data into Small Data solutions, then data gathering and storage (never mind) hoarding is going to pose a significant problem. As I write in my new book, you can’t write SQL statements against petabytes of unstructured data and expect to see meaningful results. New tools are needed to make sense of Big Data.
Data hoarding only makes sense if two conditions are met:
- Your organization has deployed the right tools–i.e., Hadoop, NoSQL and columnar databases, etc.
- Your organization actually does something with that data.
If that’s the case, hoard away.
What say you?
Method for an Integrated Knowledge Environment (MIKE2.0) is an open source delivery framework for Enterprise Information Management. The MIKE2.0 Methodology has been built to support our belief that information really is one of the most crucial assets of a business. We believe meaningful, cost-effective Business and Technology processes can only be achieved with a successful approach for ...