Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Heal the Heartbreak of Data Sprawl with a Data Catalog
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Data Management > Best Practices > Heal the Heartbreak of Data Sprawl with a Data Catalog
Best PracticesBig DataBusiness IntelligenceData ManagementExclusiveIT

Heal the Heartbreak of Data Sprawl with a Data Catalog

AndrewAhn
AndrewAhn
5 Min Read
Data Sprawl with a Data Catalog
Shutterstock Licensed Photo - By hanss
SHARE

Your security analytics team wants a copy of your production database so they can look for fraudulent accounts. Your accounts payable department wants an extract it can analyze to improve supply chain efficiency. Your sales manager wants all your customer records so he can merge them with his Salesforce.com data. And your database administrator is using both snapshots and two full backups just to be sure all the data is safe.

Contents
Data Sprawl Happens when Data is Needlessly DuplicatedData Sprawl Leads to Organizations Falling Out-of-SyncData Catalogs Plus Strong Data Governance Policies are the Solution

Data Sprawl Happens when Data is Needlessly Duplicated

What you’ve got is a typical data sprawl problem in the making. That’s what happens when organizations – for whatever reasons – create multiple copies of production data. There’s always a good reason for each copy to be created, but collectively they become a mess.

Data sprawl is becoming a real problem as business users increasingly want to analyze data themselves, within the context of big data. International Data Corp. has estimated that up to 60% of total storage capacity is now dedicated to accommodating copy data, and that the total cost of copy data storage will top $50 billion next year. Yet it estimates that fewer than 20% of organizations have copy management standards. Gartner analyst Dave Russell says many companies keep between 30 and 40 copies of business data.

Data Sprawl Leads to Organizations Falling Out-of-Sync

In addition to the obvious toll that data sprawl takes on infrastructure and performance, data integrity becomes a real problem. For example, a salesperson making an update to a customer record in the CRM system risks being out of sync with the same record in the customer database. A database administrator who restores the wrong backup may overwrite production data with old information.

More Read

The Road of Collaboration
7 Ways Big Data Is Pushing CBD Marketing Into The 21st Century
Music App Predicting the 2014 Top Artists with Big Data
Great Ways To Use Data To Enhance Efficiency
Former Google Employee Asks, ‘Google±?’

Numerous companies are developing costly technology-based solutions to the copy sprawl problem, but for many customer organizations, the simplest and most cost-effective approach is good data governance grounded in a data catalog.

An enterprise data catalog maintains a single directory of all the data the company owns. This can include not only production data, but also backups, extracts, and summaries. Production data can be “fingerprinted” with a unique signature so that out-of-date copies never inadvertently make their way into mission-critical applications. Similarly, copies and extracts can be tagged according to their intended use. A catalog can even improve data integrity by ensuring that data marked with certain meta tags is never overwritten.

Data Catalogs Plus Strong Data Governance Policies are the Solution

Use of a data catalog should be combined with good governance practices. For example, employees need to know what data is okay for analytical use and what shouldn’t be touched; which are copies or new relevant data.   Database administrators need clear parameters on how to restore backed up data sets. One way to make data governance both effective and enjoyable is to encourage business users to join in the process by tagging their own data through a crowdsourced data quality program.

Using a data catalog eases the infrastructure penalty of data sprawl by reducing the incidence of orphaned data. It can also reduce the burden on database administrators while actually increasing responsiveness to business user requests. For example, the sales manager who needs customer records can use a catalog to find a satisfactory database that already exists in another department and avoid joining a backlog of IT job tickets.

Businesses shouldn’t suffer because of too much internal demand for data. The solution isn’t to deny requests with an agility-killing gatekeeping process, but to better understanding what data you have so that it will be more useful.  The curation and governance that a proper catalog can provide — that’s cure for data sprawl and the path to a data driven company.

TAGGED:big databusiness intelligence
Share This Article
Facebook Pinterest LinkedIn
Share
ByAndrewAhn
Follow:
Andrew Ahn is Vice President of Product Management for Waterline Data. He is an Apache Atlas committer and was the lead at Hortonworks for Hadoop governance strategy. Prior work includes product and governance responsibilities at ICE/NYSE Euronext, spanning 12 countries and 23 market centers.

Follow us on Facebook

Latest News

data analytics and truck accident claims
How Data Analytics Reduces Truck Accidents and Speeds Up Claims
Analytics Big Data Exclusive
predictive analytics for interior designers
Interior Designers Boost Profits with Predictive Analytics
Analytics Exclusive Predictive Analytics
big data and cybercrime
Stopping Lateral Movement in a Data-Heavy, Edge-First World
Big Data Exclusive
AI and data mining
What the Rise of AI Web Scrapers Means for Data Teams
Artificial Intelligence Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

When Crisis Hits, Technology is Dumped, and Lizard Brains Take Over

4 Min Read
log management with big data
Big DataExclusive

Big Data Leads To Massive Time Saving Digital Resources

6 Min Read
big data in healthcare technology
Big DataExclusive

The Powerful Role of Big Data In The Healthcare Industry

8 Min Read

Big Data Analytics: The Four Pillars

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?