Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Why a Mere 300 Exabytes Will Give Us a Headache [VIDEO]
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Data Management > Best Practices > Why a Mere 300 Exabytes Will Give Us a Headache [VIDEO]
Best PracticesBig DataData ManagementPolicy and GovernanceSoftwareUnstructured Data

Why a Mere 300 Exabytes Will Give Us a Headache [VIDEO]

Datafloq
Datafloq
9 Min Read
SHARE
Although 90% of the available data in the world was created in the last two years, it does mean that there is still a lot of ‘old data’. In 2010 and 2011 we created in total 3 Zettabytes of data. If we use a very simplified calculation, it would mean that the amount of ‘old data’ is still approximately 0.3 Zettabyte or 300 Exabytes.
Although 90% of the available data in the world was created in the last two years, it does mean that there is still a lot of ‘old data’. In 2010 and 2011 we created in total 3 Zettabytes of data. If we use a very simplified calculation, it would mean that the amount of ‘old data’ is still approximately 0.3 Zettabyte or 300 Exabytes. If we compare that to the 2.5 Exabyte of data that we currently create every day, it looks like it is nothing to worry about. Unfortunately, that is wrong. Those 300 Exabytes of data will give us headaches, sleepless nights and it will cost a lot of energy and money.
 
Why? Because a large percentage of those 300 Exabytes reside in legacy systems that are incompatible with modern technology. We cannot switch off those systems and we cannot simply import the data in modern Hadoop platforms. Especially banks and insurance companies have many legacy systems, some of them having been in place for decades. Due to the many mergers and acquisitions in the finance world, banks sometimes have dozens of separate legacy systems. As Karl Flinders writes in his article, one bank even had 40 different legacy systems. These aging cobbled-together legacy systems can often be found in payment and credit card systems, ATMs and branch or channel solutions. The fact that these legacy systems cause companies headaches is illustrated by the Deutsche Bank, whose big data plans are held back due to the legacy systems.

Not only banks have to deal with legacy systems. Also the car industry has to deal with them. At Ford Motor Company, they have data centres that are running on software that is 30 or 40 years old. But also the pharming industry, travel industry or the public sector have to deal with legacy systems. Replacing these legacy systems is almost impossible. Flinders refers to it as “changing the engines on a Boeing 747 while in flight”.

However, how hard it may seem, it is not impossible, as was shown by the Commonwealth Bank of Australia. In the past 5 years they have replaced the entire bank’s core system, moved most of the services into the cloud and developed many apps and innovations that brought the bank at the forefront of innovation.

Legacy systems consist of traditional relational database management systems often on old and slow machines that cannot handle too much data at once. Hence, most of these legacy systems process their data at night and it can take some time to query the data needed. Real-time processing and analysing of data in legacy systems is impossible. We have to look for solutions to continue to use that old data.

One of the solutions how to deal with legacy systems and big data is to replace the entire legacy system of a company. A part from the massive risks involved in such an operation, there are also a lot of costs involved so it is not very likely that many organisations will take up this strategy.

As such, it is important to find ways to have new innovative technologies that allow real-time analysis of various data sets to co-exist with the legacy systems. These systems from the terabyte era still contain valuable (historical) information. There are several ways to keep and use the historical data in the data warehouses:

  1. Macro-batching of the data into the new big data solutions on a periodic timescale, for example every night. This data can then be used together with the ‘new’ data.
  2. Sending periodic summaries of the data in the data warehouse in order to use the data in those warehouses while preventing continues querying of that data. Only when certain information is required, the data warehouse is queried for that data and the data is retrieved.

These solutions will enable analysing both unstructured and structured legacy data within a single integrated architectural framework. Such a platform allows the legacy data to remain within the existing data warehouses and at the same time enable near real-time analyses.

Using middleware to enhance systems and replace the hardware that supports them is however not ideal. Another problem for the legacy systems is that with the overall acceptance of big data, a larger percentage of the IT budget will go to these big data projects. Leaving less money for the legacy systems. While in turn, the employees being able to work with the legacy systems become scare and thus expensive.

If such a trend continues for a too long time, there is a danger that the legacy system will fail one day, placing the company into a lot of trouble. The later organisations start with replacing these legacy systems, or at least try to make them compatible with big data technologies, the more expensive and difficult it will be.

A less risky but still expensive solution could also be to develop a specific algorithm that can transfer millions of lines of old data into modern distributed file systems. Until all data works correctly in the new distributed file systems, they can both co-exist. A paper by among others Mariano Ceccato, explains how they developed an algorithm to interfere a structured data model from a legacy data system in an attempt to restructure that legacy system into an up-to-data and usable data model.

Real transformative insights can only come when all data is used, including the data from legacy systems with incompatible data formats. Therefore, eventually the data in legacy systems will need to be transferred to massively scalable storage system and thus replacing the critical search, calculation and reporting functions of those legacy systems.

In the end, the ambition for any organisation with legacy systems should be to truly retire these systems, as companies will not be able to support them forever.  If, in the mean time, they simultaneously integrate that legacy data in one platform to produce data aggregation, they can already reap the benefits from historical data to create truly valuable insights.

Finally, I came across below video from EMC, which gives a great explanation of how to deal with legacy systems:

Copyright Big Data Startups 2013. You may share using our article tools. Please don’t cut articles from BigData-Startups.com and redistribute by email or post to the web.
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

artificial intelligence big data
Artificial IntelligenceBig DataPredictive Analytics

Artificial Intelligence in Healthcare: Major Opportunities and Challenges

4 Min Read

Planview Improves Long-Range Planning Potential

14 Min Read
Device Attacks, Network Scanning Compromise Healthcare Data
Big DataSecurity

Device Attacks, Network Scanning Compromise Healthcare Data

5 Min Read
big data will change academia
Big Data

In the Age of Big Data, Will Academia Ever Be the Same?

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?