We are experiencing an information revolution that provides companies with unprecedented opportunities to capitalize on the mountains of data being generated today. This data is being generated from a variety of sources unlike anything we have previously seen. Because of the business opportunities that can emerge from harnessing data sources like mobile devices and social media, business analysts are demanding fresher data faster. Larger datasets combined with new business demands are creating challenges for IT departments across all major industries around the globe.
As IT professionals work to optimize IT infrastructures to manage ‘Big Data,’ they have placed a targeted focus on business intelligence and data analytics strategies to try and close the gap between them and the business user. Although data integration and more specifically ETL (extract, transform, and load) processing – is at the center of this information revolution, its significance as a critical component to the ‘Big Data’ engine is often overlooked. One of the primary reasons is that data integration still remains largely isolated from business users, with minimal communication and collaboration.
Another problem in data integration is platforms that do not scale. Scalability is increasingly important for meeting today’s demands for fresher, near real-time information. As a result, processing and integrating data at the speed the business requires becomes exponentially more expensive while greater complexity is added into companies’ existing environments as new data sources must be acquired.
In fact, a recent research report by the analyst firm Enterprise Strategy Group found data integration complexity was the number one data analytics challenge, cited by more than 270 survey respondents. Perhaps even more alarming, BeyeNETWORK surveyed more than 350 IT and business professionals earlier this year and found that 68 percent of them believe that data integration tools are impeding their organization’s ability to achieve strategic business objectives. Why does this disconnect exist and how can IT professionals get their data integration strategies back on track?
A good starting point is to more closely examine the ETL process, the core of data integration. ETL was originally conceived as a means to extract data from multiple sources, transform it to make it consumable (commonly by sorting, joining and aggregating the data), and ultimately load and store it within a data warehouse. However, as demands on IT became greater and legacy ETL tools were not scaling to meet evolving business requirements; organizations started performing transformations outside of the ETL environment by moving them into the database. This is a practice commonly referred to as ELT (extract, load, transform). Hand-coded solutions and other workarounds have also been used. To illustrate just how prevalent this has become, the BeyeNETWORK survey found that only 31 percent of respondents cite data transformations as taking place in their ETL tool today.
While ELT and hand coding provided organizations with a much needed fix and some temporary relief, these approaches are typically unable to scale in a cost-effective manner. They also can create significant challenges around ongoing maintenance and governance. As a result, IT departments within several companies have initiated projects called “ETL 2.0” to achieve a long-term, sustainable solution to their data integration challenges.
At its essence, ETL 2.0 is about enabling a high performance, highly efficient ETL approach that empowers the business to have greater control over data integration rather than be limited by the technology. ETL 2.0 allows IT organizations to bring the “T” out of the database and back into the ETL tool, reducing the cost and complexity of staging data. It should not only accelerate existing data integration environments where organizations have already made significant investments, but also enhance emerging ‘Big Data’ frameworks such as Hadoop. ETL 2.0 promises to help organizations improve control of their data integration efforts to reduce complexity and total cost of ownership.
Across all industries, a seismic shift is happening and businesses are under greater pressure than ever before to make faster, better business decisions. Fortunately, more data sources and a movement towards collaborative business environments are providing new opportunities for organizations to grow and thrive in today’s marketplace. ETL 2.0 is about redefining the way organizations perform data integration and allowing them to leverage ‘Big Data’ for competitive advantage. When aligned with strategic business objectives, data integration can help businesses innovate and create new revenue streams.