The Driving Force Behind Big Data: Data Connectivity
In most organizations, stakeholders maintain the perspective that Big Data offers tremendous benefits to the enterprise, especially when it comes to more agile business intelligence and analytics. Unfortunately, the days of complete visibility into Big Data are numbered – there is simply too much of it. While we may see companies promoting fancy strategies for managing ‘fire hose data’, only the ones focused on analytics will get close to creating meaning from the massive deluge. As a result, companies are looking to plug into new advancements in relational and non-relational programming frameworks that support the processing of large data sets. Data connectivity components, such as drivers, help enterprise organizations effectively satisfy the bulk data access requirements for a broad array of use cases.
While Big Data offers a real-value benefit in the form of enhanced business intelligence, it also presents significant challenges for IT organizations, particularly when it comes to the data connectivity and integration infrastructure. Technologies such as Hadoop and Map-R struggle to maintain access integration points, and to manage and process petabytes of data. And at the same time, they add the significant risk of making applications and current skill sets irrelevant.
Business expectations are quickly escalating as data velocity is accelerating, but in many circumstances, IT does not yet have the advanced data connectivity architecture in place to effectively import and export the growing volume of Big Data. Nor do it possess the functionality to integrate and transform a wide variety of Big Data formats. In short, companies lack the flexible, scalable data infrastructure needed to exploit Big Data for critical business insights that translate into a competitive advantage. Moreover, the inability to seamlessly assimilate the volume, variety and velocity associated with Big Data introduces significant risk. In a worst-case scenario, operational visibility muddies, compliance becomes haphazard, customer service levels diminish and revenues tumble.
Before organizations contend with negative impacts such as reduced visibility or loss of revenue opportunities, they must first consider the more finite use cases associated with Big Data. Primarily, these cases center on the often overlooked, but critical arena of database connectivity and the growing requirements regarding high-performance import and export of data bulk loads.
In a well-written, well-tuned application, more than 90 percent of data access time is spent in middleware. And data connectivity middleware plays a critical role in how the application client, network and database resources are utilized.
In any bulk load use case scenario, database connectivity is the cornerstone of performance. Over the years, technology vendors have made great strides in database optimization as well as the performance of processors and other hardware-based server components. As a result, the performance bottleneck moved to the database middleware – the software drivers that provide connectivity between applications and databases.
The most popular commercial databases all include data connectivity components – ODBC and JDBC drivers or ADO.NET data providers – at no additional charge. The open source community, too, offers data connectivity software. Attracted to the price tag, architects often use these free or open source components by default when connecting a particular database to various applications.
By choosing the ‘free’ options, architects are using drivers that have not been retooled for today’s business data volumes. And when working with Big Data and bulk loading, the use of such ‘free’, but performance-limiting, data connectivity components can actually cost organizations more than they anticipate.
In fact, within the context of bulk loading Big Data, if data connectivity middleware is not designed for maximum streamlined and efficient functionality, database driver performance is a critical risk factor within Big Data use case scenarios.
Multiple Driver Uses
Drivers help enterprise organizations effectively satisfy the bulk data access requirements for a broad array of use cases. In doing so, they simplify the data access architecture; save important resources for other tasks; and improve operational performance. Examples of their use include the following:
- Data Warehousing – Drivers provide a fast, high-performance way to load bulk data into an Oracle, DB2, Sybase, or SQL Server-based data warehouse while avoiding data latency issues.
- Data Migration – Drivers help extract and load data migration operations, moving bulk data from one database directly into the other by streaming, thus avoiding the need to load the data into memory.
- Data Replication – Drivers can be used to load needed data into relational database tables. This is a fast approach that provides the added benefit of storing the data as a relational database table easily accessed by reporting or BI applications.
- Disaster Recovery – This is all about making sure that when a failure occurs, the backup database you are working with is as close to the original set of data as possible. Drivers help ensure that any bulk data is quickly and easily replicated into disaster recovery databases.
- Cloud Data Publication – In cloud-based computing, efficient network usage is critical. As a result, performance is critically important when moving bulk data files or database tables into a cloud-based database. Industry standard drivers allow developers to quickly and easily build a simple program that publishes bulk data into the cloud.
Big Data is here to stay – there is no denying it. And as enterprise organizations attempt to reap the benefits of Big Data, they must come to grips with the inherent limitations of most of the existing data connectivity tools on the market today. Looking to drivers for connectivity is a good first step in deploying an advanced data architecture that enables the seamless and uninterrupted flow of Big Data throughout the enterprise.
Jesse Davis is the Director of research and development at Progress DataDirect, a division of Progress Software. He has a passion for developing highly motivated and productive engineering teams that love to research new technologies and apply their findings to build high quality products that exceed expectations.
Other Posts by Jesse Davis
The moderated business community for business intelligence, predictive analytics, and data professionals.