Big Intelligence: BI Meets Big Data, with Apache Drill

September 18, 2015
560 Views

ImageWelcome to a whole new world of data exploration—a world where SQL specialists are now first class citizens and no longer have to wait for weeks/months before they can access new datasets; a world where IT does not have to be a bottleneck in preparing and maintaining schemas for the BI user; a world where data scientists are free to follow the information trail wherever it may lead them.

ImageWelcome to a whole new world of data exploration—a world where SQL specialists are now first class citizens and no longer have to wait for weeks/months before they can access new datasets; a world where IT does not have to be a bottleneck in preparing and maintaining schemas for the BI user; a world where data scientists are free to follow the information trail wherever it may lead them.

The old way of doing analytics is set in stone. Datasets are predefined and schemas are fixed. When an analysis uncovers something interesting, digging deeper means starting from scratch, that too only after IT can prepare an updated set of models and schemas.

All of this limits a business analyst’s or a data scientist’s ability to discover those odd little causative linkages that can turn into actionable insights or the weird anomalies, which may lead to effectively targeting a niche group. The bottom line: when data analysis is restricted to predefined boundaries, we don’t really increase our intelligence at a pace that would set us apart from the competition.

Drill Makes Big Data Live up to Its Greatest Potential

Apache Drill sets business analysts and data scientists free. The open source community has greatly refined the original features of Google’s Dremel, with enhanced capabilities including the extensibility of its architecture, overall agility, support for ANSI SQL, optional schema handling, and the ability to efficiently handle modern data structures and nested data (such as JSON and Parquet).

Apache Drill enables data analysts to explore the data without having to ask IT counterparts to define schemas or create new ETL processes. As analysts delve into the data, Apache Drill’s engine discovers the source schemas and automatically adjusts query plans. Querying self-describing data and being able to process complex data types as you go, provides an entirely new way of wringing every possible useful bit and byte of business intelligence from big data. Data sources such as Hadoop, HBase, and MongoDB can be queried using ANSI SQL semantics to glean new insights at the speed of thought.

Actionable insight comes from seeing the correlations across multiple, apparently unrelated data sources, including blog posts, sensors, clickstreams, customer interaction records, videos, transaction data, competitive analysis, and much more. Apache Drill makes it easier for the end users to bring SQL queries to such data sets much more rapidly than any other SQL engine on Hadoop. The less time it takes to derive value from big data, the greater the potential for a business to reach more customers and address their needs.

Often a good mix of historical, near-time, and real-time information provides the best insights, enabling companies to—among other activities—predict customer needs, pretarget consumers, set budget spends, spot supply chain issues before they become problems, and manage risk far more effectively. While Hadoop provides the right platform to store large amounts of data in one place, Apache Drill provides the right set of SQL capabilities to mine that data easily and rapidly using familiar BI tools as the front end.

Drilling Straight to the Top

Big data is rapidly becoming mission-critical across all industries and government agencies. And, of course, the more we rely on big data, the more we require dependability in the solutions that we use to explore it. No one wants to devote time to learning how to use a tool only to have it languish, uncared for, in perpetual development limbo.

Apache Drill has solidified its status as a key technology for SQL analytics on big data. The Apache Software Foundation announced in December 2014 that it has promoted Drill to a top-level project at Apache, where it joins other illustrious projects such as Apache Hadoop and httpd (the world’s most popular web server). Drill’s promotion to a top-level project is an acknowledgement of the strength of its community of users and developers.

Apache Drill enables business analysts and data scientists to discover what their big data is trying to tell them. You can see how this works right now, with a developer Sandbox with Apache Drill.