To Query or Not to Query: That Is the Question

November 2, 2015
144 Views

The growth of the digital economy has resulted in torrents of data. This problem will only continue because data is the language of technology. As companies continue to increase their reliance on technology, the data they create and their need to analyze it, will also increase.

The growth of data has given rise to a class of problems that we call, for lack of a better term, big data analytics. The common requirements for solving this class of problems, loosely, are:

The growth of the digital economy has resulted in torrents of data. This problem will only continue because data is the language of technology. As companies continue to increase their reliance on technology, the data they create and their need to analyze it, will also increase.

The growth of data has given rise to a class of problems that we call, for lack of a better term, big data analytics. The common requirements for solving this class of problems, loosely, are:

  • Tell me what’s in my data
  • What are some outcomes that I can track? (Machine failure, network slowdown, etc.)
  • What indicators are related to these outcomes?
  • How can I respond to these indicators and influence these outcomes?

The broad approach to these kinds of problems is search or query based analytics. The approach is rooted in traditional statistics, where a central tenant of the scientific method is hypothesis testing. If we do not know what’s in the data, we present a hypothesis and then use queries, or questions, to piece a solution together.

A result of this lineage is modern business intelligence, an ad hoc analysis designed to answer a single business question. The answer to this question is typically a statistical model, analytic report, or other type of data summary delivered on demand to the business user.

SAS, the reigning giant of statistical modeling software, defines big data analytics as “(T)he process of examining big data to uncover hidden patterns, unknown correlations and other useful information that can be used to make better decisions.”

But the number of possible queries in a data set is very large.

http://www.numberempire.com/combinatorialcalculator.php

Analysts and data scientists continue to discover new ways to store more data and make our queries run faster, but the additional complexity of more data very quickly outpaces our ability to create more and better queries.

Gartner stated at a recent conference,

“Data is inherently dumb. It doesn’t do anything unless you know how to use it, how to act on it, because algorithms is where the real value lies. Algorithms define action”

The No Query approach requires that the algorithm computes the queries and ranks them based on relevance (like Google’s page rank algorithm).