Big Data’s Big Flip-Flop

March 18, 2014
163 Views

Analytics Matters with Bill Franks

Analytics Matters with Bill Franks

It wasn’t too long ago that many people espoused the decline, if not death, of the SQL language and relational database technology in general. As a level set, remember that relational technology stores data into rows and columns and that the way to access relational data is through Structured Query Language (SQL). For a couple of years, there was a full frontal assault on relational approaches from the Hadoop and non-relational crowds. The overhead of placing data into pre-defined rows and columns was deemed too great, compared to storing data within a non-relational environment.

In non-relational environments, users are free to use a wide range of programming languages to analyze data in any format. The data is typically simply stored in files with no assumed format or relationships. This approach does have its merits, but it also has its limitations.

In case you hadn’t noticed, a huge flip-flop has occurred. Many of the same people and organizations that were recently dismissing the entire concept of relational environments and SQL are now racing to … wait for it … add SQL-style interfaces on top of non-relational platforms like Hadoop! Let’s first take a look at how the flip-flop came about and then discuss why it is a good thing.

One big and mistaken assumption in the case against relational technologies is that relational technologies are not flexible and can’t handle unexpected questions or poorly formatted data. Therefore, a non-relational platform is required to be nimble. It is important to distinguish between an inherent shortcoming of a relational system and a shortcoming in how that system is implemented. That distinction is critical to understanding the flip-flop.

It is true that many organizations, particularly the large ones, not only had a large number of relational systems in place, but also locked the systems down very tightly. It was in fact difficult for users to ask new questions or to gain access to enough computing resources. However, this was due to the policies laid on top of relational technology as opposed to the technology itself. It is entirely possible to load and query data in a relational environment that isn’t in 3rd normal form, that hasn’t been formally modeled, and that isn’t yet clean. I spent years doing this.

The concept of an analytic sandbox or discovery environment centers on freeing users from traditional IT-imposed access limits and allowing them to explore and experiment with data in a relational environment. Granted, not all types of data can be handled in a relational system, but most common business data sources can be.

Like any solution, relational approaches are very good for many problems and are not as good for others. The same can be said about non-relational environments. Analytic professionals like me have always used a mix of environments because it isn’t about one approach being better or worse, but about which fits a given problem best. To me, SQL is the new kid on the block because when I started out, SQL did not exist! Over time, I added SQL processing into the mix where it made sense. It ended up making sense a huge proportion of the time, but not all of the time.

Recently, some organizations have tried to do too much with non-relational platforms. In many cases, this has led to inefficient processes that take more time to create, manage, and process than standard SQL approaches. Luckily, most of those who were looking to put up SQL’s tombstone have come around to their error.

It is terrific for the industry that the flip-flop around relational technologies has occurred. Having a mix of capabilities is a good thing and it isn’t a zero-sum game where only one approach can win. Facebook realized that trying to implement SQL-style processing outside of an environment built for it was wasting time and money to reinvent something that already existed and worked just fine. As a result, Facebook added a large relational environment into its mix because certain types of processing just work better that way.

I’ll be participating in a virtual event March 27 called Data Discovery In Action. Feel free to register here at no cost. The focus of the event will be on how to combine various processing paradigms and analytic techniques to maximize the ability of your organization to discover and deploy new high impact analytics. There will be discussion of both relational and non-relational approaches, which is how it should be!

Many of us who have spent years developing advanced analytic processes were surprised to see relational technologies and SQL getting beat up so badly. It never made sense to kill SQL and I’ll forgive those who were misguided in their attempts to do so. After all, it can’t help but sting a little to have to pull an about face and execute a flip-flop like politicians are known to do. But, sometimes executing a flip-flop is the right thing to do.