Death Of The Relational Database

July 4, 2009
71 Views

A recent entry by Tony Bain in his excellent ‘Innovations in Data Management‘ blog caught me a little by surprise. In it he talks about the NoSQL movement – the group of people and organisations that say we can do without the RDBMS’s from the likes of Oracle, Microsoft and IBM. This is new to me.

In a nutshell, the argument runs as follows: Massively scalable databases exist that are not relational and they power some of the biggest sites on the internet – Amazon, Google and Facebook to name three.

For those who don’t know about NoSQL, take a look at ACM blogger Michael Stonebraker or a nice summary from Computerworld. You can’t find anything about NoSQL on wikipedia yet – that’s how new it is. Stonebraker constructs a convincing argument in favour of ‘the death of the RDBMS’:
  • For data warehouses, a column store beats a row store by approximately a factor of 50 on typical business intelligence queries. The reason is because column stores read only the columns of interest to the query and not all of them. In addition, compression is more effective in a column store. Since the legacy systems are all row stores, they are vulnerable to competition from the newer column stores.
  • For online

A recent entry by Tony Bain in his excellent ‘Innovations in Data Management‘ blog caught me a little by surprise. In it he talks about the NoSQL movement – the group of people and organisations that say we can do without the RDBMS’s from the likes of Oracle, Microsoft and IBM. This is new to me.

In a nutshell, the argument runs as follows: Massively scalable databases exist that are not relational and they power some of the biggest sites on the internet – Amazon, Google and Facebook to name three.

For those who don’t know about NoSQL, take a look at ACM blogger Michael Stonebraker or a nice summary from Computerworld. You can’t find anything about NoSQL on wikipedia yet – that’s how new it is. Stonebraker constructs a convincing argument in favour of ‘the death of the RDBMS’:
  • For data warehouses, a column store beats a row store by approximately a factor of 50 on typical business intelligence queries. The reason is because column stores read only the columns of interest to the query and not all of them. In addition, compression is more effective in a column store. Since the legacy systems are all row stores, they are vulnerable to competition from the newer column stores.
  • For online transaction processing (OLTP), a lightweight main memory DBMS beats a row store by a factor of 50. Leveraging main memory and the fact that no DBMS application will send a message to a human user in the middle of a transaction, allows an OLTP DBMS to run transactions to completion with no resource contention or locking overhead. 
  • In XML, where the current major vendors have spent a great deal of energy extending their engines, it is claimed that specialized engines, such as Mark Logic or Tamino, run circles around the major vendors. 

The argument then runs that if you don’t want performance like this, then just get an open source RDBMS for free. I am beginning to agree (even the Cortex is MySQL) and if free is to hard for you, the price of SQLServer is good value for most organisations these days.

So what are the reasons for spending big dollars on an RDBMS? Here are the reasons I can think of for a large company:
  • Nobody every got sacked for buying Oracle or DB2.
  • The IT specialists have built their career and expertise on a specific vendor’s product line. What’s in it for them to support a change that they see as undermining that?
  • Who’s going to hire someone with Voldemort or MongoDB experience?? 
  • For most applications, the RDBMS can do the job – so what if the company spends $500,000 more on hardware to do it. Don’t forget, the hardware guys are also comfortable with running the big RDBMS on ‘their’ boxes.
  • A surprising number of data warehouse developer’s lack the skills to really understand the differences pointed out by the NoSQL people. Besides – they’re not the one’s paying for the infrastructure they use. 
I am happy to concede that none of these reasons are technical. Politics is very real in larger enterprises and you don’t stay long if you ignore this fact. In the past I have been lucky to hire some very good developers because of their frustration with life in a large corporate data shop.

Anyway, the NoSQL revolutionaries got together recently and you can read/view/listen to the presentations on Johan Oskarsson’s site (he is a developer for Last.fm in London). The presentation on the Cassandra database by Avinash Lakshman of Facebook has some interesting stats comparing Cassandra and MySQL for example.

It is curious that the NoSQL people don’t call their solutions databases. Instead they are a a “highly available key-value store” (Amazon) and a “distributed storage system for managing structured data” (Google). At least MongoDB does describe itself as “a high-performance, open source, schema-free document-oriented database.” Not exactly as snappy a label as RDBMS and I wonder why CDBMS (Columnar RDBMS) isn’t good enough.

Check out NoSQL – it could be useful to you.

Link to original post