By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: The Problem with the Relational Database (Part 2) – Predictability
Share
Notification Show More
Latest News
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Predictive Analytics > The Problem with the Relational Database (Part 2) – Predictability
Predictive Analytics

The Problem with the Relational Database (Part 2) – Predictability

TonyBain
Last updated: 2009/05/25 at 12:53 AM
TonyBain
9 Min Read
SHARE
  • Introduction
  • The Problem with the Relational Database (Part 1 ) –The Deployment Model

I met with a friend of mine in New York recently who runs a credit card processing end-point company. They specifically built their business around a non-relational database platform and feel they would have major issues had they chosen to build their business around a traditional relational database.

Contents
Cost-Based OptimizationExample:

In this business they process hundreds of transactions a second, this means that hundreds of people every second are standing somewhere in the world at a checkout making a purchase. And if the transaction is delayed, a lot of people are left standing and waiting. Worse still, if the transaction is delayed too long, the transaction fails and the customer not only has to retry their transaction, they are likely embarrassed and probably somewhat annoyed.

The relational database certainly has the capability to scale to many hundreds of transactions a second. But the reasons for their move away from the relational database aren’t specifically scale related but instead come down to predictability and consistency…

More Read

predictive analytics helps Albanian bitcoin investors

Albanian Bitcoin Investors Tap the Power of Predictive Analytics

Predictive Analytics Improves Trading Decisions as Euro Rebounds
Can Predictive Analytics Help Traders Navigate Bitcoin’s Volatility?
Perks of Predictive Analytics for Businesses Big and Small
How can CIOs Build Business Value with Business Analytics?

  • Introduction
  • The Problem with the Relational Database (Part 1 ) –The Deployment Model

I met with a friend of mine in New York recently who runs a credit card processing end-point company. They specifically built their business around a non-relational database platform and feel they would have major issues had they chosen to build their business around a traditional relational database.

In this business they process hundreds of transactions a second, this means that hundreds of people every second are standing somewhere in the world at a checkout making a purchase. And if the transaction is delayed, a lot of people are left standing and waiting. Worse still, if the transaction is delayed too long, the transaction fails and the customer not only has to retry their transaction, they are likely embarrassed and probably somewhat annoyed.

The relational database certainly has the capability to scale to many hundreds of transactions a second. But the reasons for their move away from the relational database aren’t specifically scale related but instead come down to predictability and consistency.

Relational databases are not predictable or reliable in terms of consistent performance for a number of reasons.

Firstly, every query uses a different amount of resources. A query could consume 1 or 2 I/O’s or 1 or 2 million I/O’s depending on how the query is written, what data is selected and factors such as how the database is indexed. Performance is further varied by how the database is maintained (fragmentation). What makes matters more complex is that different predicate values for a query can have vastly given data distributions. The same query executed with different constants can have vastly different resource requirements.

Because every query has a different “footprint,” running a query in isolation does not provide indicative statistics on how that query will perform under concurrent load. In fact is become impossible to predict the exact execution duration of a relational database query as its performance will be dependent on what else is being executed at the exact moment it is.

Cost-Based Optimization

Because queries can have complex execution steps behind them with multiple paths to achieving the result, one neat thing that was added to the relational database was cost-based optimization. This takes the control of “how” a query is resolved away from the developer and instead the relational database server works this out at runtime using a complex series of algorithms. This was hugely important for simplifying the use of relational databases. But it is bad for predictability.

Cost-based optimization can get it wrong. It is much less common now than ten years ago, but every DBA has still come across cases when the optimizer chooses the wrong “query plan” and the query execution takes a long time for seemingly no reason. But worse for predictability is that cost-based optimization is dynamic and looks at a lot of factors at run time to determine which plan is best. These factors can change meaning that query performance can change (often without any external schema/code modifications). This dynamic ability can cause headaches to DBAs who are often tasked with resolving performance issues when suddenly the database is “going slow”.

Example:

Ok let’s take a real world look at this problem. I have gathered some stats from a real, production (SQL Server) relational database. This database is around 500GB in size and processes ~700 transactions per second 24 x 7.  This data is just a couple of minutes of captured activity.

The first graph I am showing here is the # of queries based on resource usage. To simplify things resource usage in my example = CPU TIME + # READS + # WRITES. 

Because queries in relational databases can have vastly different resource usage profiles, we get a very wide distribution of resource usage by query (note the logarithmic scale).
Queriesbyresourceusage  

We can see that most of the queries executing on this database use <50 “resources” (CPU + DISK) but at the same time a smaller number of queries can use anything up to 22750 x this. That is a massive distribution. 

Problems with predictability can start to occur when a number of high resource queries randomly end up running concurrently. This restricts resources available to other queries and therefore the execution times of other queries execute at that time are affected.

We can see this by looking at the execution duration of ONLY the queries that have used 50 resources (CPU+DISK) or less. They have a similar resource usage profile, so in theory they should all have a similar execution duration, right?
Querydurationlessthan50resources

Unfortunately no. This graph is showing the execution times for only the queries that used 50 resources or less. While we can see that most of these queries executed in 100ms or less, many of these queries took a lot longer than this. In fact there was a wide distribution up to 5297 x this base execution time (529 seconds). All these queries used roughly the same amount of server resources yet their execution times were massively different.

The reasons for this could be many. Firstly, it could be resource contention.  Many of the queries taking longer could be doing so because they are being executed at the same time as a number of high resource queries. Or alternatively, it could be because of resource concurrency issues. These queries may be delayed because they are waiting for blocking queries to complete. Or it could be because of internal database management issues (indexes updating), or it could be due to concurrent disk issues (log backups, check-pointing) and so on and so on.

These are in fact all issues that DBAs deal with and investigate on a daily basis. The best you can do is to try and optimize as many of the reoccurring queries as possible. However, in doing this it still isn’t possible to accurately predict performance at run time, or ensure consistency of execution time at run time. This is bad news when you are trying to use the relational database as a platform for applications where consistency of performance is paramount.

Reblog this post [with Zemanta]


Link to original postInnovations in information management

TonyBain May 25, 2009
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

predictive analytics helps Albanian bitcoin investors
Blockchain

Albanian Bitcoin Investors Tap the Power of Predictive Analytics

9 Min Read
benefits of data analytics for financial management
Predictive Analytics

Predictive Analytics Improves Trading Decisions as Euro Rebounds

10 Min Read
predictive analytics can help bitcoin traders predict future price movements
Blockchain

Can Predictive Analytics Help Traders Navigate Bitcoin’s Volatility?

8 Min Read
predictive analytics
Predictive Analytics

Perks of Predictive Analytics for Businesses Big and Small

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?