By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: The problem with the RDBMS (Part 3) – Let’s Get Real
Share
Notification Show More
Latest News
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > The problem with the RDBMS (Part 3) – Let’s Get Real
Business IntelligenceData Mining

The problem with the RDBMS (Part 3) – Let’s Get Real

TonyBain
Last updated: 2009/06/15 at 1:39 AM
TonyBain
11 Min Read
SHARE
- Advertisement -

The Passage of TimeImage by ToniVC via Flickr

Contents
The Problem with the RDBMSReal time & EfficiencySo What to Do?
  • Introduction
  • The Problem with the Relational Database (Part 1 ) –The Deployment Model
  • The Problem with the Relational Database (Part 2) – Predictability
  • The two primary trends in data management that have been happening for as long as I can remember are:

    1. The expectations of the volume of data we are can produce and consume is growing rapidly
    2. The expected delay between data production and consumption are decreasing rapidly

    We have seen ‘typical’ data volumes of databases grow from MB through GB to a point currently where TB databases are common, and PB databases are the “big guys”.   But at the same time we have seen the expectations around the timeliness of response from these databases also change.  What used to be a monthly report became a weekly, then a daily and finally it is not uncommon to have near real-time expectations for databases in terms of data retrieval and analysis.  We have been on a continual path towards the point where data is consumed at the same moment in which it is created, either in raw form or in an aggregated or otherwise processed state. 

    At the other end of the …

    More Read

    ai in ppc advertising

    5 Proven Tips for Utilizing AI with PPC Advertising in 2023

    5 Ways AI Technology Has Disrupted Website Development
    Fortifying Enterprise Digital Security Against Hackers Weaponizing AI
    10 Ways How Artificial Intelligence Is Changing the Content Writing Landscape
    How IoT Can Be Connected to Business Intelligence

    The Passage of TimeImage by ToniVC via Flickr

  • Introduction
  • The Problem with the Relational Database (Part 1 ) –The Deployment Model
  • The Problem with the Relational Database (Part 2) – Predictability
  • The two primary trends in data management that have been happening for as long as I can remember are:

    1. The expectations of the volume of data we are can produce and consume is growing rapidly
    2. The expected delay between data production and consumption are decreasing rapidly

    We have seen ‘typical’ data volumes of databases grow from MB through GB to a point currently where TB databases are common, and PB databases are the “big guys”.   But at the same time we have seen the expectations around the timeliness of response from these databases also change.  What used to be a monthly report became a weekly, then a daily and finally it is not uncommon to have near real-time expectations for databases in terms of data retrieval and analysis.  We have been on a continual path towards the point where data is consumed at the same moment in which it is created, either in raw form or in an aggregated or otherwise processed state. 

    At the other end of the application stack, our ability to move more data around faster has led to new styles of applications that provide users near immediate access to data as it is created.  Popular consumer web examples of such applications include Facebook, Twitter, Friend Feed etc.

    But at the moment these applications aren’t real time, they are near real time.  This means there is a delay of some form between data creation and consumption.  These delays may be very short or several minutes depending on the particular application and its current workload.  These delays may seem irrelevant for the above mentioned apps, but the difference between “near real-time” and “real-time” can have a significant impact on the application functionality.  I am sure we have all been frustrated when checking in at the airport and choosing a seat, only to get the “sorry that seat is no longer available” once you click the ok button for your selection for example.

    The Problem with the RDBMS

    The problem with the traditional RDBMS is that it is not a real time system.  It is poll based.  This means a query is constructed, submitted and the results are returned to the application.  This itself may happen very quickly, maybe only a few ms to execute and receive a resultset.  However the problem is of course, the data is only “valid” for the exact moment when the query was executed.  From that moment onwards the data becomes stale and numerous changes could be happening on the data within the RDBMS while the extracted resultset is processed.

    NOTE: Yes I am aware that the disconnected approach is modern and a server side cursor approach used to be common.  We moved away from server side results processing for scalability purposes, but regardless even with server side resultset processing you weren’t automatically updated with the data changed.

    Using my example above, while I am deciding if I want a window or an isle or if it is better to have a middle seat at the front of the plane or an isle at the back, the underlying data set could be receiving numerous updates.  When I finally make my selection the dataset could be completely invalid requiring me to start the whole process again.

    While this is a very simplistic example, the issue here is the trend towards real-time in the user experience layer is not supported by the current interfacing mechanisms to a RDBMS.  While we are seeing AJAX etc being used to provide an interface which can update data in real time, underneath likely that data is still being collected from polled queries running intermittently.

    Real time & Efficiency

    One solution to this problem may be simply to run our polling cycles are such a high rate that the difference between real-time and near real-time becomes indistinguishable.  This is possible but of course, it comes at a high cost in terms of impact on scalability.

    Let me use a fictitious example to highlight this.  Imagine a Twitter like messaging system.  This system is to provide a real time like experience to their users so they set a 2 second polling cycle for all client update queries.

    For the purpose of this example, let us assume that we have 1 million users.  Those 1 million users have a different usage profiles, for this example let us assume that:

    • 50% of users get 1 message a day
    • 20% of users get 10 messages a day
    • 15% of users get 30 messages a day
    • 10% of users get 200 messages a day
    • 4% of users get 1000 messages a day
    • 1% of users get 5000 messages a day

    Ok, a couple more assumptions:

    • To poll and retrieve an empty poll requires 5 “resources” (CPU, DISK, NETWORK)
    • To poll and retrieve a message empty poll requires 50 “resources” (CPU, DISK, NETWORK)

    Now let’s compare a system which polls the database every 2 seconds with an alternative system in which messages are pushed from the database on creation to the client on creation.

    % User BaseReplies per dayPoll ResourcesPush ResourcesPush % of Poll
    501108025000000250000000.0%
    2010433000000001000000000.2%
    1530326250000002250000000.7%
    102002260000000010000000004.4%
    4100010640000000200000000018.8%
    150004660000000250000000053.6%
    100 22185000000058500000002.6%

    With the above distributions we would see that a 2 second poll time would have a resource requirement equal to 38x a push based database.  This huge overhead is obviously going to be a major overhead and a significant limitation to the upper level of scalability possible.

    So What to Do?

    I will really address the resolution path for the limitations of the RDBMS when I complete this series in my summing up post.  However specific to this issue, there are a couple of things happening which you should be aware of.

    Firstly, traditional RDBMS vendors are trying to shoehorn some form of push based results notifications into existing database platforms.  For example, SQL Server 2005 and above has query notifications and Oracle & MySQL has something similar (please post in the comments).  Current implementations are rudimentary and not suitable for large scale deployment (meant more as a global cache “refresh” event than a user specific resultset update).

    Also to watch, there are a couple of startups which have identified the real-time trend that is happening in Silicon Valley, and have also identified that existing RDBMS’s aren’t going to be able to fulfill this trend in current form.  They are focusing on re-architecting the RDBMS to be push rather than pull based.  GroovyCorp with their SQL Switch product is an organization that I have been speaking to recently.  Groovy is the furthest down this particular road that I am aware of, with a real-time push based RDBMS being launched next month.
     

    Related articles by Zemanta
    • The New FriendFeed: Real-Time, Direct Messages, Better Filters (readwriteweb.com)
    • The race to real-time search (technologyquestions.com)
    • Facebook: We’re Doing It Live, Sort Of (techcrunch.com)


    Link to original post 

    TonyBain June 15, 2009
    Share this Article
    Facebook Twitter Pinterest LinkedIn
    Share
    - Advertisement -

    Follow us on Facebook

    Latest News

    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    Big Data
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    Analytics
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    Analytics
    anti-spoofing tips
    Anti-Spoofing is Crucial for Data-Driven Businesses
    Security

    Stay Connected

    1.2k Followers Like
    33.7k Followers Follow
    222 Followers Pin

    You Might also Like

    ai in ppc advertising
    Artificial Intelligence

    5 Proven Tips for Utilizing AI with PPC Advertising in 2023

    10 Min Read
    ai in web design
    Artificial Intelligence

    5 Ways AI Technology Has Disrupted Website Development

    7 Min Read
    Digital Security From Weaponized AI
    Security

    Fortifying Enterprise Digital Security Against Hackers Weaponizing AI

    11 Min Read
    AI-powered content writing tools
    Artificial Intelligence

    10 Ways How Artificial Intelligence Is Changing the Content Writing Landscape

    8 Min Read

    SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

    ai in ecommerce
    Artificial Intelligence for eCommerce: A Closer Look
    Artificial Intelligence
    giveaway chatbots
    How To Get An Award Winning Giveaway Bot
    Big Data Chatbots Exclusive

    Quick Link

    • About
    • Contact
    • Privacy
    Follow US

    © 2008-23 SmartData Collective. All Rights Reserved.

    Removed from reading list

    Undo
    Go to mobile version
    Welcome Back!

    Sign in to your account

    Lost your password?