By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data science anayst
    Growing Demand for Data Science & Data Analyst Roles
    6 Min Read
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: VectorWise
Share
Notification Show More
Latest News
SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > VectorWise
Business IntelligenceData Mining

VectorWise

TonyBain
Last updated: 2009/08/01 at 1:27 AM
TonyBain
11 Min Read
SHARE


I was fortunate enough to speak with Marcin Zukowski earlier about VectorWise.  If you missed it, VectorWise came out of stealth mode a day or two ago.  The have announced a joint partnership with Ingres and essentially are claiming impressive analytic RDBMS performance gains on conventional hardware.

To start with, a key message that I think needs to be communicated here is that this is not a product announcement.  Ingres and VectorWise have announced a partnership in which they of course plan to build products together, today those products are still in the works.

VectorWise is a spin out of CWI based on research that was undertaken by Marcin and others, research that centered on MonetDB.  Explaining the essence of VectorWise is difficult because it is largely internal DBMS data storage & processing logic, but I will have a go.

The modern RDBMS is based around design principles that stem from general purpose OLTP roots and historical hardware architectures (this is partially true even for some of the newest analytic platforms).  These design principles in a nutshell focus on the fact that disk is slow & CPU is fast.  Data is seeked or …

More Read

SMEs Use AI-Driven Financial Software for Greater Efficiency

Key Strategies to Develop AI Software Cost-Effectively
AI is Driving Huge Changes in Omnichannel Marketing
Maximize Tax Deductions as a Business Owner with AI
Marketers Use AI to Take Advantage of 3D Rendering


I was fortunate enough to speak with Marcin Zukowski earlier about VectorWise.  If you missed it, VectorWise came out of stealth mode a day or two ago.  The have announced a joint partnership with Ingres and essentially are claiming impressive analytic RDBMS performance gains on conventional hardware.

To start with, a key message that I think needs to be communicated here is that this is not a product announcement.  Ingres and VectorWise have announced a partnership in which they of course plan to build products together, today those products are still in the works.

VectorWise is a spin out of CWI based on research that was undertaken by Marcin and others, research that centered on MonetDB.  Explaining the essence of VectorWise is difficult because it is largely internal DBMS data storage & processing logic, but I will have a go.

The modern RDBMS is based around design principles that stem from general purpose OLTP roots and historical hardware architectures (this is partially true even for some of the newest analytic platforms).  These design principles in a nutshell focus on the fact that disk is slow & CPU is fast.  Data is seeked or partially scanned off disk and cached.  Row-by-row (tuple-by-tuple) operators process that data, passing the outcome of each operator to the next as part of a queries execution plan until ultimately producing the result. 

Traditionally I/O is the main bottleneck, so to make the database faster you add more I/O bandwidth.   Today, disk requirements may be up to 100x the actual capacity needs, so many disks are necessary to achieve the I/O bandwidth to provide performance for an analytical RDBMS implementation.  Even though the RBDMS’s may parallelize query operators across cores, this typically works by partitioning data between cores, yet each is still processing on a tuple-by-tuple basis.

Conventional wisdom?  Well maybe.  You see disk is only really “slow” when it is doing random seeks.  Give a disk something sequential to do on the other hand and things are very different.  Modern disks are able to sequentially scan in the range of 150MB per second.  An array of 10 disks should therefore be able to return sequentially read data in the range of 1GB per second. 

When it comes to databases, column based storage has been found to effectively structure data for a) high levels of compression and b) sequential access.  VectorWise makes use of both of these technologies to help it achieve high levels of sequential I/O.  The problem now however is that disk may no longer the bottleneck.  While we can get 1GB a second sequentially off disk relatively easily & cheaply, processing tuple-by-tuple at this rate is very difficult.  As it turns out, a RDBMS’s may only achieve a data processing rate of 50MB a second per CPU core.  This makes the CPU processing limitations a big bottleneck for analytics data sets, assuming the above figures we would need over 20 cores to keep up with 10 disks (and of course CPU cores don’t scalability linearly).

If we step out of the database world for the moment into the world of high end computer games, or high end scientific processing, we find their use of current CPU technology is much more advanced than what we are used to.  They are using new CPU extensions (MMX, SSE, SS2, Prescott etc) to parallize & pipeline computation within a CPU’s core meaning they are processing orders of magnitude more instructions per core that what a traditional RDBMS typically has been able to. The exact details are too low level to discuss here (many of the research papers are available online) but it is fair to say, modern CPU architectures contain advanced features that to date haven’t effectively been exploited by database vendors.

Enter VectorWise.  Their aim is to marry storage technologies which allow high levels of sequential I/O to occur with query processing logic which is designed for modern CPU architectures.  Rather than process tuple-by-tuple they are processing “vectors”, groups of tuples, leveraging modern CPU extensions and high levels of on-chip cache to allow the CPU to carry out higher data processing throughput.  The result is instead of the 50MB a second in a tuple-by-tuple approach, VectorWise are able to achieve processing rates in the range of 500Mb-1GB a second per core in some situations.  This means processing rates of 8GB a second or more could be possible with relatively low end hardware.

“In some situations” is the key point to stress here, this obviously isn’t a blanket gain that applies to all analytic data sets, workloads and query requirements.  Just what those situations are will be the key to their technologies success, how well it actually applies to real world data sets and queries.  I wouldn’t expect to see too many specific examples on this until a product beta appears.  But the theory is VectorWise can offer high levels of processing capabilities with existing mainstream hardware.  At this point VectorWise isn’t even focusing on MPP instead they are single node focused.  If their scalability claims pan out you can imagine how this could allow a single node solution to be competitive with existing low to mid scale MPP solutions that are based on a more conventional query processing architecture.

This isn’t VectorWise’s only trick up their sleeve.  They are also are leveraging research around column based storage, compression, piggy-backed (shared) scans and so on.  Much of the research that has been adopted by VectorWise is referenced from their web site.

So VectorWise have impressive technology, so why then partner with Ingres rather than a larger vendor (or going at it alone)?  Marcin offers a few reasons.  Firstly, as academics they feel strongly that open source is cool so this path was greatly preferred over a relationship with a non-open vendor.  Secondly Ingres will allow them to deliver their technology in an uncompromised fashion.  Marcin mentioned that if they had partnered with one of the big three vendors, that vendors existing product strategies and investments would have likely meant their ideas could have only been implemented in partial form.  Ingres on the other hand is going to allow them more of a green field.  And of course, a partnership with Ingres makes sense from a go to market perspective as Ingres already has a worldwide reputation, a global customer base, sales & marketing capabilities etc.

Marcin confirmed that Ingres have an exclusive license to their technology, and first option to acquire them for a certain period of time.  This allows Ingres to really invest in the relationship without the fear of the carpet being pulled out from under them. 

VectorWise clearly are applying innovative research to analytical RBDMS requirements.  But as interesting as the technology sounds, the proof in the pudding will be how well these design principals translate to real-world analytical processing requirements in mainstream product form.  This remains to be seen, but Ingres and their community clearly has high hopes.

VectorWise is clearly differentiated when comparison with a traditional mainstream RDBMS running on mainstream hardware.  However in this current market we have lots of different approaches to the problems described.  Kickfire for example use their own SQL Chip processor to increase data processing rates and other appliance vendors are using FPGAs etc for similar purposes.  The comparison of these different approaches and the relative effectiveness of each approach still need to be examined, however a mainstream hardware approach has obvious benefits.

Related articles by Zemanta
  • Ingres challenges Microsoft’s DataAllegro warehouse steal (theregister.co.uk)
  • Watch out for VectorWise (dbmsmusings.blogspot.com)

Link to original postInnovations in information management

TonyBain August 1, 2009
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

Artificial Intelligence

SMEs Use AI-Driven Financial Software for Greater Efficiency

10 Min Read
ai software development
Artificial Intelligence

Key Strategies to Develop AI Software Cost-Effectively

10 Min Read
ai in omnichannel marketing
Artificial Intelligence

AI is Driving Huge Changes in Omnichannel Marketing

12 Min Read
ai for small business tax planning
Artificial Intelligence

Maximize Tax Deductions as a Business Owner with AI

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?