Copyright © 2009 James Taylor. Visit the original article at First Look – Truviso.I got a second chance to chat with the folks at Truviso recently. Truviso was founded after a Professor and his PhD student, at Berkeley went back to the fundamentals of data management and predicated that in a world of highly interconnected […]
I got a second chance to chat with the folks at Truviso recently. Truviso was founded after a Professor and his PhD student, at Berkeley went back to the fundamentals of data management and predicated that in a world of highly interconnected objects it would be necessary to eliminate the batch-centric database process of “store first, query later”. Truviso is the result, focused on providing analysis in real time and continuously. This system is based on PostgreSQL allowing an integration of stream processing and traditional queries.
Truviso take the position that this ability to process data without storing it first is critical. Not only is data warehouse volume growing even faster than Moore’s law – at up to 173%/year (according to published research by Richard Winter) – but what Truviso calls “Net-centric “companies (ones that live in and benefit from the network) have data volumes growing at 300-1000% per year. Some of these companies are handling terabytes of data each day. Truviso believes it has reinvented data management and analysis for high data growth/data intensive businesses. By basing it on an open source platform they hope to deliver revolutionary technology within an evolutionary approach. The product has three main pieces
- Core continuous query processing engine
Built inside in PostreSQL so can use it just like a database e.g. to support queries
- Java integration platform
Container for connectors – plumbing such a listening to a feed or getting data. Cisco routers, for instance, can be queried for a chunk of data that describes what has happened recently.
- Flex-based visualization environment
They built their own dashboard because most commercial ones use polling and this is inefficient for a stream processing engine. They used Adobe Flex because it had a nice way to handle data updates and offers a portable, user-friendly experience.
When streams of data come in they are processed by queries before being persisted. Streams are defined in (mostly) standard SQL – Truviso add a small statement to tell it to handle streaming data. This WINDOW statement tells the engine how to analyze the stream of data coming in. A stream is an unbounded sequence of records and the Window operators turn these streams into pseudo-tables. For instance you could specify “VISIBLE 5 sec ADVANCE 5 sec” to get 5 second non-overlapping windows in your “table” or “LANDMARK ADVANCE 2 sec” to get a fixed start updated every 2 seconds. Windows can use properties of data – like group by – and can limit the table size by time or by the number of rows. All the results can be stored by channeling them into a persistent table.
The engine runs queries all the time so it needs continuous query optimization framework which the company built . When new queries are defined, Truviso automatically folds them into the existing plan to build an overall plan. By reusing elements of queries already being run they can achieve super-linear query salability – in some ways similar to the way the Rete network manages this for rules. The stream processing engine that combines the queries is very fast, handling many hundreds of thousands of rows/second.
Because the streams are handled using an extension to SQL, queries can hit streams and tables in combination. This allows them to take existing queries and reports and rapidly re-implement them against streams. Indeed this is one of the primary use cases for their early adopters. They also provide a “time-travel” Tivo-like interface for analysis. While there has been an increase in the interest and efforts to address streaming data analysis, Truviso’s approach of leveraging standard SQL allows the combination of streaming data with staged/tabular data. Their focus on delivering massive performance scalability for businesses not necessarily doing “real-time” is also interesting. Truviso’s approach is very interesting when considering the data processing and analysis needs of a data-intensive business.
Truviso’s core use-cases are around continuous analysis (though not necessarily real-time), scalability, and headroom for data growth. In particular those companies in the business of delivering digital services where effective and timely use of data translates into direct business and customer benefits.