By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    Promising Benefits of Predictive Analytics in Asset Management
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: First Look: Via Science
Share
Notification Show More
Latest News
ai digital marketing tools
Top Five AI-Driven Digital Marketing Tools in 2023
Artificial Intelligence
ai-generated content
Is AI-Generated Content a Net Positive for Businesses?
Artificial Intelligence
predictive analytics in dropshipping
Predictive Analytics Helps New Dropshipping Businesses Thrive
Predictive Analytics
cloud data security in 2023
Top Tools for Your Cloud Data Security Stack in 2023
Cloud Computing
become a data scientist
Boosting Your Chances for Landing a Job as a Data Scientist
Jobs
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Modeling > First Look: Via Science
AnalyticsBig DataInside CompaniesModelingPredictive Analytics

First Look: Via Science

JamesTaylor
Last updated: 2013/06/15 at 2:04 AM
JamesTaylor
6 Min Read
Via Science big data
SHARE

Via Science big dataI got a chance to catch up with Via Science recently. Via Science is focused on “big math” for “big data.” They see big data companies spanning data collection to search/storage to analysis/visualization.

Via Science big dataI got a chance to catch up with Via Science recently. Via Science is focused on “big math” for “big data.” They see big data companies spanning data collection to search/storage to analysis/visualization. Big Math, to Via Science, is “the use of leading edge mathematics on very large supercomputing platforms to make better predictions, recommendations, and explanatory models of how the world works directly from data” and this is where they see themselves. Math, they say, makes data useful – it turns GPS data into directions, purchase history into recommendations, and delivers scale. Big Math skills are, however, not all that widespread in businesses. Given the volume of data and this lack of skills, business people need tools to make it useful. Via science applies Bayesian statistics and networks on a big computing platform (supercomputer) to process big data and small data.

Today Via Science builds models for people and the results are deployed using Java deployment and these are built into operational decision management systems as well as more traditional decision support systems. They have subsidiaries dedicated to healthcare and quantitative trading customers. They are now expanding beyond those sectors to retail, CPG, energy, and telecommunications.

They have a big focus on cause and effect relationships to make predictions, detect anomalies and create explanatory models. Even when causation is not essential, their approach also provides a rich set of information about correlation and the approach also improves accuracy. They aim at three distinct differentiators:

  • Handle big data and small data
    Big data is generally long – lots of records. There are many real world problems with wide data– lots of columns and relatively few rows. It’s hard to publish long data because you might have 100s of millions of records that reflect what has happened and must be analyzed to predict what might happen next. Wide data is hard because you might have 10s of thousands of variables for each entry. This is hard to process even if you only have some thousands of rows because traditional frequentist statistics requires many more rows than columns to make accurate predictions.
  • Causality, describing the way data interacts and leads to an outcome.
    The network of causality, how the variables interact and in what “direction,” lets you ask why something happens (looking back) and what if (looking forward).
  • Handling volatility and uncertainty in data.
    Standard statistical approaches work well for normal distributions and continuing historic trends while their approach will detect events or make predictions outside the bounds of historical analysis.

The math behind this, Bayesian statistics and networks, is well understood and the challenge is how to scale it. The basis for this is the work of Judea Pearl who developed a branch of mathematics for determining cause-effect relationships from data. The platform applies Pearl’s work and Bayesian networks at massive scale on a hosted platform. The platform (REFSTM) handles three steps:

  1. Enumeration where lots of networks are developed and evaluated to see if they can explain a part of the picture. Very large numbers of fragmentary networks are created from input, output and state variables as well as various possible mathematical relationships. Probability scores for each fragment are calculated using the full distribution of data. Lots of models using lots of different mathematical relationship types.
  2. Optimization combines these into an optimal assembly. An ensemble Bayesian network is created using a Monte Carlo algorithm. Initially networks are selected randomly and random changes are made to the model at each stage of the optimization to prevent local maxima becoming a problem. Large numbers of networks are generated each with a different result and some weight reflecting the likelihood it is correct. Very large numbers of networks might be generated and perhaps 1,000 of the best selected. These are then combined to generate a weighted mean and standard deviation. This is what code can be generated for.
  3. Finally simulation is run on the ensemble of models to see how much impact there is for a change to each variable. This gives you a sense of the causal impact of each variable. This testing and simulation is done specifically for each models.

This runs on both Amazon EC2 and on their own supercomputer (130,000+ CPU cores). The resulting models are executable locally. You can find more information about Via Science www.viascience.com.

Copyright © 2013 http://jtonedm.com James Taylor

TAGGED: Big Math, Via Science
JamesTaylor June 15, 2013
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai digital marketing tools
Top Five AI-Driven Digital Marketing Tools in 2023
Artificial Intelligence
ai-generated content
Is AI-Generated Content a Net Positive for Businesses?
Artificial Intelligence
predictive analytics in dropshipping
Predictive Analytics Helps New Dropshipping Businesses Thrive
Predictive Analytics
cloud data security in 2023
Top Tools for Your Cloud Data Security Stack in 2023
Cloud Computing

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

[mc4wp_form id=”1616″]

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?