By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data science anayst
    Growing Demand for Data Science & Data Analyst Roles
    6 Min Read
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Open Source Analytics Reaches Main Street (and Some Other Trends in Analytics)
Share
Notification Show More
Latest News
ai in automotive industry
AI Is Changing the Automotive Industry Forever
Artificial Intelligence
SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Analytics > Predictive Analytics > Open Source Analytics Reaches Main Street (and Some Other Trends in Analytics)
Business IntelligencePredictive Analytics

Open Source Analytics Reaches Main Street (and Some Other Trends in Analytics)

Editor SDC
Last updated: 2009/05/12 at 10:44 PM
Editor SDC
8 Min Read
SHARE
This is the first of three posts about systems, applications, services and architectures for building and deploying analytics. Sometimes this is called analytic infrastructure. This post is primarily directed at the analytic infrastructure needs of companies. Later posts will look at analytic infrastructure for the research community.

In this first post of the series, we discuss five important trends impacting analytic infrastructure.

Trend 1. Open source analytics has reached Main Street. R, which was first released in 1996, is now the most widely deployed open source system for statistical computing. A recent article in the New York Times estimated that over 250,000 individuals use R regularly. Dice News has created a video called “What’s Up with R” to inform job hunters using their services about R. In the language of Geoffrey A. Moore’s book Crossing the Chasm, R has reached “Main Street.”

Some companies still either ban the use of open source software or require an elaborate approval process before open source software can be used. Today, if a company does not allow the use of R, it puts the company at a competitive disadvantage…

More Read

cloud data security in 2023

Top Tools for Your Cloud Data Security Stack in 2023

Businesses Must Cope with the Benefits and Risks of Cloud Computing
Cloud-Centric Companies Discover Benefits & Pitfalls of Network Relocation
Optimizing Cost with DevOps on the Cloud
Cloud Technology is the Future of Medical Billing Software

This is the first of three posts about systems, applications, services and architectures for building and deploying analytics. Sometimes this is called analytic infrastructure. This post is primarily directed at the analytic infrastructure needs of companies. Later posts will look at analytic infrastructure for the research community.

In this first post of the series, we discuss five important trends impacting analytic infrastructure.

Trend 1. Open source analytics has reached Main Street. R, which was first released in 1996, is now the most widely deployed open source system for statistical computing. A recent article in the New York Times estimated that over 250,000 individuals use R regularly. Dice News has created a video called “What’s Up with R” to inform job hunters using their services about R. In the language of Geoffrey A. Moore’s book Crossing the Chasm, R has reached “Main Street.”

Some companies still either ban the use of open source software or require an elaborate approval process before open source software can be used. Today, if a company does not allow the use of R, it puts the company at a competitive disadvantage.

Trend 2. The maturing of open, standards-based architectures for analytics. Many of the common applications used today to build statistical models are stand-alone applications designed to be used by a single statistician. It is usually a challenge to deploy the model produced by the application into operational systems. Some applications can express statistical models as C++ or SQL, which makes deployment easier, but it can still be a challenge to transform the data into the format expected by the model.

The Predictive Model Markup Language (PMML) is an XML language for expressing statistical and data mining models that was developed to provide an application-independent and platform-independent mechanism for importing and exporting models. PMML has become the dominant standard for statistical and data mining models. Many applications now support PMML.

By using these applications, it is possible to build an open, modular standards based environment for analytics. With this type of open analytic environment, it is quicker and less labor-intensive to deploy new analytic models and to refresh currently deployed models.

Disclaimer: I’m one of the many people that has been involved in the development of the PMML standard.

Trend 3. The emergence of systems that simplify the analysis of large datasets. Analyzing large datasets is still very challenging, but with the introduction of Hadoop, there is now an open source system supporting MapReduce that scales to thousands of processors.

The significance of Hadoop and MapReduce is not only the scalability, but also the simplicity. Most programmers, with no prior experience, can have their first Hadoop job running on a large cluster within a day. Most programmers find that it is much easier and much quicker to use MapReduce and some of its generalizations than it is develop and implement an MPI job on a cluster, which is currently the most common programming model for clusters.

Trend 4. Cloud-based data services. Over the next several years, cloud-based services will begin to impact analytics significantly. A later post in this series will show simple it is use R in a cloud for example. Although there are security, compliance and policy issues to work out before it becomes common to use clouds for analytics, I expect that these and related issues will all be worked out over the next several years.

Cloud-based services provide several advantages for analytics. Perhaps the most important is elastic capacity — if 25 processors are needed for one job for a single hour, then these can be used for just the single hour and no more. This ability of clouds to handle surge capacity is important for many groups that do analytics. With the appropriate surge capacity provided by clouds, modelers can be more productive, and this can be accomplished in many cases without requiring any capital expense. (Third party clouds provide computing capacity that is an operating and not a capital expense.)

Trend 5. The commoditization of data. Moore’s law applies not only to CPUs, but also to the chips that are used in all of the digital device that produce data. The result has been that the cost to produce data has been falling for some time. Similarly, the cost to store data has also been falling for some time.

Indeed, more and more datasets are being offered for free. For example, end-of-day stock quotes from Yahoo, gene sequence data from NCBI, and public data sets hosted by Amazon, including the U.S. Census Bureau, are all available now for free.

The significance to analytics is that the cost to enrich data with third party data, which often produces better models, is falling. Over time, more and more of this data will be available in clouds, so that the effort to integrate this data into modeling will also decrease.

For related posts, see: blog.rgrossman.com

TAGGED: analytic infrastructure, cloud computing, pmml, r
Editor SDC May 12, 2009
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai in automotive industry
AI Is Changing the Automotive Industry Forever
Artificial Intelligence
SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

cloud data security in 2023
Cloud Computing

Top Tools for Your Cloud Data Security Stack in 2023

7 Min Read
cloud computing benefits and risks
Cloud Computing

Businesses Must Cope with the Benefits and Risks of Cloud Computing

9 Min Read
cloud-centric companies using network relocation
Cloud Computing

Cloud-Centric Companies Discover Benefits & Pitfalls of Network Relocation

5 Min Read
DevOps on cloud
Development

Optimizing Cost with DevOps on the Cloud

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?