Prototyping Cloud Analytic Applications

July 27, 2010
78 Views

Cloud computing is changing the way that companies build and deploy their analytic solutions. With cloud computing, computing is available on demand, scales elastically, and can be self-provisioned. This flexibility sometimes requires developing new analytic infrastructure and new analytic algorithms, which, in turn, requires some experimenting. This process can usually benefit from an external perspective.

Cloud computing is changing the way that companies build and deploy their analytic solutions. With cloud computing, computing is available on demand, scales elastically, and can be self-provisioned. This flexibility sometimes requires developing new analytic infrastructure and new analytic algorithms, which, in turn, requires some experimenting. This process can usually benefit from an external perspective.

The fastest way forward is to use a public cloud, external experts, and to do some quick experiments and prototyping. At this point, for many companies, there is a problem. It is quite common these days for companies to have policies that prohibit placing proprietary data, or data that contains information that can identify customers, on public clouds. Providing access to this data to third parties is also usually quite difficult.

One practical approach is to replace actual data with simulated data, and, instead of using public clouds, to use instead private clouds operated by third parties. This requires using data simulators that produce realistic data. For example, large data is rarely normally distributed, but more often follows power laws or similar types of distributions.

As a reminder, a private cloud is a cloud that is used exclusively by a single organization. It may be managed by the organization or by a third party; and, it may exist on premise (an in-house private cloud) or off premise (a third-party private cloud). In contrast, in a public cloud, the cloud infrastructure is made available to the general public, or a large group, and is owned by an organization selling cloud services (a cloud service provider). In this post, we assume that private third party clouds are also single tenant clouds; that is, only one client’s data is on the cloud at a time and the cloud is sanitized between use by different clients.

In more detail, one approach for moving your analytics to clouds is:

  • use simulated data following realistic simulations, instead of actual data;
  • supplement in-house expertise with third party experts who specialize in analytics and cloud computing;
  • use third party private clouds instead of public clouds to decrease risk or perceived risk;
  • experiment with different analytic approaches and different analytic infrastructures;
  • agree on APIs up front and transfer technology by transferring code that uses these APIs.

We have found this approach works well. We would be interested in hearing your experiences.

Full disclosure: Open data operates private clouds, has developed software that provides simulated data for a variety of industries, including financial services, and provides consulting services using simulated data on private clouds so that companies can rapidly explore the use of cloud computing to develop innovative cloud computing applications, especially analytic applications.