Public vs. Private Cloud: How to Integrate Your Data Across Both

Private clouds are a natural extension of the virtualization revolution of the late 1990s and early 2000s, and they give organizations the ability to quickly create virtual machine environments— whether they are running vSphere, OpenStack, CloudStack or some other technology. Ultimately, though, that private cloud is based on capital expense of bare metal hardware in a data center you are responsible for.

Public clouds—including Amazon Web Services, Microsoft Azure, Google Cloud Platform and other IaaS market players—enable an organization to lease virtual machines across the Internet for hours or even minutes at a time. Utilizing this pay-as-you-go model can be especially helpful for workloads with unpredictable demands so that an organization can handle peaks without the underutilized capacity that would otherwise come during slower periods.

Many organizations are opting for a hybrid approach, using private cloud in certain situations and public cloud in others. A key consideration with such a strategy has to do with handling data that might have to be spread over multiple physical locations across the public Internet, and the latency as well as security concerns that might arise.

Everybody Has Everything: Cross Internet Master/Master Replication

The simplest approach to tackling data replication across a public and private or multiple public clouds is the same solution used for exclusively internal use cases: master/master replication. Keeping data replicated and symphonized across multiple locations ensures data integrity. The nuance here is now replication traffic is running across the public Internet and requires additional security measures.

Replication latency is a strong consideration here as well. Depending upon how an application is using data, it may need to proceed with caution given that replication data is now being carried over much larger distances.

Single Version of the Truth Approach

Alternatively, should data gravity issues prevent a master/master replication approach, data may have to reside in a single place while multiple front ends access it from wherever they might be running. Security issues remain the same and can benefit from a more formally structured REST API sitting in front of the single data source, but replication latency concerns get replaced by transactional latency between the data source and the consuming application layer. Those transactional concerns can often be mitigated with creative caching approaches on the consumption end so that requests for data back at the single source can be minimized.

Avoiding Hybrid Cloud Data Issues with Workload Placement Guidelines

Another way to avoid data issues across public and private clouds is to simply choose one or another based on workload type and not have any particular workload straddle both. Some workloads have steady demand or sensitive data, which makes them better suited for the firewalled, fixed capacity confines of a private cloud. Financial analytics and Human Resources workloads are good examples.

Other workloads see wide variations in demand and have publicly viewable data that make them a great fit for the elasticity of the public cloud. A customer-facing marketing website or customer analytics that have been sanitized to remove Personally Identifiable Information are typical candidates.

So, instead of choosing both for a particular application, establish guidelines for your entire portfolio of applications and decide to run each individual application on one or the other depending upon demand variability and data sensitivity.

The Choice Is Yours

Every organization has its unique challenges, strengths and key performance indicators that make no single choice the right one for everyone. Some applications should be deployed across multiple clouds in a hybrid fashion. Replicating data using long tested master/master methods can be successful in such situations when security concerns are met. Single data source techniques can also prove useful, especially when establishing a REST-API in front of them and using data caching techniques. An equally valid approach is to opt against single application hybrids and instead choose guidelines for how demand and data sensitivity thresholds dictate which applications get deployed where.