Design Goals for Developing Distributed Applications

In my last post (Wide open spaces) we discussed the elegance of using space based architecture platforms based on their simplicity and power.

In my last post (Wide open spaces) we discussed the elegance of using space based architecture platforms based on their simplicity and power. Compared to other models for developing distributed applications, it offers simpler design, savings in development and debugging effort, and more robust results that are easier to maintain and integrate. Recall, this model combines and integrates distributed caching, content-based distributed messaging, and parallel processing into a powerful architecture within a grid computing framework.

That was a mouthful. You may want to read that last sentence again carefully. And think about what this means to you as a professional practitioner. More importantly, how this may change the way you think about application platforms in general.

Before diving into this important concept, I think it is always good idea to express our stated design goals right up front – and use these to guide the inevitable trade-offs and decisions that will need to be made along this journey. So let’s get started with a few design goals I’m comfortable with. I’m sure there are more, but this represents a good start.

The platform’s ability to scale must be completely transparent

The architecture should be based on technology that can be deployed across a grid of commodity hardware nodes, providing a scalable and adaptable platform that supports high-volume, high-performance processing. The resulting platform should be tolerant of failure in individual nodes, can be matched to changing volumes easily by increasing (or decreasing) the number of processing nodes and, by virtue of its decoupled business logic, is extendible and adaptable to evolve as the business landscape changes.

Unlike conventional application server models, our elastic application platform should not require application developers to do anything different in their code in order to scale. The developer uses a simple API that provides a vast key-value data store that looks like a large shared memory space. Underneath the covers distributed caching features of the application platform spread the data across multiple servers (e.g. using a sophisticated hash algorithm). The application developer should remain unaware of the underlying implementation that distributes the data across the servers on his behalf. In brief, the goal of the grid-enabled middleware is designed to hide complexities of partitioning, distributing, and load balancing.

The platform provides resiliency by design

Applications must be available to customers and expected service level objectives must be met. The business cannot afford a single point of failure to impact customer access to other features and functions of the customer applications suite otherwise available. The platform should operate continuously and needs to be highly resilient to avoid any interruption in processing. This means that the application suite cannot have any single point of failure in the software, hardware, or network. High Availability (HA) is a basic requirement. Failing services and application components will continue on different backup servers, without service disruption.

Distributed data caches are resilient by design because they should automatically replicate data stored in the cache to one or more backup servers, guided by the policies defined by an administrator and executed in a consistent controlled manner. If one server fails, then another server provides the data (the more replicas, the more resilient the cache). Note, distributed data caches can be vulnerable to data center outages if all the compute servers are located in the same physical data center. To address this weakness, the distributed caching mechanism should offer special WAN features to replicate and recover data across multiple physical locations. The improvement in resilience reduces the risk of expensive system down-time due to hardware or software failure, allowing the business to continue operating albeit with reduced performance, during partial outages. An added benefit of this architecture composed of discrete units working together would enable rapid development and a controlled introduction of new features in response to changing requirements without the need for a big-bang rollout approach.

The platform is prepared to meet demanding performance requirements

A performance characteristic of distributed caches is that they store data in fast-access memory rather than on disk, although backing store on disk may be an option. Since this data spans multiple servers, there is no bottleneck or single point of failure. Using this advanced elastic application platform provides a means to ensure that cached data will tend to be on the same server where application code is processing, reducing network latency. We can do this by implementing a “near-cache” concept that places data on the server running the application using that data or by directly managing application code execution in the platform, placing adjacent code and data in cache nodes that are on the same server.

The platform needs to support robust integratation with other data sources

Most distributed caching platforms offer read-through, write-through, and write-behind features to synchronize data in the cache with external data sources. Rather than the developer having to write the code that does this, an administrator configures the cache to automatically read or write to a database or other external data source whenever an application performs a data operation in the cache. Data is an asset that is valuable. Sharing this asset across the platform improves the ability to support better data enrichment, improve accuracy and meet business goals.

The platform’s application workload is by nature distributed

For elastic application platforms offering distributed code execution we should consider the nature of the workload the applications will present to servers. If we can divide the workload into units that naturally fit into the distribution schemes as offered then the greater sophistication of the distributed code execution capability can be just what’s needed to turn a troublesome, resource intensive application into one that performs well and meets expectations.

Specific application responsibilities that repeat (or are redundant) across the application architecture should be separated out in the application architecture. Shared global or common use application functional solutions are sometimes referred to as “cross-cutting concerns” and forward the key principle of “separation of concerns”. The platform should support component designs which minimize coupling. The law of Demeter (Principle of Least Knowledge or only know your neighbor applies). The platform should promote loose coupling by minimizing:

dependency between modules (e.g. shared global variables)
discouraging content coupling (one module relying on another’s content)
protocol or format dependencies
control based coupling where one program controls another’s behavior
Non-traceable message coupling which can lead to a dynamic spaghetti-like results impossible to manage

There are other goals I have not addressed here which we should all be familiar with to include:

Desire to BUY vs. Build and Maintain
Remain Technology and Vendor Independent
Promote Interoperability
Meet security and privacy needs

So, now we have a better idea of the design goals we are going to try to achieve. I think it is always important to take these goals to the next step in the high-level specification in order to begin quantifying how we will meet these into actionable objectives. Remember our original strategy which has driven our design goals. The design goals now should be used to create quantifiable objectives we can plan and measure progress to.