As delivery of cloud computing services threatens to become a way of life, it is timely to look at where vendors like ourselves should be concentrating their efforts to ensure very high uptime, or if you like, to ensure that outage over long periods of time is measured in seconds and minutes only.
The main job of a software vendor is of course to design software that not only can be deployed fully redundant but is also engineered well in the first place, so that it doesn’t fail.
But from a user’s point of view, there are a number of other things that also need to be on place to ensure very high up time. These include:
- a ‘no single point of failure’ (redundant) hardware architecture
- connections to multiple bearer networks
- appropriate levels of system monitoring
- strict change control
Without these features, even the best call center software in the world may simply fail to deliver. But let’s focus on what software vendors can do.
Outage can cost not only revenue but reputation damage, too. If the outage is caused by a software component within your platform, even a fully redundant architecture will not protect you.
Here are two complementary software approaches to minimising down-time:
- Process separation
i.e. dividing server-side components into discrete services, each delivering a specific function. This minimises complexity of components, ring-fences failure-prone operations and therefore minimises failure rates and failure cost.One application of this might be a database proxy that contains code to manage database transaction failure. It publishes interfaces so that other services can take advantage of its capabilities. This means that there is only one application that has to implement complex code for managing database transaction failure but its capabilities can be used by other applications.
i.e. running one activity across several physical processes. This could be across multiple discrete instances (multi-instancing), or multiple connected instances (clustering):
Multi-instancing allows software components to be installed multiple times on the same computer, allowing each component to operate simultaneously but independently; or allowing different types of data to be associated with different instances, e.g. tenant data in a multi-tenant environment.By multi-instancing, the load on each service is reduced, and the likelihood and cost of failure is reduced.
A cluster consists of a set of connected servers (physical or virtual) that work together so that in many respects they can be viewed as a single system.
Performance, capacity and availability can be scaled up across multiple systems at a fraction of the cost it would take to achieve in a single system.
Further protection can be provided by making one node of the cluster redundant, and maintained as a ‘hot standby’. This entails deploying N+1 servers to deliver N servers worth of capacity. Deploying a redundant cluster requires the running of a central control process, or script, for making bridging decisions.
High availability is a must-have for cloud deployments, and with the right design and some careful planning, it is certainly achievable (without astronomical expense).
Stay tuned as we look in more depth at the detail of delivering high availability in future blogs.