We have put many man years of effort in reengineering to meet the challenges of the cloud. Here are some of those challenges.
- Failure-proof protocols. A distributed system requires core protocols where everything is a request rather than a command, and services which handle the failure case and timeouts as a normal operation. In addition, transport failure is to be expected, so the protocol needs to be transport-independent and implement queueing and session recovery.
- Dynamic provisioning of application and media services. In order to deploy to cloud and scale out, you cannot have an approach of manually deploying services to specific hosts. You need an architecture that delivers automatic deployment of services across your VM estate and does so knowing how to load-balance but also minimise network footprint.
- Automatic failover and recovery for I/O operations. Usually when people talk about failover they consider service and server-level failure. However most operational failures when operating at scale are related to I/O. Database services (even those on bare-metal servers) get overloaded and drop sessions. Sytel’s service design provides I/O recovery and database rollforward as standard.
- Centralised logging. If you have dynamically-deployed services you need to centralise service logging so that when service degrades (usually because of external factors) this can be identified and corrected quickly. A design that supports this can also take advantage of low-cost storage services provided by cloud providers.
The astute reader will note that the list above does not include any mention of microservices and service orchestration. These are all necessary to achieve linear scaling and reliability for the cloud in general, but Cloud Contact Center has some unique challenges which orthodox designs for cloud cannot overcome.
The ACD, or in Sytel’s case, ASD (Automatic Session Distributor) engine has to be stateful in order to make real-time decisions concerning large amounts of resource. Consider the fact that an ASD has to load-balance in real-time across a series of queues, and that with a true ASD this load-balancing is multidimensional. It is not possible to retrieve all this information from a persistent store in order to make decisions about dequeuing sessions to agents many times per second.
Solving this realtime problem for the cloud, requires a different approach to the traditional cloud design for eliminating single points of failure. We will discuss this in a later topic.