In the 5th and penultimate blog in our Designing for The Cloud series, we take a look at the challenges faced when designing for scale.
Consider this dichotomy:
- In order to make software components deliver at scale, the information dependencies between them must be minimised or nothing would happen in timely fashion
- But this then limits the ability of a platform to solve complex problems of resource allocation, such as load-balancing across ACD queues for service level adherence and blending.
Solving this problem is the holy grail of cloud contact center design.
Most people reading this statement will think ‘that is obviously a programmer writing this blog’ and ‘I don’t want to know. It’s your job to solve these problems!’ You’d be right on both counts. You’ll forgive me for writing a longer blog on this subject, but there is a story to tell that will resonate for those brave souls who deploy cloud solutions for their contact center business.
The reason why the dichotomy is important has to do with the approach that software development teams take to architecting their software. Much of the software we see in the market use designs that follow microservices patterns, providing infinite scale (in theory).
Following microservices patterns
If you write an ACD following microservice patterns, you end up producing ‘actors’ to model the activity of queues, agents, media sessions. These actors are self-contained and persist state, so they can be disposed of when not in use and reinstated at will. This model allows the ACD to be deployed in distributed fashion and because all of the components are small, self-contained and able to be reinstated on failure or the need to add server resources. Such an ACD has linear and potentially infinite scale and is inherently reliable.
What’s not to love about this? If you work with an ACD offer from one of the more recent entrants to the cloud ACD market they will likely have this sort of architecture. They are generally easy to get to grips with, have clean APIs, lots of add-on functionality that your business could use. But…
The problem of scale
A big clue as to why all may not be well in paradise is when you try to take such a platform and then use it to manage lots of queues, sessions across multiple channels and having agents handle multiple media sessions.
You will have to do this sort of resource management. The business wants:
- omnichannel
- multiple live sessions at once
- bots to do some of the work
- 100% utilisation of human resource
- possibly integrating with some form of WFM for real-time adherence
You’ll find that the nice simple model your vendor offers you doesn’t automatically manage resource conflicts (agent has skills for multiple queues) or load-balance queues to SLA, but they do provide you with all the APIs you need to do this yourself.
It isn’t that difficult to do, so you roll up your sleeves and get on with writing some code. It works in test and you then deploy…
And here is where you run into what grey-haired ACD programmers call ‘the busy lamp bug’. On old PBXs, a receptionist console would have a busy lamp display. A company with 50 people might have one receptionist with a console with 50 lights. When the company doubles in size you might need 2 receptionists to manage the phone load, and each receptionist needs a console with 100 lights. Double the size of the system, quadruple the amount of work the PBX has to do to set all those lamps.
The same thing happens when you try to do ACD resource management for many queues/channels/sessions-per-agent via an API. The exponential increase in load means the promise of linear scaling is in fact a fairy tale.
If you stick with your cloud provider, you’ll be nickel-and-dimed for all those API call costs; remember they’re not for free. And with increasing call/media session volumes there will come a time where your service provider chokes on the exponential increase in API calls, leading to denial of service. That gets both the CFO and the COO upset.
Suddenly the old-school ACD that got mothballed because it couldn’t do multisession and was a pain to integrate with the rest of your systems doesn’t look so bad, but clearly you can’t go backwards.
The Sytel solution
So how does Sytel solve this problem?
The first thing to say is that as good cloud citizens we decompose services and follow a microservice model in our design.
Even our ACD (ASD®) engine follows an actor model, which provides separation of concerns and maintainability, but crucially we run all of this in a single process for each tenant, which means that state does not have to be cached. We also have our own automated load-balancing, again separated out for maintainability but run in-process alongside the ASD machinery.
We do some other clever stuff with just-in-time compiled rule processing to allow business-specific rules to run at native code speeds.
Examples of such rules might be
- using an agent’s average time to respond to chats to determine how many concurrent chats sessions they should handle, or
- setting house policy on whether or not users can be presented with email or chat when they are active on a voice or video session.
Our design imposes per-tenant limits but those limits are huge by comparison with the real-world limits that API load places on cloud ACD solutions that don’t provide automatic resource management and load-balancing. One of our end-user customers is in the process of deploying and ramping up a 7000-seat tenant as part of a multitenant system. We load-test to 10,000 agents / 30,000 sessions per tenant.
How we’ve come to this design is also important. It didn’t happen by accident and it didn’t happen overnight. We take the Kaizen approach to ACD development which means we focus on standardisation, maintainability and extensibility. Peer review and diversity of skills are important; our core team is made up of math and stats gurus, games programmers, compiler authors, realtime software specialists.
And we never stop learning – a key philosophy for developers at Sytel. We use lessons and techniques developed over 25 years of continuous improvement. And that equips us to climb mountains.