The NFV World Congress in San Jose was all about disrupting the current state of telecommunications. So I actually wasn’t too surprised when I heard in my opinion a very controversial comment that went contrary to one of the core architectural principles in telecommunications networks. It basically questioned telecom’s obsession with the concept of carrier grade and five 9’s. The question raised was, “Why is the telecommunications industry so intent on delivering five 9’s when no one is really willing to pay for it? Spending billions of dollars on this capability does not really make sense when no one is willing to pay a premium.” It’s true that “five 9’s” has become a cornerstone to telecom availability, and the term is thrown around quite liberally by vendors and service providers alike. However, with the worlds of IT and telecom colliding, I believe there are markedly different perceptions and opinions to the requirement for five 9’s availability. Are the perceptions that far apart, though?
One of the motivating factors for five 9’s for me was the 1988 Hinsdale Central Office fire, the largest telecommunications disaster in US history to that time, in which a good portion of the special circuits and central office equipment supporting not only local services but also banking, airline ticket reservations, ATMs etc., were affected due to a catastrophic fire. The fact was that the Hinsdale office represented a single geographical point of failure, and the outages of some major systems went on for weeks. Enough history, though. The point is that the terms “availability” and “reliability” seemed to take center stage as well as the concern over single points of network failure.
With Network Functions Virtualization (NFV), the concept of High Availability (HA) for virtualized network functions (VNFs) changes. It shifts from that of an active-standby or active-active deployment of the application or platform in a physical sense to a virtualized application using the inherent services and capabilities of the underlying NFV infrastructure (NFVI) layer. So now, when there is a failure, catastrophic or otherwise with the underlying hardware or the application, the impacted traffic will be re-directed to a new instance or a load shared instance of that application either in the same data center or across data centers. So even though the NFVI layer takes responsibility for providing HA, you still need the available virtual resources to accommodate this capability.
Admittedly, some public cloud providers don’t promise five 9’s availability (which amounts to a little over 5 minutes down time a year) nor do they count scheduled downtime or maintenance in calculating their SLA. This seems to be in direct conflict with the telco mantra of five 9’s, but an N+X approach (where X is the number of failed components that can be tolerated) can provide a pretty solid foundation to meet customer expectations of availability. There are challenges, though, as discussions in the ETSI NFV ISG point out. For example, the VNF has to dynamically and rapidly scale out in response to a failure or burst in traffic load. Also, some VNFs perform stateful processing of flows and that state needs to be duplicated across instances otherwise there could be a service disruption when an instance fails.
Network Functions Virtualization (NFV) and cloud-based applications truly represent a technology turn that will have a profound effect on:
- How underlying infrastructure is deployed for end-to-end service delivery
- How services and new network functions are rolled out
- The way network functions are architected, chained together and instantiated to create a service, and
- The capability set of virtualized network functions that can now leverage the elasticity and resiliency benefits of the cloud – for example N+X
Dialogic has taken a comprehensive approach when it comes to NFV and VNF implementation and has focused on software modularity and decomposition of applications to better take advantage of the elasticity and scalability features of the NFVI.
What do you think? Is the concept of five 9’s a thing of the past or do we have new tools to implement the next generation of high reliability in a cloud environment? Let us know what you think by tweeting us @Dialogic.