SD-WAN offers many benefits vs traditional WAN's, including greater performance, enhanced visibility and control, increased agility, and cost savings. Perhaps its most notable benefit is improved reliability, which it achieves by actively utilizing multiple transport networks simultaneously and dynamically steering traffic around network failures when they occur. While most of us have an intuitive understanding of why this boosts reliability, many are unfamiliar with the mathematics that quantify it. The purpose of this blog post is to demystify that math and illustrate SD-WAN's reliability boost using real-world examples. The results can be eye-opening.
Mapping Out the Math
When considering the reliability of a wide area network, the key quantity is the network availability at each site. We can think of network availability in two ways: historically and predictively.
Historically, the availability of a network service over some time period is simply the amount of time that service was “up” and available during that period divided by the total time period.
Predictively, we can calculate the expected availability of a network service based on its mean time between failure (MTBF) and its mean time to repair/restore (MTTR): expected availability is MTBF (the average uptime between outages) divided by MTBF + MTTR (the average total time between outages).
Mathematically, the expected availability is the probability that the service will be available when it's needed, so the closer it is to one, the more reliable the service. Expected availability, like any probability, is often expressed as a percentage.
Let's look at a real-world example. The MTBF of a typical broadband circuit might be around 600 hours (25 days). The MTTR for broadband can be lengthy, perhaps around 12 hours. Plugging these figures into the formula above yields an expected availability of about 98%. On average, this equates to a downtime of 14.1 hours in a month.
For many businesses, broadband's potential downtime is unacceptable, prompting them to procure more reliable (and more expensive) services such as MPLS or DIA over carrier-grade Ethernet. The MTBF for these services might be 2400 hours (100 days), and the MTTR might be 4 hours, yielding an expected availability of about 99.8%. On average, this equates to a downtime of 1.2 hours in a month, a significant improvement over broadband.
But with SD-WAN we can do even better. The key is SD-WAN's ability to utilize multiple network services simultaneously. If those services are diverse and independent of one another, the likelihood of a location being entirely out of service is the likelihood of all the underlying services being down simultaneously. Let's have a closer look at the math behind this.
Since the expected availability of a service is the probability that it's available for use, one minus the expected availability is the probability that it's not available. If we have two independent services, the probability that both are unavailable is the product of their individual probabilities of being unavailable. And one minus the probability both are unavailable is the probability that at least one service is available. This is how we derive the expected availability of a location with two independent network services.
Let's apply this probability math using the quantities from our previous examples. Consider a site served by two independent and diverse broadband services (e.g., one cable and one 4G cellular, a combination frequently seen in the retail and hospitality industries). Plugging in the broadband availability value from above, we find the expected availability of the combination to be 99.96%. On average, this works out to only 16.6 minutes per month of downtime, substantially better than carrier-grade MPLS or DIA. Remarkably, the combination of two relatively unreliable things results in something very reliable!
Now consider the case of a site with one MPLS or DIA over telco-grade Ethernet and one cable broadband, a configuration commonly seen in hybrid WAN's. Plugging in the corresponding availability values from above, we find the expected availability of this combination to be 99.9967%. On average, this works out to only 1.4 minutes per month of downtime, a figure small enough to satisfy all but the most demanding enterprises.
Optimal Cost Efficiency with SD-WAN
SD-WAN's ability to incorporate low-cost broadband services as part of its underlay without compromising reliability is helping businesses reduce the overall cost per Mb of their WAN's. Many that have historically relied on higher cost telco-grade services like DIA and MPLS are replacing or augmenting those services with lower-cost broadband.
And businesses with strict uptime requirements are replacing expensive secondary telco-grade networks with economical, diverse broadband at a fraction of the cost per Mb with negligible impact to site availability. These cost savings along with the reduction of costly downtime are key drivers in the business case to adopt SD-WAN.
About GTT
GTT connects people across organizations, around the world, and to every application in the cloud. Our clients benefit from an outstanding service experience built on our core values of simplicity, speed, and agility. GTT owns and operates a global Tier 1 internet network and provides a comprehensive suite of cloud networking services. We also offer a complementary portfolio of managed services, including managed SD-WAN from leading technology vendors.