Internet Core is Complicating Cloud-Era Connectivity

by Adam Gervin

The cloud era has been anything but simple for businesses. In fact, a lot of the challenges they face may be attributed to how hard it is for network operations teams to consistently provide reliable connectivity among their employees, assets, applications, and services.

So the opportunity is to simplify end-to-end reliable WAN for businesses, large and small. That's why we were at ONUG in NYC (ok, were there for the bagels also). We saw an amazing presentation by Steve Garson of SD-WAN Experts, titled "Measuring Internet Core Variability."

Why amazing?

We've been blogging about the Internet, how it's generally great, but perhaps problematic when it's used as part of an SD-WAN solution for mission-critical business applications.

We've told you about the "good parts" version of the Internet — separating Access from Core. And we've implicated the Core as the primary cause of unpredictability.

We are always amazed when others produce corroborative evidence. At ONUG, others = Steve Garson of SD-WAN Experts.

Let's take a look at the first part of Steve's presentation.

SD-WAN Experts used three third-party tools: Cedexis for long-haul measurement. Catchpoint and Speedtest for last-mile or Access measurement.

Steve chose response time (i.e. send + wait) as the ideal methodology, because it excludes one-time events like DNS and connect, and it's a better measure of real Internet response than ping.

In his first series of tests, Steve calculated Core performance by subtracting access performance from a long-haul performance. Of course, performance varies over time. I'm no mathematician, but I seem to remember that in cases of two parts of a series with variability, the variance of the whole (variance = standard deviation, squared) is equal to the sum of the variance of each part, plus two times the covariance (the correlation of variance in part one with part two).

I think it's a safe assumption that, in general, performance variance of the last-mile is wholly independent of core variance, meaning covariance = 0. Steve makes this assumption, which means the long-haul variance = core variance + access variance. Whew. We are done with the math.

Steve collected data from servers originating in San Jose, London, Tokyo, Sydney, and Virginia. He looked at the performance to end users in Bangalore, Washington D.C., Tokyo, London, Melbourne, and San Francisco. Here is a table of the raw data:

The results are pretty clear. For long hauls, the vast majority of response variance occurred in the Internet Core, not the last-mile. In fact, 99.5% of response variance happened in the Internet Core. That means that your business traffic, over long hauls, is experiencing the vast majority of dropped packets, jitter, etc., as a result of the Internet Core. Even with the lovely software-defined benefits of SD-WAN at your corporate edge.

99.5% of long haul Variance happens in the Internet Core. Why?

It's largely an issue of economics. Internet Access networks receive 300% to 1000% the investment of the Internet Core, most of it coming from customers. The Core is based on least-cost peering and routing.

Steve went on to provide additional tests of Internet Core performance, and we will cover them in upcoming blogs.

But for now, we know one thing for sure. If you want to simplify reliable end-to-end WAN so your businesses can hum, you can't stop at SD-WAN. You need to replace the Internet Core with something far more reliable. You could use something overpriced and rigid like MPLS. But you really want something that's SD-WAN friendly, flexible, and affordable. You may not know it yet, but your want a software-defined core (SD-CORE).

Jennifer English at TechTarget has done a nice job reviewing Steve's presentation. Her conclusion — you may not be able to rely on SD-WAN + Internet, but you no longer have to pay for expensive solutions like MPLS. SD-CORE might be just the thing for simplifying the reliability of your SD-WAN.