Uptime is one of the most vital aspects of connectivity in the modern enterprise. From the LAN to the WAN, maintaining users’ access to mission-critical apps and services can make or break a business.

There are a myriad of statistics and online calculators to help quantify the impact of downtime. Gartner, for example, has pegged the number at $5,600 USD per minute. While this number will obviously vary depending on your industry, scope of the outage, and a number of other variables, the takeaway is clear: downtime hurts business productivity. For this reason, IT teams across the globe put plans into place to prevent downtime from occurring and recover rapidly when it does.

Downtime mitigation strategies usually entail some form of redundancy and failover. Too often in the field of WAN connectivity, failover is often complex and requires some level of manual interaction to execute. Fortunately for users of premium, cloud-based SD-WAN (a.k.a. SDWaaS) features like self-healing and Intelligent Last Mile Management (ILMM) enable truly automatic failover and a proactive approach to WAN uptime management.

In this piece, we’ll dive into the details of the self-monitoring, self-healing, and ILMM features offered by SDWaaS and explain how they can help optimize uptime for your enterprise WAN.

Self-monitoring & self-healing

Backed by SLAs, with multiple Points of Presence (POPs) across the globe, and a backbone supported by multiple tier 1 Internet Service Providers (ISPs), premium cloud-based SDWaaS is inherently robust. Simply given those features, SDWaaS is capable of matching or exceeding the performance and uptime of other WAN solutions. When you add in HA (high availability) features like self-monitoring and self-healing, you begin to understand how SDWaaS separates itself from the pack.

To understand the benefits of self-monitoring and self-healing, it is useful to understand the legacy WAN failover paradigm. With traditional WAN failover solutions, there are often dedicated security devices, with specific rulesets and policies, at each physical location. When a failover occurs, these specific rulesets and policies must be updated. Often by specific Infosec personnel within the enterprise. This manual process has a non-trivial amount of friction (what if the security engineer is unavailable for an hour, what if something goes awry with the rule change? etc.) that can extend the duration of an outage.

Similarly, when it is time for the failover to switch back, the changes may need to be undone. The reason manual processes are often required is because the security and networking portions of traditional WAN solutions are often two (or more) disjointed, unintegrated systems.

SDWaaS mitigates the need for manual processing and risk of security-policy based conflicts by automating the failover process, including dynamic updates of relevant security policies. The reason failover can be so simple and easily automated with SDWaaS is: the security and networking infrastructure are both baked-in to one holistic hosted traffic management platform. With SDWaaS, when one POP fails, failover to the next closest POP occurs seamlessly, without the need for manual intervention.

Intelligent last mile management 

Thus far, we have reviewed the topic of maximizing uptime as it relates to blackouts (the complete loss of connectivity), however brownouts (instances where the quality of connectivity significantly degrades) can be just as painful to operation. Within an enterprise WAN, one of the hardest areas to manage can be the last mile (a.k.a. the first mile). This is because the quality of infrastructure, reliability of a connection, and other external variables (e.g. weather), can vary significantly from region to region.

Intelligent Last Mile Management or ILMM, not only helps mitigate blackouts that occur in the last-mile, it also helps proactively address brownouts (think high levels of jitter, packet loss, latency, issues with specific services, etc.). ILMM does this by performing dynamic, granular, and intelligent monitoring of performance upstream and downstream of the ISP. This allows enterprises to quickly identify and resolve issues within the last-mile in a manner that is significantly faster than the traditional support-ticket and change request processes common with legacy WAN solutions like MPLS (Multiprotocol Label Switching). Some of the specific features of ILMM that help differentiate it from other less sophisticated solutions include:

  • Continuous last mile profiling. By profiling performance with ILMM enterprises are able to benefit from a baseline of metrics like latency, packet loss, and jitter. Having this baseline enables the creation of a dynamic last mile performance model that can be used to proactively address minor issues before they become major sources of downtime.
  • Infrastructure service monitoring. Traditional monitoring solutions often only monitor uptime by using ICMP (ping) to determine if a routing device at the ISP or another endpoint it up or down. ILMM is able to not only monitor ICMP, but also able to identify outages in commonly used network services like HTTP and DNS. This additional visibility greatly increases an enterprise’s ability to identify outages that do not have their root cause at the router level (e.g. a misbehaving application).
  • Pinpoint identification. By monitoring the entire connection from location to ISP, ILMM is able to accurately and rapidly identify the cause of problems. This minimizes finger pointing between users and providers, and reduces mean time to resolution.

Self-monitoring, self-healing, and ILMM enable an automated path to maximum uptime

Uptime is vital to any organization, and rapid detection of and recovery when failures occur is vital to maximizing uptime. By taking the manual processes and friction out of failover and redundancy and enabling granular monitoring of the last mile, SDWaaS with self-monitoring, self-healing, and ILMM allow enterprises to benefit from robust, reliable, and high-performance WAN connectivity at a significant cost savings when compared with MPLS.