Part 2 of 2: Service Assurance: What Happens in Pakistan Stays in Pakistan (Mu Dynamics blog)
by Thomas Maufer on 07 March 2008 - 01:34:24 PM
Service Assurance for Next-Generation Networks is a Layer 2-7 Problem
(Part 2 - Bugs Happen)
Note: This outage has been covered elsewhere in the media. The purpose of this blog is not to point fingers, but to learn lessons about how to build processes that "crash test" new protocol software in advance of, and during, deployment, as part of a life cycle approach that minimizes down time, thereby maximizing customer satisfaction -- especially important business metrics in next-generation IP-enabled services like IMS (including VoIP) and IPTV.
---
Newer protocols (like those used for IPTV) are more complex: A
dangerous combination for new services that hopefully generate revenue to depend on new protocols that are relatively untested. The more complex the protocols get, the more
likely it is that unintentional misbehavior will occur. This is a much more
fragile version of what we saw at layer-3, where changing a router
configuration had spectacular unintended consequences.
The Pakistan vs.
YouTube example depicts how a service interruption at the network layer
can affect the network's ability to deliver IPTV (or any other IP-based application!). The conventional
wisdom tells you that layer-3 is pretty well-baked and stable. Next-gen
IP service protocols must be stable, but they are new, so it's hard to know how much faith to place in them, but configuration
complexity undermines even the most battle-tested code. In the case of the YouTube incident, the effect on the
router code wasn't a crash, and in fact the routers did exactly as
instructed. Still the IPTV service was interrupted. Even non-next-gen TV can be interrupted, so it should be clear that Service Assurance testing really isn't optional. In the next-gen network protocols like RTP, SIP, SCTP, RTSP, HTTP, customization is rampant, many active protocol "dialects" exist, frequent
product updates and configuration changes occur, etc. and it's easy to see a
nearly infinite level of possible problems.
There is a way to expose the root of many of these software implementation flaws: Proactive Service Assurance validation. By actively attacking the software
implementations of the protocols within the network components used to
build next-generation IP networks, service providers can identify
weaknesses (and 0-day vulnerabilities) before they affect service, not
just once but as part of an ongoing process of discovery and
remediation. This Service Assurance process is already a best practice
in leading cable MSOs and large telecommunications providers in North
America, and is rapidly spreading across Asia and Europe as well.
Service
Assurance is comprised of attacks against network protocols and
products. These attacks are repeatable and exercise several dimensions
of the software's behavior: 1) Can it tolerate invalid inputs? The real
world is full of them. How do these invalid inputs affect the device's
ability to handle "good" traffic? 2) Can the target handle a low rate of
invalid inputs, but not a high rate? 3) Is the target vulnerable to well-known
failure modes? All of these questions affect the suitability of a
device for production deployment where it can bring in revenue or cause
downtime and lose customers.
Carriers often have straightforward
"success" measurements for deployment: Service-level agreements (SLAs;
in other words, uptime commitments); end-to-end latency; response
time; throughput (packets per second, calls per second, etc.). If the
network is anything BUT highly reliable, available, and secure, then
those weak, unreliable components undermine the carrier's ability to
deliver the level of service promised to customers. When outages
happen, regardless of location, it's not only embarrassing, it is costly.
Carriers employing Service Assurance realize fewer software flaws in
their unique combination of network components. They also know how the
network behaves when connected to live, ugly traffic, representative of real-world conditions. Next Gen IP networks using IMS and IPTV are in rapid demand today,
and the networks that best tolerate invalid inputs, that scale the best,
and that retaining the most customers will be the ones that are the most profitable.
Comments:
Write a comment
- Required fields are marked with *.
|