How CloudFabrix Telco Service Assurance Uses Multi-Protocol and Multi-Layer Correlation to Improve Service Delivery

Telco service providers face a unique set of challenges when it comes to service delivery. Their networks are complex and heterogeneous, and they need to be able to correlate events from multiple protocols and layers in order to quickly identify and resolve problems.

Service providers demand more open technologies and platforms that can solve multiple use cases, and are future-proof, versatile, and adaptable to evolving organizational needs. They also want platforms that enable rapid prototyping and iterative advancement to realize new initiatives and have low barriers to entry in terms of learning curve and ability to customize. This is a tall order, but it is essential for service providers to stay ahead of the curve and deliver the best possible service to their customers.

CloudFabrix Telco Service Assurance addresses these challenges with a powerful multi-protocol and multi-layer correlation engine. This engine can correlate events from SNMP, syslog, and other protocols, as well as from different layers of the network, such as the physical layer, the data link layer, and the network layer.

This allows CloudFabrix to quickly identify the root cause of problems, even in complex networks. This can help service providers improve service delivery by reducing the mean time to repair (MTTR), improving customer satisfaction, and reducing costs.

CloudFabrix Telco Service Assurance is a powerful tool that can help telco service providers solve a variety of network problems. It can correlate events from different sources, such as SNMP traps, syslog events, and network monitoring tool alerts, to quickly identify the root cause of the problem. It can automate the tasks involved in resolving problems, such as sending notifications to the right people and triggering corrective actions.

In this blog, we’ll take a closer look at how CloudFabrix can be used to solve a specific network problem: an interface down condition in an OSPF network. We’ll show you how CloudFabrix can correlate events from different sources to quickly identify the root cause of the problem, and how it can automate the tasks involved in resolving the problem

Scenario: Interface goes down in an OSPF-connected network

An interface goes down in an OSPF-connected network. This can happen for a variety of reasons, such as a physical link failure, a configuration error, or a software bug. When an interface goes down, it can disrupt traffic flow and cause other problems in the network.

Some of the SNMP traps and syslog events you would be getting are:

Typical SNMP traps and Syslog events in the above network link failure scenario

SNMP traps:

  • linkDown: Pyhsical interface
  • ospfLinkDown: For OSPF interface
  • ospfNbrLinkDown: For OSPF neighbor

Syslog Events:

  • %LINEPROTO-5-UPDOWN
  • %LINK-5-CHANGED
  • %OSPF-5-ADJCHG

Upon link recovery:

  • %LINK-3-UPDOWN
  • %LINEPROTO-5-UPDOWN

How does CloudFabrix Telco Service Assurance handle this Scenario?

  1. CloudFabrix receives events from different sources, such as SNMP traps, syslog events, and network monitoring tool alerts.
  2. CloudFabrix correlates these events to identify the interface that is down and the cause of the problem.
  3. CloudFabrix sends notifications to the right people and triggers corrective actions, such as restarting the interface or changing the configuration.
CloudFabrix Telco Service Assurance: Network Correlation Scenario Example

Here are a couple of key steps that are taken in doing the correlation:

  • CloudFabrix uses its multi-protocol and multi-layer correlation engine to correlate events from different sources.
  • CloudFabrix uses its automated root cause analysis to identify the root cause of the problem.
  • CloudFabrix uses its customizable dashboards to visualize the results of the correlation and root cause analysis.

CloudFabrix Telco Service Assurance can be used to quickly identify the root cause of an interface down condition in an OSPF network. CloudFabrix can correlate events from different sources, such as SNMP traps, syslog events, and network monitoring tool alerts, to identify the interface that is down and the cause of the problem.

For example, CloudFabrix can correlate an SNMP trap for a link-down event with a Syslog event for a configuration change to the interface. This would indicate that the interface went down due to a configuration change.

Key Capabilities

There are a few key capabilities that make this service assurance scenario possible:

  • Contextualize: The ability to understand the context of the alert/event data. Which application, which node, how is it connected to other nodes, the physical and/or logical topology, What is the impact of failure of this node?
  • Enrichment: Enhancing the quality of the raw data with context gleaned from native discovery and via integrations with domain controllers, element managers, and monitoring tools.

Particularly for Cisco IOS/IOS-XE/IOS-XR Syslog events, the ability to enrich raw Syslog data with recommended actions can accelerate incident triaging, reduce MTTR, and even do auto-remediation.

  • Correlation: The ability to group related alerts together into a root situation, based on a multi-dimensional correlation engine, that can correlate based on any combination of these four dimensions
    • ML cluster-based: group alerts belonging to the same cluster – trained via unsupervised ML clustering
    • Time dimension: Group alerts within a given time window
    • Topology dimension: group alerts that originate from any node that belongs to a common parent entity (like a business service or an application), or from nodes that are neighbors in a network, or across networks (e.g.: 5G core and edge networks).
    • Horizontal grouping: group alerts belonging to a certain functional domain – like network or storage. Helpful in cases where the functions are organizationally carved out to be performed by different teams (either in-house or outsourced).

Your Next Step with Telco Service Assurance

What we have seen in this blog are a few aspects of Telco Service Assurance, and walked through an example of effective event correlation in a network topology. In addition to Service Assurance, CloudFabrix with its RDA platform, is enabling many use cases in Telco Service Providers, including Performance Management, Bulkstats, Mobility and Optical network assurance, Single Pane of Glass (SPoG) insights, Use Case analytics, and more.

Reach out to our experts for a free no-obligation consultation or to learn more about Telco Service Assurance.

Stay tuned for more updates and announcements from us in this space. Get Started with Telco Service Assurance on Robotic Data Automation Fabric(RDAF) by signing up for free.

Tejo Prayaga
Tejo Prayaga
Tejo Prayaga is a high-growth Product Management & Marketing leader. Tejo has extensive experience helping enterprises build, scale, and market innovative products and solutions that use modern technologies like Data Automation, Artificial Intelligence, Machine Learning, Microservices, Cloud Services, and more. Startup geek, Ex-Cisco, MBA, Speaker, and Toastmaster!! https://www.linkedin.com/in/tprayaga