Receive Only Actionable Alerts Through Event Correlation And Avoid Fault Storms

Many network monitoring solutions overload the NOC with redundant and insignificant network alerts.  The volume of alerts can be overwhelming and create confusion of which alert to address first.  Missing the priority can cause outages, user frustration or violation of SLAs.

Event Storm

NerveCenter can reduce the alert storms and present operators with only actionable alerts.  With NerveCenter, operators can customize, correlate (with other information) and define a unique sequence of events over time that would trigger one truly meaningful alert. IT staff are presented with only actionable alerts to make efficient use of their time. This reduces IT infrastructure costs and improves service availability and network performance.

It all starts with NerveCenter’s unique architecture, which sets it apart from other IT Operations management products.  At the heart of NerveCenter is its correlation engine.  NerveCenter uses finite state machines – or alarms – that define operational states it wants to detect.  These state machines enable NerveCenter to correlate data from multiple sources over time, including reading directly from databases.  As it detects the operational states, it can apply these advanced correlation types to the data:

Suppressive                       suppression of secondary faults or events.

Sequence                            sequence of events.

Temporal                             timing and persistence of events.

Associative                         associate separate events into a single event.

Forward chaining             all of the above types can be used in an advanced, forward chaining model (such as sequence detect + association + a suppression)

NerveCenter can suppress repetitive faults and alerts.  For example, once a server down is alerted, NerveCenter can suppress alerting on the succeeding polls as long as the issue persists.

Custom logic can be incorporated to alert on a sequence of events.  If there is a known pattern of events that historically lead to a network issue, logic can be incorporated into the design of the state model to monitor for the sequence and alert appropriately.

If traffic on an interface is being monitored and a capacity threshold of 80% is considered alertable, most network monitoring solutions will alert each time the threshold is crossed, regardless of whether the threshold breech was caused by transient traffic and traffic volume dropped back to acceptable levels shortly.  NerveCenter can react to the initial threshold breech by polling the interface on a faster interval to check to see if the issue is momentary or persistent.  If persistent, an actionable alert will result.

NerveCenter can suppress faults caused by another alert by associating separate events into a single event.  If a router goes down and causes a server cluster to be unreachable but appear to be down, NerveCenter can associate the events into one actionable alert.

Tags: , , , , , ,