4 System Management Functions
Last update
PvD
4 SMF
Alarm Reporting Function
Recommendation X.733.
Overview
- CMIP Notification
- Event Type
- Event Information
- List of Probable Causes
CMIP Notification
X.733 Alarm Reporting is primarily about the alarm as a CMIP message. As alarm messages are CMIP 'Event Reports', some parameters are generic CMIP:
- Invoke identifier (mandatory field): unique identifier to distinguish this notification from other notifications or operations;
- Mode (mandatory field): whether the reception of the notification is to be confirmed or non-confirmed;
- Managed Object class (mandatory field);
- Managed Object instance (mandatory field);
- Event Type (mandatory field): in this case various Alarm types (see section Event Type below);
- Event time (optional field);
- Event information (optional field): a lot of information (see section Event Information below);
So the CMIP message already identifies the originating object (i.e. the object issuing an alarm).
An 'Alarm record' is a managed object class derived from the Event log record object class defined in CCITT Rec. X.721 | ISO/IEC 10165-2.
The Alarm record object class represents information stored in logs as a result of receiving an event report where the event type is one of the alarm types defined in this Recommendation | International Standard.
Event Type
Event type consists of one of the following Alarm types:
- Communications alarm: principally associated with the procedures and/or processes required to convey information from one point to another;
- Quality of Service alarm: principally associated with a degradation in the quality of a service;
- Processing error alarm: principally associated with a software or processing fault;
- Equipment alarm: principally associated with an equipment fault;
- Environmental alarm: principally associated with a condition relating to an enclosure in which the equipment resides.
See also the predefined List of Alarms below.
Event Information
Event information (fields are optional unless explicitly stated otherwise):
- Probable cause (mandatory): [X.721] already defines a list (see section list) which can be extended using ASN.1.
- Specific problems: a set of object identifiers which identifies further refinements to the Probable cause of the alarm.
- Perceived severity (mandatory): six severity levels provide an indication of how it is perceived that the capability of the managed object has been affected (see section Alarm Severity below).
- Backed-up status: when present, specifies true or false whether the object emitting the alarm has been backed-up, and services provided to the user have, therefore, not been disrupted {i.e. also consequences for Alarm Severity}.
- Back-up object (conditional): present when the Backed-up status parameter has the value true. This parameter specifies the managed object instance that is providing back-up services for the managed object about which the notification pertains. This parameter is useful, for example, when the back-up object is from a pool of objects any of which may be dynamically allocated to replace a faulty object.
The Back-up object parameter is related to the Back-up object relationship attribute defined in X.732. The value of this parameter shall be the same as the Back-up object attribute value when the alarm is emitted.
- Trend indication: when present, specifies the current severity trend of the managed object. If present it indicates that there are one or more "outstanding" alarms which have not been cleared, and pertain to the same managed object as that to which this "current" alarm pertains. The Trend indication parameter has three possible values:
- more severe: the Perceived severity in the current alarm is higher than that reported in any of the outstanding alarms;
- no change: the Perceived severity reported in the current alarm is the same as the most severe of any of the outstanding alarms;
- less severe: there is at least one outstanding alarm of a severity higher than that in the current alarm.
- Threshold information (conditional): shall be present when the alarm is a result of crossing a threshold. It consists of four subparameters:
- triggered threshold: the identifier of the threshold attribute that caused the notification;
- threshold level: in the case of a gauge the threshold level specifies a pair of threshold values, the first being the value of the crossed threshold, and the second its corresponding hysteresis; in the case of a counter the threshold level specifies only the threshold value.
- observed value: the value of the gauge or counter which crossed the threshold.
- arm time: for a gauge threshold, the time at which the threshold was last re-armed, namely the time after the previous threshold crossing at which the hysteresis value of the threshold was exceeded thus again permitting generation of notifications when the threshold is crossed. For a counter threshold, the later of the time at which the threshold offset was last applied, or the time at which the counter was last initialized (for resettable counters).
- Notification identifier: when present, provides an identifier for the notification, which may be carried in the Correlated notifications parameter (see next item below) of future notifications. Notification identifiers must be chosen to be unique across all notifications of a particular managed object throughout the time that correlation is significant.
A Notification identifier may be reused if there is no requirement that the previous notification using that Notification identifier be correlated with future notifications. Generally, Notification identifiers should be chosen to ensure uniqueness over as long a time as is feasible for the managed system.
- Correlated notifications: when present, contains a set of Notification identifiers and, if necessary, their associated managed object instance names. This set is defined to be the set of all notifications to which this notification is considered to be correlated. The source object instance shall be present if the correlated event report is from a managed object instance other than the one in which the Correlated notifications parameter appears. {How correlation should be performed is not defined by ITU.}
- State change definition: when present, is used to indicate a state transition associated with the alarm. If the managed object class definition includes state change notifications, it shall also emit a state change notification.
- Monitored attributes: when present, defines one or more attributes of the managed object and their corresponding values at the time of the alarm. Managed object definers may specify the set of attributes which are of interest, if any. This allows, for example, the timely reporting of changing conditions prevalent at the time of the alarm.
- Proposed repair actions: when present, is used if the cause is known and the system being managed can suggest one or more solutions (such as switch in standby equipment, retry, replace media). This parameter is a set of possibilities specified by the object class definer.
This parameter is a set of object identifiers (registered using the procedures defined for ASN.1 Object Identifier values [X.208]).
Two values with the following semantics have been assigned to this parameter:
- no repair action required: to indicate that the manager is not required to initiate any repair action because it is not the manager's responsibility;
- repair action required: to indicate that the manager is required to initiate repair action to correct the problem reported in the alarm report. This value also indicates that no specific repair action is proposed by the agent system.
- Additional text: when present, allows a free form text description to be reported. No format or meaning of this field or understanding the semantics is required for interpretation of the notification.
- Additional information: when present, allows the inclusion of a set of additional information in the event report. It is a series of data structures each of which contains three items of information: an identifier, a significance indicator, and the problem information.
- The identifier subparameter carries a registered object identifier which defines the data type of the information subparameter. The data type must be understood by the managing system in order for the contents of the information subparameter to be parsed. Additional identifiers may be registered using the procedures defined for ASN.1 object identifier values in X.208.
- The significance subparameter is a boolean value which is set to true if the receiving system must be able to parse the contents of the information subparameter for the event report to be fully understood. Even if the Additional information parameter is not fully understood, an event report indication shall be issued to the user. Indication that the Additional information parameter is not fully understood is a local matter.
- The information subparameter carries information about the event. This information can be parsed if the identifier is understood.
Alarm Severity
The Perceived severity field in Event information can have the following values:
- indeterminate: indicates that the severity level cannot be determined.
- warning: indicates the detection of a potential or impending service affecting fault, before any significant effects have been felt. Action should be taken to further diagnose (if necessary) and correct the problem in order to prevent it from becoming a more serious service affecting fault.
- minor: indicates the existence of a non-service affecting fault condition and that corrective action should be taken in order to prevent a more serious (for example, service affecting) fault. Such a severity can be reported, for example, when the detected alarm condition is not currently degrading the capacity of the managed object.
- major: indicates that a service affecting condition has developed and an urgent corrective action is required. Such a severity can be reported, for example, when there is a severe degradation in the capability of the managed object and its full capability must be restored.
- critical: indicates that a service affecting condition has occurred and an immediate corrective action is required. Such a severity can be reported, for example, when a managed object becomes totally out of service and its capability must be restored.
- cleared: indicates the clearing of one or more previously reported alarms. This alarm clears all alarms for this managed object that have the same Alarm type, Probable cause and Specific problems (if given). Multiple associated notifications may be cleared by using the Correlated notifications parameter (see corresponding Event Information item) !
In practice, the Severity of a fault is less straightforward. When a section of a transmission link is cut, the severity will probably be Critical. However, when that section is protected like in an SDH-ring, it will only be Minor at most. If there is no ring protection, but paths are potentially protected on individual basis, it is unclear what severity this fault must get. See also (MFA) Fault Management Alarm surveillance.
List of Probable Causes
[X.721] already defines a list which can be extended using ASN.1 (most systems do that).
Alarm type | Probable cause |
Communications | Loss of signal |
Loss of frame |
Framing error |
Local node transmission error |
Remote node transmission error |
Call establishment error |
Degraded signal |
Communications subsystem failure |
Communications protocol error |
LAN error |
DTE-DCE interface error |
Quality of service | Response time excessive |
Queue size exceeded |
Bandwidth reduced |
Retransmission rate excessive |
Threshold crossed |
Performance degraded |
Congestion |
Resource at or nearing capacity |
Processing error | Storage capacity problem |
Version mismatch |
Corrupt data |
CPU cycles limit exceeded |
Software error |
Software program error |
Software program abnormally terminated |
File error |
Out of memory |
Underlying resource unavailable |
Application subsystem failure |
Configuration or customization error |
Equipment | Power problem |
Timing problem |
Processor problem |
Dataset or modem error |
Multiplexer problem |
Receiver failure |
Transmitter failure |
Receive failure |
Transmit failure |
Output device error |
Input device error |
I/O device error |
Equipment malfunction |
Adapter error |
Environmental | Temperature unacceptable |
Humidity unacceptable |
Heating/ventilation/cooling system problem |
Fire detected |
Flood detected |
Toxic leak detected |
Leak detected |
Pressure unacceptable |
Excessive vibration |
Material supply exhausted |
Pump failure |
Enclosure door open |
Most alarm handlers provide additional functionality, like keeping a list of active alarms sorted on severity or area, count repeating alarms, etc.
See also (Management Functional Area) Fault Management.
=O=