Monday, August 1, 2016

How to Clear the COMMUN_FAIL Alarm on the OptiX OSN 6800?

Boards on an OptiX OSN 6800 NE malfunction, and therefore the system reports the COMMUN_FAIL alarm.

Fault Type

COMMUN_FAIL
SUBRACK_LOOP

Symptom

Multiple boards on a newly-deployed OptiX OSN 6800 NE report the COMMUN_FAIL alarm at the same time. The NE subrack is the master subrack.
On the NMS, the historical alarm record shows that the SUBRACK_LOOP alarm is reported before the COMMUN_FAIL alarm.

Cause Analysis

Possible causes for the problem on a single NE are as follows:
  • Boards on the NE are in the cold or warm reset state.
  • The network cables that cascade subracks do not meet relevant requirements.
  • Boards on the NE are malfunctioning.

Procedure

  1. On the NMS, check the COMMUN_FAIL alarm parameter. The parameter is 0x01 0x00 0x03, indicating that inter-board ETH communication fails.
  2. The NE has ever reported the SUBRACK_LOOP alarm and this alarm is cleared one minute later. The SUBRACK_LOOP alarm indicates a loopback on network ports between subracks. A loopback can cause a broadcast storm on the network and block some communication ports.
  3. Remove and then reinsert the AUX board. After the board starts up, the alarm is cleared and is not reported again.

Result

The problem is resolved.
In this maintenance case, the OptiX OSN 6800 works in the master-slave mode. Internal network ports of the subracks are connected to form a loop. This causes an Ethernet broadcast storm and blocks some communication ports. Therefore, communication on these boards fails. If a COMMUN_FAIL alarm is generated accompanied by a SUBRACK_LOOP alarm on a live network, check network cables between subracks. If the SUBRACK_LOOP alarm is cleared but the COMMUN_FAIL alarm persists, reset (cold) the AUX board.

Reference Information

Meanings of possible values of the COMMUN_FAIL alarm parameter are as follows:
  • 0x01 0x00 0x01 indicates that channel 1 of RS485 fails.
  • 0x01 0x00 0x02 indicates that channel 2 of RS485 fails.
  • 0x01 0x00 0x03 indicates that ETH communication between boards fails.
  • 0x01 0x00 0x04 indicates that emergency ETH communication between subracks fails.


No comments:

Post a Comment