Sunday, April 10, 2016

Warning of Service Interruption on a Multiplex Section

Warning of Service Interruption on a Multiplex Section Resulted from Simultaneous SD Clearing in OptiX NGSDH and OCS Products

Product Family: MSTP      Product Model: OSN 1500/OSN 2500/OSN 3500 II/OSN 7500

[Problem Description]
Triggering conditions
SD conditions at the two ends of a span on the same MSP ring are simultaneously cleared.
Symptoms
After SD conditions are cleared, some sites on the MSP ring do not recover and APS-INDI does not end.
Services on the MSP ring are interrupted.
Identification method
When an NE meets the following conditions, the problem addressed in this document occurs on the NE:
1. The NE runs a version specified in the preceding table, and ring MPS is configured for the NE.
2. After SD conditions are cleared, some sites on the MSP ring do not recover and APS-INDI does not end. In addition, services on the MSP ring are interrupted.
3. K bytes 0x8xxx, 0x5xxx, and 0x0xxx are cyclically sent during the period the problem persists. For details, see the following information:
#0x90406:cfg-get-rmsevent:1;
                                  MSSPR-EVENT-LOG                                 
    PG-ID  EVENT-NO  EVENT-VALUE     EVENT-PARA  DATE-TIME            TIME-STAMP  
    1      570       K_SENDS         0x512a      2012-4-29 12:33:15   0x022ebdd5  
    1      571       K_DIR           0x0002      2012-4-29 12:33:15   0x022ebdeb  
    1      572       STATE_TRANS     0x2405      2012-4-29 12:33:15   0x022ebe76  
    1      573       K_RECEIVED      0x0010      2012-4-29 12:33:15   0x022ebeb7  
    1      574       K_DIR           0x0002      2012-4-29 12:33:15   0x022ebebb  
    1      575       XC_EXECUTE      0x0200      2012-4-29 12:33:15   0x022ec323  
    1      576       K_SENDS         0x0020      2012-4-29 12:33:15   0x022ec3bf  
    1      577       K_DIR           0x0002      2012-4-29 12:33:15   0x022ec3d6  
    1      578       K_SENDS         0x0120      2012-4-29 12:33:15   0x022ec440  
    1      579       K_DIR           0x0000      2012-4-29 12:33:15   0x022ec450  
    1      580       STATE_TRANS     0x2100      2012-4-29 12:33:15   0x022ec4d3  
    1      581       K_RECEIVED      0x821a      2012-4-29 12:33:15   0x022ef3e3  
    1      582       K_DIR           0x0002      2012-4-29 12:33:15   0x022ef3e9  
    1      583       K_SENDS         0x1121      2012-4-29 12:33:15   0x022ef71e  
    1      584       K_DIR           0x0000      2012-4-29 12:33:15   0x022ef723  
    1      585       K_SENDS         0x8129      2012-4-29 12:33:15   0x022ef7aa  
    1      586       K_DIR           0x0002      2012-4-29 12:33:15   0x022ef7bf  
    1      587       STATE_TRANS     0x0408      2012-4-29 12:33:15   0x022ef846  
    1      588       K_RECEIVED      0x1212      2012-4-29 12:33:15   0x022ef8f9  
    1      589       K_DIR           0x0000      2012-4-29 12:33:15   0x022ef8fe  
    1      590       T2_START        0x0000      2012-4-29 12:33:15   0x022ef977  
    1      591       K_SENDS         0x5122      2012-4-29 12:33:15   0x022efa09  
    1      592       K_DIR           0x0000      2012-4-29 12:33:15   0x022efa0d  

Check whether a WDM device exists between the transmission devices and whether OLP is configured for the WDM device.

[Root Cause]
The SD condition on the site at one end of the span is cleared soon after the SD condition on the site at the other end of the span is cleared. The long path is long. As a result, the SD conditions are cleared between the time point at which a site receives the K byte sent on the short path and the time point at which the same site receives the K byte sent on the long path. Consequently, the same NE receives two different K bytes in the same direction. The standard does not provide a field in the K byte sent on the long path to specify whether the end responding to a fault is the end triggering a switchover or the end responding to a switchover. An NE sends a response as long as the request received on the long path differs from the local request. In the previously addressed scenario, the NE at either end receives two different requests from the long path and therefore responds to the requests because it considers that the opposite NE sends the requests, resulting in status flapping.

[Impact and Risk]
The MSP ring is not in the normal state and its carried services are interrupted.

[Measures and Solutions]
Recovery measures
Restart the multiplex section protocol throughout the ring.
Workarounds
Workaround 1: Do not use SD as a condition to trigger an MSP switchover.
Advantage: When SD occurs, the problem addressed in this document does not occur.
Disadvantage: If SD affects services, it will occur on links. Then, services are interrupted, and no protective switching is triggered.

Workaround 2: Do not use SD as a condition to trigger an MSP switchover. Reduce the SF threshold to the current SD value.
Advantage: When SD conditions at the two ends of a span are simultaneously cleared, the problem addressed in this document does not occur.
Disadvantage: If SD occurs on multiple sites on the same MSP ring, multiple SF switchovers occur. As a result, isolated areas occur, resulting in multiplex section squelching and service interruption.


Workaround 3: Set the MSP hold-off time (applicable to scenarios with OLP configured).
Advantage: If OLP triggers a switchover for the intermediate WDM device, no MSP switchover will occur. Therefore, the problem addressed in this document does not occur.
Disadvantage: If the hold-off time is set to 100 ms and no OLP switchover is triggered, an MSP switchover takes about 150 ms, which exceeds the allowed time.


Preventive measures
OCS:
Upgrade NEs involved to V1R6C03SPC200+SPH206 or a later V1R6C03 version or V1R6C05SPC201 or a later version.
NGSDH:
For NEs running V100R008C02SPC500, install SPH505 or a later patch version.
For NEs running V100R008C02SPC200, install SPH203 or a later patch version.
Upgrade NGSDH NEs running V100R009 or V100R010 to V100R010C03SPC203 or a later version.
Upgrade NGSDH NEs running V200R011 or V200R012 to V200R012C01 or a later version.
Material handling after replacement
None

[Inspector Applicable or Not]
None

[Rectification Scope and Time Requirements]
None

[Rectification Instructions]
None

[Attachment]
None


More related:

Warning of Unexpected NE Resets Resulted from Resetting Boards Without a CPU on MSTP NEs


No comments:

Post a Comment