Notice on Precaution for DCN Channel Allocation Limitations in OptiX OSN 8800
Summary: For an OptiX OSN 8800 NE of a version earlier than V100R007C00, when more than 1024 DCN channels are allocated, there is a relatively high probability that the system control board of the NE is unexpectedly reset. In addition, when the system control board undergoes active/standby switching or a reset, there is a certain probability that the DCN communication between the NE and the peer NE is interrupted.
[Problem Description]
Trigger condition:
Condition 1: The NE version is earlier than V100R007C00.
Condition 2: The number of DCN channels in a single subrack exceeds 1024. This condition is generally present in scenarios where service boards are fully configured. Examples of such scenarios include:
TN54THA boards are fully configured in an OptiX OSN 8800 T32 subrack.
Excessive TOM boards are configured in an OptiX OSN 8800 T64 subrack of a version earlier than V100R006C00.
Excessive TN54THA and 8 port TOM/TOX/NO2 boards are configured in an OptiX OSN 8800 T64
subrack
of V100R006C00 or a later version.
Condition 3: Excessive service boards are configured in a single subrack and the
required DCN channels exceed the allocation capability of the system control board.
An example of such a scenario is as follows: In an OptiX OSN 8800 T32 subrack equipped
with the TN52SCC board, 19 or more 8-port TOM/TOX/NO2 boards are configured.
Each optical port on the TOM/TOX/NO2 board needs to be allocated one 3-byte channel,
one 9-byte channel, and one 18- or 24-byte channel. In this case, 19 boards
require a total of 152 (19 x 8) channels for each channel type. The TN52SCC board
supports a maximum of 150 channels for 3-byte and 9-byte channel types each
and 100 channels for the 18- or 24 byte channel type. This means that
the TN52SCC board has no capability to provide one 3-byte channel, one 9-byte
channel, and one 18- or 24-byte channel for each service board.
The following table lists the DCN channel allocation capability of each type of
system control board.
Symptom:
Symptom 1: When both conditions 1 and 2 are present, there is a relatively high
probability that the system control board of the NE frequently experiences unexpected
resets after the DCN channels on tributary boards are disabled in batches.
Symptom 2: When both conditions 1 and 3 are present, the following result may occur in
case of a switchover or reset of the system control board of the NE:
Result 1: Some peer NEs of the NE can be found in the routing table of the NE but
cannot be logged in.
Result 2: Some peer NEs cannot be found in the routing table of the NE.
Identification method:
For an NE that has symptom 1, the method of determining whether the NE is involved in
this precaution is as follows:
Run the :cm-get-newbdinfo command to query the DCN channel allocation on the NE.
Based on the command output, calculate the number of DCN channels in each subrack.
If the number of DCN
channels in each subrack does not exceed 1024, run the :cm-get-tti command to
query the FE_DCN channel allocation on the NE. Calculate the sum of the number of
channels obtained using this command and the number of channels obtained using the :
cm-get-newbdinfo command, and then check whether the sum exceeds 1024.
For an NE that has result 1 of symptom 2, the method of determining whether the NE is
involved in this precaution is as follows:
Check the MAC connection information consistency between every two NEs along the path
from the gateway NE to an unreachable NE. In a network shown in the following figure,
assume that NE A is a gateway NE (NE ID: 0x9270e), and NE B (NE ID: 0x94e60) and NE C
are subtending NEs. NE B is the faulty NE and NE C cannot be logged in. In this case,
the MAC connection information consistency between NEs A and B and that between NEs B
and C need to be checked.
Step 1 Use the Navigator to log in to NE A and check the MAC connection information on
NE A.
The MAC connection of NE B can be found on NE A. As shown in the following command
output, the NE ID of NE B is 0x94e60 and the DCN channel used for the MAC connection
of NE B is 676.
#A:szhw [][][2014-06-17 12:06:12+08:00]>
:cm-get-maccon
MAC-CONNECT
DST-ID BOARD-ID FIBER-ID MODE SCC-NO
0x00094e7f 255-255 0 auto 2040
0x00094e60 59 1 auto 676
On NE A, query the communication route information of NE B. In the queried route
information, the peer-end DCN channel of NE B is 140. This channel number
may not be the actual channel number since channel numbers cannot be completely
displayed in versions earlier than V100R007C00.
#A:szhw [][][2014-06-17 12:06:11+08:00]>
:cm-get-eccroute
ECC-ROUTE
DST-ID DXC-ID DISTANCE LEVEL MODE SCC-NO PEER-SCCNO
0x00094e60 0x00094e60 0 4 auto 676 140
NE A sets up a communication connection with NE B through the GCC12_18 channel on
optical port 1 of the board in slot 59.
#A:szhw [][][2014-06-17 12:06:17+08:00]>
:cm-get-newbdinfo
ECC-BDINFO
BID SUB-CARD PORT PORT-STATE CHAN-TYPE LINK-CHAN CHAN-ACCESS INIT-STACK ACT-STACK NEG
CHAN-STATE PROTECT P-Bid P-Pid
59 255 1 port-enable GCC12_18 676 OK hwecc hwecc
unused ok 0 0 0
Step 2 Use the Navigator to log in to the NE B and check the MAC connection
information on NE B.
The MAC connection of NE A (NE ID: 0x9270e) cannot be found on NE B.
The MAC connection of NE A (NE ID: 0x9270e) cannot be found on NE B.
#B:szhw [][][2014-06-17 13:23:49+08:00]>
:cm-get-maccon
MAC-CONNECT
DST-ID BOARD-ID FIBER-ID MODE SCC-NO
0x00094e25 55 1 auto 651
0x00094e23 64 2 auto 684
0x000926fa 21 3 auto 415
Based on the preceding information, NE B is determined as the faulty NE. In practice,
if the MAC connection information on NEs A and B are consistent, repeat the preceding
steps to check the MAC connection information consistency between NEs B and C.
----End
For an NE that has result 2 of symptom 2, the method of determining whether the NE is
involved in this precaution is as follows:
On the NE, run the :cm-get-sccchaninfo command to check the DCN channel
allocation on the system control board of each subrack. In the command output, if
IDLE-NUM of a row is 0, all the channels of the channel type on the
row are allocated. As shown in the following example, all 3-byte DCN channels on
the system control board in slot 85 are allocated.
#9-49136:szhw [][][2014-09-19 14:45:32+08:00]>
>>> cm-get-sccchaninfo:85
CPU-CHAN-INFO
BID CHAN-WIDTH CHAN-TOTAL USED-NUM IDLE-NUM
85 1 24 0 24
85 3 300 300 0
85 9 300 21 279
85 18 200 21 179
85 24 200 0 200
85 80 100 0 100
Total records :6
[Root Cause]
The root cause of symptom 1 is as follows: Disabling of the DCN channels on tributary
boards will cause overwriting of the static memory. Consequently, the system control
board is frequently reset.
The root cause of result 1 of symptom 2 is as follows: There is a low probability that
MAC information is lost when the logic of the system control board frequently processes
concurrent MAC messages. Consequently, the routing information on the NE and that on
the peer NE are inconsistent.
The root cause of result 2 of symptom 2 is as follows: The system control board has a
limited capability of allocating DCN channels. After the system control board undergoes
a reset or active/standby switching, DCN channels are re-allocated. It is possible that
no DCN channel is allocated to a path to which DCN channels were allocated before the
reset or active/standby switching. When this occurs, the peer NE connected to the NE
through the path will not be present in the routing table of the local NE if the path
is the only available path between the two NEs.
[Impact and Risk]
For symptom 1, the memory of the system control board is overwritten and therefore the
system control board is frequently reset.
For symptom 2, the local NE or its downstream NE is unreachable by the NMS or cannot be
logged in after the system control board on the local NE undergoes active/standby
switching or a reset.
[Measures and Solutions]
Recovery measures:
For symptom 1, when more than 1024 DCN channels are allocated in a subrack:
If the system control board does not experience unexpected resets, send the database of
the faulty NE back to the R&D department to modify the DCN configuration, and then
import the modified database to the live network.
If the system control board already experiences frequently unexpected resets, send the
NE database before the reset occurs back to the R&D department to modify the DCN
configuration, clear the database by setting the DIP switches on the system control
board with reference to product manuals, and then import the modified database to the
live network.
For symptom 2, when excessive service boards are configured in a single subrack and
the required DCN channels exceed the allocation capability of the system control board,
take either of the following measures:
Disable the allocated but unused DCN channels, especially the DCN channels on tributary
boards.
As shown in the following figure, if DCC Resources of a DCN channel is Obtained
Already, the DCN channel is allocated.
Note that after you disable a DCN channel, all allocated DCN channels on the optical
port where the DCN channel resides will be disabled.
Delete unused DCN channels.
Workarounds:
Delete or disable unused DCN channels, especially the DCN channels on tributary boards
before the allocated DCN channels exceed the allocation capability of the system
control board.
Preventive measures:
For symptom 1, upgrade the NE to OptiX OSN 8800 V100R007C00 or a later version.
For symptom 2, upgrade the NE to OptiX OSN 8800 V100R007C00 or a later version. If the
current NE version is earlier than OptiX OSN 8800 V100R006C01SPC500, you can also
upgrade the NE to OptiX OSN 8800 V100R006C01SPC500 and then install the OptiX OSN 8800
V100R006C01SPC500SPH520 patch.
More blog:
No comments:
Post a Comment