Sunday, March 13, 2016

What's the Service Unavailability of H806GPBH & H806GPBD Boards of MA5600T Products

When a large number of users are connected to the H806GPBH/H806GPBD boards, 
the DDR3 read operation occasionally becomes abnormal and the external DDR3 cache 
returns incorrect data, causing wrong service packetsAs a result, slow Internet 
access, dialup failure, and board reset may occur.
Product Line:Access network                    Product Family:OLT
Model:  MA5680T&MA5683T                 MA5600T&MA5603T

[Problem Description]

Trigger conditions
1. A large number of users are connected to the H806GPBH/H806GPBD boards (this 
problem is more likely to be triggered when the number of users exceeds 300, and more 
users mean a higher probability for this problem to occur), traffic is heavy, or traffic 
burst occurs.
2. Devices use the patches of versions earlier than V800R008SPC321, V800R010SPC111, 
V800R011SPC109, V800R012SPC106, and V800R013C00SPC205.
Symptom:
Users connected to the H806GPBH/H806GPBD boards encounter slow Internet access or 
dialup failures. Even board reset may occur.
Location method 1 (manual):
When an H806GPBH or H806GPBD board encounters any fault mentioned above, check the 
DDR cache through the transparent channel. Then, determine the problem based on the 
read/write result.
Step1 Check whether the OLT and board versions are earlier than V800R008SPC321, 
V800R010SPC111, V800R011SPC109, V800R012SPC106, or V800R013C00SPC205.
MA5600T(config)#display patch all
   Software Version:MA5600V800R011C00
   SPC100
   SPH103
   HP1102
  ------------------------------------------------------------------------
   Current Patch State:
   ------------------------------------------------------------------------
   Patch Name        Patch State     Delivery     Attribute     Dependency
   ------------------------------------------------------------------------
  SPC100            running         common       cold patch    NO
   SPH103            running         common       hot patch     NO
  HP1102            running         common       hot patch     NO
   ------------------------------------------------------------------------
   Total:3
  Patches in the system cannot be rolled back
Step 2 Enter the transparent channel of the board.
MA5680T(config)#diagnose                                                          
MA5680T(diagnose)%%su                                                           
   Challenge:E8BUH36K                                                            
   Please input password:                --- password (can be obtained using a password generation tool)      
MA5680T(su)%%transparent on 0/slotid                      ---slotid indicates the slot ID of the board.                   
Serial redirect function is enabled now!    
Step 3 Run the following three groups of commissioning commands consecutively. 
If all the 
three return values of a command group are incorrect, the DDR3 partition related 
to this group is faulty. As long as the execution results of one or more group 
of commands indicate a fault, the problem may occur (note that, to ensure 
accuracy, the interval for executing the three groups of commands must 
be short).
MA5680T(su)%%tm set indirect-reg 0x48000040 0x55555555            
 Write register 0x48000040 0x55555555 successfully!                
MA5680T(su)%%tm dis indirect-reg 0x48000040 0x1                   
0x40000040: 55555555                                                   
MA5680T(su)%%tm set indirect-reg 0x48000040 0xaaaaaaaa                          
 Write register 0x48000040 0xaaaaaaaa successfully!                             
MA5680T(su)%%tm dis indirect-reg 0x48000040 0x1                                 
0x40000040: aaaaaaaa                                                            
MA5680T(su)%%tm set indirect-reg 0x48000040 0xffffffff                          
 Write register 0x48000040 0xffffffff successfully!                             
MA5680T(su)%%tm dis indirect-reg 0x48000040 0x1                                 
0x40000040: ffffffff                                                            
                                                                                
MA5680T(su)%%tm set indirect-reg 0x58000040 0x55555555                        
 Write register 0x58000040 0x55555555 successfully!                             
MA5680T(su)%%tm dis indirect-reg 0x58000040 0x1                              
0x50000040: 55555555                                                            
MA5680T(su)%%tm set indirect-reg 0x58000040 0xaaaaaaaa                          
 Write register 0x58000040 0xaaaaaaaa successfully!                             
MA5680T(su)%%tm dis indirect-reg 0x58000040 0x1                                 
0x50000040: aaaaaaaa                                                            
MA5680T(su)%%tm set indirect-reg 0x58000040 0xffffffff                          
 Write register 0x58000040 0xffffffff successfully!                             
MA5680T(su)%%tm dis indirect-reg 0x58000040 0x1                                
0x50000040: ffffffff                                                            
                                                                                
MA5680T(su)%%tm set indirect-reg 0x68000040 0x55555555                          
 Write register 0x68000040 0x55555555 successfully!                             
MA5680T(su)%%tm dis indirect-reg 0x68000040 0x1                                 
0x60000040: 55555555                                                            
MA5680T(su)%%tm set indirect-reg 0x68000040 0xaaaaaaaa                          
 Write register 0x68000040 0xaaaaaaaa successfully!                             
MA5680T(su)%%tm dis indirect-reg 0x68000040 0x1                                 
0x60000040: aaaaaaaa                                                            
MA5680T(su)%%tm set indirect-reg 0x68000040 0xffffffff                          
 Write register 0x68000040 0xffffffff successfully!                             
MA5680T(su)%%tm dis indirect-reg 0x68000040 0x1                                 
0x60000040: ffffffff

If "Read register fail errorcode" is displayed in the test, the problem can be identified.
MA5680T(su)%%tm display indirect-reg 0x68000040 1                                  
0x68000040:                                                                     
 Read register fail errorcode=1082982587!    ---This problem can be identified as long 
as this output is displayed.
----End
Location method 2 (PMI tool)
If a board involves the problem symptom, upgrade the preventive maintenance inspection
(PMI) tool using the package attached in this document and then perform PMI to identify 
the problemInstall the required patch if the following PMI result is displayed: 
"Detected DDR error. Solution: Update to SPC321 if is R8; update to R11SPC109 
if is R11. Should you have any question, please contact R&D, Zhouhao 00140882.

[Root Cause]

When packet traffic is heavy, there is small probability that the interval at which the FPGA 
reads and writes the DDR3 is too short and cannot meet the DDR3 requirement. As a result, 
DDR3 becomes abnormal and packets are incorrect, causing slow Internet access, dialup failure, 
or even board reset.

[Impact and Risk]

This problem occurs with a low probability and may cause slow Internet access, dialup 
failure, and occasional board reset, which will affect live-network services.

[Measures and Solutions]

Recovery measures:
This problem is triggered occasionally and can be rectified by resetting the board affected.
Because this problem occurs occasionally and may trigger another DDR3 exception and 
relevant problems, install the required patches for the faulty board.
Preventive measures:
None


More blog:

How to Upgrade the LSW Chip of the H801SCUN Board

The Feature of Huawei MA5600


No comments:

Post a Comment