Summary: When an OptiX PTN 950 NE is configured with two system control boards
(active and standby), the license modules on the system control boards accumulate the board
debugging information. The accumulated debugging information exhausts the space of the flash
memory, leading to a failure to save the configuration data and back up the NE database.
[Problem Description]
Trigger conditions:
The OptiX PTN 950 NE uses two system control boards which run the V100R002C01SPC100
NE software.
Symptom:
The standby system control board exhausts the flash memory space at a speed two times faster than
the active system control board. Therefore, the fault first occurs on the standby system control board.
During the periodical backup of the NE database, if the flash memory has insufficient free space, the
system control board reports a DBMS_ERROR alarm. And also lead to memory leaks, if the memory
leak reaches the threshold, then reported MEM_OVER alarm.
Incremental files are generated when configuration data is delivered or when dynamic services are
configured. If the flash space becomes insufficient during the process of generating incremental files,
the system control board may fail to generate these incremental files. When this occurs, the NE is reset.
During an NE upgrade, when the NE detects that the free space on the flash memory is insufficient to
save the new software, the NE upgrade fails.
Identification method:
The NE type is OptiX PTN 950 and the NE version is V100R002C01SPC100.
The NE uses two system control boards.
[Root Cause]
The license module of the system control board software has a defect in data processing. Due to this
defect, the debugging information about the active and standby system control boards continuously
accumulates, exhausting the space of the flash memory. The debugging information occupies 1 MB
flash space every 20 days for the active system control board but every 10 days for the standby
system control board. (The debugging information has no impact on the device functions and applications.)
[Impact and Risk]
The active and standby system control boards run out of the flash space.
When the NE database is being backed up, if the free space on the flash memory becomes insufficient,
the database will fail to be backed up. Then the new configuration cannot persist and may be lost once
the NE is reset. And also lead to memory leaks, if the memory leak reaches the threshold, then reported
alarm. After running out memory will cause the control board reset.
If the free space on the flash memory becomes insufficient during the process of generating incremental
files, then the files will fail to be generated, causing the NE to be reset.
During an NE upgrade, if the free space is insufficient, the NE will fail to load the new software and as
a result the NE upgrade will fail.
[Measures and Solutions]
Recovery measures:
1 When a DBMS_ERROR or MEM_OVER alarm is reported, do as follows:
1.1 Run the following commands to check whether there is an HBUMSG.TXT file on the active and
standby system control boards:
:sftm-show-dir:7,"ofs1/license"
:sftm-show-dir:8,"ofs1/license"
1.2 If there is an HBUMSG.TXT file on the two boards, delete the file by running the following
commands:
:sftm-delete-file:7,"ofs1/license/HBUMSG.TXT"
:sftm-delete-file:8,"ofs1/license/HBUMSG.TXT"
1.3 Manual backup database
:dbms-copy-all:drdb,fdb;
If it is MEM_OVER Soft reset the alarm control board
:cfg-reset-board:$the slot of the alarm control board,soft
2 When the system control board reset, do as follows:
For a failure to synchronize data between the active and standby system control boards, run the
following command on the active system control board:
:sm-set-nebusy:0,0,0,0,none
2.1 Run the following commands to check whether there is an HBUMSG.TXT file on the active
and standby system control boards:
:sftm-show-dir:7,"ofs1/license"
:sftm-show-dir:8,"ofs1/license"
2.2 If there is an HBUMSG.TXT file on the two boards, delete the file by running the following
commands:
:sftm-delete-file:7,"ofs1/license/HBUMSG.TXT"
:sftm-delete-file:8,"ofs1/license/HBUMSG.TXT"
2.3 If the new configuration data is lost, the relate services may affect, please reconfigure the relate
services.
Workarounds:
Use the SmartKit Inspector to periodically perform preventive maintenance inspections (PMIs) of
NEs of the specified version (V100R002C01SPC100).You are advised to perform the PMI every
three months.
On the live network the version of the SmartKit Inspector is V100R006C00. Before performing a
PMI, use the automatic upgrade function to upgrade the tool to the CN-PTN Box=P004 patch version, the later version Inspector also has the similar item.
After installing the CN-PTN Box=P004 patch, load the R6FP005BP004_PreUpgrading_checking_Template.xml file to
the SmartKit Inspector to perform the PMI.
Preventive measures:
Install the V100R002C01SPH102 patch or upgrade the NE to a later version.
More blog:
No comments:
Post a Comment