Summary: In the PW redundancy for PWE3 services scenario, when multiple points of failure occur, due to a software processing error, a PW may go Up but services, including low-rate and high-rate services, are interrupted.
[Problem Description]
Application scenario:
The following network topology is used as an example for low-rate services.
The HVPN or Layer 2+Layer 3 services of the IP RAN standard solution are configured on the ATN devices that function as CSG1 and CSG2.
The following network topology is used as an example for high-rate services.
The high-rate services of the IP RAN standard solution are configured on the ATN devices that function as CSG1 and CSG2.
Trigger conditions:
Configuration examples:
The following example shows how to configure PW redundancy (independent mode) for low-rate services.
interface Serial0/2/0:0
mpls l2vc 3.3.3.3 pw-template tdm 100
mpls l2vc 4.4.4.4 pw-template tdm 101
mpls l2vpn redundancy independent
mpls l2vpn stream-dual-receiving
mpls l2vpn oam-mapping
|
The following example shows how to configure PW redundancy (master/slave mode) for high-rate services.
interface Ethernet0/3/0.2
vlan-type dot1q 14
mpls l2vc 3.3.3.3 500 control-word raw
mpls l2vc 4.4.4.4 501 control-word raw secondary
mpls l2vpn redundancy master
mpls l2vpn reroute delay 500
mpls l2vpn stream-dual-receiving
mpls l2vpn arp-dual-sending
|
The problem occurs when any of the following conditions is met, and the sub-conditions are met in order in each condition:
Condition 1:
PW redundancy (independent mode) is deployed on the CSGs, and E-APS is deployed on the RSGs. Both the primary and secondary PWs go Up.
An E-APS switchover is performed on the RSGs, and service traffic on CSG1 is switched from the primary PW to the secondary PW.
The secondary PW goes Down.
The BFD session that monitors the primary PW flaps.
Condition 2:
PW redundancy (master/slave mode or independent mode) is deployed on the CSGs, and the primary PW goes Up but the secondary PW goes Down.
The BFD session that monitors the primary PW flaps.
Condition 3:
PW redundancy (independent mode) is deployed on the CSGs, and E-APS is deployed on the RSGs. Both the primary and secondary PWs go Up.
The BFD session that monitors the secondary PW goes Down.
The BFD session that monitors the primary PW goes Down.
The BFD session that monitors the primary PW goes Up.
The BFD session that monitors the secondary PW goes Up.
Symptom:
Symptom 1: The primary PW is Up, the BFD session that monitors the primary PW flaps, and services are interrupted.
Symptom 2: The primary PW is Up, the BFD session that monitors the primary PW goes Down, and services are interrupted.
Identification method:
The following configuration is used as an example in this document.
[HUAWEI-diagnose]display status pw interface Serial 0/2/0:0
Check BEARER-GROUP 1 success
Check BEARER 1024 success
Check NHI 225 success
Check INTF 132 success
Check VC_AND_SWAP 2 success
Check INSEGMENT 24 success
Check SUBCARD_NHLFE 24
Card:1 success
Check FW_OS2OS3CFT 0
Card:1 success
//Note: Serial 0/2/0:0 is the interface to which the faulty PW is bound. In normal conditions, each field in the command output is displayed as "success".
If the command output contains no information or an error message is displayed in the command output, the problem has occurred.
Example 1: The command output contains no information. Run the display status pw interface interface number command in the diagnostic view to view the PW status.
[HUAWEI]diagnose
[HUAWEI-diagnose]display status pw interface Serial 0/2/0:0
[HUAWEI-diagnose]
//Note: When the PW that is bound to the interface Serial 0/2/0:0 goes faulty, no information is displayed in the command output.
Example 2: An error message is displayed in the command output. Run the display status pw interface interface number command in the diagnostic view to view the PW status.
[HUAWEI]diagnose
[HUAWEI-diagnose]display status pw interface Serial 0/2/0:0
Check BEARER-GROUP 1
ulBasePtr 2047(!= 1024) ERROR
ulCount 254(!= 0) ERROR
fail
Note: When the PW that is bound to the interface Serial 0/2/0:0 goes faulty, an error message containing "ERROR and fail" is displayed in the command output.
[Root Cause]
In the PW redundancy scenarios, implementations of the ATN devices do not take multiple points of failure into consideration. If any of the preceding trigger conditions is met, the PW entries at the bottom layer fail to be delivered, causing service interruptions.
[Impact and Risk]
When the problem occurs, the PW services are interrupted.
[Measures and Solutions]
Recovery measures:
Run the shutdown and undo shutdown commands on the AC interfaces of the CSGs so that the PW entry can be redelivered. After that, the PW service can be restored to normal.
Preventive action:
When a CSG is connected to an ASG, do not deploy BFD for PW if the secondary PW is unavailable. Deploy BFD for PW only after the secondary PW becomes available.
Rectify a link fault immediately in case of one to prevent multiple points of failure from occurring.
Solutions:
For ATN950B V200R001C02SPC300, install ATN950B V200R001SPH008 that is to be released in late September of 2013.
For ATN950B V200R002C00SPC300, install ATN950B V200R002SPH002 that is to be released in late September of 2013.
For ATN910I V200R002C00SPC300, install ATN910I V200R002SPH005 that is to be released in late September of 2013.
For ATN910 V200R002C00SPC300, install ATN910 V200R002SPH005 that is to be released in late September of 2013.
Upgrade ATN910I V200R002C00SPC100 to ATN910I V200R002C00SPC300, and install V200R002SPH005.
Upgrade ATN910 V200R002C00SPC100 to ATN910 V200R002C00SPC300, and install ATN910 V200R002SPH005.
More blog:
No comments:
Post a Comment