RAN10 KPI Troubleshooting Guide
INTERNAL
BOM Code
Product Name
RAN10 KPI Troubleshooting Guide
Intended audience
Product Version
V100R010
Document Version
V1.0
Department
UMTS Maintenance and Development Department
RAN10 KPI Troubleshooting Guide
Prepared by
KPI Team of the UMTS Maintenance Department
Date
Reviewed by
Date
Reviewed by
Date
Approved by
Date
2009-3-6
Huawei Technologies Co., Ltd. All rights reserved 2009-03-06
2017-3-25
Huawei Confidential
Page 1 of 130
Revision History Version
Date
Description
Author
V1.0
2009-1-20
The draft was complete.
Shen Yueping, Qian Jin, and Wu Yanwen
V1.0
2009-3-6
The document was revised.
Shen Yueping, Qian Jin, and Wu Yanwen
RAN10 KPI Troubleshooting Guide
Figures
Contents 1 Analysis Methodology of KPI-Related Problems...............................3 1.1 Problem Discussion...........................................................................................................................................3 1.2 Narrowing the Scope.........................................................................................................................................3 1.3 Locking the Scenario.........................................................................................................................................3 1.4 Drive Test On Site..............................................................................................................................................3 1.5 Reproducing the Mirroring Environment..........................................................................................................3 1.6 Problem Analysis and Summary........................................................................................................................3
2 RRC Access Success Rate (Service/Non-Service)..............................3 2.1 KPI Definition....................................................................................................................................................3 2.2 Influence Factors................................................................................................................................................3 2.3 Analysis Process................................................................................................................................................3 2.4 List of Problem Information..............................................................................................................................3
3 RAB Access Success Rate (AMR/PS/VP/HSPA)..................................3 3.1 KPI Definition....................................................................................................................................................3 3.2 Influence Factors................................................................................................................................................3 3.3 Analysis Process................................................................................................................................................3 3.4 List of Problem Information..............................................................................................................................3
4 Handover Success Rate (SHO/HHO)................................................3 4.1 Problems Related to Soft Handover Success Rate.............................................................................................3 4.1.1 KPI Definition..........................................................................................................................................3 4.1.2 Influence Factors......................................................................................................................................3 4.1.3 Analysis Process.......................................................................................................................................3 4.1.4 Cases of Soft Handover Failure................................................................................................................3 4.2 Problems Related to Hard Handover Success Rate...........................................................................................3 4.2.1 KPI Definition..........................................................................................................................................3 4.2.2 Influence Factors......................................................................................................................................3 4.2.3 Analysis Process.......................................................................................................................................3 4.2.4 Cases of Inter-Frequency Hard Handover Failure....................................................................................3 4.3 List of Problem Information..............................................................................................................................3
5 Problems Related to Call Drop (AMR/PS/VP/HSPA)...........................3 5.1 KPI Definition....................................................................................................................................................3 5.2 Influence Factors................................................................................................................................................3 5.3 Analysis Process................................................................................................................................................3
RAN10 KPI Troubleshooting Guide
Figures
5.4 Cases of Call Drop.............................................................................................................................................3 5.5 List of Problem Information..............................................................................................................................3
6 Inter-RAT Interoperability..............................................................3 6.1 Inter-RAT Handover from WCDMA to GSM (CS Domain).............................................................................3 6.1.1 KPI Definition..........................................................................................................................................3 6.1.2 Influence Factors......................................................................................................................................3 6.1.3 Analysis Process.......................................................................................................................................3 6.2 Inter-RAT Handover from GSM to WCDMA (CS Domain).............................................................................3 6.2.1 KPI Definition..........................................................................................................................................3 6.2.2 Influence Factors......................................................................................................................................3 6.2.3 Analysis Process.......................................................................................................................................3 6.3 Inter-RAT Handover from WCDMA to GPRS (PS Domain)............................................................................3 6.3.1 KPI Definition..........................................................................................................................................3 6.3.2 Influence Factors......................................................................................................................................3 6.3.3 Analysis Process.......................................................................................................................................3 6.4 Inter-RAT Handover from GPRS to WCDMA (PS Domain)............................................................................3 6.4.1 KPI Definition..........................................................................................................................................3 6.4.2 Analysis Process.......................................................................................................................................3 6.5 List of Problem Information..............................................................................................................................3
7 Information Collection...................................................................3 7.1 Performance data of RNC..................................................................................................................................3 7.1.1 Purpose.....................................................................................................................................................3 7.1.2 Information to Be Collected.....................................................................................................................3 7.1.3 Method......................................................................................................................................................3 7.2 RNC CHR/PCHR..............................................................................................................................................3 7.2.1 Purpose.....................................................................................................................................................3 7.2.2 Information to Be Collected.....................................................................................................................3 7.2.3 Method......................................................................................................................................................3 7.3 RNC IOS Tracing...............................................................................................................................................3 7.3.1 Purpose.....................................................................................................................................................3 7.3.2 Information to Be Collected.....................................................................................................................3 7.3.3 Method......................................................................................................................................................3 7.4 RNC IFTS/CDT (User Plane) Tracing...............................................................................................................3 7.4.1 Purpose.....................................................................................................................................................3 7.4.2 Information to Be Collected.....................................................................................................................3 7.4.3 Method......................................................................................................................................................3 7.5 Standard Signaling Tracing on the RNC............................................................................................................3 7.5.1 Purpose.....................................................................................................................................................3 7.5.2 Information to Be Collected.....................................................................................................................3 7.5.3 Method......................................................................................................................................................3 7.6 UE QXDM LOG................................................................................................................................................3
RAN10 KPI Troubleshooting Guide
Figures
7.6.1 Purpose.....................................................................................................................................................3 7.6.2 Information to Be Collected.....................................................................................................................3 7.6.3 Method......................................................................................................................................................3 7.7 Real-Time Performance Monitoring of RNC....................................................................................................3 7.7.1 Purpose.....................................................................................................................................................3 7.7.2 Information to Be Collected.....................................................................................................................3 7.7.3 Method......................................................................................................................................................3 7.8 RNC Script Configuration.................................................................................................................................3 7.8.1 Purpose.....................................................................................................................................................3 7.8.2 Information to Be Collected.....................................................................................................................3 7.8.3 Method......................................................................................................................................................3 7.9 Operation Log of RNC......................................................................................................................................3 7.9.1 Purpose.....................................................................................................................................................3 7.9.2 Information to Be Collected.....................................................................................................................3 7.9.3 Method......................................................................................................................................................3 7.10 Alarm Information on RNC.............................................................................................................................3 7.10.1 Purpose...................................................................................................................................................3 7.10.2 Information to Be Collected...................................................................................................................3 7.10.3 Method....................................................................................................................................................3 7.11 Node B Configuration Script...........................................................................................................................3 7.11.1 Purpose...................................................................................................................................................3 7.11.2 Information to Be Collected...................................................................................................................3 7.11.3 Method....................................................................................................................................................3 7.12 Node B CHR....................................................................................................................................................3 7.12.1 Purpose...................................................................................................................................................3 7.12.2 Information to Be Collected...................................................................................................................3 7.12.3 Method....................................................................................................................................................3 7.13 Node B Alarm..................................................................................................................................................3 7.13.1 Purpose...................................................................................................................................................3 7.13.2 Information to Be Collected...................................................................................................................3 7.13.3 Method....................................................................................................................................................3 7.14 Node B CDT....................................................................................................................................................3 7.14.1 Purpose...................................................................................................................................................3 7.14.2 Information to Be Collected...................................................................................................................3 7.14.3 Method....................................................................................................................................................3 7.15 Checking Whether Any Neighboring Cells are not Configured......................................................................3 7.15.1 Enabling Call Trace for Missing Neighboring Cell Detection Tracing..................................................3 7.15.2 Stopping the MNCDT............................................................................................................................3 7.15.3 Reporting the Missing Neighboring Cell Message.................................................................................3 7.16 Soft Failure of DSP..........................................................................................................................................3 7.17 Terminal Troubleshooting................................................................................................................................3
RAN10 KPI Troubleshooting Guide
Figures
Figures Figure 1 Impact of the PS service upon the RTWP when some neighboring cells are not configured..................3 Figure 2 Impact of the CS service upon the RTWP when the neighboring cell is not configured.........................3 Figure 3 Satellite map of the BTS...........................................................................................................................3 Figure 4 Traced RTWP waveform..........................................................................................................................3 Figure 5 Change in the number of subscribers of the cell......................................................................................3 Figure 6 Signal quality of neighboring cells...........................................................................................................3 Figure 7 Impact of the burst of a large number of RRC connection requests upon the RTWP..............................3 Figure 8 Cell audit message....................................................................................................................................3 Figure 9 Cell signal quality.....................................................................................................................................3 Figure 10 Measurement report................................................................................................................................3 Figure 11 CIO offset parameter...............................................................................................................................3 Figure 12 Relation between RSCP fading and Ec/N0 fading.................................................................................3 Figure 13 Comparison of handover parameters......................................................................................................3 Figure 14 Flow on CS inter-RAT handover out of 3G............................................................................................3 Figure 15 Relocation Required message.................................................................................................................3 Figure 16 Relocation Command message...............................................................................................................3 Figure 17 Handover Request ACK message...........................................................................................................3 Figure 18 Flow on CS handover-in.........................................................................................................................3 Figure 19 Signaling of CS inter-RAT handover-in.................................................................................................3 Figure 20 Relocation_Request message.................................................................................................................3 Figure 21 Flow on PS inter-RAT handover out of..................................................................................................3 Figure 22 Flow on LAU/RAU after the UE accesses the 2G cell...........................................................................3 Figure 23 Querying the workarea of the BAM.......................................................................................................3 Figure 24 Exporting the CHR log (by running the COL LOG command)............................................................3 Figure 25 Types of objects to be traced..................................................................................................................3 Figure 26 IOS Tracing dialog box.........................................................................................................................3
RAN10 KPI Troubleshooting Guide
Figures
Figure 27 MoreInfo dialog box..............................................................................................................................3 Figure 28 Type of trace object................................................................................................................................3 Figure 29 Configuration page of CDT parameters.................................................................................................3 Figure 30 Configuration page of IFTS parameters.................................................................................................3 Figure 31 Configuration page of user-plane tracing...............................................................................................3 Figure 32 Configuration page of performance monitoring.....................................................................................3 Figure 33 Uu interface tracing................................................................................................................................3 Figure 34 Iub interface tracing................................................................................................................................3 Figure 35 Iur interface tracing................................................................................................................................3 Figure 36 Querying the DSP code of the CN..........................................................................................................3 Figure 37 Configuring the QPST port.....................................................................................................................3 Figure 38 Connecting the equipment ports.............................................................................................................3 Figure 39 Enabling log tracing................................................................................................................................3 Figure 40 Real-time performance monitoring........................................................................................................3 Figure 41 NC script configuration..........................................................................................................................3 Figure 42 Exporting the operation log by running the EXP LOG command........................................................3 Figure 43 Alarm box of the LMT............................................................................................................................3 Figure 44 Exporting the alarms...............................................................................................................................3 Figure 45 Exporting the NodeB configuration file through the M2000.................................................................3 Figure 46 Data Config File Transfer.......................................................................................................................3 Figure 47 FTP upload..............................................................................................................................................3 Figure 48 Setting the CHR level of the NodeB......................................................................................................3 Figure 49 NodeB CHR reporting switch.................................................................................................................3 Figure 50 Querying the alarm information.............................................................................................................3 Figure 51 Saving the alarm information.................................................................................................................3 Figure 52 Alarm box of the NodeB LMT...............................................................................................................3 Figure 53 Modifying the properties of the monitor items of the NodeB CDT.......................................................3 Figure 54 Enabling CDT tracing of the NodeB cells..............................................................................................3 Figure 55 Basic setting............................................................................................................................................3 Figure 56 Setting other monitor items....................................................................................................................3 Figure 57 Enabling call trace to check whether any neighboring cells are not configured....................................3 Figure 58 Configuration interface of intra-frequency MNCDT.............................................................................3 Figure 59 MNCDT window....................................................................................................................................3
RAN10 KPI Troubleshooting Guide
Figures
Figure 60 Intra-frequency measurement control after the intra-frequency MNCDT is enabled............................3 Figure 61 Configuration interface of inter-frequency MNCDT.............................................................................3 Figure 62 Configuration interface of inter-RAT MNCDT......................................................................................3 Figure 63 Message tracing for the missing intra-frequency neighboring cells.......................................................3 Figure 64 Reported message about the missing intra-frequency neighboring cells...............................................3 Figure 65 Message tracing for the missing inter-frequency neighboring cells.......................................................3 Figure 66 Reported message about the missing inter-frequency neighboring cells................................................3 Figure 67 Message about the missing inter-RAT neighboring cell.........................................................................3 Figure 68 Analyzing the soft failure of the DSP through the CHR log..................................................................3 Figure 69 Resetting the DSP...................................................................................................................................3 Figure 70 Analyzing the special UEID through the CHR log................................................................................3
RAN10 KPI Troubleshooting Guide
Tables
Tables Table 1 Indicators related to RRC setup failure......................................................................................................3 Table 2 Cell traffic count—Analysis of power congestion.....................................................................................3 Table 3 Number of top congested cells...................................................................................................................3 Table 4 Indicator of cell CE congestion..................................................................................................................3 Table 5 Number of CEs consumed by the DCH service.........................................................................................3 Table 6 Number of CEs consumed by the HSUPA service.....................................................................................3 Table 7 Analysis of cell code congestion indicators...............................................................................................3 Table 8 Analysis of transmission congestion indicators..........................................................................................3 Table 9 Indicators of CS RAB setup failure............................................................................................................3 Table 10 Indicators of PS RAB setup failure..........................................................................................................3 Table 11 Indicators of PS RB setup failure.............................................................................................................3 Table 12 Flow on RB setup failure because of invalid configuration.....................................................................3 Table 13 Models of the known UEs that have invalid configuration......................................................................3 Table 14 Indicators related to soft handover failure................................................................................................3 Table 15 Indicators related to inter-frequency hard handover failure.....................................................................3 Table 16 Inter-frequency handover failure..............................................................................................................3 Table 17 CS call drop rate.......................................................................................................................................3 Table 18 Requirements for the EcIo and Ec threshold............................................................................................3 Table 19 Requirements of IP-based networking for the transmission quality........................................................3 Table 20 Indicators related to CS call drop.............................................................................................................3 Table 21 Indicators related to PS call drop..............................................................................................................3 Table 22 Indicators related to CS inter-RAT handover-out failure.........................................................................3 Table 23 Indicators related to CS inter-RAT handover-in failure...........................................................................3 Table 24 Indicators related to PS inter-RAT handover-out failure..........................................................................3
RAN10 KPI Troubleshooting Guide
INTERNAL
The document describes the troubleshooting methods for the KPI-related problems in the commercial WCDMA networks, thus providing reference for the network maintenance personnel.
1
Analysis Methodology of KPIRelated Problems
1.1 Problem Discussion
If the customer, network planning personnel, or customer service personnel report some KPI-related problems, you need to collect the related information and understand the problems and needs of the field personnel.
It is important to know the background of the problems, especially the KPI-related problems that occur in commercial networks. You need to collect more information about the problems by phone and by Email. Firstly, you need to ascertain the urgency and importance of the problems, thus helping you lay down appropriate measures.
Determine whether the problems are known problems according to the collected information (for example, problem description and version).
For the KPI-related problems, you need to obtain the version information first. Some problems are known problems or related to known bugs, which have been analyzed and solved.
Therefore, the troubleshooting personnel can first obtain the bug information or release notes of the version to eliminate the impacts of known problems upon the KPIs.
According to the time of KPI changes, determine whether the problems are caused by network operations or parameter modifications, and analyze the impacts of network adjustment upon the KPIs emphatically.
1.2 Narrowing the Scope After understanding the problems clearly, you can analyze the general KPI data and the KPI data of the top N cells, compare the normal KPI data with the abnormal KPI data, and thus find out the main causes that affect the KPIs (performance counter). For example, if the call drop rate increases, you can analyze the call drop rate of the Top N cells and compare the normal KPIs of the Top N cells with the abnormal KPIs of the Top N cells. You may find that a cell is abnormal and its call drop rate increases. By analyzing the
RAN10 KPI Troubleshooting Guide
INTERNAL
abnormal data of the cell and comparing the normal KPIs with the abnormal KPIs, you can determine whether the symptom is the primary cause of the problem. If yes, the KPI-related problems of the network can be focused on the abnormal cells.
1.3 Locking the Scenario
Know the main scenarios that affect the KPIs by analyzing the PCHR log.
Compared with the performance data, the PCHR log records more detailed information and can record the 15 pieces of key signaling before the call is released. By analyzing the PCHR log, you can collect and analyze the exceptional information about the KPIrelated problems and know the common features of signaling flow.
Based on the PCHR log, you can obtain the subscriber information and terminal information, and thus judge whether the problems are caused by the poor performance of the terminal of a specific subscriber or a specific UE type.
If having the preliminary analysis principles and results, you can request the field personnel to enable the IOS tracing of the TOP cells, thus knowing the scenarios and details of the problems.
IOS tracing is an effective troubleshooting means. As a kind of cell-level tracing, it can trace the subscriber information about a cell and provide massive amounts of information. However, it can only trace several cells and a limited number of subscribers. To achieve an optical effect, you need to have a preliminary understanding of the problem before enabling IOS tracing.
Deeply analyze the causes for the KPI-related problems through the CHR, as well as the IOS and PCHR.
Normally, you can find some abnormalities through IOS tracing and obtain more detailed internal print information about the RNC by analyzing the CHR in the corresponding time range. In addition, you can associate the CHR with the PCHR bills and thus analyze the internal abnormal process, for example, whether the problem is Soft Failure of DSP. The analyzed problem scenarios are determined as the main scenarios that affect the KPIs.
1.4 Drive Test On Site Normally, you can determine the causes that affect the KPIs through the preceding steps. If failing to determine such causes, you need to request the field personnel to arrange drive test. The expense of drive test is high. Before conducting the drive test, therefore, you need to have a considerable analysis of the problems and determine the main cells or scenarios where the problems occur. Through drive test, you can know the behaviors and signaling of the UE, which are vital to analyzing some UE-related problems.
1.5 Reproducing the Mirroring Environment Both field drive test and mirroring in the HQ are the means to reproduce the problems. If the problems can be reproduced in the HQ, the expense is lower and the problems can be analyzed more clearly. However, it is difficult to simulate the field scenarios (transmission
RAN10 KPI Troubleshooting Guide
INTERNAL
status and signal status) in the HQ. To verify the problems caused by the modification parameters, therefore, it is necessary to reproduce the problems in the HQ. If the problems occur in the existing network, field drive test can be required usually. Note that you must have a preliminary analysis of the problems before reproducing the problems (unless the problems are extremely urgent). Otherwise, the reproduction is blind.
1.6 Problem Analysis and Summary Firstly, you need to determine whether the analyzed problem is the main influence factor of the KPIs. This point is important, because lots of factors affect the KPIs of the network. You need to clearly analyze the major factors that cause the KPI changes or affect the KPIs. Secondly, it is also important to summarize the problems and share the related experience timely.
2
RRC Access Success Rate (Service/Non-Service)
2.1 KPI Definition RRC setup success rate = (Number of Successful RRC Setups)/ (Number of RRC Connection Attempts) VS.RRC.SuccConnEstab.Rate = < RRC.SuccConnEstab.sum > / < VS.RRC.AttConnEstab.Cell >
2.2 Influence Factors The process of RRC connection setup includes the following steps: 1.
The UE sends the RRC Connection Request message through the RACH.
2.
The RNC sends the RRC Connection Setup message through the FACH.
3.
If the RRC is established on the DCH, the UE sends the RRC Connection Setup CMP message through the uplink dedicated channel after the downlink dedicated channel is set up and synchronized.
4.
If the RRC is established on the CCH, the UE directly sends the RRC Connection Setup CMP message through the RACH. The RRC connection setup fails in the following scenarios:
The UE sends the RRC Connection Request message, but the RNC does not receive the message.
The RNC receives the RRC Connection Request message sent by the UE and delivers the RRC Connection Setup message, but the UE does not receive the RRC Connection Setup message.
The RNC receives the RRC Connection Request message sent by the UE, and delivers the RRC Connection Reject message.
The UE receives the RRC Connection Setup message, but does not send the RRC Setup Complete message.
The UE sends the RRC Setup Complete message, but the RNC does not receive the message.
Usually, the problems related to RRC setup success rate are found through the performance counter of the RNC or users’ complaints (or drive test). In the scenario where the UE sends the RRC Connection Request message but the RNC does not receive the message, the problem can be found only through users’ complaints or drive test. In other scenarios, the problem can be found through the performance counter. Usually, RRC setup failure is caused by the following factors:
Uplink RACH
Downlink coverage
Cell reselection parameter
Downlink synchronization
Uplink synchronization
Resource congestion
The equipment is abnormal. Resource congestion includes power resource congestion, CE resource congestion, code resource congestion, and transmission resource congestion. For the problem caused by resource congestion, you need to first check the actual utilization of resources and analyze the correctness of congestion threshold and configurations. For the problem caused by other factors, the air interface of RRC setup does not make any response. Generally, UU Noreply is the main problem that causes RRC connection setup failure.
2.3 Analysis Process 1.
Discussing the Problem, Ascertaining the Problem Background and Product Version, and Excluding the Impacts of Known Bugs Determine the time at which the RRC setup success rate decreases severely, analyze whether the problem is caused by network adjustment, and focus on the impacts of network adjustment. Obtain the known bug information about the corresponding version (you can inquire of the related contact person of the product or inquire about the information about similar problems of other sites), and determine whether the problem is a known problem.
2.
Analyzing the Main Scenarios in Which RRC Setup Fails Analyze the change in the causes of RRC access failure through the performance counters on the RNC, and analyze which factor causes the decline of RRC setup success rate. Table 1 lists the causes of RRC access failure defined by the performance counter:
Table 1 Indicators related to RRC setup failure Measurement Item
Sub Items
RRC.FailConnEstab.Cong
VS.RRC.Rej.Power.Cong VS.RRC.Rej.UL.CE.Cong VS.RRC.Rej.DL.CE.Cong VS.RRC.Rej.Code.Cong VS.RRC.Rej.ULIUBBandCong VS.RRC.Rej.DLIUBBandCong
Measurement Item
Sub Items
VS.RRC.FailConnEstab
VS.RRC.Rej.RL.Fail RRC.FailConnEstab.Cong VS.RRC.Rej.AAL2.Fail
RRC.FailConnEstab.NoReply
3.
Analyzing the Main Causes of RRC Access Failure Deeply
VS.RRC.Rej.Power.Cong The RNC RRM makes power admission algorithm decision. If finding the decision on uplink or downlink admission denial, the RNC RRM initiates RRC setup rejection. In the RAN10 or earlier versions, the power admission policy for RRC is as follows: If the RRC Connection Request is caused by emergency call, detach, or registration, directly allow the RRC connection; If the RRC Connection Request is caused by other factors, Allow the RRC connection according to the OLC threshold if the OLC is enabled; directly allow the RRC connection if the OLC is disabled. Therefore, VS.RRC.Rej.Power.Cong occurs when the OLC is enabled in the network and network load is high enough to cause congestion. If RRC setup success rate decreases because the indicator value becomes large suddenly, find the Top N cells that cause power congestion and then query the changes in the maximum RTWP (VS.MaxRTWP) and maximum TCP (VS.MaxTCP) of the Top N cells. If the RTWP increases severely, it indicates that the problem is caused by uplink power congestion. If the TCP increases severely, it indicates that the problem is caused by downlink power congestion.
RTWP The RTWP increases for the following reasons: −
High traffic
−
External interference
−
Some neighboring cells are not configured.
−
Cells re-establish.
−
The equipment is abnormal. For uplink power congestion (the RTWP increases), judge the causes through the analysis of performance data, and then propose appropriate solution suggestions. Table 1 lists the related performance counters.
Table 1 Cell traffic count—Analysis of power congestion
RB Number
CS Erlang
PS Erlang
DL
UL
VS.AMR.Ctrl.DL12.2
VS.AMR.Ctrl.UL12.2
VS.RB.DLConvCS.64
VS.RB.ULConvCS.64
VS.RB.DLInterPS.8
VS.RB.ULInterPS.8
VS.RB.DLInterPS.16
VS.RB.ULInterPS.16
VS.RB.DLInterPS.32
VS.RB.ULInterPS.32
Throughput
DL
UL
VS.RB.DLInterPS.64
VS.RB.ULInterPS.64
VS.RB.DLInterPS.128
VS.RB.ULInterPS.128
VS.RB.DLInterPS.144
VS.RB.ULInterPS.144
VS.RB.DLInterPS.256
VS.RB.ULInterPS.256
VS.RB.DLInterPS.384
VS.RB.ULInterPS.384
VS.RB.DLBkgPS.8
VS.RB.ULBkgPS.8
VS.RB.DLBkgPS.16
VS.RB.ULBkgPS.16
VS.RB.DLBkgPS.32
VS.RB.ULBkgPS.32
VS.RB.DLBkgPS.64
VS.RB.ULBkgPS.64
VS.RB.DLBkgPS.128
VS.RB.ULBkgPS.128
VS.RB.DLBkgPS.144
VS.RB.ULBkgPS.144
VS.RB.DLBkgPS.256
VS.RB.ULBkgPS.256
VS.RB.DLBkgPS.384
VS.RB.ULBkgPS.384
HSPA User
VS.HSDPA.UE.Mean.Cell
VS.HSUPA.UE.Mean.Cell
R99 PS Throughput
VS.PS.Int.Kbps.DL8
VS.PS.Int.Kbps.UL8
VS.PS.Int.Kbps.DL16
VS.PS.Int.Kbps.UL16
VS.PS.Int.Kbps.DL32
VS.PS.Int.Kbps.UL32
VS.PS.Int.Kbps.DL64
VS.PS.Int.Kbps.UL64
VS.PS.Int.Kbps.DL128
VS.PS.Int.Kbps.UL128
VS.PS.Int.Kbps.DL144
VS.PS.Int.Kbps.UL144
VS.PS.Int.Kbps.DL256
VS.PS.Int.Kbps.UL256
VS.PS.Int.Kbps.DL384
VS.PS.Int.Kbps.UL384
VS.PS.Bkg.Kbps.DL8
VS.PS.Bkg.Kbps.UL8
VS.PS.Bkg.Kbps.DL16
VS.PS.Bkg.Kbps.UL16
VS.PS.Bkg.Kbps.DL32
VS.PS.Bkg.Kbps.UL32
VS.PS.Bkg.Kbps.DL64
VS.PS.Bkg.Kbps.UL64
VS.PS.Bkg.Kbps.DL128
VS.PS.Bkg.Kbps.UL128
VS.PS.Bkg.Kbps.DL144
VS.PS.Bkg.Kbps.UL144
VS.PS.Bkg.Kbps.DL256
VS.PS.Bkg.Kbps.UL256
VS.PS.Bkg.Kbps.DL384
VS.PS.Bkg.Kbps.UL384
HSPA Traffic
Power
Call Attempt Times
DL
UL
VS.HSDPA.MeanChThrough put
VS.HSUPA.MeanChThroug hput
VS.HSDPA.MeanChThrough put.TotalBytes
VS.HSUPA.MeanChThroug hput.TotalBytes
VS.MeanTCP
VS.MeanRTWP
VS.MaxTCP
VS.MaxRTWP
VS.MinTCP
VS.MinRTWP
VS.RRC.AttConnEstab.Cell
VS.RRC.AttConnEstab.Cell
VS.RAB.AttEstab.AMR
VS.RAB.AttEstab.AMR
VS.RAB.AttEstCS.Conv.64
VS.RAB.AttEstCS.Conv.64
VS.RAB.AttEstabPS.Cell
VS.RAB.AttEstabPS.Cell
VS.HSDPA.RAB.AttEstab
VS.HSDPA.RAB.AttEstab
VS.HSUPA.RAB.AttEstab
VS.HSUPA.RAB.AttEstab
The following section describes the judgment methods and solution suggestions in different scenarios where the RTWP increases:
High traffic causes the rise in the RTWP
: −
The RTWP increases abnormally when traffic is busy.
−
The admission of uplink power is rejected when traffic is in peak hours.
−
The RTWP becomes normal gradually while traffic decreases.
−
The corresponding traffic is high, that is, about 80 equivalent Erlang (it may not serve as the necessary condition).
: Through the performance data, analyze whether the RWTP increases while traffic increases. : −
It is recommended that TRXs should be added in hotspot areas.
−
If TRXs cannot be added within a short period, you can run the following command to enable the uplink LDR function: Run the following command: ADD CELLALGOSWITCH: NBMLdcAlgoSwitch=UL_UU_LDR-1;
The LDR function can relieve the congestion caused by high traffic rather than eliminate the congestion. The LDR function sacrifices the QoS of some users for the access success rate of the new users.
External interference causes the rise in the RTWP : After the preceding cause is excluded, external interference causes the rise in the RTWP in two scenarios: −
The RTWP of a cell is abnormal at regular intervals.
−
For the cells in an area, the coverage directions are the same basically and the problem occurs at the same time and at the same frequency.
1)
When traffic is low, the RTWP of multiple cells in the same area increases at different degrees in the same time segment and the symptom lasts for more than 20 minutes.
2)
If tracing the waveform about the abnormal RTWP (by enabling the RTWP tracing task of the cells), you can find that the RTWP varies gently and has no remarkable fluctuation after the RTWP increases.
: −
Analyze the performance data: In an area or cell where RRC power congestion occurs, check whether the problem occurs in the same time segment. Use the minimum granularity to query the performance data. During the congestion period, check whether the traffic of the cell increases sharply and thus the RTWP increases abnormally (exclude the factor that traffic increases).
−
Analyze the PCHR log: Filter out all the bills of RRC admission denial because of the RTWP congestion, and determine the occurrence of congestion (as detailed as to the minute and second).
−
Analyze the geographical distribution: Query the geographical distribution information about the cells. If the coverage directions of the cells are the same, it is probable that external interference causes the problem.
: If the problem is caused by external interference, capture the evidence by scanning the antenna interface and explain the cause to the customer.
The RTWP increases abnormally because some neighboring cells are not configured There are two such scenarios: −
Huawei neighboring cell is not configured.
−
The cells of other vendors are not configured with Huawei neighboring cell.
: −
In such scenarios, the RTWP increases abnormally because of the mobility of subscribers. The problem occurs at random and in the time segment during which the subscribers move frequently.
−
If the RTWP abnormally increases more frequently, RRC congestion and service congestion occur more frequently. Figure 2 shows the impact of different types of services upon the RTWP of the cells (data source: The O2 in Germany). PS service: The Nokia cell is not configured with Huawei neighboring cell. In the Nokia cell, the PS 384/384 service is initiated and it is uploaded.
Figure 2 Impact of the PS service upon the RTWP when some neighboring cells are not configured
AMR service: Cell1: 311 Cell2: 312 Cell1 and Cell2 are the intra-frequency cells. Cell 311 is not configured with the neighbor relation with Cell 312. Figure 3 Impact of the CS service upon the RTWP when the neighboring cell is not configured
: −
Analyze the performance data of the congested cells, and find out the distribution of occurrence time of power congestion.
−
Trace and analyze the RTWP and number of subscribers of the cells in real time.
−
Analyze whether the congested cells are not configured with neighboring cells through the NASTAR (or through the analysis result of the neighboring cell configuration of the intelligent network optimization in the PCHR log).
−
Check whether any neighboring cells are not configured according to the preceding analysis result and engineering map information.
: It is recommended that the corresponding neighbor relation should be configured. : O2 BPCR case 1
During the period of 2009-01-12 to 2009-01-18, the following cells are severely congested: 23690, 43962, 24104, 23696, 3686, 23678, and 23701 of Cluster UMTS_S0048_4.
Table 1 Number of top congested cells CellId 23690 24104 43692 24104 23690 3686 23690 23678 43692 3686 23701 23690 23678 23690 43692 3678 23696 23945 44104 23690 23691 43685 3701 3673 3686 3696 23675 23685 23690 23696 23696 23696 43685 44104 3675 23676 23685 43678 43685 44104 23690 43676 23676
CellName 509310690S2 509311104S-2 509310692S-3 509311104S-2 509310690S2 509310686S-1 509310690S2 509310678S2 509310692S-3 509310686S-1 509310701S2 509310690S2 509310678S2 509310690S2 509310692S-3 509310678S1 509310696S2 509310945S2 509311104S-3 509310690S2 509310691S-2 509310685S3 509310701S1 509310673S1 509310686S-1 509310696S1 509310675S2 509310685S2 509310690S2 509310696S2 509310696S2 509310696S2 509310685S3 509311104S-3 509310675S1 509310676S2 509310685S2 509310678S3 509310685S3 509311104S-3 509310690S2 509310676S3 509310676S2
Time(As 2009-1-12 2009-1-19 2009-1-15 2009-1-19 2009-1-18 2009-1-16 2009-1-15 2009-1-18 2009-1-17 2009-1-16 2009-1-14 2009-1-15 2009-1-13 2009-1-15 2009-1-14 2009-1-13 2009-1-18 2009-1-12 2009-1-19 2009-1-14 2009-1-19 2009-1-12 2009-1-16 2009-1-19 2009-1-19 2009-1-17 2009-1-18 2009-1-16 2009-1-17 2009-1-17 2009-1-17 2009-1-17 2009-1-13 2009-1-19 2009-1-16 2009-1-12 2009-1-13 2009-1-13 2009-1-12 2009-1-19 2009-1-16 2009-1-18 2009-1-12
hour) 20:00:00 17:00:00 12:00:00 18:00:00 19:00:00 14:00:00 13:00:00 20:00:00 21:00:00 12:00:00 18:00:00 20:00:00 18:00:00 11:00:00 10:00:00 18:00:00 18:00:00 19:00:00 16:00:00 18:00:00 18:00:00 6:00:00 18:00:00 14:00:00 15:00:00 19:00:00 13:00:00 12:00:00 19:00:00 12:00:00 18:00:00 19:00:00 15:00:00 20:00:00 17:00:00 9:00:00 15:00:00 18:00:00 7:00:00 18:00:00 13:00:00 13:00:00 8:00:00
VS.RRC.Rej.Power.CongVS.LCC.OverCongNumULVS.LCC.OverCongTimUL 335 3 680 318 3 520 238 1 320 168 7 415 101 1 250 89 1 260 63 2 145 39 1 75 38 3 95 37 4 155 26 1 40 22 3 135 19 1 40 18 1 60 13 1 5 12 1 65 12 1 220 11 1 25 8 1 15 7 1 85 7 2 25 6 1 15 5 1 35 4 1 30 4 1 20 4 2 75 4 1 20 4 1 40 4 1 30 4 1 65 4 1 55 4 6 155 4 1 25 4 1 15 3 1 35 3 3 75 3 2 190 3 1 15 3 1 15 3 1 10 2 1 30 2 1 35 1 1 15
The satellite map of the BTS shows that Nokia neighboring cell is nearby 23690, 43692, and 24104. Therefore, it is suspected that Nokia neighboring cell has a great impact upon the RTWP of Huawei BTS. Figure 4 Satellite map of the BTS
4.
On the RNC where the Cluster is located, select the top 20 severely congested NodeBs for RTWP tracing and find the RTWP waveform generated at the congestion time. During the period of 16:00 to 17:00, you can trace the waveform of Cell 44587 about abnormal RTWP, as shown in Figure 1.
Figure 1 Traced RTWP waveform
Figure 2 shows the change in the number of subscribers of the corresponding cell.
Figure 2 Change in the number of subscribers of the cell
5.
Query the neighboring cell configuration in the configuration script. Cell 44587 is configured with Nokia neighboring cells: Site 175, Site 730, and Cell 43176 of RNC 525. In addition, the signal quality information about Nokia neighboring cells is exported to the PCHR log. As a result, the RTWP of Huawei cell is raised by 10 dB when the subscriber is in a Nokia cell.
Figure 1 Signal quality of neighboring cells
The RTWP increases abnormally because cells are reestablished : For the equipment or transmission reason, cells are reestablished. When the cells are enabled again, a large number of RRC connection requests are generated because of cell reselection over different subsystems. The RTWP increases abnormally within a short period because of the burst of a large number of RRC connection requests. As a result, when the uplink power admission algorithm is enabled, some RRC connection requests are rejected because of power congestion. If the cell has a large number of subscribers, the rise in the RTWP value lasts for a longer period. As verified in the lab, the RTWP increases to an abnormal level if there are a large number of RRC connection requests. For details, see Figure 2.
Figure 2 Impact of the burst of a large number of RRC connection requests upon the RTWP
: −
Within two or three minutes after cells are reestablished, the RTWP fluctuates continuously.
−
When both the admission algorithm and OLC algorithm are enabled, a large number of RRC connection requests with the cause of cell reselection over different subsystems are rejected and other types of service requests are also rejected.
−
Cell reestablishment alarm: Query the system alarms and check whether the following alarms are generated in the time segment when the RTWP increases abnormally:
−
Uplink CPRI Interface Abnormal or SAAL Link Unavailable / SCTP Link Down or Cell unavailable
−
Analyze the PCHR data. Power congestion mainly occurs in the time segment of 2 to 5 minutes, and more than 60% of the power-congestion subscribers undergo cell reselection over different subsystems. :
−
Disable the uplink admission algorithm or OLC algorithm. If both the uplink admission algorithm and OLC algorithm are enabled, the access success rate decreases severely. If the algorithms are disabled, you can avoid the RRC connection failure caused by power congestion.
−
Analyze the cause of cell reestablishment, and try to lower the occurrence frequency of cell reestablishment in the network.
The RTWP increases abnormally because the equipment is abnormal : −
When traffic is not high, the RTWP of one or two sites increases stably by more than 10 dB. The symptom lasts for more than 60 minutes.
−
After the RTWP increases, the RTWP varies gently and has no remarkable fluctuation.
−
The minimum RTWP (VS.MinRTWP) always remains at a high level.
:
−
Analyze the performance data. If cells are congested, measure the MinRTWP value of the cells.
−
Query the system alarms and check whether there exist any board-related alarms. Process the alarms first.
−
Trace the RTWP and number of subscribers of the cells in real time.
−
If the real-time tracing result shows that the RTWP of the cells is abnormal continuously and the number of subscribers is small, you can ascertain the cause on site. If all the preceding causes are excluded, you can suspect that the problem is caused by the equipment. You can collect the related information and submit the information to the Maintenance Department.
: Collect the related information according to the following checklist, and ask the R&D personnel to further analyze the problem. 1)
TCP
The TCP increases for the following reasons: −
High traffic
−
Other causes For downlink power congestion (the TCP increases), analyze the performance data, judge whether the problem is related to the rise in traffic, and then propose appropriate solution suggestions.
The commercial networks do not encounter the scenario where RRC admission failure is caused by the overhigh TCP. Currently, the troubleshooting experience in the aspect is not enough. The related contents will be added subsequently.
VS.RRC.Rej.UL.CE.Cong/ VS.RRC.Rej.DL.CE.Cong The RNC RRM makes admission algorithm decision. The RNC RRM can find the admission denial because of the insufficiency of uplink or downlink CE resources, or the number of RRC connection rejections because the NodeB returns CE Congestion when the RNC delivers the RL_SETUP message. For the RL_Fail because the NodeB returns CE Congestion, the RAN10 or earlier versions have the following defects: The CE capability of the NodeB is constrained by both the license configuration and hardware specifications. At present, the NodeB reports IUB_INTERFACE_CELL_SYNC_NOT_SUPP and IUB_INTERFACE_CELL_SYNC_ADJ_NOT_SUPP to the RNC if the CE Licenses are not enough. The RNC adds the two cause values to the VS.RRC.Rej.UL.CE.Cong counter and VS.RRC.Rej.DL.CE.Cong counter respectively. However, the NodeB reports RADIO_RESOURCES_NOT_AVAILABLE to the RNC if the actual hardware capability (CE resource) of the NodeB is not enough. Then, the RNC adds the cause value to the VS.RRC.Rej.RL.Fail counter rather than to the corresponding CE congestion counter. You can observe the CE capability reported by the NodeB through the Iub NBAP signaling.
Figure 3 Cell audit message
The CE license configuration of the NodeB should be lower than the hardware capability of the NodeB. Why is the hardware capability not enough when the license capability is not congested? The reason is as follows: In the RAN10 or earlier versions, the NodeB reports the CE capability to the RNC according to the standard of configured licenses 110% regardless of the hardware capability. Therefore, there exists the scenario where the configured licenses exceed the hardware capability, which is not reasonable. The subsequent versions will make the following improvements: The following improvements are made on the NodeB: If the CE Licenses are not enough, the NodeB reports the following cause through the Iub interface: CELL_SYN_NOT_SUPP: The uplink CE licenses are not enough. CELL_SYN_ADJ_NOT_SUPP: The downlink CE licenses are not enough. The two cause values keep the design of the RAN10 version. If the hardware CE capability is not enough, the NodeB reports the following cause through the Iub interface: UL_RADIO_RESOURCES_NOT_AVAILABLE: The uplink hardware CE capability is not enough, the uplink logical resources (for example, FPID and CcTrchID) are not enough, and uplink subscribers are allocated. DL_RADIO_RESOURCES_NOT_AVAILABLE: The downlink hardware CE capability is not enough, and the downlink logical resources (for example, FPID and CcTrchID) are not enough. The following improvements are made on the RNC: Both CELL_SYN_NOT_SUPP and UL_RADIO_RESOURCES_NOT_AVAILABLE reported by the NodeB are considered as uplink CE insufficiency. Both CELL_SYN_ADJ_NOT_SUPP and DL_RADIO_RESOURCES_NOT_AVAILABLE reported by the NodeB are considered as downlink CE insufficiency. In addition, the access failure because of the preceding four causes is excluded from the VS.RRC.Rej.RL.Fail counter. It is improbable that the NodeB reports RADIO_RESOURCES_NOT_AVAILABLE for the insufficiency of other resources (for example, hardware resource). Therefore, you can basically determine that the problem is caused by the insufficiency of CEs.
Because of the preceding defects, the number of CE congestions is not accurate. When analyzing VS.RRC.Rej.UL.CE.Cong and VS.RRC.Rej.DL.CE.Cong, you also need to consider VS.RRC.Rej.RL.Fail. The common causes of CE congestion are as follows: −
High traffic
−
The residual CEs maintained by the NodeB are not consistent with those maintained by the RNC. In case of CE congestion, analyze the top N congested cells through the performance data, judge the causes of CE congestion, and propose appropriate solution suggestions.
Table 1 Indicator of cell CE congestion
Traffic
DL
UL
CS Erlang
VS.AMR.Ctrl.DL12.2
VS.AMR.Ctrl.UL12.2
VS.RB.DLConvCS.64
VS.RB.ULConvCS.64
PS Erlang
VS.RB.DLInterPS.8
VS.RB.ULInterPS.8
VS.RB.DLInterPS.16
VS.RB.ULInterPS.16
VS.RB.DLInterPS.32
VS.RB.ULInterPS.32
VS.RB.DLInterPS.64
VS.RB.ULInterPS.64
VS.RB.DLInterPS.128
VS.RB.ULInterPS.128
VS.RB.DLInterPS.144
VS.RB.ULInterPS.144
VS.RB.DLInterPS.256
VS.RB.ULInterPS.256
VS.RB.DLInterPS.384
VS.RB.ULInterPS.384
VS.RB.DLBkgPS.8
VS.RB.ULBkgPS.8
VS.RB.DLBkgPS.16
VS.RB.ULBkgPS.16
VS.RB.DLBkgPS.32
VS.RB.ULBkgPS.32
VS.RB.DLBkgPS.64
VS.RB.ULBkgPS.64
VS.RB.DLBkgPS.128
VS.RB.ULBkgPS.128
VS.RB.DLBkgPS.144
VS.RB.ULBkgPS.144
VS.RB.DLBkgPS.256
VS.RB.ULBkgPS.256
VS.RB.DLBkgPS.384
VS.RB.ULBkgPS.384
VS.HSDPA.UE.Mean.Cell
VS.HSUPA.UE.Mean.Cell
VS.HSDPA.MeanChThroughput
VS.HSUPA.MeanChThrough put
VS.HSDPA.MeanChThroughput .TotalBytes
VS.HSUPA.MeanChThrough put.TotalBytes
VS.RRC.AttConnEstab.Cell
VS.RRC.AttConnEstab.Cell
VS.RAB.AttEstab.AMR
VS.RAB.AttEstab.AMR
VS.RAB.AttEstCS.Conv.64
VS.RAB.AttEstCS.Conv.64
VS.RAB.AttEstabPS.Cell
VS.RAB.AttEstabPS.Cell
HSPA Traffic
Call Attempt Times
Congestion
CE Used Number
CE Used Number
NodeB Count
DL
UL
VS.HSDPA.RAB.AttEstab
VS.HSUPA.RAB.AttEstab
VS.LCC.LDR.Num.DLCE
VS.LCC.LDR.Num.ULCE
VS.LCC.LDR.Time.DLCE
VS.LCC.LDR.Time.ULCE
VS.RAB.FailEstPs.DLCE.Cong
VS.RAB.FailEstPs.ULCE.C ong
VS.RAB.FailEstCs.DLCE.Cong
VS.RAB.FailEstCs.ULCE.C ong
VS.RRC.Rej.DL.CE.Cong
VS.RRC.Rej.UL.CE.Cong
VS.RRC.Rej.RL.Fail
VS.RRC.Rej.RL.Fail
VS.LC.DLCreditUsed.CELL
VS.LC.ULCreditUsed.CELL
VS.LC.DLCreditUsed.CELL.M ax
VS.LC.ULCreditUsed.CELL .Max
VS.LC.DLCreditUsed.CELL.Mi n
VS.LC.ULCreditUsed.CELL .Min
VS.DLCE.Mean.Shared
VS.ULCE.Mean.Shared
VS.DLCE.Max.Shared
VS.ULCE.Max.Shared
Table 2 lists the number of CEs consumed by different services: Table 2 Number of CEs consumed by the DCH service Directio n
Spreadi ng Factor
Number of CEs Consumed
Corresponding Credits Consumed
Typical Traffic Class
DL
256
1
1
3.4 kbit/s SRB
UL
256
1
2
DL
128
1
1
UL
64
1
2
DL
128
1
1
UL
64
1
2
DL
32
2
2
UL
16
3
6
DL
64
1
1
UL
32
1.5
3
DL
32
2
2
13.6 kbit/s SRB
12.2 kbit/s AMR
64 kbit/s VP
32 kbps PS
64 kbit/s PS
Directio n
Spreadi ng Factor
Number of CEs Consumed
Corresponding Credits Consumed
UL
16
3
6
DL
16
4
4
UL
8
5
10
DL
8
8
8
UL
4
10
20
Typical Traffic Class
128 kbit/s PS
384 kbit/s PS
Table 3 Number of CEs consumed by the HSUPA service Direction
Spreading Factor
HSUPA Phase 1
HSUPA Phase 2
Typical Traffic Class
UL
64
1+1+1
1
-
UL
32
1+1+1.5
1.5
64 kbit/s
UL
16
1+1+3
3
128 kbit/s
UL
8
1+1+5
5
256 kbit/s
UL
4
1+1+10
10
384 kbit/s
UL
2 x SF4
1+1+20
20
1.45 Mbit/s
UL
2 x SF2
Not supported
32
2.04 Mbit/s
UL
2 x SF2 + 2 x SF4
Not supported
48
5.76 Mbit/s
The following section describes the judgment methods and solution suggestions in different scenarios of CE congestion:
High traffic congestion causes CE congestion : −
The CE Used Number is large, and approaches to the license capability.
−
CE admission denial occurs when traffic is in peak hours.
−
CE congestion disappears gradually while traffic decreases.
: −
Analyze the performance data of the congested cells, find the NodeB to which the congested cells belong, and obtain the KPI counter of all the cells of the NodeB.
−
Query the total number of CEs consumed by all cells under the NodeB (through the performance data of the RNC) and the CE count measured by the NodeB (the number of consumed CEs measured by the NodeB), and check whether they reach the upper limit of CE capability (uplink: license110%-UlHoCeResvSf; downlink:
license110%-DlHoCeCodeResvSf). UlHoCeResvSf and DlHoCeCodeResvSf are configured by the RNC MML. −
Calculate the number of consumed CEs equivalently through the number of RBs of each cell under the NodeB, and check whether they reach the upper limit of CE capability.
−
Check whether CE congestion disappears gradually while traffic (CE Used Number, RBs, and HSPA subscribers) decreases gradually. If all the preceding conditions are met, you can basically determine that high traffic causes CE congestion.
: You can take the following measures: −
If the CE-based LDR function is not enabled in the existing network, you can consider enabling the CE LDR algorithm to relieve the impacts of CE congestion.
−
If CE-based LDR function is enabled in the existing network, you can check whether VS.LCC.LDR.Num.DLCE, VS.LCC.LDR.Num.ULCE, VS.LCC.LDR.Time.DLCE, and VS.LCC.LDR.Time.ULCE are validated through the following performance data. If the preceding indicators do not measure the count and duration in LDR state, it indicates that the equipment does not enter the LDR state. The possible causes are as follows:
A)
The NodeB reports the CE capability according to the standard of configured licenses 110%. If the configured licenses 110% LDR threshold (UlLdrCreditSfResThd/DlLdrCreditSfResThd) exceeds the hardware capability of the NodeB, the equipment can never enter the LDR state. The RNC triggers the LDR function by judging whether the difference between the CE capability reported by the NodeB (configured licenses 110%) and the number of currently consumed CEs reaches the LDR threshold.
B)
The functions of the product are defective.
−
A)
If you enable HSUPA DCCC, you must configure HSUPA admission to be based on MBR access.
B)
If you enable dynamic CEs of the NodeB, you must disable HSUPA DCCC and configure HSUPA admission to be based on GBR access.
−
When the HSUPA function is enabled in the existing network, you can enable the dynamic CE function of the NodeB or HSUPA DCCC function if uplink CE congestion is severe. Note the following points:
Expand the capacity, purchase CEs, or add TRXs.
CE congestion is caused because the residual CEs maintained by the NodeB are not consistent with those maintained by the RNC. : −
The CE Used Number is not high enough to reach the license capability.
−
CE admission denial occurs even if traffic is not high.
: −
Analyze the performance data of the congested cells, find the NodeB to which the congested cells belong, and obtain the performance data of all the cells of the NodeB.
−
Query the total number of CEs consumed by all cells under the NodeB and the CE Count measured by the NodeB, and check whether they are below the upper limit of CEs.
−
Calculate the number of consumed CEs equivalently through the number of RBs of each cell under the NodeB (query the number of RBs of each cell through the performance data, and then calculate the number of consumed CEs according to the CE consumption rules), and check whether it is below the upper limit of CEs.
−
If all the preceding conditions are met, you can basically determine that the problem is caused because the residual CEs maintained by the NodeB are not consistent with those maintained by the RNC (the former is less than the latter). The possible cause is NodeB CE leakage.
: In case of NodeB CE leakage, you need to contact the Maintenance Department for further analysis.
VS.RRC.Rej.RL.Fail: During RRC connection setup, the NodeB judges setup failure. The possible cause is that the internal resources (hardware CE capability and logical resource) of the NodeB are not enough.
The hardware CE capability is not enough : −
The CE Used Number is large, and approaches to the upper limit of CEs.
−
In peak hours, RL Reject occurs more frequently.
−
RL Reject becomes normal gradually while traffic decreases.
: −
Analyze the performance data of the RL Reject cells, find the NodeB to which the RL Reject cells belong, and obtain the performance data of all the cells of the NodeB.
−
Query the total number of CEs consumed by all cells under the NodeB and the CE Count measured by the NodeB, and check whether they approach to the upper limit of the hardware CE capability of the NodeB. If all the preceding conditions are met, you can basically determine that the problem is caused by the constraint of hardware specifications of the NodeB. The NodeB reports the CE capability according to the standard of the configured licenses 110% regardless of the hardware specifications. If License110% UlHoCeResvSf or license110% DlHoCeCodeResvSf exceeds the hardware capability of the NodeB, the problem occurs.
:
−
In the subsequent R11 version, the hardware specifications are taken into account when the NodeB reports the CE capability. Then, the problem does not occur.
−
To avoid the problem, you can decrease the number of configured licenses. As a result, the impacts of congestion can be relieved through the LDR function.
Other internal resources of the NodeB are not enough The probability of occurrence is low. Feed back the occurrence (if available) to the R&D department for analysis.
VS.RRC.Rej.Code.Cong RRC setup rejection is mainly caused by the insufficiency of code resources. In a hightraffic scenario (for example, indoor micro-cell coverage), code resources may be not enough. You need to expand its capacity. Query the following count values and determine whether the problem is caused by high traffic.
Table 4 Analysis of cell code congestion indicators DL
Traffic
CS Erlang
VS.AMR. Ctrl.DL1 2.2
VS.RB.DLConvCS.64 PS Erlang
VS.RB.DLInterPS.8 VS.RB.DLInterPS.16 VS.RB.DLInterPS.32 VS.RB.DLInterPS.64 VS.RB.DLInterPS.128 VS.RB.DLInterPS.144 VS.RB.DLInterPS.256 VS.RB.DLInterPS.384 VS.RB.DLBkgPS.8 VS.RB.DLBkgPS.16 VS.RB.DLBkgPS.32 VS.RB.DLBkgPS.64 VS.RB.DLBkgPS.128 VS.RB.DLBkgPS.144 VS.RB.DLBkgPS.256 VS.RB.DLBkgPS.384
Congestion
VS.RAB.FailEstPs.Code.Cong VS.RAB.FailEstPs.Code.Cong VS.RRC.Rej.Code.Cong
: −
Check the code setting of the HSDPA. The following configuration is recommended: ADD CELLHSDPA: AllocCodeMode=Manual, HsPdschCodeNum=1; /// The RNC is statically configured with one HSPDSCH code. SET MACHSPARA: DYNCODESW=OPEN; /// Enable the dynamic code switch of the NodeB
−
Expand the capacity
VS.RRC.Rej.ULIUBBandCong/ VS.RRC.Rej.DLIUBBandCong RRC setup failure is mainly caused by the transmission congestion on the IUB interface. You can check the traffic and transmission configuration of the cells, and thus judge whether the problem is caused by the insufficiency of transmission resources.
Table 5 Analysis of transmission congestion indicators
Congesti on
Iub bandwidt h utility ratio
ATM
IP
DL
UL
VS.RRC.Rej.DLIUBBandCong
VS.RRC.Rej.ULIUBBa ndCong
VS.RAB.FailEstab.CS.DLIUBBand.Cong
VS.RAB.FailEstab.CS. ULIUBBand.Cong
VS.RAB.FailEstab.PS.DLIUBBand.Cong
VS.RAB.FailEstab.PS. ULIUBBand.Cong
VS.AAL2PATH.PVCLAYER.TXBYTES
VS.AAL2PATH.PVCL AYER.RXBYTES
VS.QAAL2.AllocedFwd.AAL2BitRate
VS.QAAL2.AllocedBw d.AAL2BitRate
VS.QAAL2.AllocedMaxFwd.AAL2BitRat e.Value
VS.QAAL2.AllocedMa xBwd.AAL2BitRate.Va lue
VS.IPPATH.IPLAYER.TXBYTES
VS.IPPATH.IPLAYER. RXBYTES
OS.ANI.IP.AllocedFwd
OS.ANI.IP.AllocedBwd
The following section describes several important concepts about Iub admission: −
Iub bandwidth admission is based on the allocated bandwidth regardless of the actual traffic.
−
In the versions later than the RAN10, the bandwidth is allocated for the PS service according to GBR Active Factor.
−
The RAN10 provides the corresponding count indicators for both actual traffic and allocated bandwidth of the Iub interface, but they need to be converted. The admission is based on the PVC traffic consumed by the user. All traffic needs to be converted to the PVC layer. The following section describes several important count indicators:
−
The following section describes the calculation of actual traffic on the Iub interface by taking the downlink as an example: ATM (kbps): SUM (VS.AAL2PATH.PVCLAYER.TXBYTES) 8 / 3600 / 1000 Meaning: Add up the traffic of all AAL2PATHs of the Iub, and have the sum divided by the time, thus obtaining the actual traffic (kbps) The traffic measurement is performed in the PVC layer, so it does not need to be converted. IP (kbps): SUM(VS.IPPATH.IPLAYER.TXBYTES) 8 / 3600 / 1000 Meaning: Add up the traffic of all IPPATHs of the Iub, and have the sum divided by the time, thus obtaining the actual traffic (kbps) The traffic measurement is performed in the IP layer, so it does not need to be converted.
−
By taking the downlink as an example, the following section describes bandwidth allocation of the Iub interface:
ATM (kbps): VS.QAAL2.AllocedFwd.AAL2BitRate 53 / 48 /1000 Meaning: Convert the allocated bandwidth of the Qaal2 adjacent point corresponding to the NodeB. The conversion from the AAL2 layer to the PVC layer is 53/48. IP (kbps): OS.ANI.IP.AllocedFwd /1000 Meaning: OS.ANI.IP.AllocedFwd is the traffic of the IP layer, so it does not need to be converted. Generally, the allocated bandwidth should be approximate to the actual traffic. Then, the configuration of the activation factor is appropriate. If there is a great difference between them, you can optimize the configuration of the activation factor appropriately. If IUB congestion causes RRC access failure, the reason is usually that traffic increases or the activation factor is not configured reasonably. Therefore, you need to increase the Iub bandwidth or optimize the configuration of the activation factor. The following section gives the judgment method and solution suggestions: : −
The allocated bandwidth of the Iub interface is high and is approximate to the configured bandwidth.
: −
Measure the actual traffic and allocated bandwidth (average value per hour) of the Iub interface through the performance data. If the allocated bandwidth is high and is approximate to the configured transmission bandwidth, the transmission bandwidth may be congested.
:
−
If the actual traffic is approximate to the allocated bandwidth, it indicates that high traffic causes transmission congestion. The first consideration is to expand the capacity and increase the bandwidth of the Iub interface.
−
If the actual traffic is low but the allocated bandwidth is high, it indicates that the problem is caused by the inappropriate setting of the activation factor. You can reduce the activation factor appropriately. Raise the transmission utilization.
−
Other possible optimization means are to modify the service GBR and modify the FP mode into the Silent mode. However, the two means are not recommended.
VS.RRC.Rej.AAL2.Fail: The AAL2 Path setup fails on the Iub interface because the transmission is abnormal. Such setup failure does not frequently occur in the existing network. If such cause leads to KPI deterioration, feed back the problem to the R&D department.
RRC.FailConnEstab.NoReply There are the following Noreply scenarios: −
Uu Noreply is caused by cell reselection over different subsystems.
−
The RNC receives the RRC Connection Request message sent by the UE and delivers the RRC Connection Setup message, but the UE does not receive the RRC Connection Setup message (excluding the part of cell reselection over different subsystems).
−
The UE receives the RRC Connection Setup message, but does not send the RRC Setup Complete message.
−
The UE sends the RRC Setup Complete message, but the RNC does not receive the message.
It is difficult to judge whether the UE receives the RRC Connection Setup message only through the CHR log or performance data. To attain a definite result, you must conduct drive test. Of course, you can attain the preliminary analysis result through the CHR log or performance data before conducting the drive test. The following section describes the judgment methods and solution suggestions in different scenarios:
Uu Noreply is caused by cell reselection over different subsystems : −
In the PCHR log, you can find that there exists the access success log nearby the point of access failure time of the same subscriber.
−
The analysis data of Germany’s O2 and Spain’s VDF shows that the part accounts for about 40% of total RRC access failure count.
: As instructed in the following operation guide, you can directly obtain the count and proportion of cell reselections over different subsystems. The operation remains yet to be attached here. The operation guide has been prepared well, but its size is large. Alternatively, analyze the problem as follows: For a RRC access failure recorded in the PCHR log, you can determine that the problem is caused by cell reselection over different subsystems under the following circumstances: −
The last access of the corresponding subscriber is normal
−
The RL Release time is later than the time of the current access failure.
Alternatively,
−
The next access of the corresponding subscriber is normal,
−
The difference between the time of the next normal access and the time of the current access failure is less than (N300+1) T300,
The cell of access failure and the cell of access success are not in the same subsystem. :
−
In the RNC RAN11 050, the RRC access failure caused by cell reselection over different subsystems is not considered as UU Noreply.
−
Provide a clarification report for the customer, thus explaining the impacts of cell reselection over different subsystems and excluding such impacts.
The RNC receives the RRC Connection Request message sent by the UE and delivers the RRC Connection Setup message, but the RNC does not receive the RRC Setup Complete message. : −
First exclude the RRC access failure caused by cell reselection over different subsystems through the PCHR log.
−
If UU Noreply is not caused by cell reselection over different subsystems, discriminate the following scenarios and then analyze the problem deeply: The RNC receives the RRC Connection Request message sent by the UE and delivers the RRC Connection Setup message, but the UE does not receive the RRC Connection Setup message (excluding the part of cell reselection over different subsystems). The UE receives the RRC Connection Setup message, but does not send the RRC Setup Complete message.
−
The UE sends the RRC Setup Complete message, but the RNC does not receive the message. For details about the judgment methods and solution suggestions, see the following section. However, the most direct method is to conduct drive test and make signaling analysis. Therefore, the in the following section define several common judgment criteria, which are not absolute.
The RNC receives the RRC Connection Request message sent by the UE and delivers the RRC Connection Setup message, but the RNC does not receive the message (excluding the part of cell reselection over different subsystems). : Through the IOS, you can find that the RRC Connection Setup message is sent repeatedly on the UU interface (based on the N300). The possible causes are as follows: −
The FACH coverage is poor.
−
The cell selection and reselection parameters are not set reasonably.
−
The equipment is abnormal or packets are lost during the transmission.
: −
Analyze the EC/N0 information reported by the UE in the RRC Connection Request message (you can obtain the EC/N0 information through the PCHR log). If the EC/N0 value is lower than 12 dB (the default value), it indicates that the problem is caused by poor coverage.
−
If the monitoring set in the RRC Connection Request message contains better cells, it indicates that the problem may be caused by cell reselection.
−
If the EC/N0 reported by the UE in the RRC Connection Request message is higher than 7 dB, it indicates that the equipment is abnormal or packets are lost during the transmission (which seldom occurs).
: −
If the problem is caused by poor coverage, you can take appropriate measures to enhance the coverage, for example, add sites to fill the blind spots and adjust the engineering parameters. If you cannot enhance the coverage, you can raise the RACH power appropriately. During the adjustment, you need to consider the PCPICH EC/Io coverage of the existing network. For example, if the pilot Ec/Io in the coverage area is higher than -12 dB after network optimization, you can ensure the access success rate of the UE at the 3G idle state as long as the matching proportion of the power of public channels is configured to ensure that the Ec/Io is higher than -12 dB. If the UE is reselected to the GSM when the pilot Ec/Io is lower than -12 dB, you can ensure the RRC setup success rate of the UE in a weak-signal coverage area after cell reselection over different subsystems as long as the matching proportion of the power of public channels is configured to ensure that the Ec/Io is higher than -14 dB.
−
If the cell selection and reselection parameters are not set reasonably, you can modify such parameters to raise the speed of cell selection and reselection.
−
If the EC/N0 value is ideal but the RRC Connection Setup message is not received, feed back the symptom to the R&D department.
The RRC CONNECTION SETUP message is carried by the FACH. The UE sends the RRC CONNECTION REQUEST message through the RACH after the preamble of the PRACH is received at the UTRAN side and the power of the preamble is used as the benchmark. The transmit power of the preamble can increase continuously until the UE receives a response (restricted by the maximum count of preamble retransmissions). In some poor-coverage areas, the imbalance may occur between the RACH coverage and FACH coverage. As a result, the RRC setup request sent by the UE can be received
at the UTRAN side, but the UE cannot receive the RRC Connection Setup sent by the RNC.
The UE receives the RRC Connection Setup message, but does not send the RRC Setup Complete message. : −
Through the IOS, you can find that the RRC Connection Setup message is sent infrequently on the UU interface and that the sending count does not reach the count as specified by the N300.
−
If RRC access is based on the DCH, you do not find the RL Restore message on the Iub interface.
−
If RRC access is based on the DCH, you can find that the transmit power of the UE is low. If both feature 1 and feature 2 (or feature 3) are available, it is probable that the UE receives the RRC Connection Setup message but does not send the RRC Setup Complete message. If RRC access is based on the CCH and feature 1 appears, it is probable that the UE receives the RRC Connection Setup message but does not send the RRC Setup Complete message or that the UE sends the RRC Setup Complete message but the RNC does not receive the message. The possible causes are as follows:
−
Downlink synchronization fails.
−
The UE is abnormal. If the RRC Setup Complete message is sent through the DCH, the UE does not send the RRC Setup Complete signaling on the uplink unless the downlink is synchronized in accordance with the description of Synchronization procedure A in procedure A (The UE shall not transmit on uplink until higher layers consider the downlink physical channel established). Section 25.214 gives the following description: The UE establishes downlink chip and frame synchronization of DPCCH, using the P-CCPCH timing and timing offset information notified from UTRAN. Frame synchronization can be confirmed using the frame synchronization word. Therefore, if the UE cannot synchronize the physical downlink channel, the cause may be related to the power of public channels or the power of initial downlink DPCCH. The power of public channels is determined when the cells are configured. Except the power of the PCPICH, the power of other channels is relative to that of the PCPICH. The power of the downlink DPCCH is informed to the NodeB by the RNC when the RL SETUP REQ message is sent. The power is estimated by using the open-loop power algorithm. The formula is as follows:
PTxInitial
CPICH _ Tx _ power R ( Eb / N o ) DL [ PtxTotal ] W ( E c / N o ) CPICH
refers to the downlink orthogonalization factor, and (Ec/No)cpich refers to the coverage status at the UE location. In addition to by guess, you can also judge whether the downlink is synchronized by the TPC received by the UE and transmit power of the UE. Section 25.214 gives the following description: UTRAN shall start the transmission of the downlink DPCCH and may start the transmission of DPDCH if any data is to be transmitted. The initial downlink DPCCH transmit power is set by higher layers. Downlink TPC commands are generated as described in 5.1.2.2.1.2. Therefore, the downlink DPCCH power is transmitted after the RL is established. In accordance with the preceding description, the Pattern of the downlink TPC command word is to insert one “1” after the n “01”s.
The n is informed to the NodeB by the RNC when cells are established. The parameter name is DlTpcPattern01Count. If the UE can resolve the downlink DPCCH, e2n+1 slots can raise the power by 1 dB until the NodeB judges that the uplink channel is synchronized. If the downlink is synchronized, the transmit power of the UE should increase from the minimum value to a high value within 1 second. If the UE does not show the symbol that the transmit power increases, you can basically determine that the physical downlink channel of the UE is not synchronized.
If the UE can normally receive the uplink TPC and raise the power according to TPC, you can determine that the UE is abnormal. : −
Check whether the transmit power of the UE increases till the maximum value. If the transmit power does not increase, it indicates that the downlink is not synchronized.
−
If the downlink power of the UE increases but the RRC Setup Complete message is not on the uplink, it indicates that the UE is abnormal.
:
−
If the downlink is not synchronized, you can raise the power of the PCPICH or raise the initial transmit power of the downlink DPCH. However, the RNC does not provide a parameter for controlling the initial transmit power of the downlink DPCH separately, but can only control the minimum transmit power of the DPCH. By configuring the minimum transmit power parameter of the DPCH, you can control its initial transmit power.
−
It is improbable that the UE is abnormal. If the UE is really abnormal, you can provide a clarification report or inquire the IOT about the related test results.
The UE sends the RRC Setup Complete message, but the RNC does not receive the message. : −
Through the IOS, you can find that the RRC Connection Setup message is sent infrequently on the UU interface and that the sending count does not reach the count as specified by the N300.
−
If RRC access is based on the DCH, you can find the RL Restore message on the Iub interface.
−
If RRC access is based on the CCH, the RACH has lots of bit errors. If both feature 1 and feature 2 (or feature 3) are available, it is probable that the UE sends the RRC Setup Complete message but the RNC does not receive the message.
The possible causes are as follows: −
The RACH has bit errors.
−
Uplink synchronization fails.
−
Packets are lost during the transmission.
: −
If RRC access is based on the CCH, it is possible that the RACH has bit errors. You can check the VS.ULBler.PSNrt.Rach8 and VS.MeanRTWP values. If VS.ULBler.PSNrt.Rach8 or VS.MeanRTWP is high, it is possible that the RTWP interference on the uplink causes the bit errors on the RACH and thus the RNC cannot receive the RRC Setup Complete message correctly.
−
If RRC access is based on the DCH, it is possible that the uplink is not synchronized. You can check whether the RL Restore Indication is available on the Iub interface. If not, it is possible that the initial transmit power of the dedicated uplink channel is relatively low.
−
If the RL Restore message is available but the RNC cannot receive the RRC Setup Complete message correctly, it is possible that packets are lost during the transmission or the equipment is abnormal. You need to feed back the symptom to the R&D department.
: −
If the RACH has bit errors and the RTWP is extremely high, eliminate the uplink interference according to the RTWP Check List.
−
If the problem is caused by the failure of uplink synchronization, the transmit power of the UE increases by controlling the initial uplink power, which occurs improbably. The occurrence of such problem can raise the Constant Value of the dedicated channel, thus raising the initial transmit power of the uplink DPCCH of the UE. In addition, the problem is related to the setting of the initial target value of the uplink SIR, which has a great impact on the initial uplink synchronization at the time of initial link establishment. If the parameter is set to an extremely large value, overhigh uplink interference may be caused to the link initially established for the UE. If the parameter is set to an extremely small value, the time of uplink synchronization is prolonged and even initial synchronization fails. The parameter is an RNC-level parameter and has a great impact on network performance. Therefore, you need to modify the parameter with caution.
The RRC CONNECTION SETUP COMPLETE message is sent through the DPCH, and the UE calculates the initial power of the uplink DPCCH according to the received IE"DPCCH_Power_offset" and measured CPICH_RSCP value. DPCCH_Initial_power = DPCCH_Power_offset - CPICH_RSCP DPCCH_Power_offset is equal to Primary CPICH DL TX Power + UL Interference + Constant Value. The Constant Value parameter can be configured on the background. If the Constant Value parameter is set to an extremely low value, it is possible that the transmit power of the UE is not enough when the UE sends the RRC CONNECTION SETUP COMPLETE message. However, the problem usually does not occur under the current default parameter setting (in the V13C03B151 version, the default value is -20). −
If the RL Restore message is received but the RRC Setup Complete message is not available, it is possible that packets are lost during the transmission or the equipment is abnormal. You need to feed back the symptom to the R&D department.
2.4 List of Problem Information
3
RAB Access Success Rate
Checklist for KPI Troubleshooting-2.4 .xls
(AMR/PS/VP/HSPA) 3.1 KPI Definition RAB setup success rate = (RAB setup success count)/(RAB attempt count) VS.RAB.SuccEstabCS.AMR.Cell.Rate = / VS.RAB.SuccEstabPS.Cell.Rate = ( + + + )/( + + + )
3.2 Influence Factors The process of RAB connection setup includes the following steps: 1.
The CN sends the RAB ASSIGNMENT REQUEST message to the RNC through the IU interface.
2.
After receiving the RAB ASSIGNMENT REQUEST message, the RNC determines that it needs to establish a new RAB. The RNC first performs resource admission.
3.
If resource admission fails, the RNC returns the RAB ASSIGNMENT RESPONSE message to the CN.
4.
If resource admission is successful, the RNC sends the RADIO BEARER SETUP message to the UE. If the radio bearer setup fails, the UE returns the RADIO BEARER SETUP FAILURE message to the RNC. If receiving the RADIO BEARER SETUP FAILURE message or no response, the RNC returns the RAB ASSIGNMENT RESPONSE message to the CN. RAB setup fails under the following scenarios:
The RNC receives the RAB ASSIGNMENT REQUEST message, and the admission of code, CE or power resource fails.
The RNC receives the RAB ASSIGNMENT REQUEST message. The admission of system resources (for example, the memory) fails.
After receiving the RAB ASSIGNMENT REQUEST message, the RNC sends the RADIO BEARER SETUP message to the UE, but does not receive the RADIO BEARER SETUP COMPLETE message sent by the UE.
After receiving the RAB ASSIGNMENT REQUEST message, the RNC sends the RADIO BEARER SETUP message to the UE and receives the RADIO BEARER SETUP FAILURE message sent by the UE. Usually, RAB setup failure is caused by the following factors:
Resource congestion
Downlink coverage
Downlink synchronization
Uplink synchronization
The equipment is abnormal.
RAB parameters unsupported Resource congestion includes power resource congestion, CE resource congestion, code resource congestion, and transmission resource congestion. For the problem caused resource congestion, you need to first check the actual utilization of resources, and analyze the correctness of congestion threshold and configurations. The problems related to downlink coverage and downlink synchronization mainly occur when RAB setup fails under the DRD scenarios.
3.3 Analysis Process 1.
Discussing the Problem, Ascertaining the Problem Background and Product Version, and Excluding the Impacts of Known Bugs Ask the field personnel to feed back the related information, obtain the known bug information about the corresponding version (you can inquire of the related contact person of the product or inquire about the information about similar problems of other sites), and determine whether the problem is a known problem. Determine the time at which the RAB setup success rate is changed, analyze whether the problem is caused by network adjustment, and focus on the impacts of network adjustment.
2.
Narrowing the Analysis Scope, Analyzing Whether the Problem Occurs in Only One or Two Cells, and Analyzing Whether the Top N Cells are Representative Analyze the change of the causes of RAB access failure through the performance counters on the RNC, and analyze which factor causes the decline of RAB setup success rate.
Table 1 Indicators of CS RAB setup failure Measurement Item Level 1
Sub Items
Sub Items
Level 2
Level 3
Sub Items Level 4
VS.RAB.FailEstCs.Power.Cong VS.RAB.FailEstCs.Code.Cong VS.RAB.FailEstab.CS.DLIUBBand.Cong
Sub Items
Level 2
Level 3 VS.RAB.FailEstabCS.Cong
VS.RAB.FailEstabCS.RNL
Level 1
Sub Items VS.RAB.FailEstCS.Unsp
Measurement Item
Sub Items Level 4
VS.RAB.FailEstab.CS.ULIUBBand.Cong VS.RAB.FailEstCs.ULCE.Cong VS.RAB.FailEstCs.DLCE.Cong
VS.RAB.FailEstabCS.Unsp.Other VS.RAB.FailEstCS.RIPFail VS.RAB.FailEstCS.Relo VS.RAB.FailEstabCS.RNL.Other VS.RAB.FailEstabCS.TNL VS.RAB.FailEstabCS.other.CELL
Table 2 Indicators of PS RAB setup failure Measurement Item
Sub Items Level
Sub Items Level 3
2
VS.RAB.FailEstPS.Unsp
VS.RAB.FailEstPS.RNL
Level 1 VS.RAB.FailEstPs.Power.Cong VS.RAB.FailEstPs.Code.Cong VS.RAB.FailEstab.PS.DLIUBBand.Cong VS.RAB.FailEstab.PS.ULIUBBand.Cong VS.RAB.FailEstPs.ULCE.Cong VS.RAB.FailEstPs.DLCE.Cong VS.RAB.FailEstabPS.Unsp.Other VS.RAB.FailEstPS.RIPFail VS.RAB.FailEstPS.Par VS.RAB.FailEstPS.Relo VS.RAB.FlEstPS.RNL.Other VS.RAB.FailEstPS.TNL VS.RAB.FailEstPS.NResAvail
VS.RAB.FailEstabPS.Other.Cell
3.
Analyzing the Causes of RAB Failure Deeply
VS.RAB.FailEstCs.Power.Cong /VS.RAB.FailEstPs.Power.Cong The RNC RRM performs power admission algorithm decision. If finding the decision on uplink or downlink admission denial, the RNC RRM initiates RAB setup rejection. Power congestion occurs when the power admission switch is enabled (by running the ADD CELLALGOSWITCH:; command) and network load is high. If RAB setup success rate decreases because the indicator value becomes large suddenly, find the Top N cells that cause power congestion and then query the changes of the maximum RTWP (VS.MaxRTWP) and maximum TCP (VS.MaxTCP) of the TOP N cells. If the RTWP increases severely, it indicates that the problem is caused by uplink power congestion. If the TCP increases severely, it indicates that the problem is caused by downlink power congestion. For details about the causes of the rise in the RTWP and TCP, judgment methods, and solution suggestions, see the section
Analyzing the Main Causes of RRC Access Failure Deeply VS.RAB.FailEstCs.ULCE.Cong/VS.RAB.FailEstCs.DLCE.Cong /VS.RAB.FailEstPs.ULCE.Cong/ VS.RAB.FailEstPs.DLCE.Cong The RNC RRM makes access algorithm decision. The RNC RRM can find the admission denial because of the insufficiency of uplink or downlink CE resources, or the count of RAB rejections because the NodeB returns CE Congestion when the RNC delivers the RL_SETUP message. The common causes of CE congestion are as follows: −
High traffic
−
The residual CEs maintained by the NodeB are not consistent with those maintained by the RNC. For details about the analysis methods and solution suggestions, see the section VS.RRC.Rej.UL.CE.Cong/ VS.RRC.Rej.DL.CE.Cong. For details about the analysis methods and solution suggestions, see the
VS.RAB.FailEstCs.Code.Cong /VS.RAB.FailEstPs.Code.Cong RAB setup rejection is mainly caused by the insufficiency of code resources. In a hightraffic scenario (for example, indoor micro-cell coverage), code resources may be not enough. You need to expand its capacity. Query Table 4 to determine whether the problem is caused by high traffic. : −
Check the code setting of the HSDPA. The following configuration is recommended: ADD CELLHSDPA: AllocCodeMode=Manual, HsPdschCodeNum=1; /// The RNC is statically configured with one HSPDSCH code. SET MACHSPARA: DYNCODESW=OPEN; /// Enable the dynamic code switch of the NodeB
−
Expand the capacity
VS.RAB.FailEstab.CS.DLIUBBand.Cong/VS.RAB.FailEstab.CS.ULIUBBand.Con g/VS.RAB.FailEstab.PS.DLIUBBand.Cong/VS.RAB.FailEstab.PS.ULIUBBand.Co ng RRC setup failure is mainly caused by the transmission congestion on the IUB interface. You can check the traffic and transmission configuration of the cells, and thus judge
whether the problem is caused by the insufficiency of transmission resources. For details about the related counts, see Table 5. For details about the analysis methods and solution suggestions, see the section VS.RRC.Rej.ULIUBBandCong/ VS.RRC.Rej.DLIUBBandCong.
VS.RAB.FailEstabCS.Unsp.Other/ VS.RAB.FailEstabPS.Unsp.Other The RAB setup failure here includes the following failure: −
The QoS parameters require the RNC not to support RAB setup.
−
RRM admission fails. The RAB setup failure here does not include the failure because of CE congestion, code congestion, Iub congestion, or power congestion. The common cause is as follows: The insufficiency of NodeB resources leads to the RL Recfg failure. Through the IOS tracing of the top N cells, you can judge whether the RL Recfg Fail on the Iub interface is caused by “RADIO_RESOURCES_NOT_AVAILABLE”. The insufficiency of resources includes the insufficiency of CE hardware resources and other resources.
The CE Hardware Resources are not Enough : −
The CE Used Number is large, and approaches to the upper limit of CEs.
−
In peak hours, Unsp.Other occurs more frequently.
−
Unsp.Other becomes normal gradually while traffic decreases.
−
Check the signaling on the Iub interface, and determine whether RL Recfg Fail is caused by “RADIO_RESOURCES_NOT_AVAILABLE”.
: −
Analyze the performance data of the Unsp.Other cells, find the NodeB to which the Unsp.Other cells belong, and obtain the performance data of all the cells of the NodeB.
−
Query the total number of CEs consumed by all cells under the NodeB and the CE Count measured by the NodeB, and check whether they approach to the upper limit of the hardware CE capability of the NodeB. If all the preceding conditions are met, you can basically determine that the problem is caused by the constraint of hardware specifications of the NodeB. The NodeB reports the CE capability according to the standard of the configured licenses 110% regardless of the hardware specifications. If License110% UlHoCeResvSf or license110% DlHoCeCodeResvSf exceeds the hardware capability of the NodeB, the problem occurs.
:
−
In the subsequent R11 version, the hardware specifications are taken into account when the NodeB reports the CE capability. Then, the problem does not occur.
−
To avoid the problem, you can decrease the number of configured licenses. As a result, the impacts of congestion can be relieved through the LDR function.
Unsp.Other Failure Caused by other Factors Collect the IOS information about the performance data, CHR, and top N cells, and return the information to the R&D department for analysis.
VS.RAB.FailEstCS.RIPFail/ VS.RAB.FailEstPS.RIPFail
When the RNC sends the RAB ASSIGNMENT RESPONSE message about RAB assignment failure to the CN, the indicator is measured in the best cell of the UE if the failure cause value is “Failure in the Radio Interface Procedure”. When analyzing such failure, you need to consider the cause of RB setup failure, and analyze the cause of RIPFail more deeply. Table 1 lists the related counts. Table 1 Indicators of PS RB setup failure Measurement Item
Description
VS.FailRBSetup.CfgUnsup
Configuration unsupported
VS.FailRBSetup.PhyChFail
Physical channel failure
VS.FailRBSetup.CellUpd
Cell update occurred
VS.FailRBSetup.IncCfg
Invalid configuration
VS.FailRBSetup.NoReply
No reply
The following section describes the judgment methods and solution suggestions in different RIPFail scenarios:
VS.FailRBSetup.CfgUnsup In the RB setup phase, the UE returns the RB setup failure message. The cause value is “Configuration unsupported”. Usually, the failure is mainly caused because the UE capability does not support RB setup. For example, the UE receives the RB setup request for the VP service (the VP is calling or called) when the UE is using the 128-Kbps downlink data service. Most terminals do not support the concurrent VP service and high-speed PS service on the downlink. Therefore, the UE directly returns the RB setup failure message, and the cause value is “unsupported configuration”. If such failures increase or the failure is the main factor that affects the RAB access success rate, you can analyze the distribution of the UEs that undergo the failure according to the PCHR and determine whether the failure focuses on specific subscribers. If yes, it indicates that the performance of the UEs is defective.
VS.FailRBSetup.PhyChFail In the RB setup phase, the UE returns the RB setup failure message and the cause value is “Physical channel failure”. After the UE receives the RB SETUP message, the downlink DPDCH cannot be synchronized. For details about the synchronization-related problems, see the section RRC.FailConnEstab.NoReply. The following section describes the optimization measures: If the downlink is not synchronized, you can raise the power of the PCPICH or raise the initial transmit power of the downlink DPCH. However, the RNC does not provide a parameter for controlling the initial transmit power of the downlink DPCH separately, but can only control the minimum transmit power of the DPCH. By configuring the minimum transmit power parameter of the DPCH, you can control its initial transmit power.
VS.FailRBSetup.CellUpd At present, the customer has not encountered the RAB access failure because of the cause. In case of such failure, return the problem to the R&D department for analysis.
VS.FailRBSetup.IncCfg In the RB setup phase, the UE returns the RB setup failure message. The cause value is “Invalid Configuration”. If such failures increase or the failure is the main factor that
affects the RAB access success rate, you can analyze the distribution of the UEs that undergo the failure according to the PCHR. The current analysis shows that some terminals report the RB setup failure because of “invalid Configuration” incorrectly in the following specific flows: After receiving the RB SETUP message and before returning the RB SETUP COMPLETE message, the UE returns the RRC_RB_SETUP_FAIL (invalid Configuration) message to the RNC if receiving the RRC_DL_DIR_TRANSF (Disconnect) message. For details, see the following figure. Table 2 Flow on RB setup failure because of invalid configuration
On the IU interface, the scenario is Normal Release and the failure cannot be considered as RAB access failure, as shown in the following figure.
Table 3 Models of the known UEs that have invalid configuration IMEI(IMSI)
UE TYPE
Produce by
35170801.429053.0(262073937151768)
K800C/K800i
SonyEricsson
35159602.421420.0(262074970737150)
W880i
SonyEricsson
35342701.649470.0(262074905086910)
K800C/K800i
SonyEricsson
VS.FailRBSetup.NoReply In the RB setup phase, the RNC delivers the RB SETUP message, but does not receive any response. Therefore, the RNC considers that RB setup fails. The main causes are as follows: −
The downlink SRB1 is abnormal, so the UE does not receive the RB SETUP message. The RNC RLC is reset or RbSetupRspTmr times out.
−
The UE receives the RB SETUP message and returns the RB SETUP COMPLETE message. However, the NodeB cannot demodulate the RB SETUP COMPLETE message because the uplink SRB2 is abnormal.
If the UE receives the RB SETUP message but the downlink cannot be synchronized, the UE returns the RB SETUP FAILURE message with the cause value of “Physical channel failure”. In this case, the setup failure is considered as VS.FailRBSetup.PhyChFail rather than VS.FailRBSetup.NoReply.
The following section describes the judgment methods and solution suggestions in different scenarios: You need to first trace the IFTS(L2 DATA Report Timer=100s) of top N cells. You can trace abnormal signaling. Through signaling analysis, check whether the uplink SRB2 of the UE sends new data packets after the RB SETUP message is delivered. If yes, you can think that the UE receives the RB SETUP message and returns the RB SETUP COMPLETE message. If the uplink or downlink BER is high, you can analyze whether the problem is related to the DCH activation time, that is, whether the activation time of the UE is not consistent with that of the NodeB (note: The problem occurs in only Sony Ericsson’s UEs). The primary cause is that there exists interference on the uplink or the downlink coverage is poor. Especially in the double-TRX DRD scenario, the EC/N0 difference between TRXs is great because of the imbalance of coverage between carriers. If all the cases of RB setup timeout occur in the DRD scenario, you can raise the access success rate by optimizing the DRD parameters and controlling the DRD occurrence frequency. If both TRXs support R99/HSPA, you can optimize the DRD parameters by using the DRD algorithm based on load balance.
−
Enable the DRD algorithm switch for the HSDPA service (LdbDRDSwitchHSDPA)
−
Raise the DRD offset for the HSDPA service (LdbDRDOffsetHSDPA)
−
Lower the DRD power remainder threshold for the HSDPA service (LdbDRDLoadRemainThdHSDPA)
−
Modify the parameter TIMERPOLL of the RLC layer (to the optimized value: 120) to increase the SRB retransmission opportunities
−
Raise the DRD EcNo threshold.
VS.RAB.FailEstCS.Relo/ VS.RAB.FailEstCS.Relo While initiating the migration, the RNC receives the RAT SETUP Request message. The RNC does not process the request. The problem is caused by flow embedment and occurs improbably. It is related to the time sequence of subscriber behaviors. The problem is usually controlled in the core network.
VS.RAB.FailEstabCS.TNL/VS.RAB.FailEstPS.TNL
RAB setup fails because of the failure of transmission establishment. The problem occurs improbably. If the problem occurs, collect the performance data, CHR log, and IOS information about the top N cells and return the information to the R&D department for analysis.
VS.RAB.FailEstPS.Par The RNC considers that the parameters delivered by the core network are invalid. The problem occurs improbably. If the problem occurs, trace the IOS data of the top N cells and return the data to the R&D department for analyzing the detailed RAB setup information.
Other If you find the increase in the failures because of VS.RAB.FlEstPS.RNL.Other, VS.RAB.FailEstabPS.Other.Cell, VS.RAB.FailEstabCS.RNL.Other, and VS.RAB.FailEstabCS.other.CELL, collect the performance data, CHR log, and IOS data of top N cells and return the information to the R&D department for analysis.
3.4 List of Problem Information
4
Checklist for KPI Troubleshooting-3.4 .xls
Handover Success Rate (SHO/HHO)
In an actual commercial network, the handover-related problems are closely related to call drop. In most cases, handover failure leads to call drop. Therefore, the chapter describes the call drop caused by handover in the sections about handover success rate. Chapter 6 only describes the call drop that is not caused by handover failure. By the handover scenario, handover is categorized into soft handover (softer handover), intrafrequency hard handover, inter-frequency hard handover, and inter-RAT handover. By the handover service, handover is categorized into CS (AMR and VP), PS R99, HSDPA, and HSUPA. The chapter categorizes handover by the handover scenario, and describes the success rate of soft handover and inter-frequency hard handover. Intra-frequency soft handover seldom occurs. Intra-frequency hard handover only occurs when soft handover is not supported, for example:
Handover between RNC intra-frequency cells when no Iur interface is available
The Iur interface is available but the Iur interface resources are not enough
Handover because of the control of the rate threshold of the PS service of the cells The inter-RAT interoperations involve the interoperations between the UMTS, GSM, and CN. Therefore, Chapter 7 gives a description separately.
4.1 Problems Related to Soft Handover Success Rate 4.1.1 KPI Definition The following section defines the soft handover success rate, thus laying a basis for the analysis of soft handover success rate. 1.
Soft Handover Success Rate of CS Service and PS R99 Service VS.SHO.Success.Cell.Rate = ( + )/( + )
2.
Change Success Rate of HSDPA Serving Cell
VS.HSDPA.ServCellChg.Succ.Rate = / 3.
Change Success Rate of HSUPA Serving Cell VS.HSUPA.SHO.ServCellChg.Succ.Ratio = /
4.1.2 Influence Factors The following factors affect the soft handover success rate: 1.
Some Neighboring Cells are not Configured During the initial optimization, call drop is mainly caused because some neighboring cells are not configured. For the intra-frequency neighboring cells, you can check whether intra-frequency neighboring cells are not configured by using the following methods: Method 1: Observe the EcIo information about the active set recorded by the UE and the Best Server EcIo information recorded by the Scanner before call drop. If the EcIo recorded by the UE is poor but the Best Server EcIo recorded by the Scanner is ideal, check whether the Best Server scrambling code recorded by the Scanner appears in the latest neighboring cell list of intra-frequency measurement control before call drop. If the neighboring cell list of intra-frequency measurement control has no such scrambling code, you can determine that some neighboring cells are not configured. Method 2: If the UE accesses a cell immediately after call drop and the scrambling code of the accessed cell is not consistent with the scrambling code at the time of call drop, you can also suspect that some neighboring cells are not configured. You can further analyze the problem by measurement control (find the latest intra-frequency measurement control message by starting from the message at the call drop position) and check the neighboring cell list of the measurement control message. Method 3: Some UEs report the Detected Set information. If the Detected Set information contains the corresponding scrambling code information before call drop, you can also determine that some neighboring cells are not configured. Call drop can be caused if some neighboring cells are not configured. The redundancy of neighboring cells also has impacts upon network performance. For example, the consumption of intra-frequency measurement of the UE is increased and in serious cases, cells cannot be added to neighboring cells. Therefore, you also need to show concern for the redundancy of neighboring cells when analyzing the handover-related problems.
2.
Pilot Pollution Usually, pilot pollution is defined as follows: There exist over many strong pilots at a point, but there is no primary pilot that is strong enough. Therefore, you need to confirm the following contents when the pilot pollution criteria are laid down. −
Definition of “Strong pilot”
−
Definition of “Overmany”
−
Definition of “there is no primary pilot that is strong enough”
Definition of “Strong pilot” You can determine whether a pilot is a strong pilot according to its absolute strength. You can measure the pilot strength through its RSCP. If the RSCP of the pilot exceeds a certain threshold, you can determine that the pilot is a strong pilot, for example,
CPICH _ RSCP ThRSCP _ Absolute Definition of “Overmany”
You can judge whether there are overmany pilots at a point through the number of pilots. If the number of pilots at a point exceeds a certain threshold, you can determine that there are overmany pilots at the point, for example,
CPICH _ Number ThN Definition of “there is no primary pilot that is strong enough” You can determine whether there is a primary pilot strong enough according to the relative strength of multiple pilots at the point. If the difference between the signal
(Th 1)
N strength of the strongest pilot and the signal strength of the th strongest pilot at a point is below a certain threshold, you can determine that there is no primary pilot strong enough at the point, that is,
(CPICH _ RSCP1st CPICH _ RSCP(ThN 1) th ) ThRSCP _ Re lative If the following condition is met, you can determine that there exists pilot pollution at the point:
ThN pilots that meet the condition: CPICH _ RSCP ThRSCP _ Absolute
There are more than
.
(CPICH _ RSCP1st CPICH _ RSCP(ThN 1) th ) ThRSCP _ Re lative
Th
95dBm ThN 3
RSCP _ Absolute Assume that , criteria for pilot pollution are as follows:
, and
ThRSCP _ Re lative 5dB
, the
There are more than 3 pilots that meet the condition: CPICH _ RSCP 95dBm .
(CPICH _ RSCP1st CPICH _ RSCP4th ) 5dB . If both conditions are met, you can determine that there exists pilot pollution. 3.
The Parameters of the Soft Handover Algorithm are not Set Correctly You can adjust the handover algorithm to solve two types of problems: Handover too late and ping-pong handover. Judging from the signaling flow, handover too late has the following symptom: For the CS service, the UE does not receive the active set update message (for intra-frequency hard handover, the UE does not receive the physical channel reconfiguration message). The cause is as follows: The EcIo of the source cell signals decreases sharply after the UE reports the measurement report, and the UE switches off the transmitter because of downlink out-of-step when the RNC sends the active set update message; judging at the UE side, the active set update message is not received. For the PS service, it is possible that the active set update message is not received or TRB reset occurs before the handover. Judging from signals, handover too late has the following symptoms: −
Corner effect: The EcIo of the source cell decreases sharply, and the EcIo of the target cell increases sharply (increase to a high value suddenly).
−
Pinpoint effect: The EcIo of the source cell decreases sharply for some time and then increases, and the EcIo of the target cell increases sharply within a short period. Judging from the signaling flow, the UE reports the 1a or 1c measurement report of the neighboring cell before call drop and the RNC receives the measurement report
and delivers the active set update message, but the UE does not receive the active set update message. Ping-pong handover has the following two symptoms: −
The dominant cell is changed quickly: Two or more cells become the dominant cell alternatively. The dominant cell has desirable RSCP and EcIo, and each cell acts as the dominant cell for a short period.
−
There is no dominant cell: There are multiple cells, the RSCP is normal, the RSCP difference is not great between the cells, and the EcIo of each cell is poor. Judging from the signaling flow, you can see the following symptom: After a cell is deleted, the 1A event of the cell is immediately reported and the active set update message sent by the RNC cannot be received, thus causing the failure.
4.
The Equipment (Including the UE) is Abnormal Check whether there are abnormal alarms on the alarm subsystem, analyze the traced messages, and determine at which step soft handover occurs in the flow by querying the failure message resolution. You can ask the local customer service and engineering personnel to determine whether the equipment is abnormal. Note that the exception handling of the UE or instability of transmission quality is also a common factor that lowers the handover success rate. In the latest version, it is also possible that handover failure is caused by the version quality. If the problem is not a known problem, you must ask the R&D personnel to participate in the analysis.
4.1.3 Analysis Process 1.
Discussing the Problem, Ascertaining the Problem Background and Product Version, and Ruling Out the Possibility of Known Bugs You need to first ask the field personnel to feed back the related information and symptoms of the problem, and then obtain the information about the known bugs of the corresponding version by inquiring the contact persons of the RNC and NodeB or referring to the Release Notes. In this way, you can determine whether the problem is caused by a known version problem, for example, whether soft handover failure is caused by abnormal power control of the version. Determine the time at which the handover success rate is changed, analyze whether the problem is caused by network adjustment (for example, add or relocate sites), and focus on the impacts of network adjustment. Judging from the experiences, the soft handover success rate of a mature commercial network is barely deteriorated at sudden. If the KPI is deteriorated severely in many areas, the cause is usually as follows: The network is newly built or relocated, so some neighboring cells are not configured. During the relocation, the interoperations are performed between the Iur interface of the local RNC and the Iur interface of the peer RNC. Therefore, the latest actions performed on the network are critical information.
2.
Narrowing Down the Analysis Scope, Analyzing Whether the Low Handover Success Rate Is Caused by Certain Cells, and Analyzing the Performance data About Soft Handover Failure If the preceding causes are ruled out, you need to analyze the performance data. The performance data is one of the most information sources for network optimization and also the main evaluation criterion for network performance. The handover-related performance data can be obtained from the RNC and cells. The RNC-oriented performance data can reflect the handover performance of the whole network, and the cell-oriented performance data can help you locate the faulty cells.
The flow of soft handover includes the soft handover preparation process and soft handover air-interface process. The preparation process indicates the process from handover decision to completion of RL setup. The air-interface process refers to the update process of the active set. Check whether the soft handover success rate of the entire network and cells in busy hours complies with the standard. If not, analyze the change in the handover success rate of the cells and in the handover failure count and thus judge whether the problem is caused by the performance worsening of certain cells. Analyze the change tendency of the top N cells in the handover success rate and handover failure count, compare them with the change tendency of the entire network in the handover success rate and handover failure count, and thus judge whether the top N cells can show the information about the handover success rate in the entire network. If the top N cells show such information in the entire network, you can address the top N cells. Then, you can determine the main causes for the worsening (or not incompliance with the standard) of the handover success rate, that is, find the main count cause value of handover failure. Table 1 lists the main count cause values for soft handover failure. Table 1 Indicators related to soft handover failure Indicator
Description
SHO.FailRLAddUESide.Cf gUnsup
Number oft handover RL failures of the cells (the cause value is “Configuration Unsupported”.) The UE thinks that the active set update contents of adding or deleting links by the RNC are not supported. Basically, the scenario does not occur in a commercial network.
SHO.FailRLAddUESide.Isr
Number of soft handover RL failures of the cells (the cause value is incompatible simultaneous reconfiguration) The UE feeds back that the soft or softer handover process of adding or deleting links by the RNC is not compatible with other concurrent processes. The RNC ensures serial processing during the flow processing. The problem is caused mainly because the processing of some UEs is defective.
SHO.FailRLAddUESide.In vCfg
Number of soft handover RL failures of the cells (the cause value is invalid configuration) The UE thinks that the active set update contents of adding or deleting links by the RNC are invalid. Basically, the scenario does not occur in a commercial network.
SHO.FailRLAddUESide.N oReply
The RNC does not receive the response to the active set update command of adding or deleting links. It is the main cause of soft or softer handover failure in the network, and mainly occurs in the area where the coverage quality is poor or the handover area is small. You need to first consider RF optimization. It is also possible that the equipment is abnormal.
Other
Soft handover failure caused by other factors
For the change failure of the serving cell of HSDPA/HSUPA, no dedicated cause count is available. There is only one failure scenario: After the RNC sends the PHYSICAL CHANNEL RECONFIGRATION message to the UE, the RNC does not receive the PHYSICAL CHANNEL RECFG COMPLETE message returned by the UE. You need to analyze such failure through the CHR log and signaling tracing. 3.
Analyzing the Main Scenarios of Handover Failure After you can basically lock the top N cells of handover failure and the performance counter of handover failure, you need to analyze the CHR log and IOS data. The analysis procedure is as follows: −
Through the CHR analysis (by using the OMSTAR tool), judge whether soft handover failure focuses on a certain UE (for details about the analysis methods, see Chapter 7). If you can determine that the problem is caused by the UEs of several certain IMSIs, you can query the IMEI sequence numbers of the IMSIs. The IMEI sequence number is a 15-digit number and is the hardware identification mark of the UE. The first 8 digits indicate the vendor and model of the UE. If you can determine that the problem usually occurs in the UE of a certain model, you can consider the compatibility of terminals emphatically.
−
Enable the IOS tracing of the top N cells to obtain the signaling of air-interface failure. Through the signaling, you can infer whether the problem is an uplink problem or downlink problem and analyze the failure scenarios from the IOS signaling (for example, whether the problem is related to encryption or a specific flow).
−
When finding the main failure scenarios, you can consider enabling the IFTS tracing (with the L2 user plane) to further analyze the failure cause.
−
IFTS tracing enables you to trace the detailed information at the CDT user plane level. For the handover failure in the soft handover preparation phase (RL_ADD/RL_SETUP), for example, the resource request failure, you can obtain the detailed print information about such failure. In the Uu Noreply scenario where the ASU signaling message is sent but the ASU CMP message is not received, you can also obtain the valid RLC-layer information from the user-plane message. As a result, you can determine which of the following factors causes handover failure:
−
The UE downlink does not receive the ASU message.
−
The UE uplink returns the ASU CMP message, but the RNC does not receive the ASU CMP message. You can also trace the downlink BLER, RSCP, and Ec/No at the same time to know the quality of the uplink/downlink signals at the time when the problem occurs, thus helping you analyze the preceding RF problems (including the pilot pollution, corner effect, and pinpoint effect).
IFTS tracing is similar to IOS tracing, which involves selecting eligible subscribers in the specified cell at random for tracing. IFTS tracing enables you to trace the detailed information at the CDT user-plane level, thus facilitating deep analysis. However, only one subscriber can be selected in each cell at a time. If the KPIs (for example, the handover success rate and call drop rate) are changed slightly, the effect of IFTS tracing is not desirable and you can hardly trace the valid data. Usually, you can trace the IOS information to know the main flow of the problem, and then trace the IFTS data. 4.
Conducting Drive Tests on Site, and Analyzing the Causes Deeply Normally, you can determine the causes of the general air-interface RF problems, soft handover failure in the preparation phase, and FP synchronization failure by analyzing the preceding CHR log, performance data, IOS data, and IFTS data. If the signal quality is good during the handover and the Uu Noreply problem occurs, you need to obtain
more data. Sometimes, you need to conduct drive tests on site, reproduce the handover failure, and obtain the QXDM log (including the L2 and L1 information) of the UE for analysis. The drive test must be well targeted. That is, you need to determine the main scenarios or top N cells that affect the handover success rate. For example, you can determine the handover that can be reproduced more easily: −
Handover from cell A to cell B
−
Ping-pong handover between cell A and cell B You can also determine to perform FTP download or ping small packets. As a result, you can ensure high possibility of reproducing the handover failure through the drive test and obtain the main failure signaling. While conducting the drive test, enable NodeB CDT tracing and RNC CDT tracing and analyze their signaling. Generally, Uu Noreply has the following symptoms:
−
The RNC does not send the ASU signaling to the NodeB effectively. For example, packets are lost during the Iub transmission.
−
After receiving the ASU signaling, the NodeB does not send the ASU signaling through the air interface successfully.
−
After the ASU signaling is sent from the air interface, the UE does not receive the ASU signaling. This case is rare when the signal quality is good.
−
The UE does not send the ASU CMP message. This is a UE bug, and barely occurs in existing commercial networks.
−
The UE sends the ASU CMP message, but the NodeB does not demodulate or decode the message successfully.
−
The NodeB sends data, but the RNC L3 does not receive the data. For example, packets are lost during the Iub transmission. Obviously, only the 3rd and 4th symptoms may be caused by UE anomaly. If packets are lost on the Iub interface, you can check the transmission quality through the IPPM or VCLPM. If the transmission quality is poor, you need to solve the transmission problem and check the effect. Other symptoms are mostly caused by internal defects of the product, and require support from the R&D department.
4.1.4 Cases of Soft Handover Failure 1.
Problem Description In June 2006, subscribers of the PCCW office in Hong Kong complained that call drop easily occurs when they exited a tunnel. The performance data showed that the call drop rate of the entire network did not increase greatly and the complaint was a single-point complaint. The technical personnel conducted a drive test on site, and captured the data at the RNC side and UE side. The analysis showed that the signals outside the tunnel were strong but the UE did not report the 1A event, thus causing the call drop.
2.
Problem Analysis The analysis shows that the signal quality is not good in cell 486 of the active set, the signal quality is good constantly in cell 472 of the monitoring set, and the conditions for reporting the 1A event are met. For details, see Figure 1.
Figure 1 Cell signal quality
The UE never reports the 1A event. Finally, call drop occurs because the signal quality of cell 486 is extremely poor. Why does the UE not report the 1A event? The possible causes are as follows: −
It is originally suspected that some neighboring cells are not configured, but you can see cell 472 in the monitoring set. Therefore, the problem is surely not caused because some neighboring cells are not configured.
−
Query the configured soft handover threshold, but no anomaly is found.
−
Is the UE abnormal? During the test, the UE can report the 1A event and other cells, for example, the measurement report is shown in Figure 2.
Figure 2 Measurement report
In addition, Huawei 636 UE also encounters the similar problem. Therefore, it indicates that the problem does not occur in a single UE. Why does the UE not report the A1 event and cell 472 when the UE is in serving cell 486? View the measurement control information again, you can find a difference in neighboring cell 472 and other neighboring cells: When cell 486 is configured with neighboring cell 472, CIO is set to 10; for other neighboring cells, CIO is set to 0.
Figure 3 CIO offset parameter
Is it the CIO configuration that causes the problem? The protocol regulates that: CIO indicates that the cell individual offset shall be used for event evaluation. That is, add the CIO value and the measured CPICH value of the cell and use the sum for the event evaluation process of the UE. The current measurement decision uses the EEC/N0. Its value ranges from 0 to 24 dB. At this time, CIO is set to 10, that is, 5 dB. When the signal quality of cell 472 is good, it is possible that the decision is incorrect because the UE calculation is overflowed. You can inquire the terminal development personnel about the impacts of the CIO configuration upon the reporting of the measurement report by the UE. They obtain Qualcomm’s answer: For the 636 UE, the CIO configuration causes the bug when the UE reports the measurement report. In the 526 UE, the problem has been solved. 3.
Conclusion If CIO is set to an extremely large value, the UE does not report the 1A event, which is the UE’s bug. During the event evaluation of the UE, the UE calculation may be overflowed at the time of measurement decision if CIO is configured, thus causing the decision error. Qualcomm admits that Huawei 636 UE has the bug, and also points out that the problem has been solved in Huawei 526 UE.
4.2 Problems Related to Hard Handover Success Rate 4.2.1 KPI Definition 1.
Hard Handover Success Rate of CS Service and PS R99 Service VS.HHO.InterFreq.Out.Cell.Rate = / VS.HHO.InterFreq.In.Cell.Rate = /
2.
Change Success Rate of HSDPA Serving Cell (Inter-Frequency Handover) VS.HSDPA.ServCellChg.Succ.Rate = /
3.
Change Success Rate of HSUPA Serving Cell (Inter-Frequency Handover) VS.HSUPA.SHO.ServCellChg.Succ.Ratio = /
4.2.2 Influence Factors The following factors affect the soft handover success rate: 1.
Some Neighboring Cells Are Not Configured Like soft handover failure, it is one of the common causes of inter-frequency hard handover failure that some neighboring cells are not configured. For details about the troubleshooting method, see Chapter 7.
2.
The Inter-Frequency Handover Threshold or Compression Mode Threshold Parameters Is Set Improperly, So Handover does Not Occur in a Timely Manner Inter-frequency measurement may use the compression mode (some UEs has double receivers, so the inter-frequency signals can be measured without enabling the compression mode, for example, some Motorola terminals). When the UE enters into the CELL_DCH state or the best cell is updated, you need to configure the measurement of the 2D and 2F events if the inter-frequency handover algorithm is enabled and the best cell has the inter-frequency neighboring cell list. The absolute thresholds of 2D and 2F are the enabling/disabling threshold of inter-frequency measurement. The CPICH Ec/No or RSCP measurement quantity and threshold are used according to the location properties of the best cell in the active set. If the measured quality is below the enabling threshold, the 2D event is reported and periodical inter-frequency measurement is enabled through a decision. If the quality of the active set increases to be higher than the disabling threshold, the 2F event is reported and inter-frequency measurement is disabled. The compression mode usually affects link quality and system capacity. Therefore, it is recommended that inter-frequency measurement should not be enabled unless necessitated. If the enabling threshold of the compression mode is extremely low, it is difficult to enable the compression mode. As a result, call drop occurs in the existing network because it is too late to trigger hard handover.
3.
The Inter-Frequency Measurement Quantity Is Not Selected Correctly, So InterFrequency Measurement Cannot Be Initiated in a Timely Manner Sometimes, a commercial network encounters the following inter-frequency handover failure: When the UE moves toward an inter-frequency cell, the compression mode is always not enabled to initiate inter-frequency measurement until the UE accesses the inter-frequency cell again after call drop occurs. .. Query the cell configuration, and you can find that the cell is configured to the TRX center cell. That is, the 2D event, 2F event and inter-frequency measurement use Ec/N0 as the measurement quantity. The measured value of the pilot Ec/N0 depends on two factors: RSCP strength of the pilot signals and downlink interference. For the WCDMA system, the downlink interference mainly includes the downlink signal interference of the intra-cell cells (the current cell and neighboring cells) and background noise. The strength of downlink interference of the intra-cell cells is affected by the path loss and slow fading. It is similar to the fading that is undergone by the wanted signals (for example, the CPICH RSCP) to be received by the UE. At the coverage edge of a TRX, when the UE moves from the TRX cell in use to another TRX cell, the CPICH RSCP and interference almost fade at the same speed (the background noise is not affected by the path loss, so the CPICH RSCP fades a little faster). Therefore, the CPICH Ec/I0 received by the UE is changed extremely slowly. Both emulation test and actual test show that the CPICH Ec/I0 can still come up to about –12 dB when the CPICH RSCP received by the UE is about –110 dBm.
Figure 1 Relation between RSCP fading and Ec/N0 fading
If Ec/I0 is used as the measurement quantity of the 2D event, the 2D event is probably not triggered when call drop occurs in the UE. As a result, inter-frequency measurement is not started. In this case, you need to configure the cell to a TRX edge cell and use the RSCP as the measurement quantity of the 2D or 2F event to initiate inter-frequency measurement timely. In the RAN10 or later versions, the RNC uses the Both mode of Ec/N0 and RSCP as the inter-frequency measurement quantity by default, thus solving the problem fundamentally. In the versions earlier than the RAN10 (for example, the V29 and V18), the inter-frequency measurement quantity must be set correctly.
4.2.3 Analysis Process Basically, inter-frequency handover failure can be analyzed by using the same way as soft handover failure, especially when the handover success rate of the commercial networks decreases. The method of analysis is usually as follows: 1.
Discussing the Problem, Ascertaining the Problem Background and Product Version, and Ruling Out the Possibility of Known Bugs Like the analysis of soft handover failure, you need to rule out the known bugs, know the recent actions performed on the network (for example, relocation and upgrade), and compare the script before the occurrence of the problem with the script after the occurrence of the problem.
2.
Narrowing Down the Analysis Scope, Analyzing Whether the Low Handover Success Rate Is Caused by Certain Cells, and Analyzing the Performance data About Inter-Frequency Handover Failure When analyzing the performance data, you need to check whether the inter-frequency handover failures of the top N cells account for the majority of the total inter-frequency handover failures in the entire network, and determine which type of count is related to the inter-frequency handover failures in the network.
Table 1 lists the performance counter causes related to inter-frequency hard handover failure. Table 1 Indicators related to inter-frequency hard handover failure Numb er
Indicator
Description
1
VS.HHO.InterFreqOut.CfgUnsupp
Configuration unsupported
2
VS.HHO.InterFreqOut.PyhChFail
Physical channel failure
3
VS.HHO.InterFreqOut.FailUSR
Incompatible simultaneous reconfiguration
4
VS.HHO.InterFreqOut.CellUpdt
Cell update occurred
5
VS.HHO.InterFreqOut.CfgInvalid
Invalid configuration
6
VS.HHO.InterFreqOut.NoReply
No response on the air interface
7
VS.HHO.InterFreqOut.DLCodeRej
Failure of inter-frequency hard handover from the current cell because of the failure of downlink code resource allocation
8
VS.HHO.InterFreqOut.ULAdmsnDeny
Failure of inter-frequency hard handover out of the current cell because of the uplink admission denial
9
VS.HHO.InterFreqOut.DLAdmsnDeny
Failure of inter-frequency hard handover out of the current cell because of the downlink admission denial
10
Other
Failure of inter-frequency hard handover because of other factors
The 1st to 5th indicators all have the following feature: During inter-frequency hard handover, after the RNC receives the PHYSICAL CHANNEL RECONFIGURATION FAILURE message returned by the UE, the cause values carried in the message are measured in the best cells of the UE respectively before inter-frequency hard handover occurs. The problems seldom occur in the commercial networks. Some parameters configured on the UE are not compatible with those configured in the network. The 6th indicator has the following feature: During inter-frequency hard handover, the RNC starts the timer to wait for the response from the UE after the RNC sends the PHYSICAL CHANNEL RECONFIGURATION message to the UE; if the RNC does not receive the response from the UE when the timer times out, the indicator is measured in the best cell of the UE before inter-frequency hard handover occurs. Such problems often occur in the commercial networks. A substantial part of such problems occur under the scenario where the signal quality of the handover area is poor. Therefore, you need to first check the signal quality of the current cell and target cell at the time of handover and optimize the RF. It is also possible that the equipment is abnormal.
The 7th to 9th indicators all have the following feature: When the RNC receives the measurement report sent by the UE, the RNC initiates the inter-frequency hard handover request if the decision conditions of inter-frequency hard handover are met. After entering the inter-frequency hard handover flow, the RNC needs to make the decision on target cell admission. If hard handover fails because of the failure of cell admission, the indicators are measured in the best cells of the UE respectively before inter-frequency hard handover occurs according to the cause of cell admission failure. For such problems, you need to first check whether the resources of the target cell are really congested. If a resource is really congested, you need to expand the capacity as soon as possible. If no obvious resource congestion occurs but the admission of hard handover fails, you can suspect that the network has bugs. Then, you can trace the CDT information to see what causes the failure of admission. 3.
Analyzing the Main Scenarios of Handover Failure Like the analysis of soft handover failure, you need to also analyze the CHR log and IOS or IFTS data after basically locking the top N cells of handover failure and the performance counter of handover failure. −
If the RNC receives the PHYSICAL CHANNEL RECONFIGURATION FAILURE message returned by the UE, you can basically determine in which scenario some parameters configured on the UE are not compatible with those configured on the network according to the failure cause fed back by the UE. For such problems, you can basically suspect the compatibility of the UE. You can use the CHR data to analyze the corresponding IMSIs (for details, see Chapter 7) and inquire the customer about the corresponding IMEIs of the IMSIs, thus judging whether the problems are caused by the terminals of a specific model.
−
If no response is received on the air interface, you need to first judge whether the signal quality of the current cell and target cell is normal at the occurrence time of the problem. If both the Ec/No of the current cell and the Ec/No target cell are lower than -13 dB, it is possible that the UE does not receive the PHYSICAL CHANNEL RECONFIGURATION message delivered by the network. Therefore, you can preliminarily determine that the problem is caused by poor coverage. You need to optimize the network coverage. If the signal quality is good at the time of handover but the air interface does not receive the response, you can query the IFTS user-plane information to further determine which of the following occurs:
−
The UE does not receive the PHYSICAL CHANNEL RECONFIGURATION message delivered by the network.
−
The RNC does not receive the PHYSICAL CHANNEL RECONFIGURATION CMP message returned by the UE and delivered by the network. Like the symptom that no response is received on the air interface during soft handover failure, the problems may be caused by the quality defect of the network version. To analyze the problem deeply, you may need to obtain more UE and NodeB data.
−
4.
For the admission denial of the target cell during hard handover, you need to trace the detailed IFTS/CDT data to find the cause of admission denial.
Conducting Drive Tests on Site, and Analyzing the Causes Deeply Except the scenario under which no response is received on the air interface when the signal quality is good, you can basically determine the cause through the preceding analysis under other scenarios. If no response is received on the air interface when the signal quality is good, it is possible that the network equipment is abnormal or the UE processing is abnormal. Therefore, the field personnel need to conduct drive test and test the top N cells to obtain the RNC CDT, NodeB CDT, and UE QXDM log data for detailed analysis. The purpose is to determine at which step errors occur:
−
RNC
−
Iub transmission
−
NodeB
−
UE The R&D personnel need to analyze such problems emphatically.
4.2.4 Cases of Inter-Frequency Hard Handover Failure 1.
Problem Description In February 2009, the field personnel of the Orange site of Moldova fed back that two KPI-related problems occurred after the NodeB was upgraded to the V110 053 (note: The local time was the evening of February 6th, and the NodeB was upgraded in the morning of February 7th). −
The inter-frequency handover success rate decreasing by 1% Analyze the performance data, and you can find that the main cause count of call drop is VS.HHO.InterFreqOut.NoReply. In addition, no obvious top N cells are available. Therefore, you can determine that the problem is a global problem but not caused by certain cells.
Table 1 Inter-frequency handover failure Cell Group Time(As day)VS.HHO.InterFreq.Fail.Cell.Rate 053 Cluster 2009-2-4 0.64% 053 Cluster 2009-2-5 0.91% 053 Cluster 2009-2-6 0.91% 053 Cluster 2009-2-7 1.96% 053 Cluster 2009-2-8 1.63% 053 Cluster 2009-2-9 1.87%
−
The CS call drop rate increasing by 0.5% from 0.6% to an average of 1.1%
Table 2 CS call drop rate
2.
Problem Analysis −
Analyze the CS call drop rate through the performance data and CHR data. You can basically determine that the problem is caused by sharp increase of the VS.RAB.Loss.PS.RF.UuNoReply value. Ask the field personnel to trace the IOS data. The IOS analysis shows that call drop occurs for 17 times, the ASU timeout
occurs for 3 times, the failure occurs for 8 times after the compression mode is enabled, and this proportion is high. You need to analyze the failure scenario emphatically after the compression mode is enabled. You can find that the failure scenario is basically the same as the scenario in the following figure. After the compression mode is enabled, the signal quality of the current cell is poor (in the following figure, the 1A measurement report is received after the compression mode is enabled, and the Ec/No of both the current cell and neighboring cell is about –15 dB). Subsequently, the signal quality of the inter-frequency neighboring cells is not measured. Because of the poor signal quality, synchronization fails and thus call drop occurs. Now, you can basically associate the CS call drop with the worsening of the inter-frequency handover success rate. That is, the CS call drop count increases mainly because of the failure of inter-frequency hard handover.
−
Now, you need to explain the following questions: Why has the signal quality been so poor after the compression mode is enabled? Why does call drop occur upon the outof-step of the air interface even if the inter-frequency cell signals are not measured? Is the signal quality fluctuated dramatically? Is it too late to enable the compression mode? The real-time measurement data of the downlink RSCP and Ec/No is not traced when the IOS data is traced, so it is difficult to answer the question whether the signal quality is fluctuated dramatically. You can only ask the field personnel to trace such data next time. To determine whether the compression mode is enabled at an appropriate time, query the measurement control message of the compression mode for inter-frequency handover. In the RAN10, the Both measurement quantity is used and there are two measurement messages. For the first measurement message, the measurement quantity is Ec/No, and the configured 2D and 2F thresholds and delay are the default values. For the second measurement control message with the measurement quantity of RSCP, the configured 2D threshold and 2F threshold are –100 dBm and –97 dBm respectively, both of which are 5 dB lower than the default values (2D: –95 dBm and
2F: –92 dBm). This is surely the reason why it is too late to initiate inter-frequency hard handover and the call drop count increases. −
Till this step, you can basically determine the cause of the problem. The field personnel only upgrade the NodeB, and the parameter of the RNC is not changed because of the upgrade. Why does the field personnel feed back that the indicator is deteriorated after the NodeB is upgraded? Query the previous script and operation log. Then, you can find the record of modifying the inter-frequency handover threshold on the current day of upgrade. Finally, the real cause is clear. The field personnel feed back that the modification is performed by the customer and is not known to the customer service personnel. [375083], [ admin], [
1], [ 172.16.106.48], [ 18670], [ Y2009M02D07H11N36S39],
[ Y2009M02D07H11N36S40], [
1], [
0], [
1], [SET INTERFREQHOCOV:
InterFreqCSThd2DRSCP=-100, InterFreqCSThd2FRSCP=–97, InterFreqR99PsThd2DRSCP=–100, InterFreqHThd2DRSCP=–100, InterFreqR99PsThd2FRSCP=–97, InterFreqHThd2FRSCP=–100, TargetFreqCsThdRscp=–97, TargetFreqR99PsThdRscp=–97, TargetFreqHThdRscp=–97;]
Figure 1 Comparison of handover parameters
3.
Conclusion The real cause of the problem is as follows: The customer modified the inter-frequency hard handover threshold without Huawei’s prior consent, so the compression mode is enabled too late and thus it is too late to initiate inter-frequency hard handover. To restore the inter-frequency hard handover success rate and CS call drop rate to the original level before the NodeB is upgraded, it is recommended that the customer modify the parameter to the default value.
4.3 List of Problem Information
5
Checklist for KPI Troubleshooting-4.3 .xls
Problems Related to Call Drop (AMR/PS/VP/HSPA)
The call drop rate is a key indicator for assessing the network performance, and is also among the first concerns of the customer. If the call drop rate increases, the problem is usually urgent. In a broad sense, the call drop rate includes the call drop rate of the CN and the call drop rate of the UTRAN. You need to focus on the call drop rate at the UTRAN side. Chapter 5 focuses on the KPIs related to the call drops at the UTRAN side, but not the call drops caused by handover failure.
5.1 KPI Definition Formulas on the call drop rate (cell-level indicator): VS.PS.Call.Drop.Cell.Rate = ( + ) / ( + + ) VS.CS.AMR.Call.Drop.Cell.Rate = / ( + ) VS.CS.VP.Call.Drop.Cell.Rate = / ( + )
5.2 Influence Factors A great diversity of factors may cause call drop in a radio network. The chapter focuses on the handover-unrelated call drop. 1.
Poor Coverage For the Voice, call drop may be caused by poor coverage when the EcIo of the CPICH is higher than –14 dB and the RSCP is higher than –100 dB. Usually, poor coverage indicates that the RSCP is poor. Table 1 lists the requirements for the planned Outdoor EcIo and Ec (The data sources from the network planning result of an operator, and is only for your reference).
Table 1 Requirements for the EcIo and Ec threshold Servic e
Bit rate of service
DL EbNo
EcIo thresholds
Ec thresholds
CS 12.2
12.2
8.7
–13.3
–103.1
CS 64
64
5.9
–11.9
–97.8
PS 64
64
5.1
–12.7
–98.1
PS 128
128
4.5
–13.3
–95.3
PS 384
384
4.6
–10.4
–90.6
To determine whether uplink coverage or downlink coverage is poor, you need to query the dedicated channel power of the uplink or downlink before call drop. The method is as follows: You can basically determine that call drop is caused by poor uplink coverage under the following circumstances: −
The uplink transmit power increases to the maximum value before call drop.
−
The uplink BLER is poor or the single-subscriber tracing data recorded by the RNC shows that the NodeB reports the failure. You can basically determine that call drop is caused by poor downlink coverage under the following circumstances:
−
The downlink transmit power increases to the maximum value before call drop. The downlink BLER is poor. If the uplink balances the downlink and there is no interference on the uplink or downlink, the uplink transmit power and downlink transmit power are limited at the same time. In this case, you do not need to strictly determine which is limited first. In case of severe imbalance between the uplink and the downlink, you can preliminarily determine that interference exists to the limited direction. To determine whether the problem is caused by poor coverage, the simplest method is to directly observe the traced measurement data. If both the RSCP and EcIo of the best cell are low, you can determine that the problem is caused by poor coverage. Poor coverage is caused for the following reasons:
−
Sites are not enough
−
Sectors are connected incorrectly
−
Sites are switched off because of the faults of power amplifiers. In some indoor space, the overhigh penetration loss can also cause poor coverage. Sometimes, sectors are connected incorrectly or sites are switched off because of faults, which also occurs in the existing network. For example, the coverage of other cells is poor at the point of call drop. You need to discriminate the reasons from each other.
2.
Call Drop Caused by Interference Both uplink interference and downlink interference cause call drop. Normally, you can basically determine that the problem is caused by downlink interference if call drop occurs when the CPICH RSCP of the active set is higher than –85 dBm and the general EcIo of the active set is lower than –13 dB. If handover is not initiated timely, it is also
possible that the RSCP signal quality of the serving cell is good but the EcIo of the serving cell is poor; however, both the RSCP and EcIo of the cells in the monitoring set are good. If the uplink RTWP exceeds the normal value (–107 to –105) by 10 dB and the interference period exceeds 2 to 3 seconds, call drop may occur. This problem must be solved emphatically. Usually, the downlink interference refers to pilot pollution, that is, more than three cells satisfy the handover conditions in the coverage area. The fluctuation of signals usually causes the replacement of the active set or the change of the best cell. When the general quality of the active set is not good (the EcIo of the CPICH usually fluctuates around –10 dB), handover may fail easily, thus causing SRB reset or TRB reset. The uplink interference raises the uplink transmit power of the UE in the connection mode. As a result, the overhigh BLER causes SRB reset or TRB reset or call drop occurs because of out-of-step. Additionally, at the time of handover, the newly established link cannot be synchronized because of the uplink interference. The uplink interference comes from outside the system or inside the system. In most scenarios, the uplink interference comes from outside the system. Usually, the uplink balances the downlink if there is no interference, that is, both the uplink transmit power and downlink transmit power are approximate to their maximum values before call drop. In case of the downlink interference, the uplink transmit power is low or the BLER is converged, but the downlink transmit power reaches its maximum value and the downlink BLER is not converged. In case of the uplink interference, the same symptom appears. You can use the method to analyze the actual problem. 3.
Abnormal Transmission The call drop rate may increase because of the following factors: −
The transmission equipment is abnormal
−
Packets are lost There exists the delay ripple. Judging from the all-IP tendency of the networks, the IP-based networking of some sites cannot provide high QoS. As a result, a burst of packet loss occurs, and thus call drop occurs. Usually, the QoS of the IP-based commercial network needs to meet the requirements in Table 1. Otherwise, the call drop rate may increase or the HSPA service rate is low because of the poor transmission quality. To measure the transmission quality, a simple way is to enable IPPM measurement on the RNC LMT. For details, refer to the RAN10 Transmission Troubleshooting Guide.
Table 1 Requirements of IP-based networking for the transmission quality
4.
Equipment (Including the UE) Anomaly After the preceding causes are excluded, you need to suspect that the equipment is abnormal. For example, the UE is abnormal, or the compatibility is not ensured. Therefore, you need to query the log and alarms of the equipment to further analyze the cause of call drop.
5.3 Analysis Process 1.
Discussing the Problem, Ascertaining the Problem Background and Product Version, and Ruling Out the Possibility of Known Bugs You need to first ask the field personnel to feed back the related information and symptoms of the problem, and then obtain the information about the known bugs of the corresponding version (by inquiring the contact persons of the RNC and NodeB or referring to the Release Notes). In this way, you can determine whether the problem is caused by a known version problem, for example, whether abnormal call drop is caused by memory leakage in a version. Determine the time at which the call drop rate is changed suddenly, analyze whether the problem is caused by network adjustment (for example, add or relocate sites, and upgrade the version), and focus on the impacts of network adjustment. The previous experience shows the following points: −
For a newly built commercial network, call drop is mostly caused because some neighboring cells are not configured or the coverage quality is poor.
−
For a relocated network, call drop is mostly caused because some neighboring cells are not configured or the interoperability of the Iur interface of the relocated network is poor.
−
For an upgraded network, call drop is mostly caused because the new version (including the new hardware and new functions) has some defects.
−
For a stable commercial network, it is improbable that the call drop rate suddenly increases. If the problem really occurs, it is possible that the soft failure occurs in the equipment DSP or the transmission is abnormal on a large scale. Therefore, the latest actions performed on the network are the critical information.
2.
Narrowing the Scope, Analyzing Whether the High Call Drop Rate is Caused by One or Two Cells, and Analyzing the Main Count Distribution of Call Drop If the preceding factors are excluded, you need to first analyze the performance data. Firstly, analyze the change in the call drop rate and call drop count of the cells, and thus judge whether the problem is caused by the performance descent of one or two cells. Secondly, analyze the change tendency of the top N cells in the call drop rate and call drop count, compare them with the change tendency of the entire network in the call drop rate and call drop count, and thus judge whether the top N cells are representative. If the top N cells are representative of the entire network, you can analyze the top N cells emphatically. Then, you can determine the main reasons why the call drop rate increases (or is not up to standard). The following section describes the main reasons through traffic measurement by taking the CS and PS service as an example: Table 1 lists the count reasons of CS call drop:
Table 1 Indicators related to CS call drop Indicator (Level 1)
Sub-indicator (Level 2)
VS.RAB.Loss.CS.RF
VS.RAB.Loss.CS.RF.RLCRst VS.RAB.Loss.CS.RF.ULSync VS.RAB.Loss.CS.RF.UuNoReply VS.RAB.Loss.CS.RF.Oth
VS.RAB.Loss.CS.Abnorm
VS.RAB.RelReqCS.OM VS.RAB.RelReqCS.UTRANgen VS.RAB.RelReqCS.RABPreempt VS.RAB.Loss.CS.Aal2Loss VS.RAB.Loss.CS.Congstion.CELL VS.Call.Drop.CS.Other
For the CS service, the common causes of call drop are as follows: −
VS.RAB.Loss.CS.RF: Abnormal release because of the out-of-step of the link. The coverage quality is poor (for example, the signal quality of the current cell is poor, some neighboring cells are not configured, or the handover area is small), so the UP switches off the transmitter abnormally or the uplink demodulation is out of step. To solve the problem, you need to improve the coverage quality. If the network is newly built or relocated, the cause is frequently encountered.
−
VS.RAB.Loss.CS.RF.RLCRst: Link release because of the downlink SRB reset. The coverage quality is poor (for example, some neighboring cells are not configured, or the handover area is small). To solve the problem, you need to improve the coverage quality. In an initial network, the cause is frequently encountered.
−
VS.RAB.Loss.CS.RF.UuNoReply: The number of RABs released by the RNC because of the Failure in the Radio Interface Procedure. The failure is usually caused by the imbalance between the uplink coverage and downlink coverage and fast signal
change. You need to trace the IOS data or query the CHR log to further analyze the cause of Uu Noreply. −
VS.RAB.Loss.CS.Aal2Loss: The RNC initiates abnormal release after finding that the AAL2 Path on the IU CS interface is abnormal. The scenario is seldom encountered in practice. The corresponding alarm information is generated under the scenario. It is possible that the transmission equipment is faulty or the RNC version has some defects.
−
VS.Call.Drop.CS.Other: Call drop because of other causes. Lots of causes of call drop (for example, the flow interaction times out, or cell update fails) are not separately countered and are counted into OTHER. In practice, the count of call drops caused by flow interaction timeout and cell update failure accounts for a high proportion. Therefore, lots of causes of call drop are OTHER. You need to further analyze the causes through the CHR log.
−
VS.RAB.RelReqCS.OM: The CS link is released caused by the operation and maintenance work (for example, the cell is blocked). The call drop because of the cause is normal.
−
VS.RAB.RelReqCS.UTRANgen: Number of RABs of the CS domain to be released in the cell for the UTRAN Generated Reason. In practice, the scenario is seldom encountered.
−
VS.RAB.RelReqCS.RABPreempt: The CS link is released because of the highpriority preemption. Such call drop occurs when the load and resources are not enough. You need to determine whether to expand the capacity according to the call drop count. Table 2 lists the count reasons of PS call drop:
Table 2 Indicators related to PS call drop Indicator (Level1)
Sub-indicator (Level2)
VS.RAB.Loss.PS.Abnorm
VS.RAB.RelReqPS.OM
Sub-indicator (Level3)
VS.RAB.RelReqPS.RABPreempt VS.RAB.Loss.PS.GTPULoss VS.RAB.Loss.PS.Congstion.CELL VS.Call.Drop.PS.Other VS.RAB.Loss.PS.RF
VS.RAB.Loss.PS.RF.RLCRst
VS.RAB.Loss.PS.SRBReset VS.RAB.Loss.PS.TRBReset
VS.RAB.Loss.PS.RF.ULSync VS.RAB.Loss.PS.RF.UuNoReply VS.RAB.Loss.PS.RF.Oth In terms of the count values, the PS service is similar to the CS service. The difference between them is as follows:
Their CN interfaces are not consistent
The PS service involves the TRB reset. The following section analyzes the causes of call drop: −
VS.RAB.Loss.PS.RF: Abnormal release because of the out-of-step of the link. The coverage quality is poor (for example, some neighboring cells are not configured, or the handover area is small), so the UP switches off the transmitter abnormally or the uplink demodulation is out of step. To solve the problem, you need to improve the coverage quality. In an initial network, the call drop because of this cause occurs frequently.
−
VS.RAB.Loss.PS.SRBReset: Link release because of the downlink SRB reset. The coverage quality is poor (for example, some neighboring cells are not configured, or the handover area is small). To solve the problem, you need to improve the coverage quality. In an initial network, the call drop because of this cause occurs frequently.
−
VS.RAB.Loss.PS.TRBReset: Link release because of the downlink TRB reset. The coverage quality is poor (for example, some neighboring cells are not configured, or the handover area is small). To solve the problem, you need to improve the coverage quality. In an initial network, the cause is frequently encountered.
−
VS.RAB.Loss.PS.GTPULoss: The RNC initiates abnormal release after finding that the GTPU on the IU PS interface is abnormal. In practice, the scenario is seldom encountered. It is usually caused by equipment faults or defects.
−
PS_RAB_DROP_OTHER: Call drop because of other causes. Lots of causes of call drop (for example, the flow interaction times out, or cell update fails) are not separately countered and are counted into OTHER. In practice, the count of call drops caused by flow interaction timeout and cell update failure accounts for a high proportion. Therefore, lots of causes of call drop are OTHER. You need to further analyze the causes through the CHR log. Generally, the main causes of call drop are RLC Reset, UU Noreply, and Other. Such causes usually result from poor coverage or product defects. To further analyze the causes, you need to query the CHR log and trace the IOS or IFTS data.
3.
Locking the Problem Scenarios, and Determining the Main Scenarios Where the Call Drop Rate Goes Up or is not Up to Standard If you cannot determine the causes of call drop only through the performance data, you also need to query the IOS data and CHR log for further analysis. Firstly, determine the main causes of call drop through the performance data. For further analysis, you need to also analyze the IOS data and CHR log as follows: −
Filter out the logs about the main causes of call drop among the PCHR logs, and analyze the signal quality of the call drop active sets of all cells (or top N cells), and thus judge whether call drop is caused by poor quality.
−
Through the PCHR logs, analyze the subscribers who undergo call drop because of the main causes in all cells or the top N cells before and after the call drop rate is changed, and judge whether one or two subscribers or the performance of the UE of a specific brand affects the call drop rate. If yes, you can enable CDT tracing or IOT test to further analyze the compatibility.
−
Enable the IOS tracing of top N cells, obtain the signaling about the main causes, and determine the main scenarios of call drop from the signaling (check whether the call drop is related to a specific flow, for example, softer handover and DRD). In addition, analyze the fundamental cause of call drop by querying the CHR logs generated in the corresponding time segment.
−
If you cannot determine the fundamental cause only through the IOS data, you can consider enabling IFTS tracing (with the L2 user plane; you can deeply analyze the call drop caused by SRB or TRB reset) after finding the main problem scenarios.
If you determine that the main cause lies in the transmission network layer, you need to check the transmission quality emphatically. By checking the alarm information, you can check whether there are any transmission-related alarms. If the problem is caused by the transmission quality, directly transfer the problem to the NodeB team (transmission team). 4.
Conducting Drive Test on Site, and Analyzing the Causes Deeply If you still cannot determine the causes through the CHR log, performance data, and IFTS data, you need to conduct drive test on site, reproduce the call drop signaling under the main scenarios, and obtain the logs of the UE. Especially if no response is received on the air interface, you need to query the logs of the UE. The drive test should be well targeted, that is, determine the main scenarios or top N cells of call drop, thus ensuring the high possibility of reproduction through drive test and obtaining the main failure signaling. You need to enable NodeB CDT tracing and RNC CDT tracing during the drive test, and analyze their signaling information.
5.4 Cases of Call Drop In November 2008, the StarHub site of Singapore was undergoing the Beta phase of the RAN10. The field personnel fed back that the PS call drop rate went up sharply after the EBBC cards of most sites in the existing network were activated on November 18th. As shown below, the RNC402 can serve as a typical case. Before the EBBC cards are activated, the PS call drop rate remains at about 0.3%. After the EBBC cards are activated, the PS call drop rate increases to 1.2% sharply.
By querying the network actions performed before and after the problem occurs, you can preliminarily determine that the problem is related to the activation of the EBBC cards. In addition, the Beta version is new, so no similar known problems can be found. The analysis of the performance data shows that no obvious top N cells are available. Therefore, the problem is an entire-network problem. You can find that call drop is mostly caused by SRB reset and TRB reset, which account for 94%.
The IOS data and IFTS data traced on site show that abnormal call drop mainly occurs under the following two scenarios:
Scenario 1: After the new measurement control message is delivered upon completion of soft handover, the L2 of the RNC does not receive the L2 ACK message sent by the UE, thus causing SRB Reset. The SRB resets under the scenario account for 70% of total SRB resets.
Scenario 1: After the active set update (ASU) message is delivered upon completion of soft handover, the L2 of the RNC does not receive the L2 ACK message sent by the UE, thus causing SRB Reset. The SRB resets under the scenario account for 30%.
Why is the L2 ACK message not received after the RNC delivers the Meas Ctrl or ASU message? For the SRB reset, you cannot determine the fundamental cause only through the data at the RNC side. Even if you can obtain the user-plane data through IFTS tracing, you can find only the following symptom: After the downlink delivers the Meas Ctrl or ASU message, the L2 ACK message is not received; therefore, the last PDU with the Poll is retransmitted repeatedly and subsequently, the RESET PDU is retransmitted repeatedly. However, you cannot determine which of the following occurs:
The downlink data cannot reach the UE.
The UE uplink returns an acknowledgement, but the acknowledgement cannot reach the RNC. Therefore, the field personnel conduct drive test, and capture the CDT data and Probe data (trace the L1 and L2 data). By converting the Probe data into the QCAT data, you can see that the authorized SG of the uplink HSUPA of the UE is extremely small. As a result, the physical layer is not fully authorized to send the L2 ACK message although the RLC layer of the UE returns the L2 ACK message for the data of the RNC. As shown in the following figure, the authorized SG of the HSUPA at the UE side is lowered constantly from 8 to 7 till 4 because the SG DOWN message is received repeatedly.
As you know, the authorized SG should be at least 8 if the HSUPA uplink sends a 336-bit (336 bit + 18-bit TB header = 354 bit) PDU. If the authorized SG is lower than 8, the data cannot be sent through the uplink. Finally, the data is retransmitted through the downlink till the maximum count, that is, the reset is initiated. TB Ind ex
TB Size
MAC-e Data Rate(k bps)
Afte r Codi ng
RLC PDU
RLC Data
0
SF
Num
Rate (kbps)
18
1.8
138
0
0
256
1
186
18.6
642
0
0
2
204
20.4
696
0
3
354
35.4
1146
4
372
37.2
1200
Eqv Ch
BtEd/Bt C
Ref ETPR
SG LUP R
1
0.2199707
0.0484
0
32
1
0.7071068
0.5
5
0
32
1
0.7405316
0.5484
6
1
32
32
1
0.9755065
0.9516
8
1
32
32
1
1
1
8
Num
You also need to explain the following question: To prevent the unlimited descent of the SG from the SRB or TRB reset caused by the failure to sent packets through the uplink, the SG is set to 8 and the authorized SG will not be lowered because of the insufficiency of resources. However, why is the authorized SG is lowered continuously after the authorized SG is set to 8? This problem is a defect of product implementation. Therefore, you need to ask the R&D personnel to participate in analysis. Finally, the NodeB development personnel find the bug of product implementation. When the dynamic CEs are activated and the SRB over HSPA switch is enabled in some weak-signal areas, the following problem occurs: If there exists CE congestion and the uplink EDPDCH of the decoding DSP has the NACK information, the downlink DSP delivers the RG Down command by mistake and the SG of the UE is lowered to 4. The defective NodeB has been incorporated into the RAN10 B053.
5.5 List of Problem Information Checklist for KPI Troubleshooting-5.5.xls
6
Inter-RAT Interoperability
Inter-RAT interoperability involves a great diversity of NEs, and the failure is mainly caused by the incorrectness or inconsistency of parameter settings between NEs. When analyzing such problems, you need to fully communicate with the customer, personnel of the core network, and GSM personnel, thus obtaining the related information and avoiding the vain work.
6.1 Inter-RAT Handover from WCDMA to GSM (CS Domain) 6.1.1 KPI Definition Definition of the RNC-level indicators: VS.SRELOC.SuccPrep.IRHOCS.Rate = VS.SRELOC.SuccPrep.IRHOCS / VS.SRELOC.AttPrep.IRHOCS VS.IRATHO.SuccCSOut.RNC.Rate= VS.IRATHO.SuccCSOut.RNC / VS.IRATHO.AttCSOut.RNC Definition of the cell-level indicators: VS.IRATHO.SuccRelocPrepOutCS.Cell.Rate= / VS.IRATHO.SuccOutCS.Cell.Rate = /
6.1.2 Influence Factors Figure 1 Flow on CS inter-RAT handover out of 3G
The handover process includes the following two processes:
Relocation preparation process The SRNC sends the RELOCATION REQUEST message to the CN. The message contains such information as the relocation type, relocation reason, source PLMN, source LAC, source SAC, destination PLMN, and destination LAC. The CN interacts with the GSM by forwarding the GSM MSC, and prepares the related resources. After the GSM-related resources are prepared, the CN sends the RELOCATION COMMAND message to the SRNC. The message contains the layer 3 information element, and the element carries the information about the related resources allocated by the GSM. If the allocation of all resources or some resources fails, the CN sends the RELOCATION PREPARATION FAILURE message to the SRNC.
Handover implementation process The RNC delivers the HANDOVER FROM UTRAN COMMAND message to the UE. The message carries the RAB ID, activation time, GSM frequency, and the GSM message in the form of a bit string. After the UE accesses the GSM, the CN sends the IU RELEASE COMMAND message continuously, instructing the RNC to release the resources of the UE in the WCDMA system. Relocation preparation failure is mainly caused for the following reasons:
The 2G equipment is abnormal or resources are not enough.
The CN parameters are not configured reasonably.
The configurations of GSM neighboring cells are not consistent with actual parameters. Handover implementation failure is mainly caused for the following reasons:
The parameters of 2G neighboring cells are not configured correctly.
The 2G encryption algorithm is not consistent with the 3G encryption algorithm.
There exists side-channel interference in 2G cells.
The handover threshold is not set reasonably.
6.1.3 Analysis Process 1.
Discussing the Problem and Ascertaining the Problem Background and Product Version When the problem occurs, determine the key time at which the success rate is changed, and know the recent adjustment of the 2G access network, 3G access network, and CN. Analyze the impacts of the key actions performed at the corresponding time upon the KPIs.
2.
Determining the Main Scenarios Firstly, measure the relocation preparation success rate and handover implementation success rate of the RNC level and cell level respectively according to the performance data of the RNC. Determine which flow causes the descent of the inter-RAT handoverout success rate, and check whether the success rate of the entire network or the success rate of some cells decreases. If the problem only occurs in one or two cells, it indicates that the problem is related to the configuration of the GSM neighboring cells. Secondly, analyze which cause leads to the descent of the inter-RAT handover-out success rate. Table 1 lists the failure causes defined by the performance counter.
Table 1 Indicators related to CS inter-RAT handover-out failure Indicator (Level1)
Sub-indicator (Level2)
VS.SRELOC.FailPrep.IRATCSOut
VS.SRELOC.Fail.IRATCSOutNRpl
(Relocation preparation failure)
VS.SRELOC.Fail.IRATCSOutCanc VS.SRELOC.Fail.IRATCSOutTexp VS.SRELOC.Fail.IRATCSOutTfai VS.SRELOC.Fail.IRATCSOutTOve VS.IRATHO.PrepFailCSOut.UkwnRNC VS.IRATHO.PrepFailCSOut.NoRsrc VS.IRATHO.PrepFaiCSInTgtOveL VS.IRATHO.PrepFailCSOutReqinfnotavai
VS.IRATHO.FailCSOut.RNC
VS.IRATHO.FailCSOut.CfgUnRNC
(Handover implementation failure)
VS.IRATHO.FailCSOut.PhyFaRNC
3.
Analyzing the Causes Case by Case
VS.SRELOC.Fail.IRATCSOutNRpl/ VS.SRELOC.Fail.IRATCSOutTexp
After the SRNC sends the RELOCATION REQUIRED, the SRNC starts the timer to wait for the RELOCATION COMMAND message. If the RELOCATION COMMAND message is not received when the timer times out, the SRNC sends the RELOCATION CANCEL message and measures the indicator.
−
Check whether the RNC links and MSC links are normal.
−
Check the CN configuration, especially the transmission parameters of the 2G MSC/VLR, for example, the data of the MTP layer, data of the SCCP layer, and interMSC trunk data.
−
Query the CN configuration, and check whether inter-RAT handover is allowed.
−
Trace and analyze the MSC/BSS signaling. Ask the CN personnel and 2G personnel to attend the analysis.
VS.SRELOC.Fail.IRATCSOutCanc After requesting the handover preparations, the RNC receives the release command sent by the CN. It is usually caused as follows: −
The inter-RAT handover request is initiated during the signaling (for example, location update). Location update is complete before the flow is complete, so the CN initiates the release.
−
The subscriber who sets up the call hangs up during the handover preparation, so the CN initiates the release. Although handover is not complete, the two circumstances are normal flow embedment.
VS.SRELOC.Fail.IRATCSOutTfai The relocation fails in the target CN/RNC or in the system. Usually, the cause is as follows: −
The CN configurations are not correct.
−
The BSS does not support the relocation.
: −
Check the CN negotiation data.
−
Check whether the BSS allows inter-RAT handover-in.
−
Check whether the configurations of GSM neighboring cells are consistent with the actual parameters. The BTS may fail to find the target cell. If the problem occurs only to one or two cells, you can trace the IOS data, determine whether the relocation failure occurs only to one or two target cells, and check the parameters of the GSM neighboring cells of the target cells.
Figure 1 Relocation Required message
The message carries the address information about the BSC of the access network that expects to provide services for the subscriber. Usually, the address is the CGI (global cell id=PLMN + LAC + CELL ID) of the target cell. The message also carries the information about the current cell, that is, PLMN + (LAC and SAC).
VS.IRATHO.PrepFailCSOut.UkwnRNC The target RNC is unknown. The cause is the main cause of relocation failure. Usually, the reason is that the MSC cannot find the route leading to the 2G cells. :
−
Check the CN configuration. It is possible that the LAI of the 2G target cell is not configured on the MSC.
−
Check the consistency of the parameters of GSM neighboring cells configured on the RNC.
VS.IRATHO.PrepFailCSOut.NoRsrc No resources are available. Usually, the BSC has no resources available for the access of the UE or the 2G MSC has no information about the target cells. :
−
Check the resource utilization of the 2G BSS. It is possible that no channel is available because the channel is occupied by another subscriber.
−
Check the status of the target cell. The target cell may be faulty.
−
Check the mapping between the target cell and 2G MSC on the 3G MSC.
VS.IRATHO.FailCSOut.CfgUnRNC The handover is not supported by the configuration. Usually, the UE does not receive the HANDOVER FROM UTRAN COMMAND message delivered by the RNC because of
the incorrect RNC format, incompatibility of the UE, or incorrect configuration of the encryption parameters. :
The encryption parameters are not set correctly −
Trace the IOS data of the top N cells, and query the encryption algorithm.
Figure 2 Relocation Command message
Check whether the parameter of the encryption algorithm on the BSC is consistent with that carried by the relocation command. In the 3G system, the encryption process is required. In the 2G system, the encryption process is optional. Therefore, the 2G system can send an encryption-related parameter optionally when the UE is handed over from the UMTS to the GSM. If the 2G system does not send an encryption-related parameter, the MSOFTX3000 uses the default handover configuration to reestablish a Cipher Mode Setting parameter and sends the parameter
to the RNC through the signalling message of Relocation command. When the 2G system sends an encryption parameter carrying the chosen encryption algorithm, the MSOFTX3000 uses the chosen encryption algorithm to establish the Cipher Mode Setting parameter and sends the parameter to the RNC through the signalling of Relocation command. −
If they are not consistent, further trace the CN signaling and query the encryption parameter received by the MSC.
Figure 3 Handover Request ACK message
: Modify the handover parameter configuration of the 2G LAC on the MSOFTX3000, so that the encryption parameter carried by the CN to the RNC is consistent with the encryption parameter used by the 2G system.
UE compatibility −
Trace the IOS data of the top N cells, obtain the failure flow, and analyze whether there exists a typical scenario, for example, some flow interactions cause the UE to return the message of Unsupported Configuration.
−
Obtain the IMSIs of the terminals through the CHR or PCHR log. If the problem mainly occurs in one or two terminals, it indicates that the problem is caused by the UE. Then, inquire the customer about the corresponding IMEI of the IMSI, and query the type of the failed terminal.
−
If conditions permit, verify the problem in the HQ. Alternatively, ask the field personnel to conduct drive test.
Incorrect RNC signaling format −
Trace the IOS data of the top N cells, and capture the failed cells.
−
Compare the HO_FROM_UTRAN_CMD_GSM generated at the failure time with the signaling generated at the time of normal handover. A usual problem is as follows: The handover command does not carry the encryption indication. If this problem occurs, you need to modify the handover parameter configuration of the 2G LAC on the MSC.
The ETSI GSM PHASE I protocol has a defect: The handover command does not carry the encryption information. The ETSI GSM PHASE II protocol has rectified the defect. However, the GSM devices of lots of vendors have not rectified the defect in accordance with the ETSI GSM PHASE II protocol. If the CN does not reestablish the encryption for the RNC, a format error occurs.
: If the 2G BSC does not send the Chosen Encryption Algorithm parameter, configure the handover parameter of the 2G LAC on the MSOFTX3000.
For other problems, directly collect the related information and feed back the information to the R&D department for analysis.
VS.IRATHO.FailCSOut.PhyFaRNC Inter-RAT handover implementation failure is mainly caused as follows: 1) 2) 3)
After receiving the Handover From Utran command, the UE attempts to access the system on the BTS. The UE repeatedly sends the Handover Access message to the BTS through the FACCH, starts the T3124 timer (the default is 320 ms), and stops sending the message if receiving the PHY INFO message. If the timer times out, the BTS returns the old Utran channel and replies the physical channel failure.
: −
Check the parameter configuration of the GSM neighboring cells. For example, if the BCCHARFCN is not configured correctly, the cell in the measurement report that reaches the handover threshold is not the actual cell accessed by the UE. As a result, the signal quality of the actual handover cell does not satisfy the handover requirements and thus the handover fails.
−
Check whether the unreasonable setting of the handover threshold causes the easy handover but poor signal quality of the 2G cell.
−
Check whether the handover failure is caused because the encryption algorithms are not consistent.
−
If you still cannot solve the problem, ask the 2G personnel to attend the analysis.
6.2 Inter-RAT Handover from GSM to WCDMA (CS Domain) 6.2.1 KPI Definition Definition of the RNC-level indicators: VS.IRATHO.PrepSuccCSIn.RNC.Rate= VS.IRATHO.PrepSuccCSIn.RNC / VS.IRATHO.PrepAttCSIn.RNC VS.IRATHO.SuccExecCSIn.RNC.Rate= VS.IRATHO.SuccExecCSIn.RNC / VS.IRATHO.AttExecCSIn.RNC Definition of the cell-level indicators: VS.IRATHO.SuccRelocPrepInCS.Cell.Rate= < VS.IRATHO.PrepSuccCSIn > / < IRATHO.AttIncCS > VS.IRATHO.SuccInCS.Cell.Rate =< IRATHO.SuccIncCS > / < IRATHO.AttIncCS >
6.2.2 Influence Factors Figure 1 Flow on CS handover-in UE
Node B
RNC Target
CN
MSC
BSSMAP
MAP/E
RANAP
RANAP
3. Relocation Request 4. Relocation Request Ack.
2. Prepare Handover
BSC
1. Handover Required
BTS
BSSMAP
MAP/E
RANAP
RANAP
MAP/E
5. Prepare Handover Response MAP/E
BSSMAP
6. Handover Command
BSSMAP
7. Handover Command RR
RR
RANAP
RRC
9. DCCH : Handover Complete
8. Relocation Detect
RANAP
RRC 10. Relocation Complete RANAP
RANAP
MAP/E
11. Send End Signal Request MAP/E
BSSMAP
BSSMAP
12. Clear Command 13. Clear Complete
BSSMAP
BSSMAP
14. Send End Signal Response MAP/E MAP/E
After receiving the RADIO LINK RESTORE INDICATION message, the RNC sends the RELOCATION DETECT message to the MSC Server, notifying that the UE is handed over from the GSM to the WCDMA. The UE sends the HANDOVER TO UTRAN COMPLETE message to indicate that the handover is complete. If the UE cannot complete the handover, the UE reports the handover failure to the GSM. After receiving the HANDOVER TO UTRAN COMPLETE message, the RNC sends the RELOCATION COMPLETE message to the MSC Server, indicating that the handover is complete. Additionally, the RNC controls the UE for mobility management, query of the UE capability, and safe mode. The relocation preparation failure is mainly caused for the following reasons:
The 3G cell resources are not enough.
Parameters are not configured correctly.
The handover failure is mainly caused for the following reasons:
The radio air interface is abnormal.
Parameters are not configured correctly.
6.2.3 Analysis Process 1.
Discussing the Problem and Ascertaining the Problem Background and Product Version When the problem occurs, determine the key time at which the success rate is changed, and know the recent adjustment of the 2G access network, 3G access network, and CN. Analyze the impacts of the key actions performed at the corresponding time upon the KPIs.
2.
Determining the Main Scenarios Firstly, measure the inter-RAT CS handover-in success rate of the RNC level and cell level respectively according to the performance data of the RNC, and thus determine whether the success rate of the entire network or the success rate of some cells decreases. Secondly, analyze which cause leads to the descent of the inter-RAT handover-out success rate. Table 1 lists the failure causes defined by the performance counter.
Table 1 Indicators related to CS inter-RAT handover-in failure Indicator (Level1)
Sub-indicator (Level2)
VS.SRELOC.FailPrep.IRATCSIn
VS.IRATHO.PrepFaiCSInCongRNC VS.IRATHO.PrepFaiCSInTfailRN VS.IRATHO.PrepFaiCSInTunsRNC
VS.IRATHO.Incoming.Fail.RNC
3.
Analyzing the Causes Case by Case
VS.IRATHO.PrepFaiCSInCongRNC
VS.IRATHO.FailExecCSIn.NRply
IRATHO.FailIncCS.ResUnavail The relocation failure message is received, and the cause value is Resource Unavailable, that is, the admission fails. The common resources include the power, codes, CEs, and IUB transmission. :
−
Analyze the success rate of the cells from the performance data, and obtain the list of top N cells.
−
By the utilization of various resources, analyze the resource limitation of the top N cells. For details about the analysis method, see the section of the analysis of resource congestion upon RRC setup failure.
IRATHO.FailIncCS.TRNCSysFailReloc/ VS.IRATHO.PrepFaiCSInTfailRN The relocation fails in the target system or RNC. :
−
Check the parameter configuration of the CN.
−
Check the configuration of the 3G neighboring cells on the BSC.
VS.IRATHO.FailExecCSIn.NRply / VS.IRATHO.FailExecCSIn.NRply
The handover fails because the UE has no response. : −
Analyze the success rate of the cells from the performance data, and obtain the list of top N cells.
−
Check whether the neighboring cell parameters of the top N cells are configured for the 2G cells, and ensure that the target cells of the handover are correct.
−
If the handover failure rate is high, directly trace the IFTS data of the top N cells. Otherwise, it is recommended that you conduct drive test and trace the CDT and Probe data.
−
Analyze whether the uplink is synchronized through the CDT or IFTS data, that is, whether the RNC receives the RL_RESTORE_IND message. If the uplink synchronization indication is not received, you need to further determine whether the transmit power of the UE increases and reaches the maximum value.
Figure 1 Signaling of CS inter-RAT handover-in
If the transmit power of the UE increases to the maximum value, it indicates that the downlink is synchronized. Therefore, the uplink synchronization fails. Usually, it is possible that the transmit power of the dedicated uplink channel is relatively low.
If the transmit power of the UE does not increase, it indicates that the downlink synchronization fails. It is possible that the minimum power of the downlink DPCCH is configured to an extremely small value. −
If the synchronization indication is received, it indicates that the RNC does not receive the HO_UTRAN_CMP message. It is possible that the encryption algorithms are not consistent or packets are lost during the transmission. If the encryption algorithms are not consistent, you can observe whether the encryption parameter that the CN carries to the RNC is consistent with the encryption algorithm carried in the Handover to UTRAN Command message that the BSC delivers to the UE.
Figure 2 Relocation_Request message
To determine whether packets are lost during the transmission, capture the necessary information about the site and feed back the information to the R&D department for analysis.
6.3 Inter-RAT Handover from WCDMA to GPRS (PS Domain) 6.3.1 KPI Definition Definition of the RNC-level indicators: VS.IRATHO.SuccPSOutUTRAN.RNC.Rate = VS.IRATHO.SuccPSOutUTRAN.RNC / VS.IRATHO.AttPSOutUTRAN.RNC Definition of the cell-level indicators: VS.IRATHO.SuccOutPSUNTRAN.Cell.Rate = /
6.3.2 Influence Factors Figure 1 Flow on PS inter-RAT handover out of
The handover failure is mainly caused for the following reasons:
The neighboring cell parameters are not configured correctly.
The CN configuration is not correct or the CN configuration does not support the handover.
There exists interference in the 2G cell.
6.3.3 Analysis Process 1.
Discussing the Problem and Ascertaining the Problem Background and Product Version When the problem occurs, determine the key time at which the success rate is changed, and know the recent adjustment of the 2G access network, 3G access network, and CN. Analyze the impacts of the key actions performed at the corresponding time upon the KPIs.
2.
Determining the Main Scenarios Firstly, measure the inter-RAT CS handover-in success rate of the RNC level and cell level respectively according to the performance data of the RNC, and thus determine whether the success rate of the entire network or the success rate of some cells decreases. Secondly, analyze which cause leads to the descent of the inter-RAT handover-out success rate. Table 1 lists the failure causes defined by the performance counter.
Table 1 Indicators related to PS inter-RAT handover-out failure
Indicator (Level1)
Sub-indicator (Level2)
VS.IRATHO.PSOut.FailPS
VS.IRATHO.PSOut.CfgUnsup VS.IRATHO.PSOut.PhyCHFail VS.IRATHO.PSOut.Unpec VS.IRATHO.PSOut.NoReply
3.
Analyzing the Causes Case by Case
VS.IRATHO.PSOut.PhyCHFail / IRATHO.FailOutPSUTRAN.PhyChFail After receiving the CELL CHANGE ORDER FROM UTRAN message, the UE starts the T309 timer. The T309 timer is stopped if the UE sets up a connection in a new cell. Once the T309 timer times out, the original 3G cell is returned and the CCO failure message is sent. :
−
Check the configuration of the GSM neighboring cell parameters. If the parameters are not configured correctly, the access is initiated in an incorrect target cell.
−
Check whether the status and KPIs of the target cell are normal.
−
Check the resource utilization of the target cell, and determine whether the access failure is caused by the insufficiency of resources.
−
Check whether there exists strong interference in the GSM radio environment. The downlink interference affects the reading of the downlink SI information. The uplink interference causes the uplink signaling, for example, the Channel Request message cannot be sent successfully.
VS.IRATHO.PSOut.NoReply / VS.IRATHO.CCO.FailOutPSUTRAN.Nrply After sending the CCO message, the RNC starts the Trelocoverall timer. The timer is stopped after the UE returns the CCO failure message or receives the IU RELEASE CMD (the cause value is Normal release) message sent by the SGSN. Once the timer times out, the RNC actively sends the IU RELEASE REQUEST message to the SGSN. If receiving the SRNS Context Req message during the period, the RNC restarts the timer. : −
Check whether the 2G SGSN allows the handover-in.
−
Determine the top N cells according to the performance data, conduct drive test, trace the UE LOG, and check whether the RAU is complete after the UE accesses the 2G cell.
Figure 1 Flow on LAU/RAU after the UE accesses the 2G cell
: If it takes an extremely long period to complete the RAU, the possible cause is the radio environment of the target cell. Therefore, if the UE can complete the RAU in the 2G cell, you can extend the Trelocoverall timer to raise the inter-RAT handover success rate.
6.4 Inter-RAT Handover from GPRS to WCDMA (PS Domain) 6.4.1 KPI Definition VS.IRATHO.SuccPSInUE.RNC.Rate= VS.IRATHO.SuccPSInUE.RNC / VS.IRATHO.AttPSInUE.RNC
6.4.2 Analysis Process The inter-RAT handover-in success rate in the PS domain is consistent with the RRC setup success rate with the cause value of cell reselection over different subsystems. For details about the analysis of the PS inter-RAT handover-in problems, see the section of the RRC setup success rate.
6.5 List of Problem Information Checklist for KPI Troubleshooting-5.10 .xls
7
Information Collection
7.1 Performance data of RNC 7.1.1 Purpose
To determine the KPIs that are deteriorated
To know the magnitude, trend, and scope of KPI changes
To provide the guidance for subsequent IOS tracing, drive test, and feedback of the CHR data (in which subrack)
7.1.2 Information to Be Collected
Performance data generated within one week before the KPIs are changed
All performance data generated after the problem occurs
7.1.3 Method 1.
Collecting the Information Through the RNC by Using FTP The BAM of the RNC automatically collects and saves the performance data. Therefore, the related field personnel can log in to the RNC BAM to obtain the desired performance data by using FTP. On the RNC BAM, the performance data is saved in the following path: V2 platform: \BAM\VersionA (VersionB)\FTP\MeasResult V1 platform: \BAM\\FTP\MeasResult You can query the workarea directory of the BAM by running the LST BAMAREA command.
Figure 1 Querying the workarea of the BAM
2.
Collecting the information from the M2000 The M2000 periodically takes and saves the performance data from the BAM of the RNC. Therefore, the performance data of the existing network can be obtained from the M2000 by using FTP. On the RNC BAM, the performance data is saved in the following path: You can obtain the performance data by using the following method: Log in to the M2000 by using FTP, ftp://(M2000 IP). Enter the FTP username and password of the M2000. The performance data is saved in the following folder: ftp:// (M2000 IP)/ftproot/pm/
7.2 RNC CHR/PCHR 7.2.1 Purpose
To view the classification of main KPIs (for example, handover and call drop rate) through full record
To check whether soft failure occurs in the DSP, whether the problem occurs in the terminals of the same model, and whether there exists identical print information about internal errors
7.2.2 Information to Be Collected
Data generated before and after the problem occurs
Data of the corresponding subrack if the field personnel determine that the problem is a single-subrack problem
7.2.3 Method 1.
Collecting the information from the RNC by using FTP For the RNC of the RAN10 version, the CHR log and PCHR log are merged into one file. Log in to the RNC BAM (remotely, locally or through FTP). Then, you can obtain the CHR data on the RNC in the following path: X:\Bsc6800\BAM\Common\Famlog\fmt
2.
Collecting the information through the COL LOG comand By running the COL LOG command, you can export the CHR log, alarm log, and operation log at a time. If you need to return the information, the method is recommended.
Figure 1 Exporting the CHR log (by running the COL LOG command)
The exported file is named FixInfo_Host.zip. After decompressing the file, you can obtain the operation information. By default, the exported file is saved in the following directory: V2 platform: \BAM\VersionA\FTP or \BAM\VersionB\FTP V1 platform: \BAM\\FTP You can query the workarea directory of the BAM by running the LST BAMAREA command.
7.3 RNC IOS Tracing 7.3.1 Purpose
To determine the processes that cause the problem
7.3.2 Information to Be Collected
Trace the faulty top N cells
7.3.3 Method In the Maintenance window of the operation and maintenance system, click Trace Management and select the types of objects to be traced. Figure 1 Types of objects to be traced
Double-click IOS. Then, the IOS Tracing dialog box is displayed.
Figure 2 IOS Tracing dialog box
Set the related parameters in the dialog box. For the traced events, select Select Default usually. In special cases, you can select other events or select Select All. At a time, a maximum of 50 calls and 32 cells can be traced. You can start a maximum of eight tasks. The trace tasks occupy a large number of resources. It is recommended that you create the trace tasks when the system is idle or only one IOS is traced at a time. If the system is busy, the running trace task may be terminated automatically.
Click More Info. Then, you can set the browsing and saving of messages, as shown in Figure 3. You need to pay attention to two parts in Figure 3. In the area marked with 1, you need to select the desired traffic classes, for example, the BE service or stream service. In the area marked with 2, you need to set the RAB properties to trace the events selectively. Especially, you can obtain the filtered information when analyzing the specific problems.
Figure 3 MoreInfo dialog box
Click OK to start the tracing task.
7.4 RNC IFTS/CDT (User Plane) Tracing 7.4.1 Purpose
To deeply analyze the data (mainly the L2 data) of a typical scenario
To extract the TCP data and voice Wav data
7.4.2 Information to Be Collected
User-plane tracing and L2 measurement
Visibility of the performance monitoring item
Top N faulty cells for IFTS tracing
7.4.3 Method Trace the internal printed messages of the RNC by modifying the script file: Use the Ultra Edit32 or notepad to open the text file RncTestConfig.xml under the specified directory in the traced LMT version, for example, D:\HWLMT\adaptor\clientadaptor\RNC\BSC6800V100R008C01B082\style\defaultstyle\loca le\en_US\rnctest\RncTestConfig.xml Set the value of each parameter to 1.
After modifying the script, start the LMT of the corresponding RNC version and log in to the BAM. In the Maintenance window of the operation and maintenance system, click Trace Management and select the types of objects to be traced. For both IFTS tracing and CDT tracing, you need to select CDT. Figure 1 Type of trace object
Double-click CDT. Then, the dialog box of task parameter setting is displayed. If you select UEID in the CDT Match Type box, the tracing task is CDT tracing. On the UE ID tab, enter the traced UE IMSI and select the saving path. The default saving path is X:\HW LMT\client\output\RNC\BSC6810V200R010C01B061\trace. You can also set a custom saving path and file name. If the tracing period is extremely long, multiple files are generated. Generally, the end of the filename is “–1” or ” -2”. Click OK to start the tracing task. The CDT data of up to two UEs can be traced simultaneously. However, the sum of the number of started CDTs and number of standard-interface tasks of the UE cannot exceed six. Figure 2 Configuration page of CDT parameters
If you select IFTS in the CDT Match Type box, the tracing task is IFTS tracing. On the UE ID tab, you can set the tracing period. The value 0 indicates that the tracing period is not restricted. In addition, you need to set the ID of the cell to be traced, select an SPU subsystem, select traffic classes, or RRC setup reasons. Finally, click OK to start the tracing task.
Figure 3 Configuration page of IFTS parameters
For CDT tracing or IFTS tracing, it is recommended that the user-plane tracing be attached. If necessary, you need to attach the performance monitor tracing.
Figure 4 Configuration page of user-plane tracing
On the Other tab, set the information about the user-plane tracing. Generally, you need to select Periodically Data Report (it is set to 2s). L2 Data Report Time(s) is set to 100s. AUTO_PACKET_GENERATE cannot be selected. On the Monitor tab, you can select the performance monitor items to be traced.
Figure 5 Configuration page of performance monitoring
7.5 Standard Signaling Tracing on the RNC 7.5.1 Purpose
Mainly analyze the problems of RRC access on the Uu interface.
Analyze the Iub/Iur-specific problems.
7.5.2 Information to Be Collected
Trace the related interface signaling on a case-by-case basis.
7.5.3 Method On the Maintenance page of the RNC LMT, choose Trace Management Interface Trace Task, select and double-click the corresponding interface, and configure the tracing task. Then, you can start the interface tracing task. Standard signaling tracing includes the message tracing of the UU interface, IUB interface, IUR interface, and IU interface. 1.
Uu Interface Tracing
On the Maintenance page of the RNC LMT, choose Trace Management Interface Trace Task, and double-click UU Interface. Then, the following interface is displayed. Click OK on the interface. Figure 1 Uu interface tracing
When tracing the cell configuration, enter the cells to be traced in the format of R1:C1/C2;R2:C1/C2, for example, 174:101/102;175:201. 174 and 175 indicate RNC IDs. 101 102 201 indicates the cell ID. For Tracing message type, it is recommended that you select Select All if you are not sure of the problem. You can also select the appropriate tracing message types as needed. 2.
Iub Interface Tracing On the Maintenance page of the RNC LMT, choose Trace Management Interface Trace Task, and select and double-click IUB Interface. Then, the following interface is displayed. You can choose to trace the specified NodeB or all NodeBs.
Figure 1 Iub interface tracing
3.
Iur Interface Tracing IUR interface tracing enables you to trace the messages between the current RNC and all neighboring RNCs or the messages between the current RNC and the specific neighbor RNC. To trace the messages between the current RNC and the specific RNC, you need to run the LST N7DPC command to query the DSP code of the neighbor RNC. On the Maintenance page of the RNC LMT, choose Trace Management Interface Trace Task, and select and double-click IUR Interface. Then, the following interface is displayed. To trace the messages of the specified DSP, you need to enter the DSP code. Note that the DSP code should be in the hexadecimal format. Finally, click OK.
Figure 1 Iur interface tracing
4.
Iu Interface Tracing IU interface tracing enables you to trace the messages between the current RNC and all CNs or the messages between the current RNC and the specific CN. To trace the messages between the current RNC and the specific CN, you need to run the LST N7DPC command to query the DSP code of the specified CN, as shown in the following figure.
Figure 1 Querying the DSP code of the CN
On the Maintenance page of the RNC LMT, choose Trace Management Interface Trace Task, and select and double-click IU Interface. Then, the following interface is displayed. To trace the messages of the specified DSP, enter the DSP code. Note that the DSP code must be in the hexadecimal format. Finally, click OK.
7.6 UE QXDM LOG 7.6.1 Purpose
To analyze the problems related to signaling flow by querying the RNC data, and analyze the KPI problems caused by the specific terminals and under specific scenarios
To analyze the problems related to power control
To analyze the user-plane problems
7.6.2 Information to Be Collected
During the drive test, obtain the log at the UE side according to the log at the network side.
7.6.3 Method Install the QPST and QXDM software (Note: To use the QXDM, you need to activate the QXDM online or apply for a license and activate the QXDM manually). To install the data card diver, insert the data card into the port and query the port position from the equipment administrator of Windows. Configure the QPST. Choose QPST Configuration Port and click the Add New Port button at the lower right corner. Then, the Add New Port dialog box is displayed. Select the data card port found at the last step. If you add the port successfully, you can view the information about the port in the QPST Configuration window.
Figure 1 Configuring the QPST port
Choose Qxdm Option > Communication, select the port to be observed on the equipment administrator, and enable log tracing. Figure 2 Connecting the equipment ports
Choose Qxdm Options > Logging View Configuration, select the message items to be traced on the Message Packets and Log packets tabs, set the saving path of the tracing files on the Misc tab, and click OK.
Figure 3 Enabling log tracing
Choose Options > Logging or press the ALT+L shortcut key to start the tracing. Press the ALT+L shortcut key again to stop the tracing.
7.7 Real-Time Performance Monitoring of RNC 7.7.1 Purpose
To know the signal quality of the air interface on the uplink and downlink before and after call drop and handover
7.7.2 Information to Be Collected 7.7.3 Method Real-time performance monitoring includes connection performance monitoring, cell performance monitoring, link performance monitoring, and board resource monitoring. During the troubleshooting, connection performance monitoring and cell performance monitoring are often used. Log in to the RNC LMT, and choose Realtime Performance Monitoring Connection performance monitoring on the Maintenance tab. On the popup interface, select the item to be monitored and file saving directory and click OK. The parameter settings for cell performance monitoring, link performance monitoring, and board resource monitoring are the same as the preceding operation.
Figure 1 Real-time performance monitoring
7.8 RNC Script Configuration 7.8.1 Purpose
To know the network configuration, neighbor relation between cells, switch setting, and parameter setting from the configuration script and file
7.8.2 Information to Be Collected
The configuration script generated before and after the problem occurs.
7.8.3 Method Run the EXP INNERCFGMML command on the RNC LMT, export the configuration data on the BAM as an MML script file, and save the script file in the default or specified path. Figure 1 NC script configuration
Extract the configuration script under the corresponding directory. By default, the script configuration is saved in the following path: V2 platform: \BAM\VersionA\FTP or \BAM\VersionB\FTP V1 platform: \BAM\\FTP You can query the workarea directory of the BAM by running the LST BAMAREA command.
7.9 Operation Log of RNC 7.9.1 Purpose
To know the main suspicious operations performed before and after the problem occurs
7.9.2 Information to Be Collected
Logs generated before and after the problem occurs
7.9.3 Method 1.
Collecting the information through the EXP LOG command Run the EXP LOG command on the RNC LMT to export the operation log generated in a certain time segment on the RNC.
Figure 1 Exporting the operation log by running the EXP LOG command
By default, the operation log data is saved in the following path: V2 platform: \BAM\VersionA\FTP or \BAM\VersionB\FTP V1 platform: \BAM\\FTP You can query the workarea directory of the BAM by running the LST BAMAREA command.
2.
Collecting the information through the COL LOG command You can also export the CHR log, alarm information, and operation log information by running the COL LOG MML command on the LMT. If you also need to collect such data at the same time, the method is recommended. For details, see 7.2.3.
7.10 Alarm Information on RNC 7.10.1 Purpose
To check whether there exists the alarm information about the corresponding problem, for example, intermittent interrupt of transmission and high DSP utilization rate
7.10.2 Information to Be Collected
Alarm information generated before and after the problem occurs
7.10.3 Method 1.
Saving the alarms from the alarm box of the LMT Open the Alarm Browsing window on the LMT, select the alarm to be saved, and save the alarm as a csv file, html file, or txt file. You can define the saving directory and filename yourself.
Figure 1 Alarm box of the LMT
2.
Collecting the information through the EXP ALMLOG command Run the EXP ALMLOG MML command on the LMT and set the related parameters. Then, you can obtain the corresponding alarm log. You can save the alarm log as a csv file or txt file.
Figure 1 Exporting the alarms
Alarm Severity: It is recommended that you set the parameter to the default value. As a result, the LMT returns all types of alarm log information. Returned Records: Preferably, the parameter value is larger than 500. Because the alarm generated at the occurrence time of the problem, the RMT feeds back as many as records. Filename: The filename is the system time at which the alarm is exported. By default, the exported file is saved in the following directory: V2 platform: \BAM\VersionA\FTP or \BAM\VersionB\FTP\ExportAlmLog V1 platform: \BAM\\FTP\ExportAlmLog You can query the workarea directory of the BAM by running the LST BAMAREA command.
3.
Collecting the information through the COL LOG command You can run the COL LOG command to export the CHR log, operation log, and alarm information at the same time. For details about data collection, see 7.2.3.
7.11 Node B Configuration Script 7.11.1 Purpose
To have a general knowledge of the basic data and algorithm of the NodeB
To check the configuration data of the RNC for consistency
7.11.2 Information to Be Collected
Configuration script generated before and after the problem occurs
Typical site or faulty site
7.11.3 Method 1.
Collecting the information from the M2000
Run the ULD CFGFILE MML command to the NodeB through the M2000, and thus import the configuration file to the FTP server. Figure 1 Exporting the NodeB configuration file through the M2000
Log in to the FTP server through a FTP client, and obtain the configuration data under the specified path. 2.
Collecting the information from the NodeB LMT On the Maintenance page of the NodeB LMT, choose Service Navigation Software Management, and select Data Config File Transfer.
Figure 1 Data Config File Transfer
Double-click Data Config File Transfer, and the following dialog box is displayed. Select Upload (NodeB to FTP Server), set Compress Flag to Compress, and select a saving path for the exported file.
Figure 2 FTP upload
Set the related parameters of the FTP server. You can use the current built-in FTP server or the specified FTP server. After setting the FTP server, click OK. Then, the LMT collects the configuration script of the NodeB.
7.12 Node B CHR 7.12.1 Purpose
To check whether data arrives at the NodeB
To measure the number of packets that are transmitted successfully or discarded through a dedicated channel or a public channel
To measure the number of Preambles received by the PRACH
7.12.2 Information to Be Collected
Data generated in the time segment during which the problem occurs
7.12.3 Method Log in to the NodeB LMT, and run an MML command to enable the CHR function of the NodeB. If the CHR function is enabled, you can skip the step. By default, the CHR function of the NodeB is disabled.
Figure 1 Setting the CHR level of the NodeB
Collect the NodeB CHR logs. In a period after the problem occurs, download the CHR logs of the NodeB through the M2000, RNC FtpServer, or local FtpServer of the NodeB. Irrespective of the mode you use, you must ensure that the CHR function is enabled. Figure 2 NodeB CHR reporting switch
7.13 Node B Alarm 7.13.1 Purpose
Check whether there exists the alarm information about the corresponding problem, for example, intermittent interrupt of transmission.
7.13.2 Information to Be Collected
The alarm information generated before and after the problem occurs.
7.13.3 Method 1.
Collecting the information from the M2000 On the default interface of the M2000, you can query four types of alarms respectively by choosing Fault Query.
Figure 1 Querying the alarm information
Select all NodeBs, BBUs, or RRUs, and select all levels and types. Specify the corresponding time range. Click Query. Click Save to save the alarm information as a TXT file. Figure 2 Saving the alarm information
Query the three other types of alarms by using the method, take the alarm files in the selected folders, and save the alarm information respectively. 2.
Saving the alarm information from the alarm box of the LMT
Open the Alarm Browsing window on the LMT, select the alarm to be saved, and save the alarm as a csv file, html file, or txt file. You can define the saving directory and filename yourself. Figure 1 Alarm box of the NodeB LMT
7.14 Node B CDT 7.14.1 Purpose
To analyze the service-related problems
7.14.2 Information to Be Collected
During the drive test, collect the logs at the network side and at the RNC side.
7.14.3 Method Open the TraceTask.ini file in the following path: Disk letter: \HW LMT\adaptor\clientadaptor\NodeB\Version number\style\defaultstyle\conf\trace. Find the property page label to be modified (including the Iub and Uu interfaces of the user, and Iub and Uu interfaces of the cells), and set the check marks of the monitor items to be traced to 0 or 1. The modification rules are as follows: Name of monitor item = Check mark, ID of monitor item, Parameter 1 is required or not, maximum of Parameter 1, minimum of Parameter 1, default of Parameter 1 Parameter 2 is required or not, maximum of Parameter 2, minimum of Parameter 2, default of Parameter 2
Parameter 3 is required or not, maximum of Parameter 3, minimum of Parameter 3, default of Parameter 3 Value of check mark: 1, tick on the interface; 0, not tick on the interface by default; 2, not displayed on the interface Blue fonts: Except the contents whether the parameter is required, the contents in blue font can remain blank, but their positions must be reserved. Whether the parameter is required: 0, not required; 1, required; 2, non-editable on the interface For details, see Figure 1. Figure 1 Modifying the properties of the monitor items of the NodeB CDT
On the LMT, double-click CDT Cell Tracing, set the IDs of logical cells to be traced, select the monitor items to be traced on the Iub/Uu page, and click OK. Figure 2 Enabling CDT tracing of the NodeB cells
Double-click User Tracing. The trace method can be the initial link establishment time, CRNCID, and IMSI. If Trace Method is set to Chain Time, note that the entered time must be consistent with the time of the BTS.
Figure 3 Basic setting
If you select a specified IMSI ID to trace, run the MOD NODEB: NodeBId = xxx, NodebTraceSwitch=ON command on the RNC (xxx indicates the NodeB ID). If you enable user tracing on the NodeB LMT, set Trace Method to IMSI, and enter the corresponding IMSI ID. Figure 4 Setting other monitor items
Select the corresponding CDT monitor items on the IUB interface and UU interface respectively.
7.15 Checking Whether Any Neighboring Cells are not Configured Missing neighboring cell detection: Check whether any neighboring cells are not configured, that is, whether the neighbor relation is configured for the neighboring cells. The detection enables you to find the missing neighboring cells. Missing neighboring cell detection has three types: Intra-frequency, inter-frequency, and interRAT. The three types of detection are independent of each other. After the LMT delivers the MNCDT message, all cells in the RNC undergo the missing neighboring cell detection independently. Intra-frequency detection: Set the Trigger Condition of the 1A event in the intra-frequency measurement control to Monitored Set plus Detected Set. Then, the UE reports the measured detected set. Inter-frequency detection: You need to set the detection frequency and range of scrambling codes. When the RNC enables the inter-frequency detection, a maximum of 32 neighboring cells can be measured in the measurement control while one cell is not configured with so many neighboring cells. Therefore, the configured missing neighboring cells are used to fill the measurement object list till 32 neighboring cells. Inter-RAT detection: You need to configure the network color code, BTS color code, band indication, and frequency range to be detected. Like the inter-frequency detection, the configured missing neighboring cells are used to fill the measurement object list till 32 neighboring cells.
7.15.1 Enabling Call Trace for Missing Neighboring Cell Detection Tracing Like the interface tracing, the LMT enables the call trace for missing neighboring cell detection tracing (MNCDT), as shown in Figure 1.
Figure 1 Enabling call trace to check whether any neighboring cells are not configured
Double-click MNCDT, and the following interface is displayed. Figure 2 Configuration interface of intra-frequency MNCDT
On the configuration interface, you can select three types of MNCDT: Intra-frequency, interfrequency, and inter-RAT. Figure 2 shows the configuration interface of intra-frequency MNCDT. 2.
Enable intra-frequency MNCDT For intra-frequency MNCDT, you do not need to set any parameters. Set Detection Type to Intra Freq, and click OK. Then, the following window is displayed.
Figure 1 MNCDT window
After you enable the intra-frequency MNCDT, the intra-frequency measurement control message contains the following information: The TriggerCondition of the 1A event is detectedSetAndMonitoredSetCells. Figure 2 Intra-frequency measurement control after the intra-frequency MNCDT is enabled
3.
Enable inter-frequency MNCDT Set Detection Type to Inter Freq. The following configuration interface is displayed.
Figure 1 Configuration interface of inter-frequency MNCDT
Uplink UARFCN: Uplink UARFCN of the cell that undergoes the MNCDT Downlink UARFCN: Downlink UARFCN of the cell that undergoes the MNCDT Start of Primary Scrambling Code: Minimum scrambling code that undergoes the MNCDT End of Primary Scrambling Code: Maximum scrambling code that undergoes the MNCDT Constraints: −
The relationship between the uplink UARFCN and downlink UARFCN needs to be constrained by the user. Considering the scalability and that the protocol does not stipulate the relationship between them, the system does not constrain the relationship between them.
−
The End of Primary Scrambling Code is greater than or equal to the Start of Primary Scrambling Code. Click OK. Then, the MNCDT window is displayed, as shown in Figure 1. Observe the measurement control: After you enable the inter-frequency MNCDT, the cell list of the inter-frequency measurement control includes some cells in the MNCDT range.
4.
Enable inter-RAT MNCDT Set Detection Type to Inter RAT. The following configuration interface is displayed.
Figure 1 Configuration interface of inter-RAT MNCDT
Network color code (NCC): The NCC of the cell that undergoes the MNCDT. BTS color code (BCC): The BCC of the cell that undergoes the MNCDT. Frequency Indicator: The DCS 1800 and PCS 1900 have some overlapped frequency numbers. Therefore, the frequency indicator is mainly used for the band indication of the overlapped frequency numbers. Start of BCCH ARFCN: The minimum frequency number that undergoes the MNCDT. End of BCCH ARFCN: The maximum frequency number that undergoes the MNCDT. Click OK. Then, the MNCDT window is displayed Observe the measurement control: After you enable the inter-RAT MNCDT, the cell list of the inter-RAT measurement control includes some cells in the MNCDT range.
7.15.2 Stopping the MNCDT Like interface tracing, you can stop the corresponding MNCDT if closing the message tracing window.
7.15.3 Reporting the Missing Neighboring Cell Message After the UE reports the measurement report on the missing neighboring cells and the measurement report meets the handover requirements, the RNC displays the information about the missing neighboring cells in the missing neighboring cell message tracing window. For example, Figure 1 shows the message tracing window for the missing intra-frequency neighboring cells. The window displays the following information: Serial number, generation time, cell ID (ID of best cell), standard message type, URNTI, and message contents.
Figure 1 Message tracing for the missing intra-frequency neighboring cells
As shown in Figure 1, 16 messages are reported. Double-click a message and a window is displayed to display the contents of the message. Figure 2 Reported message about the missing intra-frequency neighboring cells
The cells (only the MNCDT-related cells) are described as follows: The reported message about the missing intra-frequency neighboring cells ulRnti: ulRnti of the UE. ucActSetNum: Number of active sets. ausActCellId: The array of the cell IDs of the active set; the first cell is the best cell. ucDetectCellNum: Number of detected missing neighboring cells.
ausSrimbleCode: Array of the scrambling code of the detected missing neighboring cell. Figure 3 Message tracing for the missing inter-frequency neighboring cells
Double-click the message. The following window is displayed. Figure 4 Reported message about the missing inter-frequency neighboring cells
The cells are described as follows: ulRnti: ulRnti of the UE ucActSetNum: Number of active sets. AusActCellId: The array of the cell IDs of the active set; the first cell is the best cell. UcDetectCellNum: Number of detected missing neighboring cells. usUlUarFcn: Uplink UARFCN of the detected missing inter-frequency neighboring cell. usDlUarFcn: Downlink UARFCN of the detected missing inter-frequency neighboring cell. usPsc: Scrambling code of the detected missing neighboring cell. Figure 5 shows the messages about the missing inter-RAT neighboring cells.
Figure 5 Message about the missing inter-RAT neighboring cell
ucNCC refers to the network color code of the reported missing neighboring cell, and ucBcc refers to the BTS color code of the reported missing neighboring cell. ucInterRatBandInd refers to the band indicator, and usInterRatArfcn refers to the frequency number.
7.16 Soft Failure of DSP If the access failure or call drop occurs in a certain DSP within a short period, the problem may be because of the soft failure of the DSP. To check the soft failure of the DSP, you can query the CHR log. Import the CHR log to the tool, and select a DSP log. If you find that the problem mainly occurs in the same CPU ID within a period, it indicates that the problem may be caused by the soft failure of the DSP.
Figure 1 Analyzing the soft failure of the DSP through the CHR log
Query the corresponding DSPID of the CPUID through the CPUID tool, and thus solve the problem by resetting the DSP. In terms of the RNC platform, the tool has the V1 version and V2 version. You need to use the CPUID tool with the correct version.
CPU ID.rar
Enter the hexadecimal CPUID to the specified position. Press From CPUID to obtain the corresponding DSP ID. Run the RST DSP command on the LMT to reset the DSP. Figure 2 Resetting the DSP
7.17 Terminal Troubleshooting A KPI-related problem may be caused by the compatibility of the UE. Therefore, you need to judge the compatibility of the UE. Normally, you can judge whether the problem is associated with a single IMSI. If the problem occurs only in a single IMSI, you can suspect that the problem is caused by a specific model of terminal. By associating the IMSI with the IMEI, you can find the terminal type of the faulty UE. Figure 1 Analyzing the special UEID through the CHR log