LTE architecture, archi tecture, KPIs and Troubleshoo Troubleshooting ting
Summary The document is intended to impart information on EPS architecture / elements and basic KPIs i.e. Accessibility, Reatiainability, Integrity & Mobility and troubleshooting the poor performing cells of those KPIs KPIs in order to improve the overall network network performance. Reasons and remedies remedies for poor KPI are based on on experience and and are not be be limited to the ones suggested in the subsequent slides. Similarly the target / thresholds may vary from operator to operator. The specific KPI equations used in the slides are for Ericsson system and are built using Ericsson L11 PM counters.
Index 1. EP EPS S Ne Netw twor ork k ar arch chit itect ectur uree 1.1 Network elements 1.2 Network interfaces 2. 2.1 2.2 2.3 2.4 2.5 3. 3.1 3.2 3.3 3.4 3.5
Accessibility Random Access RRC S1 Signaling Connection E-RAB KPI formula and equation Retainability UE Session Time MME Initiated E-RAB & UE Context Release with counters Description RBS Initiated E-RAB & UE Context Context Release with counters counters Description MME & RBS Initiated E-RAB Release Flow Chart MME & RBS Initiated UE Context Release Flow Flow Chart
Index 4. 4.1 4.2 4.3
Integrity Latency Throughput Packet Loss
5. Mobility 5.1 Intra LTE Intra MME Intra eNodeB 5.2 Intra LTE Intra MME Inter eNodeB (X2 based handover) 5.3 Inter LTE Inter MME (S1 Based Handovers Handovers)) 5.4 IRA IRAT T (Inter Radio Access Technolog echnology) y) 5.5 Inter Frequency Mobility 5.6 Counters 5.7 KPI formula and equations 6 Availability 6.1 Counters 6.2 KPI formula and equation
EPS Network Architecture
EPS = EUTRAN + EPC EPS : Evolved Packet System E-UTRAN: Evolved UTRAN EPC: Evolved Packet Core
Network Elements
Evolved UTRAN (eNodeB) : The eNodeB supports the LTE air interface
Mobility Management Entity (MME): The MME manages mobility, UE identities and security Parameters
Serving Gateway (SGW): The Serving Gateway is the node that terminates the interface towards EUTRAN. For each UE associated with the EPS, at a given point of time, there is one single Serving Gateway.
PDN Gateway (PGW): The PGW is the node that terminates the SGi interface towards the PDN. If a UE is accessing multiple PDNs, there may be more than one PGW for that UE. The PGW provides connectivity to the UE to external packet data networks by being the point of exit and entry of traffic for the UE. The PGW performs policy enforcement, packet filtering for each user, charging support, lawful Interception and packet screening.
PCRF: PCRF is the policy and charging control element
Network Interfaces
S1-C: Reference point for the control plane
protocol between E-UTRAN and MME.
S1-U:
Reference point between E-UTRAN and Serving GW for the per bearer user plane tunneling and inter eNodeB path switching during handover S5: It provides user plane
tunneling and tunnel management between Serving GW and PDN GW. It is used for Serving GW relocation due to UE mobility and if the Serving GW needs to connect to a non-collocated PDNGW for the required PDN connectivity S6a: It enables
transfer of subscription and authentication data for authenticating/authorizing user access to the evolved system (AAA interface) between MME and HSS. Gx: It provides transfer of (QoS) policy and
charging rules from PCRF to Policy and Charging Enforcement Function (PCEF) in the PDN GW. The
interface is based on the Gx interface. Gxa:
It provides transfer of (QoS) policy information from PCRF to the Trusted Non-3GPP accesses.
Gxc:
It provides transfer of (QoS) policy information from PCRF to the Serving Gateway
It provides transfer of (QoS) policy and charging control information between the Home PCRF and the Visited PCRF in order to support local breakout function. S9:
S10: Reference point between MMEs for MME relocation and MME to S11: Reference point between MME and
MME information transfer.
Serving GW
SGi: It is the
reference point between the PDN GW and the packet data network. Packet data network may be an operator external public or private packet data network or an intra operator packet data network, e.g. for provision of IMS services. This reference point corresponds to Gi for 3GPP accesses. X2:
The X2 reference point resides between the source and target eNodeB
General Call Flow
1
Accessibility (CSSR or Call Setup Success Rate) Accessibility includes RRC, S1 and E-RAB establishment success rate. For cells reporting poor CSSR stats for a continuous span of time, success rate of each sub metric must be analyzed individually. Target value of CSSR: 98% Reasons and Remedy for Low CSSR : •
Poor coverage. Parameter qRxLevMin can be decreased to resolve.
•
UE camping in the wrong cell. Cell reselection parameters can be tuned to resolve.
•
High UL interference
•
Admission reject, due to lack of licenses License capacity upgrade or offloading the traffic can resolve.
Call Setup
Steps Involved in UE initiated call setup (complementary to call setup diagram)
UE reads the system information broadcast in the cell and performs DL/UL synchronization. UE then requests RRC Connection setup. Once completed, eNB then forwards NAS Service Request in Initial UE Message to MME. MME then carries out Authentication process (optional) and requests eNB to establish the S1 UE context. eNB then activates security functions. Later Radio Bearers are setup to support EPS bearers in RRC Connection Reconfiguration messages. After successfully establishing the bearers, eNB responds to the MME with Initial Context Setup Response MME then sends Modify Bearer Request to update SGW with IP address etc. for the DL of the user plane.
Detailed
The accessibility process can be broadly divided in 4 steps: 1) Random Access 2) RRC Connection setup
3) S1 Signalling setup 4) ERAB Establishment
1.1 Random Access •
In the LTE network, the UE uses the random access process to gain access to cells for the following reasons: • Initial access to the network from the idle state • Regaining access to the network after a radio link failure • As part of the handover process to gain timing synchronization with a new cell • Before uplink data transfers when the UE is not time synchronized with the network
•
Two types of RA procedures are defined in the standard for FDD CBRA (Contention Based Random Access) CFRA (Contention Free Random Access)
The main counters for this scenario are the following: •pmRaAttCbra •pmRaSuccCbra
1.2
RRC
•
RRC connection establishment is used to make the transition from RRC Idle mode to RRC Connected mode.
•
The RRC connection establishment procedure can be of 2 types 1) UE Initiated : example, the UE triggers RRC connection establishment if the end user starts an application to browse the internet, or to send an email. 2) Network Initiated: example, Network triggers the RRC connection establishment procedure by sending a Paging message. This could be used to allow the delivery of an incoming SMS or notification of an incoming voice call.
RRC Connection Establishment Counters
Random Access and RRC setup
1.3 S1 Signaling Connection Establishment
•
The UE sends the RRC CONNECTION SETUP COMPLETE message (Message contents: PLMN, NAS message i.e. Attach Request)
to the eNodeB.
RRC connection completed •
eNodeB sends the INITIAL UE MESSAGE to MME. (Message contents: Attach request message; UE is identified by IMSI or GUTI in this message)
•
MME sends AUTHENTICATION INFORMATION REQUEST to HSS (Message contents: Auth vectors Kasme, RAND, AUTN, XRES, used for security between UE and network and also the IMSI to identify the UE)
•
HSS responds to MME with AUTHENTICATION INFORMATION ANSWER.\ (Message contents: Answers to requested auth vectors for the corresponding IMSI)
•
The MME sends AUTHENTICATION REQUEST” to the UE
S1 Signaling Establishment Counters
S1 Signalling Establishment message flow
1.4 E-RAB establishment •
MME sends INITIAL CONTEXT SETUP REQUEST message to eNodeB . ( Message contents: info on first ERAB, security algo. And security keys)
•
The eNodeB sends SECURITY MODE COMMAND message to UE. (eNodeB applies security information to the message)
•
The UE responds with SECURITY MODE COMPLETE message. (Data between eNodeB and UE is now encrypted)
•
The eNodeB allocates resources configures the UE by sending message RRC CONNECTION RECONFIGURATION
•
The UE responds with RRC CONNECTION RECONFIGURATION COMPLETE
•
eNodeB sends INITIAL CONTEXT SETUP RESPONSE to the MME. (Process complete)
E-RAB Establishment Success Rate Counters
ERAB establishment procedure
CSSR KPI Formula and Equation Pseudo Formula CSSR = RRC connection establishment SR * S1 Signalling Conn Estb SR * Initial ERAB estab SR
2
Retainability The retainability is abnormal releases per second normalized with the time that the UE is active. Active UE here is a UE that has UL / DL transmitted data during the last 100 ms. Also the retainability can be expressed as the percentage of abnormal releases of the total established calls, precisely known as Dropped Call Rate. Reasons and Remedies for poor Retainability: •
Missing neighbor relations Neighbor list fine tuning would be the solution.
•
Poor radio conditions Physical optimization or eNodeB health check can help.
•
Badly tuned handover parameters Fine tuning HO parameters to perform handover in an optimized way i.e. avoiding delayed HO and also too early HO or frequent ping pong.
•
Admission reject, due to lack of licenses License Capacity upgrade or offloading of the eNodeB
UE Session Time
It shows the accumulated active session time in a cell for the measurement period. Number of session seconds aggregated for UEs in a cell. A UE is said to be „in session‟ if any data on a DRB (UL or DL) has been transferred during the last 100 ms.
Ue Session time counters
The call releases can be classified as under . 1) MME initiated ERAB release 2) MME initiated UE Context release 3) eNodeB initiated ERAB release 4) eNodeB initiated UE context release.
MME Initiated E-RAB & UE Context Release counters Counter
Description
pmErabRelAbnormalMmeAct
The total number of abnormal E-RAB Releases initiated by the MME and that there was data in either the UL or DL
pmErabRelMmeAct
The total number of E-RAB Releases initiated by the MME excluding succesfull HO. The counter is stepped regardless of whether data was or was not lost in UL/DL buffers.
pmUeCtxtRelAbnormalMmeAct
The total number of abnormal UE context Releases initiated by the MME and that there was data in either the UL or DL
pmUeCtxtRelMme
The total number of UE Context Releases initiated by the MME excluding succesfull HO. The counter is stepped regardless of whether data was or was not lost in UL/DL buffers.
pmUeCtxtRelMmeAct
The total number of UE context Releases initiated by the MME excluding successful HO and that there was data in either the UL or DL
RBS Initiated E-RAB & UE Context Release counters Counter
Description
pmErabRelAbnormalEnb
The total number of abnormal E-RAB Releases initiated by the RBS. The counter is stepped regardless of whether data was or was not lost in UL/DL buffers.
pmErabRelAbnormalEnbAct
The total number of abnormal E-RAB Releases initiated by the RBS and that there was data in either the UL or DL
pmErabRelNormalEnb
The total number of normal E-RAB Releases initiated by the RBS. The counter is stepped regardless of whether data was or was not lost in UL/DL buffers.
pmErabRelNormalEnbAct
The total number of abnormal E-RAB Releases initiated by the RBS and that there was data in either the UL or DL
pmUeCtxtRelAbnormalEnb
The total number of abnormal UE Context Releases initiated by the RBS. The counter is stepped regardless of whether data was or was not lost in UL/DL buffers.
pmUeCtxtRelAbnormalEnbAct
The total number of abnormal UE context Releases initiated by the RBS and that there was data in either the UL or DL
pmUeCtxtRelNormalEnb
The total number of normal UE Context Releases initiated by the RBS. The counter is stepped regardless of whether data was or was not lost in UL/DL buffers.
pmUeCtxtRelNormalEnbAct
The total number of abnormal UE context Releases initiated by the RBS and that there was data in either the UL or DL
KPI Formula and equations Call Drops Per Second Pseudo Formula: Number of abnormally released E-RAB with data in any of the buffers Active E-RAB Time
Equation
4. Integrity 4.1 Latency
Average IP Latency in DL Direction (ms)
Latency Counters
4.2 Throughput The speed at which packets can be transferred once the first packet has been scheduled on the air interface. Unit : Kbps The threshold for throughput is defined by the network policies and practices, it also depends on your design parameters.
Low Throughput causes in the Downlink for LTE networks.
Downlink interference: Cells with downlink interference are those whose CQI values are low. If low CQI values are found after a CQI
report is obtained, then downlink interference might be the cause of low throughput . Common sources of interference : inter-modulation interference, cell jammers and wireless microphones BLER Values: Larger BLER values are an indication of bad RF environment .Threshold value 10% Typical causes of bad BLER are downlink interference, bad coverage (holes in the network, etc.) MIMO Parameters: Identify the transmission mode of N/W. There are seven transmission modes as shown in the table below :
Adjust the SINR thresholds for transition of transmission modes as recommended by the OEM. Request the Link Level simulations they used to set these thresholds and see if the conditions under which the values were calculated apply to your network. Otherwise, update them if the parameters are settable and not restricted. ow Demand: If the maximum number of RRC connections active per cell is close or equal to the maximum number of RRC connections supported, then. The cause for low throughput is load. A high number of scheduled users per TTI does not necessarily mean that demand is the cause for low throughput.
Scheduler Type: Find the scheduler types your OEM supports. Select the one that is more convenient for the type of cell you are investigating. Examples of schedulers are: round robin, proportional fairness, maximum C/I, equal opportunity, etc. OEMs allow you to switch the scheduler in your network but recommend one in particular. The wrong scheduler may be the reason for bad throughput.
CQI reporting parameters: a) Identify CQI reporting of the network periodic or aperiodic (or both). b) Verify the frequency of CQI reporting & max number of users supported per second. c) If the value is too small compared with the maximum number of RRC active connections, then, increase the values of the parameters CQIConfigIndex as well as RIConfigIndex. d) Enable aperiodic CQI reporting, if not used. e) Slow CQI reporting frequencies may give bad channel estimations that restrict the eNodeB from scheduling the appropriate amount of data and correct Modulation and Coding Schemes to UE. Other : VSWR, Backhaul capacity
DL DRB Traffic Volume Counters
UL DRB Traffic Volume Counters
KPI formulae and equation
4.2
Packet Loss
Two types of Packet Loss: •The rate of congestion related packet losses (for example, the packets that get dropped due to active queue management functionality). •The rate of non-congestion related packet losses (those are packets that get lost in transmission, for example, discarded by some link layer receiver due to CRC failure). Downlink Packet Error Loss Rate [%]:
Uplink Packet Loss Rate [%]:
Counters
5
Mobility HOSR (Handover Susses Rate) is the measure of Mobility . Target 98%. Reasons and remedy for poor Mobility: • • •
Missing neighbor relations • NL Fine tune is the solution Poor radio conditions • Physical Optimization or site health check Badly tuned handover parameters • Fine tune the HO parameters to a tradeoff b/w too early or delayed handovers • Handover hysteresis and TTT (time-to-trigger) parameters to be tuned to avoid excessive ping-pong handovers.
(Parameter tuning may vary case to case and should be planned as per requirement. Values may differ for different terrain, under different traffic conditions etc.) Besides this the coverage overlap needs to be carefully planned. Too much cell overlap may result in interference or low cell edge throughput. No or very less overlap may result in higher dropped calls . Overlap can be optimized with parameters and physical changes.
Network Architecture Configuration
Triggering Events for sending Measurement reports to eNodeB
Measuring quantity
User equipment use two alternative types of measurements in the cell evaluation process: • Reference Signal Received Power (RSRP) representing the mean measured power per reference signal • Reference Signal Received Quality (RSRQ) providing an indication of the reference signal quality
The LTE mobility can be broadly divided into “Intra -LTE mobility” and “Inter -LTE mobility” (inter -working with 2G/3G and CDMA 2000).
Classification of HOs •
Intra-LTE Handover - within one MME pool Intra-eNodeB Inter-eNodeB
•
Inter LTE Handover - Inter MME pool
•
Inter-RAT Handover
•
Inter Frequency Handovers
5.1 Intra LTE Intra MME Intra eNodeB
Handovers between the cells of the same eNodeB 5.2 Intra LTE Intra MME Inter eNodeB (X2 based handover)
As long as the UE moves between eNB‟s that belong to the same pooling area where the UE is currently registered, the handovers are executed via the X2 interface. However HO can proceed on S1 in case the X2 is not defined between source and tareget eNodeBs
Intra MME Handover Network Architecture Configuration
5.3 Inter LTE Inter MME (S1 Based Handovers)
In cases when the UE moves between eNB‟s that belong to different pooling areas the handover procedure necessarily has to be executed via the S1 interface.
5.4 IRAT (Inter Radio Access Technology): Inter-working between LTE (E-UTRAN) and • UTRAN • GERAN • CDMA2000
5.5 Inter Frequency Mobility
The Inter Frequency Mobility consists of Coverage Triggered Inter Frequency Handover and Coverage Triggered Inter Frequency Session Continuity.
5.6 Counters Intra Frequency Handover Preparation Counters
Intra Frequency Handover Execution Counters
Inter Frequency Handover Preparation Counters
Inter Frequency Handover Execution Counters
5.7
KPI Formula and equations
Pseudo formula : 100 *(Succ HO Prep / HO Pre Att) * (Succ HO Exec / HO Exec Att) %
EUTRAN Mobility Success Rate [%]:
6
Availability
Partial cell availability (node restarts excluded) This KPI measures system performance. Since the KPI is measured by the eNodeB, it does not include time when the eNodeB is down, i.e. node restart time is excluded. The length of time in seconds that a cell is available for service is defined as cell availability.
6.1 Counters The main counters for this cell down time are: pmCellDowntimeAuto - Length of time the cell has been disabled due to a fault pmCellDowntimeMan - Length of time the cell has been disabled due to Administrative State of the cell (The counter is only incremented when the eNodeB is operational)
6.2 KPI Equation
M - number of cells N - reporting periods
References: Google Aircom SoW Document Ericsson Counter Reference LTE L11 KPIs tutorial by CHAVANKUMAR T C