Version: 2.0 October 2009
SAP Standard Root Cause Analysis Whitepaper
Active Global Support SAP AG
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 1 of 28
SAP ® Standard Root Cause Analysis Change history: Version
Date
Changes
1.0
April 2007
Original version
1.1
January 2008
New version of chapter 5.1 (Methodology) Restructuring of chapter 5.3 (People) Additional information in chapter 6 (How to measure the success of the Implementation)
2.0
October 2009
Changed structure and content of chapter 4 and 5 Minor changes of chapter 1,2,3 and 6
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 2 of 28
SAP ® Standard Root Cause Analysis Table of Content 1
Management Summary ........................................................................ 4
2
Application Life-Cycle Management ................................................... 5
3
Root Cause Analysis Standard at a Glance ....................................... 8
3.1 3.2 3.3
Goal ................................................................................................................ 8 Scope.............................................................................................................. 8 Benefits........................................................................................................... 9
4
What is the Basic Concept of Standard Root Cause Analysis ....... 11
4.1 4.2 4.3 4.3.1 4.3.2 4.3.3 4.4 4.5
Triggers of Root Cause Analysis ................................................................... 11 Cross Component Analysis ........................................................................... 12 Component Specific Analysis ........................................................................ 13 Server Side Analysis ..................................................................................... 14 Client side analysis ....................................................................................... 14 Analysis through Software Vendor (SAP or Partner) ..................................... 14 Follow-up Activities ....................................................................................... 14 Architecture of Root Cause Analysis ............................................................. 15
5
How to Implement the Root Cause Analysis Standard? ................. 18
5.1 5.1.1 5.1.2 5.2 5.3 5.3.1 5.3.2 5.3.3
Installation and Configuration of Root Cause Analysis Scenario ................... 18 Prerequisites ................................................................................................. 18 Configuration ................................................................................................. 20 Tools ............................................................................................................. 23 People........................................................................................................... 25 E2E Solution Operations – Core Knowledge ................................................. 25 Technical Core Competence Courses........................................................... 25 Technical Expert Competence Workshops.................................................... 26
6
How to Measure the Success of the Implementation ...................... 27
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 3 of 28
SAP ® Standard Root Cause Analysis
1 Management Summary Customer’s heterogeneous IT landscapes running mission critical applications have become increasingly complex during the last decade. Finding the root cause of an incident in those environments can be challenging. This creates the need for a structured approach to isolate a component causing the problem. The approach must be supported by tools, helping customers to do this as efficiently as possible. The standard Root Cause Analysis (RCA) defines how to perform a root cause analysis across different support levels and different technologies. The basic idea behind Root Cause Analysis is to determine where and why a problem occurred. Root Cause Analysis is not only an E2E Standard defined b y SAP, it is a procedure based on SAP best practices with a set of tools shipped with SAP Solution Manager. This paper outlines the basic concept as well as the implementation methodology of the SAP Support Standard Root Cause Analysis. The last chapter explains how to measure the success of an implementation of this standard.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 4 of 28
SAP ® Standard Root Cause Analysis 2 Application Life-Cycle Management Companies expect from their IT departments that mission-critical business applications run smoothly, without business disruptions, at low cost, and that they can be adapted easily to new requirements. It is the mission of Application Life-Cycle Management (ALM) to achieve this. S AP’s ALM portfolio consists of processes, tools, services, and best practices, to ma nage SAP and non-SAP solutions, throughout the entire application life-cycle. For details about the complete portfolio, please refer to http://service.sap.com/alm. According to the IT infrastructure library (ITIL), the application management life cycle comprises six phases: Functional and non-functional requirements are collected and evaluated during the requirements phase. In the design phase, the findings from the requirements phase are used to specify how the application or IT operation processes are to function, and which IT applications should be used to map the processes. In the build and test phase, a system landscape is set up and configured to implement and test the planned scenarios and processes. The deploy phase is the transition from a pre-production environment to production operation. The operate phase groups tasks that are performed after system startup, to ensure the availability and stability of the solution. T hese tasks include activities such as system administration, system monitoring, business process monitoring, message processing (Service Desk), root cause analysis, issue management, and service delivery. The optimize phase collects key figures and data from the live solution, to reduce costs or improve performance. ALM processes span the six phases, to ensure stable operation of the IT solution while enabling accelerated innovation. Optimizing these processes reduces costs and ensures the highest quality of IT operation. Typically, multiple teams are involved in the ALM processes (see Figure 2.1). They belong to the key organizational areas Business Unit and IT . The names of the organizations differ from company to company, but their functions are equivalent. For example, a program manage- ment office communicates business requirements to the IT organization, decides on the financing of development and operations, and ensures that the requirements are implemented. On the technical side, the application management team is in direct contact with the business units. It is responsible for implementing the business requirements and providing support to end users. Business process operation covers the monitoring and support of the business applications, their integration, and the automation of jobs. And SAP technical operation is responsible for the general administration of systems and system diagnostics. Further speci alization is possible within these organizations. For example, there may be separate experts for different applications within SAP technical operations, in larger organizations.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 5 of 28
SAP ® Standard Root Cause Analysis
Figure 2.1: Organizational model for application life-cycle management Two things are the key to optimizing the collaboration of the groups involved: a common infrastructure, and a clear definition of the collaboration processes, including the activities involved, responsibilities, and service levels. The infrastructure is provided b y SAP Solution Manager as a collaboration platform. It provides role-based access to all functions required (provided either by SAP Solution Manager itself or by integrated tools), via work centers. It also provides all related information, centrally, so that all stakeholders involved have easy access to the information they require. Many customers have defined coll aboration processes. SAP has leveraged the experience of these customers, and of its own application life-cycle management experts, to create best-practice descriptions of important ALM processes. These documents are published as E2E Solution Operations standards in SAP Service Marketplace at http://service.sap.com/supportstandards. Customers can refer to these standards when optimizing their own IT processes. With Run SAP, SAP provides a methodology for the implementation of the End-to-End Solution Operations standards. The road map for Run SAP guides through defining the scope of the operations to be implemented, preparing a detailed plan, doing the setup, and running SAP solutions. Moreover, it helps to find the right strategy and tools to implement ALM. The road map provides not only what needs to be implemented but also information about how it needs to be implemented, in the form of implementation methodology documents and bestpractices documents. Currently, SAP provides the following standards: Solution Documentation and Solution Documentation for Custom Development define the documentation and reporting required for the customer solution Incident Management describes the incident resolution process Remote Supportability contains five basic requirements that have to be met to optimize the supportability of customer solutions Root Cause Analysis defines how to perform root cause analysis, end-to-end, across support levels and technologies © 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 6 of 28
SAP ® Standard Root Cause Analysis Exception Handling and Business Process and Interface Monitoring explains how to define a model and procedures to manage exceptions and error situations during daily business operations, and how to monitor and supervise mission-critical business processes Job Scheduling Management explains how to manage the planning, scheduling and monitoring of background jobs Data Integrity and Transactional Consistency avoids data inconsistencies, and safeguards data synchronization across applications, in distributed system landscapes Data Volume Management defines how to manage data growth Change Management enables efficient and punctual implementation of changes with minimal risks Test Management describes the test management methodology and approach for functional, scenario, integration and technical system tests of SAP-centric solutions. System Monitoring covers monitoring and reporting of the technical status of IT solutions System Administration describes how to administer SAP technology to run a customer solution efficiently Custom Code Management describes the basic concepts of custom code operation and optimization Security describes basic activities to setup, maintain and evolve security measures for the operation and organization of SAP solutions. Upgrade guides customers and technology partners through upgrade projects Out of this list, this white paper describes the standard for Root Cause Analysis.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 7 of 28
SAP ® Standard Root Cause Analysis
3 Root Cause Analysis Standard at a Glance In today’s distributed, heterogeneous customer IT environments, accessible through diverse devices and multiple channels, analyzing the root cause of an incident requires a systematic top-down approach. SAP has developed the solution operations standard Root Cause Analysis to address this. This standard mainly consists of an analysis roadmap and tools, which support both customers and SAP Support consultants during a resolution process.
For instance, if an end user experiences a problem while maintaining his bank account data in the corporate portal, the cause may be on the client pc (e.g. browser), in the network or somewhere in the server environment, which itself might comprise different instances of varying technologies. In this example the client request in question first hits a SAP NetWeaver Portal (based on SAP AS Java), then reaches a SAP ERP System (based on SAP AS ABAP) via a RFC call and finally results in a SQL statement which retrieves information from the ERP database. The performance problem or functional defect might have occurred in any of those systems . SAP’s root cause analysis tools help to identify the specific system part, which has caused the error. The standard Root Cause Analysis offers a systematic analysis approach and tools for the resolution of incidents - especially valuable in distributed mission-critical customer environments.
3.1 Goal In the presence of an issue affecting production the central goal of the customer’s IT team is to provide an immediate corrective action (workaround), which restores service operations as quickly as possible and which affects end users minimally, a complete solution to the issue at hand by isolating the area of concern. Additionally, with respect to operation, SAP’s root cause analysis tools are designed to reduce the number of resources in each step of the resolution process. An IT generalist with core competence in root cause analysis, who involves a Component Expert, are mostly enough to investigate an issue and nail it down. Finally, critical malfunctions can be avoided through proactive root cause analysis. Examples of such investigations are the regular study of EarlyWatch Alerts (EWAs) and a deep analysis of problems discovered through integration validation prior a go -live.
3.2 Scope SAP’s standard Root Cause Analysis consists of
Roadmaps for a systematic top-down analysis Tools for each task in cross-component (end-to-end analysis) and componentspecific analysis. Per definition, a cross-component analysis involves several sys © 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 8 of 28
SAP ® Standard Root Cause Analysis tems or technology stacks, whereas component-specific analysis deals with one system or technology stack. A dedicated support user, who is only assigned read-only rights, ensures a safe system access for SAP and customers. An open diagnostic infrastructure with hubs for all kinds of diagnostic data (e.g. workload, exceptions, technical configuration or traces). SAP not only progressively adds new SAP technologies, appl ications and OEM’s to this open infrastructure, but also integrates certain products of SAP focused independent software vendors. A training and certification program for Root Cause Analysis – covering both an analysis roadmap and tools Knowledge transfer for experts in a certain technology or system area, for example: o
SAP NetWeaver Application Server ABAP (SAP NetWeaver AS ABAP)
o
SAP NetWeaver Application Server Java (SAP N etWeaver AS Java)
o
SAP NetWeaver Business Warehouse (SAP NetWeaver BW)
o
SAP NetWeaver Process Integration (SAP NetWeaver PI)
o
SAP NetWeaver Portal
o
SAP ERP Core Component (SAP ECC)
o
SAP Customer Relationship Management (SAP CRM)
o
Databases
o
SAP client diagnostics
Run SAP scope assessment
3.3 Benefits Overall, Root Cause Analysis works towards simplifying the problem resolution process within an IT environment and reducing the total cost of ownership. Benefits of this standard and of SAP’s preferred tools for RCA are: Ensured continuous business availability – Root Cause Analysis helps to accelerate the problem resolution process. As a result introducing SAP’s RCA methodology generally leads to increased availability of the IT solution. Reduced costs for support experts - The targeted top-down approach of RCA supports a one step dispatching of issues from an IT Generalist to a Component Expert. Thereby the overall problem-resolution time and the number of resources involved in the investigation are reduced. Progressive data aggregation and unified display of diagnostics data across applications and technologies drastically reduce the level of specialism required to isolate the area of concern. Reduced license costs - Supporting RCA Tools offered by SAP are part of the standard maintenance contract and come at no additional fee.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 9 of 28
SAP ® Standard Root Cause Analysis Appropriate tools for root cause analysis are available off-the-shelf. Thereby a time consuming identification process for analysis tools is avoided. Even a structured and proven resolution roadmap is provided by SAP. One safe access channel to all systems – Root Cause Analysis provides one safe and central access channel to the customer’s landscape. If required, an investigation is continued on the system in concern using a predefined support user (SAPSUPPORT), who is only assigned read-only rights. Collected workload and exception data is displayed in unified views, thereby abstracting data from the underlying technology stack. This approach supports the structured top down analysis approach, as generalists and experts start investigating at one common point. Empowers the customer to solve problems himself - Nobody knows the customer’s SAP landscape as well as the customer himself. E2E Root Cause Analysis provides expert tools which enable a customer to quickly solve problems, thereby reducing overall resolution time. Data foundation for monitoring and IT reporting - Diagnostics in SAP Solution Manager forms the technological foundation for SAP’s next generation application monitoring. As a result, implementing Diagnostics already marks one essential step for this upcoming functionality. Furthermore, data collected by Diagnostics is reused for automated IT reporting.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 10 of 28
SAP ® Standard Root Cause Analysis
4 What is the Basic Concept of Standard Root Cause Analysis Following SAP’s collaboration model between business and IT, the Application Management team is the owner of the Root Cause Analysis process. The Application Management team is the central contact point of business departments for all IT related topics regarding their business processes. End-User and Key-User address issues directly to the Application Management office. Furthermore, this unit coordinates the implementation of new business processes or IT scenarios. If required, other IT units such as Custom Development, Business Process Operations and SAP Technical Operations perform a root cause analysis on the system responsible for the incident and apply a corrective action. Efficient collaboration between those teams is required to optimize operations of SAP-centric solutions. This involves the definition of processes, responsibilities, Service Level Agreements (SLAs) and agreement on key performance indicators (KPIs). SAP’s End-to-End Root Cause Analysis is a systematic top-down approach which avoids time consuming untargeted and intuition-based analyses.
4.1 Triggers of Root Cause Analysis
Technical Monitoring and Alerting
Incident Mgmt
Proactive Quality Assurance
Analyze
End Users and Key Users play a crucial role as they are the recipients of services provided by the IT organization. Key Users are the first point of contact in case of problems reported by End Users. Apart from troubleshooting, Key Users provide detailed feedback to members of Application Management about ongoing IT operations.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 11 of 28
SAP ® Standard Root Cause Analysis A Root Cause Analysis is either triggered by an incident reported by a Key or End User or by an alert (solution monitoring). SAP provides monitoring for both SAP Technical Operation (E2E System Monitoring) and SAP Business Process Operation (E2E Business Process Monitoring). Their sole aim is to proactively detect errors and performance bottlenecks before they affect business continuity. Alerts are triggered based on thresholds and notify the appropriate contacts within the IT team. The resolution of an alert should be documented in an incident induced by the recipient of a certain alert. The problem has to be recorded (if possible), described, categorized and prioritized via a message in the customer’s service desk system. Incidents opened by Key or End Users are sent to first-level support. Furthermore, proactive quality assurance tasks like integration validation tests or the check of EarlyWatch Alerts might trigger a RCA. Errors mentioned in a EWA and marked by a yellow or red light should be investigated in a proactive RCA. Those errors will be directly analyzed by the appropriate Component Expert and will not involve first-level support in general. Investigations are continually documented by the expert. The outcome will in most cases result in a change and the documentation is added to the customer’s solution database.
4.2 Cross Component Analysis The first level support attempts to clarify a reported problem, searches the customer’s sol ution database and SAP Notes. If a solution is not found, the Application Management team will be involved. In case of priority one problems, the main goal shoul d be to resolve the issue as fast as possible without destroying logs which might help to analyze the problem further afterwards. A part-time solution or workaround might be applied. In all other circumstances, e.g. development or quality assurance systems, it is important to drill down the issue by changing as little as possible in the environment. Otherwise, side effects might lead to a wrong analysis path, thereby wasting time and resources. When an incident reaches the Application Management team, it is handled by an IT Generalist first. The IT Generalist is the mediator between application and technology. He integrates monitoring and administration as a whole and is able to answer detailed questions regarding the customers IT landscape. Additionally, he has detailed knowledge of dependencies between different software components and their effect on core business processes. Therefore, he is skilled best to classify the incident and gauges if the error is caused by the interaction of several systems or is dedicated to a specific system. In the latter case the IT generalist routes the incident further to the relevant Component Expert or Technical System Owner. It is important to understand the big picture first before starting a deeper RCA. This understanding requires a check of software component versions and recent changes first. End user requests compromising several systems are extremely difficult to track. As a consequence, locating an error or performance bottleneck is time consuming. Reducing the overall complexity of such situations accelerates the analysis process. For those situations, research should start from a central analysis tool, which contains up to date information about all systems and in optimal case displays this information in unified views on exceptions and system workload. This centralized approach suits the needs of the IT Generalist best and supports him during the resolution process. It is not only desirable to obtain up to date information of all systems involved, additionally it might become necessary to compare system workload or configuration at different points in time. As a consequence detailed historical workload information and configuration snapshots of the customer’s system © 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 12 of 28
SAP ® Standard Root Cause Analysis landscape have to be stored in a central database. This data should be usable by component experts as well. Thereby, one common diagnostic data overview is formed for generalists and experts. Furthermore, if it is possible to reproduce and trace a problem, this path should be taken, as it speeds up the overall analysis additionally. The main goal of this first analysis step is to isolate the problem causing component and involve the according component expert afterwards. Read-only rights combined with a dedicated support user, who is unique across all systems, will additionally help to achieve this.
SAP Support
Custom Development
Component Expert
IT Infrastructure
SAP Technical Operations
4.3 Component Specific Analysis The IT Component Expert is the counter part of the IT Generalist in the Application Management team. He has in-depth knowledge of one or several components, technologies or system types (e.g. SAP Application Server, TREX or MDM). Usually, the IT Component Expert is the last person to contact in case of an incident before opening a support message at SAP. As a consequence, he analyzes the problem further. If the IT Component Expert cannot solve the issue himself, SAP Support has to get involved via a customer message in SAP Service Marketplace. The opening of such a message is carried out by the expert, as he is best capable of describing the problem and providing the most qualified answers. In the majority of cases, the utilization of the same tool box by both parties yields to a lower number of message roundtrips and encourages the interaction between customer and SAP. © 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 13 of 28
SAP ® Standard Root Cause Analysis As another outcome, Custom Development might be contacted, because the problem has been located in custom developed code. Additionally, the IT Infrastructure teams helps when dealing with questions concerning the underlying operating system. The IT Infrastructure unit provides the underlying OS p latform for the company’s application servers and, if necessary, adjusts the OS to the requirements of the application server running on top. Services of this department include configuration and monitoring of the operating system and hardware. Network management and analysis are also covered by the IT infrastructure team. Overall a strong interaction between Custom Development, IT Infrastructure, SAP Support and the IT Component Expert is required (see Figure 2.1).
4.3.1 Server Side Analysis Detailed system analysis must be supported by appropriate tools. In case of SAP AS ABAP this functionality is already built in (e.g. transactions ST03N or STAD). In the presence of SAP AS Java log viewer and system health reports might not be enough to drill down an error. Further information on memory usage, exceptions occurring per second, and garbage collection activity may be required by an IT Component Expert (e.g. gained from Wily Introscope). Application Management has to ensure that present analysis tools support the different technology stacks in use equally.
4.3.2 Client side analysis In any case, a professional RCA tool box must support both server side and client side analysis. Client side analysis is crucial as modern web applications do not contain pure html anymore, but instead make extensive use of plug-ins and JavaScript. For example, today’s antivirus programs observe the execution of JavaScript with the help of heuristic methods. This behavior of antivirus tools might lead to interruptions or slowdown of JavaScript code. Without the help of a specific tool, such problems might only be analyzed using the exclusion principle, which is in most cases time-consuming and resource expensive.
4.3.3 Analysis through Software Vendor (SAP or Partner) As mentioned earlier, in case that a customer is not able to solve a problem himself, SAP or partners may have to get involved on request. In order to provide efficient support, experts need access to the customer’s IT solution landscape. The Remote Supportability standard (see separate whitepaper) describes the requirements. Moreover, the standard Solution Documentation explains which kind of solution information should be made available by customers in order to enable SAP consultants to address issues in time.
4.4 Follow-up Activities Applying corrective actions usually involves a change, which is triggered by the relevant Component Expert. Changes to productive systems have to be tracked and approved by change request management. Eventually, additional supplementary follow-up activities are necessary. Those are initialized by the Technical System Owner in coordination with the IT Strategist.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 14 of 28
SAP ® Standard Root Cause Analysis The result of the RCA process is stored in the customer’s Solution Database . The Component Expert is responsible for the documentation process. If several experts have contributed to the solution, this process is mostly led by the Technical System Owner (TSO). The TSO is the person in charge of a specific SAP system (including the database). He applies major changes to the specific system and ensures professional system documentation.
Part of the documentation should be the final corrective action, not the analysis path. Also, intermediate workarounds can be added if they have helped to restore productive operations quickly. If a customer message in SAP Service Marketplace has been opened, the number should be quoted as well. This becomes increasingly important if the applied correction results in instability of other components and a follow-up message is created. Knowing as much as possible about the history or origin of a problem is always appreciated by SAP Active Global Support. Finally, it should be possible to resolve any new occurrence of the same issue fast and reliably by applying the documentation recorded. Ideally, the problem is avoided proactively in the future, e.g. by changing or adding alert thresholds.
4.5 Architecture of Root Cause Analysis E2E Root Cause Analysis in SAP Solution Manager is based on a central diagnostics database that is populated with data by Diagnostics agents running on each managed system. These agents are delivered preconfigured by SAP. The data required (e.g. critical log entries, dumps or queue errors) to isolate a problem causing component is continuously collected from all SAP systems. The information is kept uniform across all technologies and is available from one central console in SAP Solution Manager. E2E diagnostics supports root cause analysis of components implemented in ABAP, Java, C(++) or those running on the Microsoft .NET framework. E2E Root Cause Analysis in SAP Solution Manager standardizes and systematically aggregates Performance and resource metrics, Changes to software (code), configuration, or content, Exceptions, such as logs and dumps (program terminations). Furthermore, the information is condensed, correlated, aggregated, and made available for comprehensive IT reporting. Exceptions are reflected in unified statistic views, from where component specific log and dump viewers are directly accessible. Technical configuration, such as system properties, is tracked daily to detect recent changes and inconsistencies between systems (e.g. development and production). E2E Diagnostics is an open infrastructure with hubs for integrating non-SAP components. The openness of E2E Diagnostics is particularly underscored by the integration of Wily Introscope as OEM component. Cross-component diagnostic and component diagnostic tools are centrally accessible from SAP Solution Manager. They can be invoked from any SAP workplace upon a customer
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 15 of 28
SAP ® Standard Root Cause Analysis opens a remote connection to SAP, thereby allowing customers and partners to use the same standardized SAP tools. SAP’s E2E Standard Root Cause Analysis encompasses many different tools. Figure 4.3 illustrates the relation between the RCA standards, BMC Appsight, Solution Manager Diagnostics und Wily Introscope.
First of all, the standard E2E Root Cause Analysis differentiates between client and server side analysis. Client side analysis might be carried out using the third-party tool BMC Appsight. Appsight enables the analysis of client performance in combination with user interaction. SAP Solution Manager includes the license for the recording agent of Appsight („Black Box“) and the console of Appsight (analysis application). Although data is constant ly sent from Wily Introscope to SAP Solution Manager, no data exchange happens between BMC Appsight and SAP Solution Manager.
End-to-End Root Cause Analysis client side analysis
server side analysis Solution Manager Diagnostics infrastructure
BMC Appsight
AS ABAP
RFC & DiagAgent
AS Java
Diagnostics Agent
Managed Systems
SAP Business Intelligence InfoCube
SAP Solution Manager
CA Wily Introscope
Server-side analysis is usually carried out using the Root Cause Analysis work center. It is based on a central diagnostic database. Therefore, SAP Solution Manager has a built-in Business Warehouse that is populated with data by a diagnostics agent running on each satellite system. Agents are preconfigured and delivered by SAP. Those agents and Wily Introscope continuously collect exceptions (such as critical log entries, dumps and errors), configuration snapshots and workload data from each satellite system. The information is kept uniform across all stacks and is available from one central console in SAP Solution
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 16 of 28
SAP ® Standard Root Cause Analysis Manager. Furthermore, the information is condensed, correlated and aggregated and made available for comprehensive IT reporting. Exceptions are reflected in unified statistics of highseverity log entries and dumps. It is possible to access component-specific log and dump viewers. Technical configuration, such as system properties and snapshots of the technical system configuration, are tracked daily to expose inconsistencies between development, quality assurance and production environments, and to detect any recent changes that may have been applied to the technical configuration of the production landscape. In SAP’s RCA context , the third-party tool Wily Introscope is mainly used for the collection of performance metrics from SAP Application Server Java. The data preserved by Wily Introscope is continuously read by Solution Manager. However, in most instances a deep Application Server Java analysis is carried out directly using a front end application of Wily Introscope.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 17 of 28
SAP ® Standard Root Cause Analysis
5 How to Implement the Root Cause Analysis Standard? Customers may employ any tools that are suitable and familiar to the experts who have to carry out a root cause analysis. However, to ensure that no additional harm is done during the analysis process, it is recommended to establish tools that prevent write-access. For SAP environments, SAP recommends to use End-to-End Root Cause Analysis in SAP Solution Manager, known as Solution Manager Diagnostics. This chapter explains prerequisites and necessary steps to implement RCA.
5.1 Installation and Configuration of Root Cause Analysis Scenario End-to-End Root Cause Analysis requires a SAP Solution Manager 7.0 dual stack system (AS ABAP and AS Java). The setup of the Root Cause Analysis scenario has been simplified throughout the last years. While the installation requires OS access, the configuration can almost completely be performed from SAP Solution Manager.
5.1.1 Prerequisites SAP recommends to run SAP Solution Manager on Unicode (see also customer letter on http://service.sap.com/Unicode). It is required to install all new SAP Solution Manager systems on Unicode. Installation guides can be found on Service Marketplace http://service.sap.com/instguides SAP Components SAP Solution Manager. For customers’ SAP Solution Manager installations, which have been upgraded from previous releases and have not yet been migrated to Unicode, SAP recommends to migrate the ABAP part to Unicode. Should this not be possible, SAP will support non-Unicode installations until the customer has completed the Unicode conversion. Minimum required support package levels for Diagnostics in SAP Solution Manager as well as for managed systems are documented in SAP note 1010428. Although the SAP Solution Manager setup procedure configures a dedicated Solution Manager System Landscape Directory (SLD), one central SLD, containing all SAP systems, should already be in place beforehand. Up-to-date system information (e.g. software components, patch levels, hostnames) are crucial for the Diagnostics infrastructure. The Wily Introscope Enterprise Manager needs to be installed on the SAP Solution Manager host or on a separate host (see http://service.sap.com/diagnostics section Installation and Configuration Wily Introscope Installation Guide). Furthermore as a rule of thumb, one Diagnostics agent has to be installed per virtual host of each managed system. In case of new NetWeaver installations, this is usually not required anymore, as an agent is automatically installed as part of the system installation process. The Diagnostics data provisioning layer does not rely on a monolithic agent. In contrast, it is built on a two level agent architecture. This allows the deployment of Diagnostics agent applications (e.g. Wily Introscope agent) from SAP Solution Manager, which takes place during the © 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 18 of 28
SAP ® Standard Root Cause Analysis managed system configuration phase and in the event of a Solution Manager update. A Di- agnostics Agent Setup Guide is available on SAP Service Marketplace (see http://service.sap.com/diagnostics section Installation and Configuration Diagnostics Agent Setup Guide). Finally, certain ports between the SAP Solution Manager and the managed systems must be opened. These ports are listed in the following tables:
Connection established
“Service” on dest. host / Protocol
Service port example / Format
from host(s) Src. Host
to host Dest. Host
Outside
Diagnostics Server
J2EE engine / HTTP
Ex: 50100 / 500
DMZ
Diagnostics Server
ITS / HTTP
Default: 8000
Diagnostics Server
Diagnostics Server
IGS / HTTP
Ex: 41080 / 480
ALL Managed systems (Diag. Agent)
Diagnostics Server
J2EE engine / P4
Ex: 50104 / 504
ALL Managed systems (Diag. Agent)
Diagnostics Server
Message srv. / HTTP (not 36XX)
Ex: 8101 / 81
ALL managed systems (Wily Agent)
Diagnostics Server
Introscope Enterprise Manager / TCP / IP
Default: 6001
“Service” on dest. host(s) / Protocol
Service port example / Format
Ex: 50200 500
Connection established from host Src. Host
to host(s) Dest. host(s)
Outside
ALL Managed System
J2EE engine / HTTP
Diagnostics Server
ALL Managed System
RFC
ALL managed System (Diag. Agent)
ASSOCIATED Managed System
J2EE engine / P4
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Ex: 50204 504
Page 19 of 28
SAP ® Standard Root Cause Analysis 5.1.2 Configuration Currently, three methods for the configuration of Diagnostics in SAP Solution Manager are available. First of all, the customer might take care of the whole configuration process by reading the documentation provided and executing necessary steps himself. As a second option SAP offers a package of remote training sessions called Expert Guided Implementation. Additionally, an SAP Support Consultant might accomplish the setup on-site together with members of customer’s staff.
5.1.2.1 Guided configuration using SOLMAN_SETUP As of SAP Solution Manager SP18, a new guided web-based configuration mechanism is available. It is accessible via transaction SOLMAN_SETUP. Figure 5.1 shows the configuration wizard’s main screen.
Calling the transaction SOLMAN_SETUP will open the new browser based configuration wizard. The guided setup procedure is almost self-explaining as on-screen help is directly included. Further information can be found in a tutorial on Solution Manager. Additional help is available on help.sap.com SAP Solution Manager. The setup of Root Cause Analysis is part of the configuration scenario Basic Configuration of SAP Solution Manager. The scenario Initial Configuration is only required for new installations of SAP Solution Manager.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 20 of 28
SAP ® Standard Root Cause Analysis Basic Configuration automatically implements latest corrections and performs major customization steps. After having completed Initial and Basic Configuration, the following functionality is ready to use: Root Cause Analysis in SAP Solution Manager – SAP’s preferred tool for carrying out RCA Maintenance Optimizer – enables the customer to download Support Packages and Enhancement Packages EarlyWatch Alerts, weekly generated system health reports for ABAP and Java based SAP Systems Service Desk functionality, basic functionality, e.g. sending of messages to SAP via SAP Solution Manager Expert on Demand Session readiness Business Blueprint and Configuration – basic functionality, e.g. generation of customer business blueprint documents and configuration guides for SAP solutions After having performed all activities of Initial and Basic Configuration, managed systems can be connected to SAP Solution Manager. The scenario Managed System Configuration guides the administrator during the connection process. It must be executed once for each System. As a prerequisite, a Diagnostics agent must have been installed for this system and the system has to fulfill the minimum software requirements for Diagnostics in SAP Solution Manager.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 21 of 28
SAP ® Standard Root Cause Analysis Again, this procedure is almost fully automated as only connection parameters and logon data have to be entered. However, one instance profile parameter for ABAP and one for Java based systems have to be set manually. The most important automated steps are: managed system prerequisite check creation of RFC connections (from Solution Manager to the managed system and back) creation of communication users scheduling of Early Watch Alerts scheduling of Diagnostics extractor jobs Wily Introscope instrumentation (in the presence of Java based s ystems) Finally each AS Java instance has to be restarted to activate the new instance parameter and the Wily Introscope agent. In the presence of ABAP based systems, only the Internet Communication Manager needs a restart. In case of productive systems, this is carried out best sequentially during non-working hours, thereby minimizing the impact on productive operations. The correct setup of Diagnostics in SAP Solution Manager and the connection of managed systems might be verified via the Diagnostics Self-check – available in the Root Cause Analysis Work Center. In case of advanced security requirements, the SAP Solution Manager Security Guide (available via http://service.sap.com/instguides SAP Components SAP Solution Manager ) and the Root Cause Analysis User Administration Guide (download on http://service.sap.com/diagnostics) offer detailed information.
5.1.2.2 Expert Guided Implementation Expert Guided Implementations are offered by SAP Active Global Support to support customers and partners during the activation of SAP Solution Manager scenarios. The customer is given direct guidance by a SAP Solution Manager expert and is not left alone with SAP guides. Each guided implementation consists of several web sessions. The main benefits and deliverables of an Expert Guided Implementation are: Each step of the scenario implementation is shown and explained by an SAP expert on a sandbox system. Members of customer’s staff team receive direct knowledge transfer by the SAP Expert and may ask dedicated questions in each session.
Between each session, the customer is given time to execute the demonstrated steps on his SAP Solution Manager system. Thereby, the delivery format empowers the customer to directly work on his Solution Manager system, making it possible to address issues specific to the customer’s landscape in the upcoming session. The overall goal is to execute all relevant implementation steps during the delivery time of the Expert Guided Implementation. After the delivery, the configured scenario should be ready for productive usage and the customer should have the knowledge to use and maintain its Root Cause Analysis infrastructure. © 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 22 of 28
SAP ® Standard Root Cause Analysis Prerequisites needed and a time schedule for the delivery can be found on SMP http://service.sap.com/alm-services Expert Guided Implementation.
5.1.2.3 Solution Manager Starter Pack The SAP Solution Manager Starter Pack is applicable if direct assistance concerning the configuration and usage of SAP Solution Manager is needed. It provides a skilled SAP resource who will work together with members of customer’s staff on the configuration of SAP Solution Manager. Thereby key knowledge is shared. Additionally a basic workshop helps broaden the usage and understanding of SAP Solution Manger functions. Detail information regarding the Solution Manager Starter Pack can be found on SMP http://service.sap.com/alm-tools SAP Solution Manager Services Starter Pack.
5.2 Tools The SAP Standard Root Cause Analysis compromises tools for client and server side analysis. The central and most important one is called Root Cause Analysis, which has been fully integrated into the work center methodology of Solution Manager, thereby offering one central starting point for the analysis of errors, for both the customer and SAP. The tool completely fulfills the requirements of the SAP E2E Standard Root Cause Analysis. The navigation concept of the Work Center RCA follows a top down analysis approach. Analysis usually starts at the End-to-End Analysis section, continues with a deeper look inside the application server, might be followed by a host analysis and may end up at database level.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 23 of 28
SAP ® Standard Root Cause Analysis Additionally, the Common Tasks section provides quick links to common maintenance tasks (e.g. setup and check tools). The detailed section displays all SAP systems connected to SAP Solution Manager and provides access to the specific analysis tools. The administrator might customize the system selection displayed by restricting the list to systems of his concern or ordering them by system type. The most important tools are found in the End-to-End Analysis section: Exception Analysis allows centralized analysis of all exceptions thrown in the managed systems. This includes not only ABAP Dumps and ABAP Syslog errors but also Java application errors collected from the default trace (and much more). Specific log and dump viewers are accessible from the E2E Exception Analysis section, too. In the presence of functional problems, E2E Exception Analysis offers a unified view of the exceptions of all systems and additionally displays them using different time diagrams. Workload Analysis aggregates server side performance statics of managed systems to identify general server side performance bottlenecks, such as sizing issues. If a customer faces a performance problem, E2E Workload Analysis might be the tool to start with. Change Analysis tracks all changes (e.g. technical configuration, code, content) which are applied to the managed systems. This information is especially useful if a few ad-hoc changes result in a disruption of a productive system, as it is possible to compare different systems and generate a report which contains the results. This approach identifies the problem by comparison rather than by drilling down, which is faster and easier in most cases. Trace Analysis isolates a single user request through a complete landscape, providing trace information on each of the involved system parts. The measurement is started at the end user’s interface (Internet Explorer or SAP GUI). With the help of an individual correlation ID, each request is traced throughout the SAP server landscape. This function enables the customer to quickly identify the component causing the problem with just a few clicks. The section System Analysis compromises the tools Change Reporting and Log Viewer . Host Analysis offers access to predefined file system folders ( File System Browser ) and enables SAP Support to execute certain read-only commands via OS Command Console . The fourth and last section Database Analysis provides access to the DBA Cockpit. Finally, SAP recommends creating the user SAPSUPPORT based on the shipped standard roles in SAP Solution Manager and in all managed systems. This configuration step is highly automated and integrated into SOLMAN_SETUP. The standard role assignment for SAPSUPPORT gives members of the customer’s support team and SAP employees only read access to diagnostic data. The Diagnostics infrastructure of SAP Solution Manager partially includes technology from CA WilyTech, which is called Introscope. Wily Introscope is mainly used for SAP AS Java analysis and is shipped preconfigured with a right to view license, which covers the tool’s main diagnostic functionalities. Introscope uses byte code instrumentation (BCI) technology
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 24 of 28
SAP ® Standard Root Cause Analysis to collect and integrate performance statistics at the code level for Java and .NET components without having to access the source code. Additionally, BMC (AppSight) supplements the toolbox with respect to front-end analysis functionality. A black box recorder is used to record client side user activity. Afterwards, the collected information can be analyzed using the Appsight analysis application or sent to SAP via a customer message. Recording profiles exist for all SAP client applications (e.g. SAP GUI, Internet Explorer, etc.). In addition to adding new SAP technologies, applications and OEMs to this open infrastructure, SAP integrates those independent software providers (ISV’s) for which the c ompany holds a maintenance contract with customers.
5.3 People The E2E Solution Operations Curriculum helps customers to train their specialized teams and stakeholders efficiently pertaining to operations topics. Individuals learn about SAP standards, which describe best practices, the usage of tools, and collaboration between different roles. The E2E Solution Operations Curriculum is structured along different knowledge levels.
5.3.1
E2E Solution Operations – Core Knowledge
Two E2E Solution Operations trainings provide an overview of E2E Solution Operations. Their target groups are teams and stakeholders involved in E2E Solution
Operations. E2E050 – E2E Solution Scope and Documentation – This course explains the solution concept in SAP Solution Manager and outlines the solution documentation process for SAP centric scenarios. E2E040 – Run SAP – End-to-End Solution Operations – The course E2E040 is aimed at customer IT management and project managers as it covers a general Run SAP introduction, detailed explanation of the different standards and a Run SAP adoption and planning compendium. It describes those critical success factors important to E2E Solution Operations.
5.3.2 Technical Core Competence Courses Technical core competence courses explain regular and important system administration tasks in detail. They consist of demos and exercises aimed at Application Management, SAP Technical Operations, Business Process Operations and Custom Development. The course important for the standard Root Cause Analysis is E2E100 - E2E Root Cause Analysis. E2E100 teaches not only the usage of certain SAP preferred tools for root cause analyses (especially Solution Manager Diagnostics and Wily Introscope), but furthermore outlines a best practice top-down analysis path. Prerequisite for this course is expertise in SAP basis administration as well as a basic understanding of SAP Solution Manager. The addressed audience spans from Solution Architects and Application Management to Technical Consultants. The five day training ends with a certification exam on the taught topics. © 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 25 of 28
SAP ® Standard Root Cause Analysis Upon successful completion the participant receives the certificate Application Management Expert – Root Cause Analysis. Main topics covered by E2E100 are E2E Change Analysis E2E Workload Analysis E2E Trace Analysis E2E Exception Analysis Application and data inconsistency analysis
5.3.3 Technical Expert Competence Workshops Technical Expert Competence Workshops are individual onsite customer workshops, which are performed using the customer’s systems and held by a component-specific expert from SAP. During a session, an expert explains basic and specialized tasks regarding Solution Operation of certain SAP systems such as SAP NetWeaver Portal, SAP NetWeaver Process Integration (SAP NetWeaver PI), SAP NetWeaver Business Warehouse (SAP NetWeaver BW), or SAP Customer Relationship Management (SAP CRM). The course is targeted at members of Application Management (e.g. IT Component Expert and Technical System Owner). Workshops can be booked via the Service Market Place link http://service.sap.com/servicecatalog; the service name is System Administration. A detailed description can be found at http://service.sap.com/diagnostics Expert Competence System Administration. Supplementary workshops can be ordered on BMC Appsight and Wily Introscope.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 26 of 28
SAP ® Standard Root Cause Analysis
6 How to Measure the Success of the Implementation In order to measure the success of the implementation of Root Cause Analysis, multiple options exist, which can be used to highlight mid-term improvements. A snapshot should be taken beforehand to obtain an initial status. First of all, SAP EarlyWatch Alerts provide regular and automatic monitoring of predefined KPIs of SAP systems. EWAs allow the evaluation of the current situation in areas of stability, performance, and solution quality. After the Standard Root Cause Analysis has been implemented, SAP Early Watch Alerts might be used for a final evaluation by comparing certain KPIs. The following additional KPIs should be taken into account when measuring the success of an implementation:
Indicator
Target
Corrective action plan for all priority 1 messages
Available within four hours
Messages to SAP are pre-clarified and findings of the customer’s root cause analysis are included in the message text
(Close to) no roundtrips between customer and SAP
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 27 of 28
SAP ® Standard Root Cause Analysis
Copyright 2009 SAP AG. All Rights Reserved All rights reserved. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP Business ByDesign, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects S.A. in the United States and in other countries. Business Objects is an SAP company. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. These materials are subject to change without notice. These materials are provided by SAP AG and its affi liated companies (“SAP Group”) for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. This document is not subject to your license agreement or any other agreement with SAP. SAP has no obligation to pursue any course of business outlined in this document or to develop or release any functionality mentioned in this document. This document and SAP's strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice.
© 2009 SAP AG
Root Cause Analysis Version: 2.0
Page 28 of 28