ITIL SERVICE OPERATION
Technical Management
Objectives Help plan, implement and maintain stable technical infrastructure to support business processes Well designed, resilient and cost effective topology Keep infrastructure in optimum condition Diagnose and resolve technical failures
Benedito, Christian, Kara, Abrahams, Peters, Smith, Nombewu Purpose Undertake activities and processes to manage and deliver services at the levels agreed with business users and customers. The ongoing management of the technology that is used to deliver and support services
Scope
Value
Service Operation covers all areas of service delivery, including: services (internal, external and customer/user) service management processes (see next slide) technology
Effective Service Operation processes and functions help organizations to: reduce the impact and frequency of outages provide access to standard service
Objectives Deliver the service as agreed on in the SLA Reduce both the number and impact of outages Controlling access to IT services
To fulfil a request will vary depending upon exactly what is being requested. Note that ultimately it will be up to each organization to decide and document which service request it will handle through the request fulfilment process
Types of communication: Routine operational communication Between shifts, Projects Performance reporting Communication related to change, exception & emergency Training on new or customized processes and service designs Communication of strategy, design, and transition to service operation teams
take place as a result of an incident report help prevent the incident from recurring or provide a workaround if avoidance is impossible
Technical Management is treated in ITIL as a "function". It plays an important role in the management of the IT infrastructure.
STEP 9
Pro-active Problem Management
analyzes incident records to identify underlying causes of incidents analysis of previous incidents reveals a trend or pattern that was not apparent when each incident occurred
Raise known error record – Problem with a documented cause and workaround stored in KEDB
STEP 10
Super user
IT Operation Analysis
Recruited from business to take on some IT responsibilities Facilitate communication between IT and business Reinforce user expectations about agreed service levels Training for users in their area Support for minor incidents Involved with new releases and roll outs
IT Operator
Scope:
ACCESS MANAGEMENT
Access management is effectively the execution of the policies in information security. AM gives rights to use a service but also makes sure its available at agreed times. AM is a process that is executed by all technical and application management functions, usually not separate function.
ALERT
Purpose
Manage events through their lifecycle Event management is the basis for operational monitoring and control
EVENT MANAGEMENT
and actions. Efficiently respond to requests for granting access to services. Oversee access to services and ensure rights being provided are not improperly used.
Objectives: Improved customer service, perception and satisfaction Increased accessibility through a single point of contact Better quality and faster turnaround Improved teamwork and communication Enhanced focus and proactive approach to service provision Reduced negative business impact Better management infrastructure and control Improved usage of IT support resources More meaningful management
Objectives
Role: Logging all relevant incidents Providing first line investigation and diagnosis Resolving incidents at first contact Escalating incidents that cannot be resolved within A service desk is a functional unit made up of a dedicated number of staff responsible for dealing with a variety of service activities, usually made via Telephone calls, web interface or automatically reported infrastructure events.
agreed timescale Keeping users informed of progress Closing all resolved incidents Conducting customers surveys Communicating with users
Detect changes of state for the management of a CI and IT service Determine control action for events and ensure these are communicated to the appropriate functions Provide the trigger to execute many service operation processes and operation management activities Comparing actual operating performance and behavior against design standards and SLAs Provide a basis for service assurance, reporting and improvement
Scope
Follow the sun
24hour coverage
low costs
Virtual Service Desk
Use technology and tools to give impression of single service desk.
Maintenance of the status of day to day processes and activities
PRINCIPLES AND BASIC CONCEPTS Information Technology Operations Control consists of: Print & Output
Backup & Restore
Job Schedueling
Problem: The unknown cause of one or more incidents To manage the lifecycle of all problems from first identification. Seeks to minimize the adverse impact of incidents Objectives
Scope:
STEP 6 Categorise Problem – Record service/ component affecte
STEP 4
STEP 5 Prioritize Problem – ID importance of incident based on impact and urgency
Centralised Service Desk
Local Service desk
Different time zones
specialised groups of users
VIP status of users
Higher volume of calls
Higher skill levels
Problem models Incidents vs problems Reactive and proactive problem management: Reactive: process activities are triggered in reaction to an incident that has taken place Proactive: process activities are triggered by activities seeking to improve services
Prevent problems and resulting incidents from happening Eliminate recurring incidents Minimize the impact of incidents that cannot be prevented
Includes the activities required to diagnose the root cause of incidents. Will also maintain information about problems and the appropriate workarounds and resolutions
Process activities, methods and techniques. Problem detection Suspicion or detection of a cau7se of one or more incidents by the service desk. Analysis of incident Notification of supplier or controller
Management staff show be held partially accountable for contribution to the technical architecture and manageability design of applications
A single change management process for both groups
Focus on integrating functionality and manageability requirements
Problem prioritization.
Can system be recovered? How much will it cost? How long will it take to fix the problem?
Known error is defined as a problem with a documented root cause and workaround. Known error record should identify the problem record it relates to and document the status of actions being taken to resolve the problem.
Workarounds
Problem Logging User details Service details Equipment details Date/time initially logged Incident description
often the number of users being affected
Incident:
Impact – Indication of impact is
An unplanned interruption to an IT service or reduction in the quality of an IT service
Objectives: To support the organizations business process. These objectives are achieved through: Applications that are well designed, resilient and costeffective The required functionality is available to achieve the required business outcome Organization of adequate technical skills
In some cases may be possible to find a workaround to the incidents. When workaround is found, it is important that the problem record remain open. In some cases may be multiple workarounds.
Scope:
A clear mapping of development and management activities throughout the lifecycle
APPLICATION MANAGEMENT
Urgency – Refers to how quickly the business needs a resolution to an incident
Resolved – Resolution has been placed for incident but normal state service operation has not yet been validated Closed – User or business has agreed that incident has been resolved
Development teams show be held partially accountable for design flaws that create operational outages
Role: Custodian of technical knowledge and expertise Provides the actual resources to support the service lifecycle. Providing guidance to IT operations on how to carry out the ongoing operational management of applications.The integration of the application management lifecycle.
Raising a known error record.
In Progress – Incident in progress of being investigated
Major incidents are separate procedures, with shorter timescales and greater urgency. Definition of what constitutes a major incident must be agreed and ideally mapped onto overall incident prioritization scheme.
App Development vs Management A single interface to the business for all stages of th e business lifecycle, common requirements and specific-setting process
Priority – To agree and allocate an appropriate prioritization code to an incident, this will determine how the incident is handled both by support tools and support staff
Incident Tracking: Incidents should be tracked throughout their lifecycle to support proper handling and reporting on the status of incidents. Open – Incident recognized but not yet assigned to a support resource
PROBLEM MANAGEMENT
Purpose:
STEP 3
Principles and Basic Concepts of Incident Management
Swift application of operational skills to diagnose and resolve any IT failures that occur.
Console Management
Workarounds – Temporary way of overcoming difficulty
Problem Investigation and Diagnosis – Diagnose root cause
Configuration Items (CI) Some are included because they need to stay in a constant state Some are included because their status needs to change frequently Environmental conditions Software license monitoring Security Normal Activity
Purformance
Log Problem – Raise record with details of problem
Organizational Structure/Types of service desks.
Regular scrutiny and improvements to achieve improved service at reduces costs
Maintainence
STEP 7
Purpose is to allow storage of previous knowledge of incidents and problems. Known error record should hold exact details. Essential that any data put into the database can be quickly and accurately recovered. Care should be taken to avoid duplication of records.
Any change of state that has significance for the management of a configuration item (CI) or IT service Events are recognized by notifications through IT service, CI or monitoring tool
Define & Explain Event
Propose Purpose of access management is to provide the right to be able to use a service or group of services.
Objectives: Manage access to services based on policies
IT OPERATIONS MANAGEMENT
Known-Error Database a notification that a threshold has been reached, something has changed, or a failure has occurred a means of acquiring human intervention often created and managed by system management tools
STEP 2
Major Problem Review – Reflect on major problems as part of training for support staff or proactive problem management
Service Desk Services IT Operations Manager
Detect Problem – Reactive or proactive detection (triggers in Notes)
Problem Closure – Check that all events are recorded
Exception: A notification that a service or component is operating abnormally. Action is usually required E.g. a router failing Handles incidents, resolving as many as possible, where the resolution is straightforward Owns incidents that are escalated to other support groups for resolution Reports problems to the problem management staff members Handles service requests Provides information to users Communicates with the business about major incidents, upcoming changes, and so on Manages requests for change on the user’s behalf if required Manages the performance of third-party maintenance providers Monitors incidents and service requests against the targets in the SLA Updates the CMS as required Gathers availability figures, based on incident data
STEP 1
STEP 8
Technical Management activities embedded in other processes are shown there, with responsibility assigned to the Technical Analyst role.
Shift Leader
PROBLEM MANAGEMENT PROCESS FLOW
Resolution and Recovery – Cause removed and service restored
Reactive Problem Management
Event Types
Management Positions
Objectives
Request fulfilment is thedsd process responsible for managing the lifecycle of all service requests from the users.
Many Technical Management activities are embedded in various ITIL processes but not all Technical Management activities. For this reason, at IT Process Maps we decided to introduce a Technical Management process as part of the ITIL Process Map which contains the Technical Management activities not covered in any other ITIL process.
Informational: signifies something expected and normal has happened, and which does not require any action E.g. scheduled backup has completed normally
Warning: A notification that a pre-defined threshold has been reached. Action may or may not be required E.g. 5% hard disk capacity available
Maintain user and customer satisfaction Source and deliver the components of requested standard services Assist with general information, complaints or comments
REQUEST FULFILMENT
Role of Communication in Service Operation
Formal request form a user for something to be provided E.g. Password changes, access to printers, PC moves
Scope
Roles Technical manager/team leader leadership, control and decision making for the team providing technical knowledge and leadership ensuring training, awareness and experience levels maintained performing line management reporting to senior management on technical issues as required Technical analyst/architect Determine evolving needs of users, sponsors, stakeholders Establish system requirements defining and maintaining knowledge about systems dependencies performing cost benefit analyses developing operational models that will optimize resource utilization and maximize performance configuring the infrastructure to deliver consistent and reliable performance deliver defining all the tasks required to manage the infrastructure
Role: All communication must have an intended purpose or a resultant action. Any means of communication can be used as long as stakeholders understand when and where communication will take place.
Problem Categorization
Problems should be
categorized same way as incidents. True nature of the problem must be easily traced
Techniques of Incident Management Functional Escalation Management Escalation Hierarchic Escalation
Methods of Incident Management Incident Identification – Work can only begin when it is known that an incident had occurred Incident Logging – All relevant information of incident must be logged and date/ time stamped Incident Categorization – Must be allocated with an incident categorization coding so exact type of incident is recorded Incident Prioritization – Allocate an appropriate prioritization code to determine how the incident is handled Incident Closure – Service desk to check if incident is resolved and that users are satisfied
Purpose of Incident Management is to restore normal service operation as quickly as possible and minimize the adverse impact on a business operations, thus ensuring agreed levels of service quality are maintained.
Incident Models:
Activities of incident management
INCIDENT MANAGEMENT
Purpose:
An incident model is a way of predefining the steps that should be taken to handle a process in an agreed way. Steps that should be taken to handle incident Chronological order these steps should be taken in Responsibilities Precautions to be taken
Incident Management includes any event which disrupts, or could disrupt a service. This includes events which are communicated directly by users, wither through service desk or through an interface from event management to incident management tools.
Interfaces:
Objectives:
Ensure that standardized methods and procedures are used Increase visibility and communication of incidents to business and IT support staff Enhance business perception of IT through use of professional approach in resolving and communicating incidents Align incident management activities and priorities with those of the business Maintain user satisfaction with quality of IT services
Service Design Service level management – Input for SLA Information security management – Security related incidents Capacity management – Trigger for performance monitoring Availability management – Availability of IT services Service Transition Service Asset and Configuration Management – ID faulty equipment Change Management – Workaround need a RFC Service Operation Problem Management – Investigate and resolve underlying cause Access Management – Unauthorized access attempts