Capacity Planning Issues - a Dynamic Situation: Case Study Industry-TFT Data Center, Mumbai
SPECIALIZATION: E-Business
Prin. L.N. Welingkar’s Institute of Management, Development & Research Year of Submission: September, 2013
1
ACKNOWLEDGEMENT
I take this opportunity to express my profound gratitude and deep regards to my guide Mr. Dhiman Das for his exemplary guidance, monitoring and constant encouragement throughout the course of this thesis. The blessing, help and guidance given by him time to time shall carry me a long way in the journey of life on which I am about to embark. I also take this opportunity to express a deep sense of gratitude to Mr. N. Utekar, Manager, TFT, for his cordial support, valuable information and guidance, which helped me in completing this task through various stages. I am obliged to staff members of TFT, for the valuable information provided by them in their respective fields. I am grateful for their cooperation during the period of my assignment. Lastly, I thank almighty, my parents and friends for their constant encouragement without which this assignment would not be possible.
Regards
2
APPENDIX – I CERTIFICATE FROM THE GUIDE
This is to certify that the Project work titled Capacity Planning Issues - a dynamic situation TFT DC is a confide work carried out by Aniruddha Deshmukh, DPGD/OC11/0493 a candidate for the /Post Graduate Diploma examination of the Welingkar Institute of Management under my guidance and direction.
NAME: DHIMAN DAS
ORGANIZATION: SCHNEIDER
DESIGNATION: MANAGER
ADDRESS: Address:4th Floor, Electra, Wing 'A', Prestige Tech Park ,Exora Business Park, Marathahalli, Sarjapur Outer Ring Road, Bangalore - 560103
TELEPHONE: : +91 9900417661
3
TABLE OF CONTENTS
Chapter Number
Title
Page Number
Executive Summary
5
1. 2. 3. 4.
Introduction Project Overview Research Methodology Data Analysis
6-10 11-18 19-27 28-35
5. 6. 7.
Research Findings Recommendations & Conclusions Bibliography
36-58 59 60
4
Executive Summary
Capacity management is a critical step between simple server consolidation or virtualization and creating the internal infrastructure-as-a-service cloud that enterprises are currently focused on building. In an internal private cloud the organization pays for everything. Unlike an external public cloud, where capacity is open-ended, the organization has to pay for total capacity -- not just capacity that is being used right now. Virtualization does not create capacity or make capacity less expensive. Server virtualization is an important enabler of internal and external cloud computing, but it alone does not make a cost effective cloud service. Turn data center management outside-in. Cloud computing is associated with delivering IT as a service. Assessing the infrastructure for capacity management and planning starts with the business and ends with a model for total cost to serve and capacity management across service tiers. Use Schneider‘s Capacity Planning Data Collectionto assess infrastructure based on
dependencies, interdependencies, criticality, and business priority. Manage capacity by service tiers for cost efficiency. Not all services require the same capacity. Examine variable capacity costs for each tier to see how savings might be realized without compromising service levels. Take a gas gauge approach to capacity planning. Once pools of reserve capacity are established, future capacity acquisitions are based on service maintenance rather than application addition. Capacity management is a process, not a product. Look to system management and internal cloud management tools with an eye to how they might automate your capacity management practice.
5
Chapter 1
INTRODUCTION
Capacity Cloud & it‘s planning is a relatively new concept but has its roots in many not
so new technologies. One of the main tenets of cloud computing is the sharing of computing resources among a community of users. A successful predecessor of the idea is the Condor project that started in 1988 at the University of Wisconsin-Madison. This project was motivated by the observation that a high percentage of the capacity of user‘s workstations is idle while their users are away of their ffi o ces or doing other tasks such as reading or talking on the phone. These idle cycles can be harvested by the Condor system and made available to users who need more computing power than that available to them at their local workstations. Another technology related to cloud computing is grid computing. The grid is defined as a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities. It became the main computing paradigm for resource-intensive scientific applications and more recently for commercial applications. Fred Douglis points out that although grid computing and cloud computing are closely related, they are indeed truly distinct. Resource allocation issues are crucial to the performance of applications on the grid. See for a description of heuristic techniques for optimal allocation of resources (i.e., computing, network, service providers, and secondary storage) in grid computing. One of the technologies that has enabled cloud computing is virtualization because it allows for easy isolation of applications within the same hardware platform and easy migration for purposes of load balancing. Isolation is important for security concerns and load balancing is important for performance considerations. Service oriented architectures (SOA) and Web services are also an important development for building clouds that provide services as opposed to just computing resources. This paper discusses the concepts of cloud computing as well as its advantages and disadvantages. In order to provide a more concrete example of the benefits of cloud computing, the paper shows results of experiments conducted on PlanetLab, a cloud infrastructure widely used in academia. The paper then discusses how cloud users can optimally select the values of Service Level Agreements (SLAs) to be negotiated with cloud providers in order to maximize their utility subject to cost constraints. A numeric example is thoroughly discussed. The rest of the paper is organized as follows The discusses the dentition of cloud computing as well as its advantages and disadvantages. This briefly describes some examples of cloud computing platforms. The next section discusses the results of experiments carried out with PlanetLab. Upcoming sections
6
discuss capacity planning issues as they apply to cloud computing. Section 6 presents some concluding remarks. The term virtualization refers to the creation ofvirtual machines, virtual networks and
virtual disks (logical images of physical resources) that can execute work and then be returned to a shared resource pool. Cloud computing makes use of virtualization but cloud computing also focuses on allowing computing to be delivered as a service (for instance, software-as-a-service [SaaS]; infrastructure-as-a-service [IaaS]; platform-asaservice [PaaS]; etc.). Building virtual and cloud environments lead to more efficient systems utilization, lower operating costs (especially if service management functions are in place), improved availability, and lower testing/deployment costs.The management of virtualized/cloud environments, however, presents several new challenges for information technology (IT) managers and administrators. Instead of managing physical servers, IT managers and administrators are now being called upon to: 1. Manage both physical and virtual servers (sometimes hundreds or thousands of virtual machines or network/storage devices); 2. Troubleshoot and tune applications in order to meet performance requirements; and, 3. Ensure that there is enough capacity to execute jobs within both physical and virtual server, network, and storage environments; In a report that was generated by IBM last year entitled A Closer Look at IBM System Director VMControl (http://www.clabbyanalytics.com/uploads/VMControlReportFinalFinal.pdf), it is described how IT managers/administrators can use IBM‘sVMControl product offering to manage both physical and virtualized server environments using a common management interface. (This report addresses point 1 above). In a recent report ent itled IBM Tivoli Composite Application Manager: IBM‘sApplication Performance Management Environment found at Clabby Analytics‘ web siteat: http://www.clabbyanalytics.com/uploads/ITCAMFinal.pdf, we described how IT managers/administrators can use IBM‘sComposite Application Manager to troubleshoot and tune applications in virtualized and/or cloud environments. (This report addresses point 2 above). In this report, (which addresses point 3 above), we describe how IT managers/- administrators can determine their current and projected capacity utilization using IBM's Tivoli Monitoring for Virtual Servers. We start with the description of trends in the virtualization and cloud marketplaces — followed by a discussion of how this product is competitively positioned. We then take a closer look at how this product works (features/functions). And we conclude by recommending that IT managers and administrators (particularly those involved in the management of virtual machines) evaluate IBM Tivoli Monitoring for
7
Virtual Servers as a means to simplify cloud & virtual server management, improve cloud performance, and improve alignment to business goals. Market Trends/Competitive Positioning from a systems perspective, we see two types of clouds evolving. The first type is homogeneous cloud architecture based on Intel x86 multi-core servers. The second is a heterogeneous cloud architecture that focuses on running workloads (applications or groups of related applications) on systems that have been designed and optimized to best service those workloads.As we travel around the world we are seeing a lot of activity in the homogenous x86 cloud market space. What we have observed is that many IT buyers start their cloud journey by experimenting with virtualization on their desktops. Once they learn the basics of virtualization, they move into the server space, initially consolidating multiple smaller servers onto larger x86 multi-core servers — and then virtualizing servers in order to increase utilization rates, reduce management costs, reduce software costs and improve infrastructure resiliency. Virtualization, however, presents new challenges for IT managers and administrators because — as mentioned in the Introduction — they now become responsible for managing not only physical systems, but also potentially hundreds or thousands of virtual machines. What we've also noticed is that, in order to effectively manage these virtual machines, IT managers and administrators tend to buy new infrastructure and management tools from the company that provides them with their virtual machine hypervisor (a hypervisor is code that allows operating system images to share underlying processors). The way we see it, customers who purchase infrastructure management tools from hypervisor vendors tend to create ―virtualization silos‖ — and these silos make it difficult to holistically manage critical applications and business services. Within the x86 world, we often see VMware (Windows and Linux) and Windows (Hyper-V) silos within the same organization. Then are separate silos for UNIX and virtual sometimes a separate silothere for mainframe environments. To environments; optimally manage machines, it is necessary to break down the silos such that IT managers/administrators can assign workloads to whatever virtual resources are available. We also are not big believers in the homogenous platform (usually x86-only) approach to virtualization and cloud computing because we believe that no single processor architecture handles all jobs the most optimally. In fact, we can prove that certain heavy I/O workloads perform best on mainframe architecture; that compute-intensive tasks perform best on Power Systems; and that fast threaded applications perform best on x86 architecture. We have devoted an entire site to this concept of ―workload optimization‖ (pleasevisit www.workloadoptimization.com for more details). Pay particular attention to these reports when IT buyers who choose to take a heterogeneous approach to the management of their cloud environments are able to greatly increase their overall computing efficiency by matching the right workloads to the right servers. To do this, however, requires that IT managers and administrators have greater insight into their virtual or cloud environments — preferably using the same console and management environment to manage across heterogeneous server types. There are only a few
8
vendors that make cross-platform physical/virtual systems management tools: most notably IBM and CA Technologies, Schneider, Openview. Capacity Planning: The Role of IT Optimize Monitoring for Virtual Servers
As we pointed out in the introduction, in order to build an efficient virtualized/cloud computing environment, IT managers/administrators must be able to identify resource bottleneck issues and overcome them. Capacity planning tools help IT managers/administrators understand their capacity utilization — and troubleshoot/overcome capacity-related problems. Capacity planning tools should:
performance;
-if analysis;
and find that it needs to add 500 servers over an ensuing 6 month period); and,
usage keep license spend within budget; or ensuring that certain applications are not co-located with other applications; or to ensure that service level agreements [SLAs] are met). Capacity planning tools are vital when it comes to building efficient cloud architectures
A closer look Schneider‘s IT Optimizefor Virtual Servers shows that it has the capabilities described above (see Figure 1).
9
Figure 1: The Role of IT Optimize Monitoring for Virtual Servers in Cloud Architecture
10
CHAPTER 2
PROJECT REVIEW
You cannot afford to assume that the data center has unlimited capacity; this is even more true for the internal cloud.
This research is designed for:
CIOs or IT directors
IT infrastructure / data center managers
Internal utility infrastructure / cloud evangelists
This research will provide you with:
An understanding of why the ―lost art‖ of capacity management is more critical than ever
in consolidated proto-cloud infrastructures.
A process and workbook for cataloging and assessing current capacity in light of the needs of the business.
A process checklist for capacity management with links to relevant additional resources and tools at IT Optimize.
A gas gauge model for capacity planning based on reserve capacity and maintenance of service.
Capacity management ensures IT capacity cost effective ly meets business requirements. A capacity management process will reduce infrastructure waste while providing a framework for future acquisitions planning and accurate cost accounting.
11
Section in Brief: This section will help you: •
Understand why capacity management is a critical activity between consolidation and internal cloud.
•
Put virtualization in its proper place as a tactical enabler, rather than a management strategy.
•
See how capacity management prepares the infrastructure for a cloudy future, and aids in ongoing consolidation and virtualization.
Focus on capacity management to optimize cost effectiveness & service, both now and for an increasingly cloudy future
External public clouds will play a role in the future of corporate IT, but right now most IT departments are focusing on developing the internal cloud. An internal cloud is… … infrastructure-as-a-service (IaaS) delivered from internal IT resources. Consolidation and
virtualization play a role in building an internal cloud, just as they do in external IaaS in a public cloud service (e.g. Amazon Web Services). Capacity management is important because… … virtualization does not create capacity, nor does it automatically make all capacity cost-
measurable and cost-effective. A capacity management strategy will enable a move from infrastructure as asset management to infrastructure as service management. Benefits will include:
The capability to document current capacity.
The ability to plan capacity in advance.
The ability to estimate the impact of new apps and modifications. Cost savings through elimination of over provisioning capacity, and through planned spending rather than reactive spending. Service and spending optimized to match business needs.
12
Focus on the internal cloud before external
12% 12%
Implementing only internal cloud solutions
43%
Focus on the external cloud before internal
33%
Most IT depa rtments engaged in co internal cloud development first.
Implementing only external cloud solutions
nsolidation and vir
tualiz ation are focused on
A third (33 %) will focus only on the internal
cloud.
Intere st in the external cloud remains stro
ng bu t implem entation is in early days.
Most are looking to the external cloud’s role becoming more important -35 years from now
13
It is recommended for capacity management (infrastructure analysis & planning) to optimize service tiers
14
•
Capacity management practices lead to greater success in infrastructure consolidation/virtualization projects.
•
Having developed service tiers in infrastructure was the strongest predictor of overall success in consolidation.
•
A capacity management process, such as inventorying resources annually, was also a predictor of success, especially in managing virtual server sprawl, security assurance, and business continuity.
•
Cost accounting and capacity planning were not predictors of current success. However, as we shall see, efficient capacity planning and cost accounting are not direct inputs, but outcomes of capacity management.
Where this solution set fits: Capacity management is a critical part of the larger picture of building the internal cloud This set is one of a series dedicated to building converged utility infrastructure (see below right). All these sets reference IT layer cake model of consolidation (right) and our three laws of utility/cloud investment (below left).
A management process that starts with business needs, works through capacity optimization, and ends with a plan for tiered service pooling adheres with Info-Tech‘s three laws because it relates capacity management directly with servicing the needs of the business. IT layer cake model for the internal cloud shows how infrastructure layers and virtualization all contribute to service, but an additional element is efficient management of capacity across the layers.
15
IT Layer Cake Model
How do you slice this cake?
Related sets that address aspects of building an internal cloud
16
Build an Optimized Infrastructure-as-a-Service Internal Cloud
Mitigate Costs & Maximize Value with a Consolidated Network Storage Strategy
Evaluate a Backup Architecture Strategy
Build a Server Acquisition Strategy for the Internal Cloud
Craft a Converged Data Center Network Strategy
Select a Consolidated Storage Platform
Compare & contrast the clouds– for the internal cloud, your enterprise pays for everything and shoulders all the risk
17
It hurts to be alone – total ownership of limited capacity imposes an expensive box that can be invisible to the business
Unused capacity costs are ongoing overhead for the internal cloud.
In an internal Infrastructure as a Service cloud, the enterprise pays for all capacity, not just a share of a larger third-party pool. Justifying the IT spend for total capacity is difficult when the business is used to a 1 to 1 relationship between an application and a hardware purchase. Risk mitigation is a significant component of total cost.
In the external cloud, the third party provider is responsible for risk mitigation of the capacity it rents (availability, recoverability, security). In the internal cloud, IT bears this responsibility. Significant cost drivers are the hardware and data redundancy that are needed to mitigate risk.
When the capacity limits are reached, physical infrastructure needs to be acquired ad hoc. •
The public cloud is open ended. The third-party provider maintains a practically unlimited pool of capacity that is available on demand. In the private cloud, capacity is limited.
•
Concern about ―hitting the wall‖ of internal capacity limits leads to over provisioning.
Acquiring more capacity than is needed means wasted spending and maintenance time.
18
CHAPTER 3
PROJECT METHODOLOGIES
Draw a clear line from business need through software & hardware needs transparency is not the same as invisibility
The goal of capacity management is to optimize performance and efficiency of the current infrastructure, to plan for future capacity requirements, and to justify the financial investment in the infrastructure. The classic steps in capacity management are:
Analyze current capacity – find out how apps are currently provisioned and what the performance and availability requirements are for each one.
Optimize the infrastructure to ensure the most efficient use of existing capacity.
Analyze the impact of new or updated apps on capacity.
Analyze demand to model service requirements of the infrastructure and predict future growth in demand.
Develop a capacity plan that relates future growth in capacity to maintenance of service levels.
19
Capacity Management Vs Resource Planning
Rediscover the lost art of capacity management & planning after decades of inefficient distributed processing
The Overview of Capacity:
20
Capacity management in IT matured in the mainframe environment, where resources were costly and it took considerable time to upgrade. Applications needed to be provisioned from a share of the centrally maintained and expensive compute resource. Resource partitions needed to be rigidly cost justified and cost managed because of the high cost of the total capacity. Expanding capacity in this environment was expensive and time consuming.
This is now……
As data centers transitioned to a distributed environment supported by inexpensive UNIX, Linux and Windows servers, a brute force approach to provisioning became the norm. Cheap industry standard servers could be assigned to provision specific new or expanding applications or services. Capacity management and planning skills atrophied in companies accustomed to this ―throw some more hardware at it‖ approach. Unregulated distributing processing bred increased complexity in unregulated server sprawl, and waste in poorly utilized silos of processing and storage.
21
Server virtualization does not equal cloud– the internal cloud is the end of a journey that begins with server CAPEX savings
Server virtualization mitigates waste of distributed servers through better resource utilization and process agility, but virtualization is an enabling tactic, not an infrastructure model.
Organizations typically embark on server virtualization to realize immediate capital savings from reduced server hardware footprint, through consolidation. However, as more of the server infrastructure is virtualized, further benefits – such as improvements in provisioning agility and service availability – begin to emerge. A managed internal cloud is the end of this journey that begins with a simple need to save money on server acquisition. To realize these benefits, management capability of both the underlying capacity as well as the virtualized abstraction layer is critical.
The internal cloud is not a product that will be delivered out of a box. It will be developed over time, enabled by consolidation, standardization, virtualization, and capacity management that focuses on service delivery to the business.
Wrap up consolidation efforts and focus on capacity management for the entire infrastructure
Saving time and money on servers only increases as consolidation progresses. However, other layers of the infrastructure do not see the same success. Similarly, management benefits are mainly in server instances.
22
What this means… •
Server CAPEX reduction is the greatest benefit of consolidation through virtualization.
•
Virtualization does not lead directly to savings in facility, storage, or network costs.
•
Organizations that were more than 50% virtualized generally agreed that all types of management took fewer man-hours due to consolidation.
•
However, increased virtualization had the biggest impact on server management. Organizations that were more virtualized spent significantly fewer man-hours on server instance management.
•
Careful management planning for the entire data center will optimize facility costs, storage costs, network costs, and management complexity.
Virtualization and Cloud Management Using Capacity Planning
23
First, it is important to note that TIT Optimize Monitoring for Virtual Servers has been designed for capacity planning in x86 environments — with specific focus on managing virtual servers across VMware, KVM, Citrix XenServer, and NetApp environments. Leveraging the same tooling and infrastructure, Tivoli provides these capabilities (monitoring & capacity planning) for other platforms such as Hyper-V, Power Systems and System z environments. These tools can be found in related offerings such as, ITCAM for Microsoft Applications. When evaluating Schneider IT Optimize Monitoring for Virtual Servers it becomes readily apparent that this product provides three basic functions. It helps IT manager/administrators: 1. Manage performance and risk; 2. Plan and schedule; and, 3. Optimize operations. As a risk manager, it gathers data from a wide variety of sources to construct a composite picture of virtual server behavior. It alerts the user to current and future performance bottlenecks – to enable IT managers and administrators to take corrective action before end users are impacted. Using risk management facilities, IT managers and administrators can easily find out if their systems‘ resources are being overlo aded — and they can model when their physical resources will reach their limits. Historical data can also reveal whether there have been any significant changes in a given environment — helping managers and administrators troubleshoot and/or tune their servers for optimal performance. From a planning and scheduling perspective, Schneider IT Optimize Monitoring for Virtual Servers can be used to conduct what-if analysis in order to, for instance, help determine how many additional workloads can be added to a given server environment, or to predict how much capacity will be needed to handle future workloads.
Using these facilities, IT managers and administrators can model what would happen if 100 more virtual machines were added to a given environment. Or they can use predictive facilities to model how many more virtual machines can be added to a given environment. Schneider ITO Monitoring for Virtual Servers also offers rich capacity analysis and reporting facilities. These reports enable IT managers and administrators to right-size their virtual machines, examine performance changes, then transfer workload placement. This facility can also be used to identify performance trends, aiding in workload balancing.
24
The product roadmap for this solution shows near term enhancements that will allow the tool to additionally provide benchmarking data for physical servers and hypervisors. It will also provide customers with the option to input or import data for custom tags (for example, associations of VMs to application or environment type), and input information about business and IT policies.
The collected data can then be analyzed and categorized into correlation groups (stable versus unstable workloads or test versus development servers) -— and this information can be used to generate optimization plans.
These optimization plans will provide specific recommendations for placement and simulate benefits of reducing rightsizing workloads and reducing physical resources used, upgrading server technologies and balancing workloads across clusters and datacenters.
Figure illustrates the three basic functions of Schneider ITO Monitoring for Virtual Servers. In addition to robust capacity planning and management capabilities described above, this product also provides performance and availability monitoring for the health of the virtual environment – to include both physical and virtual resources.
Virtual Servers Monitoring- Functions and Activities
25
One of the biggest pain points for IT managers and administrators who have been asked to manage hundreds, if not thousands, of virtual machines is a lack of integrated management tools and utilities. Having to go to one console to perform backup and restore tasks; then having to launch another application to manage virtual machine sprawl; then having to launch another application to manage mobile partitioning places too much operational burden on IT managers and administrators. Products that offer integrated management facilities and automated reporting facilities can greatly simplify the management of physical and virtual server environments.
ITO Monitoring for Virtual Servers provides a highly integrated dashboard view of virtual machine activity within a given cloud environment. The next phase in cloud development is greater automation — a phase that focuses on flexible delivery models and self-service. But this phase is also about automated management — or, more specifically, integrated service management.
Tools and utilities such as ITO Monitoring for Virtual Servers are used in this phase to drive down management costs. After (or along with) automation, enterprises
26
need to find ways to optimize the use of their computing resources. One way to do this is to return unused resources to virtualized server/storage/networking pools where those resources can be found and utilized.
This is a necessary control required for cloud providers as end users do not readily give up the use of their virtual machines. This policy has been implemented in IBM‘s own development and test cloud. Additionally, to make optimal use of these pools it is necessary to understand which workloads should be run on which types of servers. ITO Monitoring for Virtual Servers also assists in helping make this type of determination.
After consolidating, virtualizing, standardizing, automating, and sharing resources — the next logical step is to completely automate the relationship between systems and workloads — enabling workloads to dynamically find the resources they need to
execute.
As we conducted our research in order to write this report, what we found is that ITO has a broad portfolio of products that touch every one of these steps. For enterprises that are looking to move beyond the virtualization phase of their cloud journey — and are looking for tools and utilities that will help monitor virtual machine behaviors within a cloud environment, ITO Monitor for Virtual Machines represents a logical step for managing capacity and controlling virtual machine activities.
27
CHAPTER 4 DATA ANALYSIS
Start capacity managementn o w to optimize current infrastructurean d boost success in ongoing consolidation
2011 the50% yearof that most companiesvirtualized. doing consolidation willalready cross the line tothat having more is than their infrastructure Many have crossed line.
28
What this means…
•
On the journey from tactical server consolidation to internal cloud management, enterprises are at a point where management is going to matter more than infrastructure effectiveness. With a majority of workloads virtualized, virtual infrastructure is increasingly core infrastructure.
•
Enterprises have likely moved beyond the low hanging fruit of server consolidation (such as test, dev, and non-critical servers) to virtualizing more mission critical and resource demanding workloads.
•
However, a significant proportion of the workloads will remain un-virtualized for immediate future. Treating infrastructure as a service management model will need to account for all server workloads.
•
Capacity management correlates with consolidation and virtualization success. In addition to orienting toward IT as a service, capacity management will help deal with an increasingly virtualized consolidated infrastructure.
29
Avoid “virtual server sprawl” & boost success in areas such as business continuity & security with capacity management
Without an idea of the cost and appropriate provisioning of capacity, the benefits from reducing the complexity of physical server management is eradicated by virtual sprawl.
Sprawl is alive and well in our organization. Virtualization has allowed application and business teams to buy additional dev/test/staging environments where they haven't been able to afford them before. They're using the same budgets they had before, they're just buying more servers with them now.
Virtual server sprawl happens when the business loses sight of infrastructure requirements and costs of running a virtual machine. Fast and easy server deployment becomes confused with cheap server deployment.
30
Negative impact of virtual server sprawl includes: •
Wasted capacity. Resource-consuming virtual machines are running that nobody is using or accountable for. Capacity waste is especially seen in storage, where high end SAN space is being eaten by multiple virtual machine instances.
•
Performance degradation.As more virtual machines are added to the system,
the performance of all virtual machines degrades as more workloads contend for the same resources. •
Unplanned capacity additions.As virtual sprawl increases and available
resources decrease, there is demand to add more physical capacity. Having a capacity management plan significantly reduces concerns about virtual sprawl.
This section will help you: e in . Cloud computing is associated with delivering IT as a service. Assessing the infrastructure for capacity management and planning starts with the business and ends with a model for total cost to serve and a capacity management across service tiers.
•
Turn data center management outsid
•
Assess infrastructurebased on dependencies, interdependencies, criticality, and
business priority as an input for capacity planning.
31
Turn infrastructure managementoutside-in – work from business needs through app requirements to total capacity requirements
Think like a service provider rather than an asset manager if you are going to offer infrastructure-as-a-service from a utility infrastructure or internal cloud.
Process Map for a Developing a Capacity Plan
32
The data center is traditionally seen as a room full of assets – servers, networks, and storage arrays, that need to be fed and cared for (with appropriate power, cooling and configuration management).
A capacity management view of the data center starts outside, with the service requirements of the customer, then works through all of infrastructure assets needed to deliver expected service levels.
The total cost of application, storage, server, network, and facilities is the total cost of the service being rendered to the business. This is the total cost to serve or the total cost of all capacity.
Finally, a capacity management strategy looks at how total cost of capacity can be mitigated. The key question is how much capacity is good enough to maintain service now and in the immediate future.
Understand that hardwareis capacity - customer service drives the process but capacity management is about hardware
Some say that IT hardware doesn’t matter in a cloud. However, while hardware doesn’t matter to the customer as much as the software and service, hardware is a key concern for the service provider.
33
The value of IT to the business comes from how apps and data serve business needs. The business will have priorities as to which apps and data are more or less valuable based on the relative criticality of the business processes they support. IT infrastructure is seen by the business as the capacity to run the apps and store the data. The unit cost of this capacity includes cost per unit of processing or storage but also the additional cost mitigating risk (e.g. ensuring uptime and security requirements). Capacity based on standardized hardware components is not a competitive differentiator. However, automated tools for efficiently allocating capacity to apps, monitoring capacity utilization, and tracking total costs can make one internal cloud more efficient and less expensive than another.
34
Determine service level requirements based on business need
Optimal performance requirements + criticality to the business + future growth potential = total service requirements.
35
CHAPTER 5 RESEARCH FINDINGS
Calculate the total cost of service by accounting for requirements at each layer of the physical infrastructure
•
Total capacity requirements are what is needed to meet current performance, future need, and availability/recovery goals of all applications and services.
36
•
Future need is covered by standby capacity ready above what is currently being used by the system.
•
Availability/recovery is typically enabled through redundancy. This redundancy can be:
•
Component redundancy (dual power supplies, dual NICs)
•
Full resource redundancy (redundant servers, storage arrays, switches, power supplies, UPS, cooling)
•
Data redundancy (data snapshots, mirrors, backup copies)
•
If demand exceeds capacity available for planned growth and/or does not leave enough redundant capacity, service levels will be compromised.
•
Either additional physical capacity will need to be added or another workload will need to be removed from the pool of available capacity.
Seek balance in provisioning– Service is a function of adequate capacity for operation, growth, and redundancy
37
Case study: City has fully redundant capacity for availability and stays “a server ahead” forfuture need
38
The Situation •
Municipal services for a mid-sized northern U.S. city. They have been replacing traditional servers with virtualized infrastructure; they are nearly fully virtualized now.
•
Challenge is to manage adequate capacity to maintain service levels (availability and performance) now and as need ramp up. Actions
•
In communicating service to the organization a ―server‖ is still the unit of measure, as it is in a distributed hardware environment; however, a ―server‖ is now not a physical
entity but a package of capacity, which is derived from physical infrastructure. •
The cost of a ―server‖ (unit of capacity) shown the organization includes the physical
resources used by the virtual machine plus the cost of redundant physical resources (see right). This overhead is the cost of guaranteeing maximum availability. •
The organization seeks to stay a physical server ahead of current capacity requirements to facilitate growth in capacity requirements .
Example: How much a “server” cost in a virtual infrastructure environment
Usable Capacity
Failover Capacity
Let‘s say the total physical infrastructure costs
$100 000, and can support a maximum of 100 virtual servers ($1,000 per VM). But for
39
redundancy, half of the capacity is reserved. The number of virtual servers actually available for provisioning is 50, at a cost of $2000 per server. Because the physical redundancy guarantees higher availability in a clustered virtual environment the $2000 server cost includes higher service features like guaranteed uptime. When demand reaches 50, additional virtual servers can be added without new investment, but the reduction in failover capacity will cause performance and criticality to suffer.
Establish a systems management team to gather baseline information on current capacity and to develop a capacity plan
Capacity assessment requires the combined input of the professionals who manage each layer of the infrastructure , because all layers of the infrastructure contribute to service levels. Consolidation of infrastructure is not just consolidation of physical boxes and data, but also consolidation of skills and personnel. Establish a systems management team with representation from all the individual technology silos to work together. Train them in advance to use any new tools specific to a consolidated environment. Head the team with a sponsor who has control over every aspect of IT, and who has influence and the HR skills necessary to manage a diverse team.
40
We‘re transitioning the staff to a different methodology that‘s about planning for strategic
growth. I sent my staff for training 30 days before we got the first new server in, because the challenges and complications with new tools are pretty huge. But once [the team] has gotten there, the world is wonderful for us. The server guys are saying ‗this is great stuff!‘ because they‘re able to very quickly meet demand –increasing size, upgrading an app – whatever it is, they can meet those demands a lot more efficiently.
Inventory apps to bring order to capacity coordination chaos
Capacity management can benefit the enterprise regardless of where you are on the consolidation/virtualization curve. Use InfoTech’s comprehensive discovery tool to collect data on current allocation of capacity to apps and group apps/capacity by criticality.
An app inventory that provides a clear depiction of the current environment should:
Document how each app connects to other apps.
Demonstrate dependencies of apps on infrastructure components.
Describe criticality of each app and how much down time each can afford.
Provide a starting point for analysis and planning for appropriate current and future capacity.
41
Identify dependencies to assess total capacity requirements
42
Look to system management and cess, not a product. internal cloud management tools with an eye to how they might automate your capacity management practice. Capa city m anage ment is a pro
43
Using your capacity assessment workbook as a starting point, follow the best practice steps of developing a capacity plan
44
Plan the plan: use a Busin ess Pla n & Process Checklist process and track results
to get buy-in for the
The Internal Cloud Business Plan will help build a business plan for the enterprise as well as document business justifications for any additional projects that are connected to implementations, such as virtualization, shared storage, and network convergence. The goal is to get all the pieces in place for an overall strategy. The resulting document is therefore intended for initial project scoping and for future reuse, as more consolidation strategies are defined.
Use the Capacity Management Process to track your organization‘s progress in developing your internal cloud. Additional activities and checkpoints can be added to the checklist, and others removed, to customize it to your situation Analyze current capacity: compare current provisioning to application & business need
Align your catalog of apps and dependencies with business expectations of performance and criticality.
45
Optimize the infrastructure: plan to create service tiers to optimize your capacity investment
Resist the temptation to treat infrastructure as one-size-fits-all. It has been found that the practice of tiering capacity by service levels significantly impacts consolidation success.
46
In assessing current capacity, you have seen that not all apps have the same business criticality and performance requirements. In planning infrastructure, look to tiering services by groupings of capacity requirements.
47
Hardware is capacity. Through th
Service is a function o
e systems planning team look for oppo
f perform ance and redundancy. rtunities for service tie
ring at
every l evel.
Start with consolidated storage: For many service tiers are synonymous with storage tiers
Storage can be the most expensive part of a consolidated infrastructure, but it need not be treated as a single monolithic entity. For storage service tiering, look to matching the fastest (and most expensive) disk with the most critical
48
processes and data. Variable redundancy – disk, data, and device (including backup) – also defines a service tier. Storage virtualization can also boost utilization/lower costs across tiers. In networks variable bandwidth, port and switch redundancy, impact classes of service
One way variable storage tiers have been be architected is to have tier one storage use faster Fibre Channel ports and switches while a secondary tier uses Ethernet and iSCSI for storage traffic. Converged networking in 10 gigabit Ethernet holds promise reducing network improving performance ofthe both serversofand storage throughcomplexity better I/O while and I/O management. In converged I/O variable service becomes a matter of policy rather than hardware.
In servers look at on board redundancy and processing architecture The server is the base ―unit of capacity‖ in a consolidated infrastructure but server pricing can vary depending on the class of processor, number of processors, and other on board redundancy such as dual power supplies. Form factor advance such as blades also increase density and reduce footprint. Calculate the impact of tiering on power and cooling and examine redundancy needs within the facilities
Facilities are 40% of total cost of the infrastructure. Efficiencies in all the the above layers will have an impact on the load requirements of the data center. Also look for opportunities to vary facilities redundancy for each service tier (see case study below). The Solution Set Renovate the Data Centerhas significant value even if you are not currently renovating. The set has detailed tools for capturing and optimizing facilities costs including the Power Requirements Calculator and the Standby Power Supply Calculator.
49
Use of Consolidation for a big picture comparison of total costs for each infrastructure layer
Detailed TCO analysis is best left to strategies for each infrastructure layer. However this tool can provide a big picture snapshot of cost comparison across infrastructure layers. Exploring opportunities to tier services in infrastructure layers will yield total cost savings opportunities. In the following case, for example, a mid-sized professional data services firm estimated potential savings of more than $20,000 per year difference from facilities service tiering alone. Several of the Solution Sets for planning individual infrastructure layers (storage, network, network, facilities) have detailed TCO comparison calculators. For a big picture at-a-glance comparison across layers use the Infrastructure TCO Comparison Tool. Using examples and data from case studies, this tool was developed to illustrates the most common TCO comparisons:
TCO of the existing infrastructure vs. TCO of your proposed project. TCO of multiple proposed projects (e.g. build a new facility vs. co-location).
50
Case study: Application of server tiers produces potential facility & TCO savings for this mid-sized organization
A data services company was planning a renovation of their 100 square foot data center. They explored the idea of tiering their facilities according to criticality, and calculated the cost savings of $22,827 per year in doing so.
These savings consider facilities costs alone. Service tiering can achieve even more savings in areas such as server CAPEX, network costs, and reducing the time needed to manage physical infrastructure.
51
Analyze the impact of new or updated apps. Pursue a policy of virtualization first for agile provisioning
Virtualize unless otherwise. Virtualization is a tactic for enabling more efficient and agile provisioning. All new or updated workloads should be evaluated for virtual hosting.
A gold, silver, or bronze service tier represents a baseline - what is good enough to provision a given workload in line with its performance and criticality requirements. At the server level a service tiers can include both native (non-virtual) servers and clusters of servers that have been partitioned for virtualization. Taking a ―virtualize unless otherwise‖ approach, new and updated apps should be assessed for hosting on the virtualized tier. Updates can include needs for new levels of performance and capacity. Legacy apps on end-of-life hardware should also be evaluated for migration to the virtual tier. In order to assess the impact of new workloads on capacity, careful assessment of requirements is needed. Use the Application Assessment Checklist (modified from Appleton Ideas) as a template for developing your own.
A trigger for virtualizing core production workloads in several companies has been the realization that performance and availability (service) for secondary workloads in their virtual server environment was better than what for primary workloads in a non-virtual environment
52
Analyze demand to model service requirements; identify trends to forecast future business and new workloads
With current capacity under control, begin looking to the future of the business, and how growth will change the capacity needed to fuel the required workloads.
Forecast business activity – growth in the business will mean more transactional processing. If growth translates into more staff, it may also translate into more users of applications. Include increased demand in the analysis of requirements for new and updated applications. Monitor and analyze capacity requirements over time. From a capacity standpoint, we hit a wall of CPU saturation before we realized where the practical limit was. We learned, with a bit of pain, to use software to model a trend line telling us ‗you‘re going to hit a wall at this time next year unless you add capacity.
53
Develop a capacity plan: Use a reserve capacity model for management & planning
• • •
•
• • •
The capacity reservation model tiers capacity according to agility, reliability, control, and cost. The idea of ―reservation‖ reintroduces the importance of justification for capacity usage. Capacity is not open ended but reserved for certain kinds of workloads. Reserve capacity enables business units to order IT services as they would from a managed service provider (including an external infrastructure as a service cloud). But IT can also show the its entire capability in terms of units (server instances) that can be supported at each level (see the case study on slide 22 for an example). Adding a workload to a capacity tier counts against available capacity – a limited resource. Accommodating the addition may require spending to increase capacity or removal/retirement of another workload to free-up capacity. Each time a unit of capacity from one of the three tiers is provisioned out to the business it is removed from the pool of available capacity. The remaining capacity can be monitored as a ―gas gauge‖ or planning point for bringing additional capacity online. The gas gauge approach avoids ad hoc hardware purchases and avoids overprovisioning–and over-spending— as capacity is brought online at each level commensurate with projected need.
54
Capacity management is a process, not a tool or a product. Look to tools to help automate tasks but have a process in place first
•
•
•
•
For many (37%) the tools for managing a capacity planning process include a pad and pen, a white board, and a spreadsheet. These are a perfectly legitimate tool set for working with your systems team, recording usage information, and planning the future. A plurality of respondents (47%) use a combination of point tools such as, for example, vendor specific storage management tools combined with virtual infrastructure management such as VMware vCenter. These provide visibility into the system from application through virtual and physical infrastructure as well as dynamic provisioning capabilities. Another tool set helpful for modeling and monitoring the impact of new and updated workloads on the virtual environment (as outlined in planning step 3) include capacity planning and monitoring tools from CiRBA, VMware, Microsoft, PlateSpin, and vKernal. Only 5% use comprehensive software that automates management across the entire consolidated infrastructure. These tended to be organizations with the largest proportion of virtualized infrastructure.
Comp. Software 5%
None 10%
Manual 37%
55
Software 52%
Some Software 47%
However, organizations that did use comprehensive software were the most successful in their consolidation efforts.
Dependent variable is average success percentile
56
Case study: This manufacturer has deployed tiered services & capacity monitoring as it closes on a goal of 99% virtualization
The client: Manufacturer of specialty paper products. They have been virtualizing infrastructure for over five years, and now over 96% of their servers are virtual. The plan is to have 99.5% of the infrastructure virtualized as soon as possible, only avoiding virtualization when hardware limitations absolutely prevent it.
Good news: the advice laid out in this Solution Set works as well in practice as it does in theory. With a solid capacity management plan in place, the organization reports success in realizing benefits and almost no pitfalls in their comprehensive consolidation efforts.
57
Prepare for a future of hybrid clouds & cloud bursting
The external cloud will continue to develop and mature as the enterprise focuses on internal cloud development. Look for future management solutions to span internal and external clouds This Solution Set has focused on internal cloud capacity management, because for most (76%), internal cloud development comes first. However, opportunities in the external public cloud will continue to develop and mature over the next three to five years. Opportunities include:
In a cloud bursting scenario, available and appropriately redundant capacity is maintained in a public cloud for spikes in need for capacity from internally hosted applications.
58
CHAPTER 6 RECOMMENDATIONS & CONCLUSION
Capacity management is a critical process
In an internal cloud, the organization bears the full burden of all capacity – used or unused.
Virtualization does not create capacity. Its benefits can only be fully realized with careful capacity management. Begin capacity management now to prepare for an increasingly cloudy future.
Start with business needs
•
Turn data center management outside-in; think like a service provider delivering IaaS to the business.
•
Determine total service requirements as the total of performance, criticality, and growth.
•
Gather a team and document apps and infrastructure to prepare for advanced capacity management.
Follow the five steps towards developing a capacity plan
•
Analyze current capacity, optimize the infrastructure, analyze impact, determine demand, then develop a plan that takes future growth into account.
•
Think of capacity as a gas gauge, and divide it into tiers for optimum success.
•
Consider automation tools, but make sure there is a process in place for automation to have benefit.
59
CHAPTER 7 BIBLIOGRAPHY
Schneider Electric India IT Business Pvt. Ltd.
www.schneider-electric.com/.../white-papers/white-papers-data-centres
American Power Conversion
http://www.apcmedia.com/salestools/PDON-8PGK8J
http://en.wikipedia.org/wiki/Electronic_commerce
VMWare Inc.- Cloud Computing
60
http://www.vmware.com/cloud-computing.html
http://www.vmware.com/in/products/vcloud-suite/
VMWare White Papers, Virtualization
IBM Inc. White Papers-Virtualization & Cloud Computing
61