Wednesday, March 28, 2012

Capacity Planning


Overview
Capacity planning is the process of planning for growth and forecasting peak usage periods in order to meet system and application capacity requirements. It involves extensive performance testing to establish the application's resource utilization and transaction throughput under load. First, you measure the number of visitors the site currently receives and how much demand each user places on the server, and then you calculate the computing resources (CPU, RAM, disk space, and network bandwidth) that are necessary to support current and future usage levels. This How To describes two methodologies for capacity planning:

  1.Transaction cost analysis. Transaction cost analysis calculates the cost of the most important user
          operations of an application in terms of a limiting resource. The resource can be CPU, memory, disk,
           or network. You can then identify how many simultaneous users can be supported by your hardware
           configuration or which resource needs to be upgraded to support an increasing number of users and
            by how much.
  2. Predictive analysis. Predictive analysis forecasts the future resource utilization of your application
            based on past performance. To perform predictive analysis, you must have historical data available
           for analysis.

Note:   The sample application referred to in this How To is not an actual application, and the data used is not based on any actual test results. They are used only to illustrate the concepts in the discussion.


Transaction Cost Analysis
The process of using transaction cost analysis for capacity planning consists of the following steps:

1.Compile a user profile: Compiling a user profile means understanding your business volumes and usage
                                       patterns. Generally, you obtain usage information by analyzing log files.

2.Execute discrete tests: Execute tests on specific user operations based on the profiles created in the
                                      previous step.

3.Measure the cost of each operation:Using the performance data captured in the previous step, calculate
                                                               the cost of each user operation.
4.Calculate the cost of an average user profile: Calculate the cost of an average user profile by assuming
                                                                        fixed period of activity for an average user (for example, 10
                                                                        minutes).
5.Calculate site capacity: Based on the cost of each user profile, calculate the maximum number of users
                                           supported by the site.

6.Verify site capacity:Verify site capacity by running a script that reflects the user profile with an
                                          increasing number of users and then comparing the results against those obtained
                                          in previous steps.

The next sections describe each of these steps.
Step 1: Compile a User Profile
Compile a user profile from the existing production traffic data. The main resource for identifying user operations is the Internet Information Services (IIS) log files. The components extracted from usage profiles are as follows: 
A list of user profiles.
The average duration of a user session.
The total number of operations performed during the session.
The frequency with which users perform each operation during the session.
To compile a user profile 
1. Identify the number of user requests for each page and the respective percentages. 
The number of user requests for each page can be extracted from the log files. Divide the number of requests for each page by the total number of requests to get the percentage. 
Table 1:  illustrates a sample profile. 

Table 1: User Requests per Page 
---------------------------------------------------------------------
ID URI                                 Number of requests   Percentages
---------------------------------------------------------------------
1 /MyApp/login.aspx                      18,234                35%
2 /MyApp/home.aspx                     10,756                20%
3 /MyApp/logout.aspx                     9,993                 19%
4 /MyApp/SellStock.aspx                4,200                  8%
5 /MyApp/BuyStock.aspx               9,423                  18%
----------------------------------------------------------------------
Total                                                52,606                100%
----------------------------------------------------------------------
2. Identify the logical operations and number of requests required to complete the operation. 
A user operation can be thought of as a single complete logical operation that can consist of more than one request. For example, the login operation might require three pages and two requests. The total number of operations performed in a given time frame can be calculated by using the following formula: 
Number of operations = Number of requests / Number of requests per operation 
The Requests per operation column in Table 2 shows how many times the page was requested for a single operation. 
Table 2: User Requests per Operation 
-------------------------------------------------------------------------------------------------------
ID URI                           Number of requests Requests per operation Number of operations
-------------------------------------------------------------------------------------------------------
1 /MyApp/login.aspx                18,234                       2                                 9,117
2 /MyApp/logout.aspx                 9,993                      1                                  9,993
3 /MyApp/SellStock.aspx         4,200                         2                                  2,100
4 /MyApp/BuyStock.aspx         9,423                        3                                  3,141
------------------------------------------------------------------------------------------------------
Total             n/a                      41,850                     8                                 24,351
------------------------------------------------------------------------------------------------------

3.Identify the average user profile, session length, and operations per session. You can analyze the IIS log files to calculate the average user session length and the number of operations an average user performs during the session. The session length for the sample application was calculated as 10 minutes from the IIS logs, and the average user profile for the sample application is shown in Table 3. 
Table 3: Average User Profile 
----------------------------------------------------------------------------------
Operation                        Number of operations executed
                                         during an average session
-----------------------------------------------------------------------------------
Login                                                1
SellStock                                          3
BuyStock                                          2
Logout                                              1
---------------------------------------------------------------------------------
For more information about identifying user profiles, see "Workload Modeling" in Chapter 16, "Testing .NET Application Performance." 

Step 2:  Execute Discrete Tests
Run discrete tests for each user operation identified in Step 1 for a load at which your system reaches maximum throughput. For example, you need to run separate tests for Login, BuyStock, and SellStock operations. The test script only fires the requests for a dedicated user operation.
The procedure for executing the tests consists of the following tasks: 

•Set up the environment with the minimum number of servers possible. Make sure that the architecture of 
   your test setup mirrors your production environment as closely as possible.
•Create a test script that loads only the operation in consideration without firing any redundant requests.
•Define the point at which your system reaches maximum throughput for the user profile. You can identify 
     this point by monitoring the ASP.NET Applications\ Requests/Sec counter for an ASP.NET application 
     when increasing the load on the system. Identify the point at which Requests/Sec reaches a maximum 
      value.
•Identify the limiting resource against which the cost needs to be calculated for a given operation. List the 
       performance counters you need to monitor to identify the costs. For example, if you need to identify the 
       cost of CPU as a resource for any operation, you need to monitor the counters listed in Table 4. 
----------------------------------------------------------------------------
Object                             Counter                            Instance
---------------------------------------------------------------------------
Processor                       % Processor Time          _Total
ASP.NET Applications    Requests/Sec                 Your virtual directory
-----------------------------------------------------------------------------
Note   Requests/Sec will be used to calculate the processor cost per request.

•Run load tests for a duration that stabilizes the throughput of the application. The duration can be 
         somewhere between 15 to 30 minutes. Stabilizing the throughput helps create a valid, equal 
          distribution of the resources over a range of requests.

Output
The output from executing this series of steps for each scenario would be a report like the following:
Number of CPUs = 2
CPU speed = 1.3 GHz
Table 5 shows a sample report for the results of the load tests.

Table 5: Load Test Results
----------------------------------------------------------------------------------------------------
User operation                Process\% Processor Time ASP.NET Applications\Requests/Sec
----------------------------------------------------------------------------------------------------
Login                                      90%                                                       441
SellStock                                78%                                                       241
BuyStock                                83%                                                      329
Logout                                     87%                                                      510
---------------------------------------------------------------------------------------------------
Step 3: Measure the Cost of Each Operation
Measure the cost of each operation in terms of the limiting resource identified in Step 2. Measuring the operation cost involves calculating the cost per request and then calculating the cost per operation. Use the following formulas for these tasks: 


Cost per request: You can calculate the cost in terms of processor cycles required for processing a request by using the following formula: 
Cost (Mcycles/request) = ((number of processors x processor speed) x processor use) / number of requests per second 
For example, using the values identified for the performance counters in Step 2, where processor speed is 1.3 GHz or 1300 Mcycles/sec, processor usage is 90 percent, and Requests/Sec is 441, you can calculate the page cost as: 
((2 x 1,300 Mcycles/sec) x 0.90) / (441 Requests/Sec) = 5.30 Mcycles/request 
•Cost per operation: You can calculate the cost for each operation by using the following formula: 
Cost per operation = (number of Mcycles/request) x number of pages for an operation 
The cost of the Login operation is: 
5.30 x 3 = 15.9 Mcycles 

If you cannot separate out independent functions in your application and need one independent function as a prerequisite to another, you should try to run the common function individually and then subtract the cost from all of the dependent functions. For example, to perform the BuyStock operation, you need to perform the login operation, calculate the cost of login separately, and then subtract the cost of login from the cost of the BuyStock operation. 

Therefore the cost of a single BuyStock operation can be calculated as follows: 
Single cost of BuyStock operation = Total cost of BuyStock – Cost of Login operation 
The cost of a single BuyStock operation is: 
39.36 – 15.92 = 23.44 Mcycles 

Table 6 shows the cost of each user operation in a sample application using the following scenario. 
CPU Speed = 1300 MHz 
Number of CPUs = 2 
Overall CPU Mcycles = 2,600 
Table 6: Cost per Operation for Login, SellStock, BuyStock, and Logout Operations 
----------------------------------------------------------------------------------------------------------------------------------
User           CPU %      Total net          ASP.NET     Number           Operation      #pages           cost  of
Operation  Utilization CPU Mcycles Requests/Sec  of  Requests   Cost(Mcycles)  without    single operation
                                                                                                                              login
-----------------------------------------------------------------------------------------------------------
                       
Login          90%        2,340.00          441                 3                 15.92                   3             15.92
SellStock    78%        2,028.00         241                  5                 42.07                   2             26.16
BuyStock    83%       2,158.00          329                  6                 39.36                   3             23.44
Logout        87%       2,262.00          510                  5                 22.18                   2              6.26
----------------------------------------------------------------------------------------------------------------------------------
The operation cost needs to be measured separately for each tier of an application. 

Step 4:  Calculate the Cost of an Average User Profile
The behavior of actual users can cause random crests and troughs in resource utilization. However, over time these variations even out statistically to average behavior. The user profile you compiled in Step 1 reflects average user behavior. To estimate capacity, you need to assume an average user and then calculate the cost in terms of the limiting resource identified in Step 2.
As shown in Table 7, during a ten-minute session, an average user needs 147.52 Mcycles of CPU on the server. The cost per second can be calculated as follows:
Average cost of profile in Mcycles/sec = Total cost for a profile / session length in seconds
Thus, the average cost for the profile shown in Table 7 is:
147.52/600 = 0.245 Mcycles/sec
This value can help you calculate the maximum number of simultaneous users your site can support.

Table 7: Cost of an Average User Profile
-----------------------------------------------------------------------------------------------------------------------------
Average User Profile    Number of    operations       Cost per operation           Total cost per operation
                                    executed during an                      (Mcycles)                       (Mcycles)
                                        average session   
--------------------------------------------------------------------------------------------------------
    Login                               1                                         15.92                              15.92
SellStock                             3                                         26.16                              78.47
BuyStock                             2                                         23.44                              46.87
Logout                                 1                                           6.26                                6.26
--------------------------------------------------------------------------------------------------
Total                                                                                                                147.52
---------------------------------------------------------------------------------------------------------

Step 5: Calculate Site Capacity
Calculating site capacity involves knowing how many users your application can support on specific hardware and what your site's future resource requirements are. To calculate these values, use the following formulas: 
•Simultaneous users with a given profile that your application can currently support. After you determine the cost of the average user profile, you can calculate how many simultaneous users with a given profile your application can support given a certain CPU configuration. The formula is as follows: 
Maximum number of simultaneous users with a given profile = (number of CPUs) x (CPU speed in Mcycles/sec) x (maximum CPU utilization) / (cost of user profile in Mcycles/sec) 
Therefore, the maximum number of simultaneous users with a given profile that the sample application can support is: 
(2 x 1300 x 0.75)/0.245 = 7,959 users 

•Future resource estimates for your site. Calculate the scalability requirements for the finite resources that need to be scaled up as the number of users visiting the site increases. Prepare a chart that gives you the resource estimates as the number of users increases. 
Based on the formulas used earlier, you can calculate the number of CPUs required for a given number of users as follows: 
Number of CPUs = (Number of users) x (Total cost of user profile in Mcycles/sec) / (CPU speed in MHz) x (Maximum CPU utilization) 
If you want to plan for 10,000 users for the sample application and have a threshold limit of 75 percent defined for the processor, the number of CPUs required is: 
10000 x 0.245 / (1.3 x 1000) x 0.75 = 2.51 processors 
Your resource estimates should also factor in the impact of possible code changes or functionality additions in future versions of the application. These versions may require more resources than estimated for the current version. 

Step 6: Verify Site Capacity
Run the load tests to verify that the transaction cost analysis model accurately predicts your application capacity and future requirements.
Verify the calculated application capacity by running load tests with the same characteristics you used to calculate transaction cost analysis. The verification script is simply a collection of all transaction cost analysis measurement scripts, aggregated and run as a single script.
The actual values and the estimated values should vary by an acceptable margin of error. The acceptable margin of error may vary depending on the size of the setup and the budget constraints. You do not need to run load tests each time you perform transaction cost analysis. However, the first few iterations should confirm that transaction cost analysis is the correct approach for estimating the capacity of your application.


Predictive Analysis
Predictive analysis involves the following steps: 

1.Collect performance data:Collect performance data for the application in production over a period of 
                                                  time. 
2.Query the existing historical data:Query the historical data based on what you are trying to analyze or 
                                                             predict. 
3.Analyze the historical performance data:Use mathematical equations to analyze the data to 
                                                                         understand the resource utilization over a period of time. 
4.Predict the future requirements:Predict the future resource requirements based on the mathematical 
                                                           model prepared in Step 2. 
The next sections describe each of these steps.

Step 1: Collect Performance Data
The performance data for the application needs to be collected over a period of time. The greater the time duration, the greater the accuracy with which you can predict a usage pattern and future resource requirements.
The performance counters and other performance data to be collected are based on your performance objectives related to throughput, latency, and resource utilization. The performance counters are collected to verify that you are able to meet your performance objectives and your service level agreements. For information about which counters to look at, see Chapter 15, "Measuring .NET Application Performance."
Be careful not to collect more than the required amount of data. Monitoring any application incurs overhead that may not be desirable beyond certain levels for a live application.
You might further instrument the code to analyze custom performance metrics. One of the tools available for storing and analyzing this performance data in large quantities is Microsoft Operations Manager (MOM).

Step 2:  Query the Existing Historical Data
Query the historical data based on what you are trying to analyze. If your application is CPU bound, you might want to analyze CPU utilization over a period of time. For example, you can query the data for the percentage of CPU utilization for the last 40 days during peak hours (9:00 A.M.–4:00 P.M.), along with the number of connections established during the same period.

Step 3: Analyze the Historical Performance Data
Before you analyze the historical performance data, you must be clear about what you are trying to predict. For example, you may be trying to answer the question, "What is the trend of CPU utilization during peak hours?"
Analyze the data obtained by querying the database. The data obtained for a given time frame results in a pattern that can be defined by a trend line. The pattern can be as simple as a linear growth of the resource utilization over a period of time. This growth can be represented by an equation for a straight line:
y = mx + b
where b is the x offset, m is the slope of the line, and x is an input. For the preceding question, you would solve for x given y:
x = (y – b)/m
For the example in Step 1, the trend line is:
y = 0.36x + 53
where y is the CPU utilization and x is the number of observations. Figure 1 shows the trend for this example.

Figure 1: Trend of CPU utilization


Choosing the correct trend line is critical and depends on the nature of the source data. Some common behaviors can be described by polynomial, exponential, or logarithmic trend lines. You can use Microsoft Excel or other tools for trend line functions for analysis.

Step 4:  Predict Future Requirements
Using the trend lines, you can predict the future requirements. The predicted resource requirements assume that the current trend would continue into the future.
For example, consider the trend line mentioned in Step 3. Assuming you do not want the CPU utilization to increase beyond 75 percent on any of the servers, you would solve for x as follows:
x = (y – 53)/0.36
Therefore:
x = (75 – 53)/0.36 = 61.11
Based on the current trends, your system reaches 75 percent maximum CPU utilization when x = 61.11. Because the x axis shows daily measurements taken from the peak usage hours of 9:00 A.M. to 4:00 P.M., one observation corresponds to one day. Because there are 40 observations in this example, your system will reach 75 percent CPU utilization in the following number of days:
61.11 – 40 = 21.11








No comments: