Tuesday, August 26, 2008

First find the bottleneck before start making improvements in your application.

Every developer has been once or more in a situation in which people ask them to improve the performance of his\her application. What is the best approach? In this blog I will explain my approach to efficiently start to improve the performance of your application. The biggest improvements can only be made in the main bottlenecks. First I will start what you should NOT do:

  1. Directly start improving your code for the parts you expect it is the main bottleneck.
  2. Make ‘improvements’ in your applications without measuring the improvement. How can you know if it is really an improvement?
  3. Discussing if the correct hardware is used. The fact is that somebody is complaining about your application. Based on the facts in log files you can prove if incorrect hardware is used.

What should you do before making changes in your code? Find the main bottlenecks. The analysis of an application should be done on facts, not assumptions.

  1. Ask the people who complain to measure the performance of your application with tools like Performance monitor and SQL Profiler.
  2. If possible, ask for a backup to reproduce the scenario in which the performance is bad.
  3. Think from a user perspective. How long is a user waiting until the application is finished? I will call this Time Spend. Assume you are the user. Will you be happy and enthusiastic with this Time spend? The answer should be Yes. If the answer is No, you can start the challenge to get happy and enthusiastic.
  4. Start analyzing the SQL profile log. In the SQL profile log you can focus on 3 different types of bottlenecks:
    a) Long running queries.
    b) Big number of small queries
    c) Application processing time. (Time between queries)

Now I will describe how to find one of these 3 bottlenecks.

  1. Save the SQL Profile to a table, so we are able to analyze the data with the query analyzer.
  2. Execute next query to get some statistics of the SQL profile. In this example I have saved the profile to table [Profile]


    DECLARE @AnalyzeTime VARCHAR(20)
    SET @AnalyzeTime = '2008-08-25 16:01:30'
    -- Profile statistics of application
    SELECT SUM(DATEDIFF(ms, A.Starttime,A.Endtime)) AS SUMQueryTime,
    SUM(DATEDIFF(ms, B.Endtime,A.Starttime)) AS SUMTimeDiffPrevQuery
    ,DATEDIFF(ms, MIN(A.Starttime),Max(A.Endtime)) AS TimeSpend, Count(*) As NumberOfQueries
    FROM [Profile] A
    INNER JOIN [Profile] B ON A.Rownumber = B.Rownumber + 1
    WHERE A.TEXTDATA IS NOT NULL
    AND A.starttime > @AnalyzeTime

    All times are milliseconds.
    SUMQueryTime = Total duration of all traces SQL queries
    SUMTimeDiffPrevQuery = Total time between different queries. This is the time that SQL server is waiting after finishing Query(X) to start Query(X+1)
    TimeSpend = Total duration between Start time of the first query and the end time of the last query.
    NumberOfQueries = Total number of queries in this log


    Analyze the result of the query in previous step. I will use some examples

    Example 1:

    SUMQueryTime: 237
    SUMTimeDiffPrevQuery : 3009
    TimeSpend: 3246
    NumberOfQueries: 203

    Total time user the user is waiting is 3246 ms. Most time is spend between different queries 92% ( 3009 ms out of 3246 ms). Main bottleneck is the time spend between different queries. What is the application doing between the different queries. Optimize SQL queries is not useful. Assume you can improve your SQL queries with 90 %. This is really nice, however the improvement from a user perspective is only 213 ms (90 % of 237ms) which is an improvement of 6 %.(213 out of 3246).

    Use next query to find all gaps between Query(X) and Query(X+1)

    DECLARE @AnalyzeTime VARCHAR(20)
    SET @AnalyzeTime = '2008-08-25 16:01:30'
    -- Time difference with previous query > 0
    SELECT B.Textdata as FirstQuery,A.Textdata as SecondQuery,DATEDIFF(ms, B.Endtime,A.Starttime) AS TimeDiffFirstSecondQ,A.Starttime, A.Endtime
    FROM [Profile] A
    INNER JOIN [Profile] B ON A.Rownumber = B.Rownumber + 1
    WHERE DATEDIFF(ms, B.Endtime,A.Starttime) > 0
    AND A.starttime > @AnalyzeTime


    Example 2:

    SUMQueryTime: 15500
    SUMTimeDiffPrevQuery: 100
    TimeSpend: 15600
    NumberOfQueries: 376

    Total time user is waiting is 15,6 seconds. 99% (15500/15600) is spent by SQL server to execute the 376 queries. Use next query to find the most long running query.

    DECLARE @AnalyzeTime VARCHAR(20)
    SET @AnalyzeTime = '2008-08-25 16:01:30'
    -- Long queries
    SELECT TOP 25 A.Textdata,DATEDIFF(ms, A.Starttime,A.Endtime) AS QueryTime,A.Starttime, A.Endtime
    FROM [Profile] A
    WHERE DATEDIFF(ms, A.Starttime,A.Endtime) > 0
    AND A.starttime > @AnalyzeTime
    ORDER BY DATEDIFF(ms, A.Starttime,A.Endtime) DESC



    Example 3:

    SUMQueryTime: 5000
    SUMTimeDiffPrevQuery: 2000
    TimeSpend: 7000
    NumberOfQueries: 4500

    71% of time is spend by SQL server (5000/7000). However, after running the query to retrieve the long running queries, you see that the longest query in this log file takes 50 ms to execute. Optimize a lot of individual queries will take a lot of time to do. Because of the big number of queries in relation with the total Time Spend, a better approach is to avoid queries. Every query you can avoid will save SUMQueryTime but it will also save SumTimeDiffPrevQuery. Some ideas to lower the number of queries:
    · Retrieve data of multiple queries into one. For instance instead of retrieve every order line in a separate query, retrieve all order line in one query.
    · Is every query really need for the scenario of the customer. For instance is the retrieved description of a general ledger account displayed in the user interface.
    · Is it possible to cache some data at the client side? For instance default currency of the administration. Data retrieved from cache will save a query on the database.

    Now you know your main bottlenecks and can start spending your time on these bottlenecks. In most situation the 80/20 rule will apply. 80 percent of the performance can be improved by 20 % of your code.