Wednesday, February 4, 2009

Guidelines to write well performing queries.

During my daily work I have done a lot of performance analyzes at customer sites. One part of a performance analyze is analyzing resource intensive queries on the database. For instance CPU usage, disk I/O. Sometimes these queries can be improved by rewriting the query without losing functionality. This blog contains the most common mistakes I have seen in bad performing queries.
Image credit: Dave Ward



  1. Avoid use of SELECT * in SQL queries. Retrieve only the columns you need. This will avoid bookmark lookups and will result in less data traffic from SQL server to client. Do you buy everything in the supermarket and at home throw away what you do not need?
    Example:
    SELECT ResourceID, FullName
    FROM Resources

    Instead of
    SELECT *
    FROM Resources
     
  2. Create only result sets when necessary. This will avoid unnecessary data retrieval. If you want to know if something exists it is not interesting to know how much time it exists.
    Example:
    IF EXISTS (SELECT 1 FROM Resources WHERE RESID > 1005)

    Instead of
    IF EXISTS (SELECT COUNT(*) FROM Resources WHERE RESID > 1005)


  3. Look to the execution plan of your query. It should contain an Index Seek instead of an Index Scan or Table Scan.
        
  4. Use parameterized queries for queries which will be used a lot. This will save unnecessary query plan generations and the Procedure cache is used more efficient.
    Example:
    SELECT FullName, PhoneNumber
    FROM Resources
    WHERE RESID=@P1

    Instead of
    SELECT FullName, PhoneNumber
    FROM Resources
    WHERE RESID=1005
     
  5. Write query with correct data type. Check data type of a column via sp_help <tablename>
    Example:Column ID Datatype INT
    SELECT Fullname
    FROM Resources
    WHERE RESID = 1005

    Instead of
    SELECT Fullname
    FROM Resources
    WHERE RESID = '1005'
     
  6. Avoid SQL roundtrips. Combine queries into one query. Buying a bottle of beer and chips bag at once is always faster than first buying a bottle of beer and go home and then go back to the supermarket to buy a chips bag.
    Example:
    Query 1: SELECT FullName FROM Resources WHERE ID = 1005
    Query 2: SELECT PhoneNumber FROM Resources WHERE ID = 1005

    Combined Query: SELECT FullName, PhoneNumber FROM Resources WHERE ID = 1005
     
  7. Remove JOINS which are not part of the SELECT or WHERE clause.

    SELECT A.Column2, A.Column3
    FROM Table1 A
    INNER JOIN Table2 B ON A.Column1 = B.Column5
    WHERE A.Column4 = 20

    Remove INNER JOIN Table2 B ON A.Column1 = B.Column5

    New query:
    SELECT A.Column2, A.Column3
    FROM Table1 A
    WHERE A.Column4 = 20
     
  8. JOIN a table on columns which are part of an index. In most cases one of the 2 columns is the primary key column.

    SELECT C.Name
    FROM Activities A
    INNER JOIN Contacts C ON C.ID A.ContactID
    WHERE A.TYPE = 10010002

    C.ID is the primary key column of table Contacts.
    A.ContactID is part of an index on table Activities.
     
  9. Avoid a lot of OR statements in the WHERE CLAUSE. Use a UNION. Because of the 3 different columns in the WHERE clause the clustered index will be used.

    Example: Table with Clustered index on ID and 3 indexes on: City, Type and the last one on Birthday. Next query will result in a clustered table scan.
    SELECT FullName
    FROM Resources
    WHERE City = 'Delft'
        OR Type = 1
        OR BirthDay > '2000-01-01'

    Next query will use the 3 indexes:
    SELECT FullName
    FROM Resources
    WHERE City = 'Delft'
    UNION ALL
    SELECT FullName
    FROM Resources
    WHERE Type = 1
    UNION ALL
    SELECT FullName
    FROM Resources
    WHERE BirthDay > '2000-01-01'
  10. Avoid ORDER BY columns which does not change the ORDER BY. This is unnecessary overhead.
    Example: Tabel Resources with Primary key on ID and Index 1 on RESID and Index 2 on FullName
    SELECT FullName FROM Resources
    WHERE City = 'Delft'
    ORDER BY RESID, FullName

    Remove the FullName in the ORDER BY. This will avoid an Index Scan on index FullName and a Hash Match
  11. Configure your SQL Server management Studio to connect Read Uncommitted to the database. By default SQL Server management Studio will connect Read Committed. By connecting Read Uncommitted, you will avoid that you lock colleagues or will be locked by other colleagues.  In SQL Server management Studio select: Tools, Options, Query Execution, SQL Server, Advanced, SET TRANSACTION ISOLATION LEVEL: Read Uncommitted).
     
  12. Test your queries on a database with enough content in it. Performance of a query on a table with only 10 records is always fast. It will only guarantee that the syntax of the query is correct. You can't make any conclusions of this query regarding performance. If your application should perform on a table with 1 million records. Test your queries on a table with 1 million records.
     
  13. Avoid using temporary tables. Creation of temporary tables can cost a lot of disk I/O.
     
  14. Use SET NOCOUNT ON at the beginning of SQL batches, Stored Procedures and Triggers.
     
  15. Define on every table a primary key. If you do not configure, SQL Server will do it in the background. This is additional overhead.
     
  16. Incomplete or over complete WHERE CLAUSES. Read the database documentation of the application to understand the business logic. Knowing the business logic you can define the correct WHERE clause.
     
  17. UPDATE multiple records in one query. For instance one update statement for 10 order lines:

    UPDATE OrderLines
    SET Modified = '2009-02-02 10:00:00.002'
    WHERE ID in(14235,14236,14237,14238,14239,14240,14241,14242,14243,14244)

    Instead of
    UPDATE OrderLines
    SET Modified = '2009-02-02 10:00:00.002'
    WHERE ID = 14235

    UPDATE OrderLines
    SET Modified = '2009-02-02 10:00:00.003'
    WHERE ID = 14236
    .
    .
    .
    UPDATE OrderLines
    SET Modified = '2009-02-02 10:00:00.013'
    WHERE ID = 14244

2 comments:

Jelle said...

Although I am not programmer at all, it's very informative. One small suggestion:

Currently your display of code is as follows:

Example:
SELECT FullName, PhoneNumber
FROM Resources
WHERE RESID=@P1

Instead of
SELECT FullName, PhoneNumber
FROM Resources
WHERE RESID=1005

Is it possible to display 'good' and 'bad' coding differently (if 'good' and 'bad' are words you want to use in this context, but anyway). Starting with current uses of code ('bad code') and then suggesting 'good code' - and adding matching titles in red and green, for example (sorry 'bout the clumsy color tags, it's not allowed by blogger):

[red]bad[/red]:
SELECT FullName, PhoneNumber
FROM Resources
WHERE RESID=1005

[green]good[/green]:
SELECT FullName, PhoneNumber
FROM Resources
WHERE RESID=@P1

Using this layout it is much easier to do a quickscan of your very informative page and it is immediately clear what you consider good code. I really like your use of images by the way, very effective.

Greetings Jelle

André van de Graaf said...

Jelle,

Thanks for the useful suggestion.
I will try to use this in my next blog posts.

Gr. André