Monday, March 30, 2009

Running a SQL Profiler trace via SQL Server Management Studio (SSMS)

SQL Profiler is a tool which every respected DBA should use. It is a tool to analyze performance, analyze what a kind of queries are executed by an application etc.... Without the SQL profiler application, you can still make profile traces. This is possible via SQL Server Management Studio (SSMS).

Easiest way to do this is to start SQL Profiler and define the events,columns and filters you want to profile.  Now you can export the trace definition. (File,Export, Script Trace Definition, For SQL 2005-2008)


Now I will show you a script which I uses a lot with some additional comments and an automatic stop of the profile trace of 1 minute.
-- start script to run a profiler trace
-- Create a Queue
declare @rc int
declare @TraceID int
declare @maxfilesize bigint
set @maxfilesize = 5

-- Extra from default export
DECLARE @EndTime DATETIME
DECLARE @OutputFileName NVARCHAR(256)
SET @OutputFileName = 'C:\Traces\tracefile'

-- Set the time the profile trace should run in minutes
SET @EndTime = DATEADD(mi,1,getdate()) -- trace will stop after 1 minute

-- Please replace the text InsertFileNameHere, with an appropriate
-- filename prefixed by a path, e.g., c:\MyFolder\MyTrace. The .trc extension
-- will be appended to the filename automatically. If you are writing from
-- remote server to local drive, please use UNC path and make sure server has
-- write access to your network share

EXEC @rc = sp_trace_create @TraceID output, 0, @OutputFileName, @MaxFileSize, @EndTime
if (@rc != 0) goto error

-- Client side File and Table cannot be scripted
-- Set the events
declare @on bit
set @on = 1
-- 15: Endtime
exec sp_trace_setevent @TraceID, 10, 15, @on
-- 16: Reads
exec sp_trace_setevent @TraceID, 10, 16, @on
-- 9:ClientProcessID
exec sp_trace_setevent @TraceID, 10, 9, @on
-- 17:Writes
exec sp_trace_setevent @TraceID, 10, 17, @on
-- 2:BinaryData
exec sp_trace_setevent @TraceID, 10, 2, @on
-- 10:ApplicationName
exec sp_trace_setevent @TraceID, 10, 10, @on
-- 18: CPU
exec sp_trace_setevent @TraceID, 10, 18, @on
-- 3:DatabaseID
exec sp_trace_setevent @TraceID, 10, 3, @on
-- 11:LoginName
exec sp_trace_setevent @TraceID, 10, 11, @on
-- 12:Spid
exec sp_trace_setevent @TraceID, 10, 12, @on
-- 13:Duration
exec sp_trace_setevent @TraceID, 10, 13, @on
-- 6:NTUserName
exec sp_trace_setevent @TraceID, 10, 6, @on
-- 14:Starttime
exec sp_trace_setevent @TraceID, 10, 14, @on

exec sp_trace_setevent @TraceID, 12, 15, @on
exec sp_trace_setevent @TraceID, 12, 16, @on
exec sp_trace_setevent @TraceID, 12, 1, @on
exec sp_trace_setevent @TraceID, 12, 9, @on
exec sp_trace_setevent @TraceID, 12, 17, @on
exec sp_trace_setevent @TraceID, 12, 6, @on
exec sp_trace_setevent @TraceID, 12, 10, @on
exec sp_trace_setevent @TraceID, 12, 14, @on
exec sp_trace_setevent @TraceID, 12, 18, @on
exec sp_trace_setevent @TraceID, 12, 3, @on
exec sp_trace_setevent @TraceID, 12, 11, @on
exec sp_trace_setevent @TraceID, 12, 12, @on
exec sp_trace_setevent @TraceID, 12, 13, @on

-- Set the Filters
declare @intfilter int
declare @bigintfilter bigint

-- TraceID 3 = DatabaseID
-- 0 = equal
set @intfilter = 8
exec sp_trace_setfilter @TraceID, 3, 0, 0, @intfilter

exec sp_trace_setfilter @TraceID, 10, 0, 7, N'SQL Server Profiler - 53952710-77cd-4820-9710-533bb536e38b'
exec sp_trace_setfilter @TraceID, 10, 0, 7, N'SQL Server Profiler - c8d5b4e9-a748-4c71-8cce-4c105fa2141f'

-- TraceID 12 = SPID
-- 1: SPID <> 51
set @intfilter = 51
exec sp_trace_setfilter @TraceID, 12, 0, 1, @intfilter

-- TraceID 13 = Duration
-- 4: Duration >= 2000000
set @bigintfilter = 2000000
exec sp_trace_setfilter @TraceID, 13, 0, 4, @bigintfilter

-- TraceID 16 = Reads
-- 3: < 5000 Reads
set @bigintfilter = 5000
exec sp_trace_setfilter @TraceID, 16, 0, 3, @bigintfilter

-- TraceID 18 = CPU
-- 2: CPU > 100
set @intfilter = 100
exec sp_trace_setfilter @TraceID, 18, 0, 2, @intfilter

-- Set the trace status to start
exec sp_trace_setstatus @TraceID, 1
-- display trace id for future references
select TraceID=@TraceID

goto finish
error:
select ErrorCode=@rc
finish:
go

-- End Script to run a profiler trace

Additional information about all events and columns can be found in the books online. Search for: sp_trace_setstatus,  sp_trace_setfilter

After starting the trace you get a TraceID. This TraceID is needed if you want to stop the trace before the endtime has occured. For instance, you want to stop a trace after 5 minutes but the @Endtime is configured for 30 minutes.

With next statement you can stop the trace for traceID 6:

-- Stop the trace: sp_trace_setstatus 'TraceID',0
sp_trace_setstatus 6,0
-- Close the trace sp_trace_setstatus 'Traceid',2
sp_trace_setstatus 6,2

After the trace is completed you can import the trace file in a SQL table for futher analyze:

CREATE TABLE TraceResults (

TextData VARCHAR(4000),
Duration INT,
Reads INT,
Writes INT,
CPU INT,
StartTime DATETIME,
ProcedureName VARCHAR(100)
)
GO

-- store traced files in a table
INSERT INTO TraceResults
(TextData, Duration, Reads, Writes, CPU, StartTime)
SELECT TextData, Duration/1000, Reads, Writes, CPU, StartTime
FROM fn_trace_gettable('c:\traces\tracefile.trc',1)

Wednesday, March 25, 2009

How to measure performance of your SQL Server?

Image credit: Lauren Nelson


In situations where I need to analyze the performance of a SQL Server I used a checklist to get an overview of the configuration of the SQL Server. This will costs 10 minutes to fill in but it will is a good start to find the bottleneck in the SQL Server configuration. In this blog you will find a checklist for the hardware configuration and database settings. At the end you will find the Windows Performance counters and a SQL profile trace definition I uses as starting point.

Performance problems can consist of one or more of the following:


  • Hardware problems: Hardware components could be malfunctioning. Faulty components can cause severe performance problems.
  • Hardware capacity: You might be exceeding the capacity of the system components. You might need post capacity planning or to reconfigure existing hardware.
  • Application problems: SQL statements might be inefficient, causing excessive usage of system resources

Hardware problems:
Check the event log for errors; check the event log of your storage system

Hardware capacity:

The three most common components on your system that require tuning:
  1. Processor
  2. Disk subsystem
  3. Memory

  1. Processor: A general rule is that if your processor utilization stays continuously at 80 percent or above, or if it peaks often at this rate, you might have a CPU bottleneck. Performance counter: Processor object, % Processor Time counter.
  2. Disk subsystem: SQL server uses the following three categories of files for databases: 
    • Primary data files. A primary data file contains startup information for a database, points to other files used by the database, stores system tables and objects, and can also store database data and objects. Each database has exactly one primary file. (File extension .MDF)
    • Secondary data files. Secondary files are optional for each database. They can be used to hold data and objects, such as tables and indexes, which are not in the primary file. A database might need one or more secondary files placed on separate disks to spread data across the disks. (File extension .NDF)
    • Log files. A log file holds all the transaction log information for the database and cannot be used to hold any other data. (File extension .LDF). For performance reasons it is recommended to place the LDF file on a separate hard disk. The LDF should be the only file on this hard disk, otherwise the advantage of a separate hard disk for log operations is lost.
    • You can combine two or more disks into a RAID array. The main characteristic of a RAID array is that physical disk drives are combined to form a logical disk drive, which is a virtual disk drive. When configuring logical drives you need to select the RAID level:
    • RAID 0. Does not support redundancy. There is no fault tolerance, so if one disk fails, all data will be lost. RAID 0 is not recommended for storing SQL Server data files.
    • RAID 1. Also known as mirroring. Use RAID 1 when data fits on one disk drive. Use RAID 1 for your operating system disk. It can be time consuming to rebuild an OS in the event of a failure. Since the OS usually fits on one disk, RAID 1 is a good choice. Use RAID 1 for the transaction log. Typically the SQL Server transaction log can fit on one disk drive. In addition, the transaction log performs mostly sequential writes. Only rollback operations cause reads from the transaction log. Thus you can achieve a high rate of performance by isolating the transaction log to its own RAID I volume. Use write caching on RAID I volumes. Because RAID 1 writes will not finish until both writes have been done, you can improve performance of writes by using a write cache. When you use a write cache be sure that it is backed up by a battery.
    • RAID 5. Any disk volume that does more than 10 percent writes is not a good candidate for RAID. Use write caching on RAID 5 volumes. Since a RAID 5 write is not complete until two reads and two writes have been performed, you can improve the response time of writes through the use of a write cache. When you use a write cache, be sure that it is backed up with a battery. However, the write cache is not a cure for overdriving your disk drives. You must still stay within the capacity of those disks. As you can see, RAID 5 is economical, but at a performance price.
    • RAID 10. RAID 10 is a combination of RAD 0 and RAID 1. RAID 10 involves mirroring a disk stripe. Each disk has an exact duplicate, but each disk contains only a part of the data. This configuration gives you the fault—tolerant advantages of RAID 1 with the convenience and performance advantages of RAID 0. Use RAID 10 whenever the array experiences more than 10 percent writes. RAID 5 does not perform well with large numbers of writes. Use RAID 10 when performance is critical. Since RAID 10 supports split seeks, performance is very good. Use write caching on RAID 10 volumes. Since RAID 10 writes are not complete until both writes are done, you can improve the performance of writes by using a write cache. Write caching is only safe when used in conjunction with caches that are backed up with batteries.
    To determine the load placed on the individual disk drives in the system, you must perform some calculations based on next performance counters:
  • Physical Disk Disk Reads/sec
  • Physical Disk Disk Writes/sec

    With next formulas you can determine how many I/Os are actually going to each disk in the array. RAID 0: I/Os per Disk = (Reads + Writes) Number of Disks
    RAID 1: I/Os per Disk = (Reads + (2 * Writes))/2
    RAID 5: I/Os per Disk = (Reads + (4 * Writes))/ Number of Disks
    RAID 10: I/Os per Disk = (Reads + (2 * Writes))/ Number of Disks

RAID level PerformanceFault ToleranceCost
RAID 0BestNo fault toleranceMost economical
RAID 1GoodGoodMost expensive
RAID 5Good reads slow writesOkMost economical Fault tolerant
RAID 10GoodExcellentMost expensive



How to calculate the maximum I/O per second for next disk specifications:
SpecificationValueNotes
Disk capacity18 GbUnformatted disk capacity
Rotational speed15.000 rpmSpeed at which the disk is spinning
Transfer rate40 M/b secSpeed of the SCSI bus
Average seek time3,9 ms (read)
4,5 ms (write)
Times it takes to seek between tracks during random I/Os
A random I/O on a typical system takes approximately 4.2 ms ((seek time read + seek time write)/2) for the disk to seek to where the data is held and an additional 2 ms in rotational latency, for a total of 6.2 ms. This gives a theoretical maximum of 161 I/Os per second (since 6.2 ins can occur 161 times per second). As noted earlier, if you run a disk drive at more than 85 percent of its capacity, queuing occurs. Therefore, the maximum recommended I/O rate is 137 I/Os per second. Taking into account overhead in the controller, a general rule is to drive these disk drives at no more than 125 1/ Os per second. So the closer the number
of actually I/O’s per second gets to maximum capacity the longer the latencies get. The disk queue lengths will growth which results in performance lose.
Memory:
It is best to dedicate your database server to SQL Server applications only, if possible. That allows SQL Server to use as much memory as possible in the system without having to share it with other applications. SQL Server 2008 supports the maximum support memory of the operating system.

Operating System
SQL Server 2008
Windows Server 2003/2008 X86
32 GB
Windows Server 2003/2008 X64 Web or Standard
32 GB
Windows Server 2003/2008 X64 Enterprise or Datacenter
2 TB

 
To allow applications on Windows 2003/2008 X86 to address more than 4 GB, these editions support Address Windowing Extensions (AWE). AWE allows physical memory pages above the standard 4 GB memory space to be acquired by applications. To allow AWE with Windows 2000, you must add the /PAE flag to the Boot.ini file.  If you have between 4 GB and 16 GB in your system, you can also add the /3GB flag to the Boot file to allow only 1 GB of virtual memory space
for the operating system, instead of 2 GB

 
Amount of system memoryFlag to add to the boot.ini file
4 GB/3GB
8 GB/3GB /PAE
16 GB/3GB /PAE
Above 16 GB (16 GB to 64 GB)/PAE

 
You must enable the AWE enabled option by setting it to I (its default is 0). Restart SQL Server for the
setting to take effect. The AWE option is an advanced option. You must have show advanced options set to 1 to view the current values of advanced options or to change an advanced option using sp_configure. To configure show advanced op/ions, use the following statement: sp_configure “show advanced options”, 1
To enable AWE use the following statement: EXEC sp_configure 'awe enabled', '1'
For more details about enable and configure advanced options see the books online of SQL server.

 
Before you can analyze the performance monitor log files you need to know the hardware configuration of the SQL server.

Hardware configuration:
Number of CPHU’s
CPU MHz
Physical RAM amount
If physical memory is more than 4 GB, what is the contents of the
BOOT.INI
Total number of physical drives in each array
RAID level of array used for SQL server databases
Hardware versus Software RAID
Location of operating system
Location of SQL server executables
Location of SWAP file
Location of Tempdb
Location of System databases
Location of user databases (MDF)
Location of log files (LDF)
What is the size of all user databases
How many free disk space is available on each array
Is write back cache in Disk controller On or Off
Speed of disk drives (RPM)
What is the average seek time of the hard disk
What is the rotational latency of the hard disk
What is the speed of the network card
Are the network cards hard coded for Speed/Duplex
How many database are attached in SQL server
Is this Physical server dedicated to SQL server
Is “NTFS data file encryption and compression” turned off
Is the Windows 2003 server configured as stand alone server
Is the “application response” setting, set to “Optimize Performance for
background services”

 

 


Location of operating system.
For best performance, operating system should be on a disk array that does not
include the SQL server data files.

Location of SQL server executables.

The location of the SQL server executables, is not critical, as long as they are not located on the same array as the SQL server data files.

Location of the swap file.

The location of the swap file is not critical, as long as they are not located on the same array as the SQL server data files.

Location of the Tempdb.

If the tempdb is heavily used, consider moving it to an array of its own, either RADI 1 or RAID 10, to boost disk I/O performance. Avoid RAID 5 arrays as they can be slow when writing data.

Location of User databases.
For best performance, user databases should be located on their own array (RAID 1,5 or 10), separate from all other data files, including log files.

How many free disk space is available ?
While the performance effect isn’t huge, it is important that all of your disk arrays have at least 20% of free space. This is because NTFS needs extra space to work efficiently. If space is not available, then NTFS is not able to function at its full capacity and performance can degrade. It also leads to more disk fragmentation.

Is write back cache in Disk Controller on or off ?
Write back cache will boost the performance of SQL server, however you should have a controller that offers battery backup. Otherwise you will get a corrupt database in case of a power failure. This is because SQL server already thinks that the data is stored, while it is still in cache. All data in cache is cleared after a power failure.

Are the network cards hard coded for speed/Duplex ?

It is fairly common for a network card to auto-sense incorrectly, setting a less than optimum speed or duplex setting, which can significantly hurt network performance. So set manually the card’s speed and duplex setting.

Is this physical server dedicated to SQL server ?

SQL server should run on a dedicated physical server, not shared with other application software. When you share SQL server with other software, you force SQL server to fight over physical resources, which make it much more difficult to tune your server for optimum SQL server performance.

Database configuration settings:
Database configuration settingDefault valueCurrent value, if other than default specify reason.
Auto_closeOff
Auto_create_statisticsOn
Auto_update_statisticsOn
Auto_shrinkOff
Database auto growOn
Transaction log auto growOn
Recovery modelFull

 
Auto_close:
By default set to OFF, only useful to set On when you have a lot of SQL databases on your server which are used incidental. When set to ON, the database is closed and shut down cleanly when the last user of the database exits and all processes in the database complete, thereby freeing any resources. The database reopens automatically when a user tries to use the database again. If the database was shut down cleanly, the database is not reopened until a user tries to use the database the next time SQL Server is restarted. When set to OFF, the database remains open even if no users are currently using the database. The Auto_close option should not be used for databases accessed by an application that repeatedly makes and breaks connections to SQL Server. The overhead of closing and reopening the database between each connection will impair performance.


Auto_create_ statistics:

When set to On, statistics are automatically created on columns used in a predicate. Adding statistics improves query performance because the SQL Server query optimizer can better determine how to evaluate a query.

Auto_update_statitics:

When set to On, existing statistics are automatically updated when the statistics become out-of-date because the data in the tables has changed.

Auto_shrink: 

By default set to Off, because mostly every database will growth in size during usages.


Database auto grow:

By default set to On, the databases can growth whenever necessary. There is only one restriction, you need to have diskspace available on the array where the database is stored.

Transaction log auto grow:  By default set to On,

The transaction log can growth whenever necessary. There is only one restriction, you need to have diskspace available on the array where the transaction log is stored.


Recovery model
:
Can be set to simple if only full back ups are made and no transaction log back ups are made.
Use the performance monitor to collect data of the SQL server. Define a trace log with next counters and set the interval time on 1 minute.


Performance monitor counters:
MemoryAvailable Bytes
MemoryPage faults/sec
Network InterfaceCurrent Bandwidth
Network InterfaceBytes Sent/sec
Network InterfaceBytes Received/sec
Physical DiskAvg. Disk sec/Read
Physical DiskAvg. Disk sec/Write
Physical DiskRead queue length
Physical DiskWrite queue length
Physical DiskAvg. Disk Reads/sec
Physical DiskAvg. Disk Writes/sec
Processor% Processor Time
ProcessorProcessor Queue length
ServerServer sessions
SQL Server: Cache managerCache Hit ratio
SQL Server: DatabasesTransactions/sec
SQL Server: General statisticsUser connections
SystemProcessor Queue length

 
Explanation of performance monitor counters:

Memory
·       Available bytes  The amount of free space in memory
·       Page faults/sec The number of page faults per second for code pages and data pages, averaged
over the interval period.

Network interface
·       Current Bandwidth The current size of the line
·       Bytes Sent/sec The number of bytes sent by the system per second, averaged over the interval
period.
·       Bytes Received/sec The number of bytes received by the system per second, averaged over the
interval period.

Physical disk
·       Avg. Disk sec/Read The average time (in milliseconds) a read operation takes. This time is
important because prolonged read and write operations indicate an over utilized
disk.
·       Avg. Disk sec/Write The average time (in milliseconds) a write operation takes. This time is
important because prolonged read and write operations indicate an over utilized
disk.
·       Read queue length The actual queue length for read operations. A disk queue of 2 (per disk) is the maximum recommended value for this counter. If the MDF file stored on 5 disks the maximum allowed queue length is 10 (5*2).
·       Write queue length The actual queue length for write operations. A disk queue of 2 (per disk) is
the maximum recommended value for this counter. If the MDF file stored on 5 disks the maximum allowed queue length is 10 (5*2).
·      Avg. Disk Reads/sec Avg. Disk sec/Read is the average time, in seconds, of a read of data from the disk.
·       Avg. Disk Writes/sec Avg. Disk sec/Read is the average time, in seconds, of a read of data from the disk.

Processor
·       % Processor Time The percentage of time the processor was busy. If more than 80 % you can have a CPU bottleneck.

Server
·       Server sessions The number of user sessions currently going on within the server.

SQL Server: Cache Manager
·       Cache Hit ratio The percentage of time a record was found in cache. The recommended cache hit
ratio is 90 percent or more.

SQL Server: Databases
·       Transactions/sec The number of transactions started for the database. These transactions come in the form of requests from client machines that are serviced by the database.

SQL Server: General statistics
·       User connections The number of users connected to the database.

System
·       Processor Queue length Processor Queue Length is the number of threads in the processor queue.  There is a single queue for processor time even on computers with multiple processors.  Unlike the disk counters, this counter counts ready threads only, not threads that are running.  A sustained processor queue of  greater than two threads generally indicates processor congestion.  This counter displays the last observed value only; it is not an average.

 

Application problems:

Use the SQL profiler to search for queries which results in a lot of reads or
writes. Define a trace file with

Events: Stored procedures: RPC Completed
            TSQL: SQL: Batchcompleted
Selected Data   Groups: Reads
                        Columns:          Eventclass
                                                Textdata
                                                ApplicationName
                                                NTusername
                                                Loginname
                                                CPU
                                                Duration
                                                Writes
                                                Starttime

Filter:    Reads Greater than or equal 100.000
            Database ID: the database ID of the specific database you want to profile.
Analyze this profile to see if queries can be optimized