Tuesday, July 20, 2010

How to retrieve numbers of records in a table with high performance without index or tablescans?


Photo Credit: dodge challenger1

The most common way to retrieve the number of records in table is done with a query like:
select count(*) from tableX where column1 = 'value'

Using this kind of queries will result in a (clustered) index scan. On tables with millions of records this can still take a while to execute and can result in unnecessary overhead on the SQL server. SQL server stores a lot of statistics of your tables. These statistics are used by the optimizer to generate the most efficient queryplan. You can use these statistics to retrieve the number of records in your database. This is always faster then to query the table itself.
SQL server uses his own mechanisme to update the statistics, so it can happen that there is a small mismatch between the number of records in a table and the number of records stored in the statistics. For big tables this is not always so important. For instance to display the growth of your tables.

You can use next query to retrieve the number of records per tables via the statistics:

SELECT sysobjects.name,rowcnt
FROM sysobjects
INNER JOIN sysindexes ON sysobjects.id=sysindexes.id
WHERE xtype='U' AND indid IN (0,1)
AND sysobjects.name = 'Your table'

To check for the number of records for a column with a specific value use next query. This query wil return the number of records per column value. Available columns are columns which exists as the first column in an index. Example you have a table: X with the column: Type. The column type is used as the first column in an index. Next query will return all existing values for the column type with the number of records.

DECLARE @TableName Nvarchar(200)
SET @TableName = 'Your Table'

DECLARE ColumnStatistics CURSOR FOR
SELECT Tbl.name AS TableName, SSID.name as IndexName ,
         INDEX_COL( tbl.[name], idk.indid, 1 ) AS IndexColumn
FROM SYSINDEXkeys idk
INNER JOIN SYSOBJECTS tbl ON idk.[id] = tbl.[id]
INNER JOIN SYSINDEXES SSID on SSID.[ID] = TBL.[id] and ssid.indid = IDK.indid
WHERE tbl.ID = (SELECT ID FROM SYSOBJECTS WHERE name = @Tablename)
           AND keyno = 1

OPEN ColumnStatistics

DECLARE @IndexName Nvarchar(200)
DECLARE @IndexColumn Nvarchar(200)

FETCH NEXT FROM ColumnStatistics INTO @TableName,@IndexName, @IndexColumn
WHILE @@FETCH_STATUS = 0
BEGIN

CREATE TABLE #TempTable (ColumnValue nvarchar(200),INT1 INT,EQ_ROWS INT,
    INT3 INT,INT4 INT,TableName nvarchar(200))
INSERT INTO #TempTable (ColumnValue, int1,EQ_ROWS,int3,int4)
EXEC ('dbcc show_statistics(' +@TableName+','+@IndexName+') with histogram')

SELECT @TableName AS Tablename,@IndexColumn AS IndexColumn,ColumnValue, EQ_ROWS AS Records

FROM #Temptable
DROP TABLE #TempTable

FETCH NEXT FROM ColumnStatistics INTO @TableName,@IndexName, @IndexColumn
END
CLOSE ColumnStatistics
DEALLOCATE ColumnStatistics

How to check if and when the statistics are updated?

execute next query:
EXEC sp_autostats 'Your table'



Check for the column AUTOSTATS. The value should be set to ON.
Check for the column Last Updated to see when the statistics for the index is updated for the last time.


Enjoy using the statistics.

No comments: