Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Tuesday, January 4, 2011

Statistics of KeepItSimpleAndFast.com


Starting with the year 2011, I want to share you some statistics of my blog. I started my blog 2,5 years ago. As of today, I'm pretty happy to see that the number of visitors to my blog is still increasing. Here some statistics of my blog:
  • 100% increase in the number of visitors last year.
  • 2200 visitors per month.
  • Visitors from 137 countries in the last year.
  • 78 % of traffic coming via search engines.
  • Total of 123 blogposts.
  • Browser usage: Internet Explorer 55%, Firefox 25% Chrome 16%. Internet explore share is decrease. Chrome is taking market share from Internet Explorer and Firefox.
  • Most popular content is related to application errors.
    http://www.keepitsimpleandfast.com/2009/08/report-definition-has-invalid-target.html
    http://www.keepitsimpleandfast.com/2010/02/compressed-zipped-folder-is-invalid-or.html

    What can we learn from this? Errors from a technical perspective are nice for the developers, however for the user it's unclear what they need to do. People are interested to know what they should do to fix it.
  • Most popular search words are the error messages of my most popular content.

Thursday, December 16, 2010

What triggers the update of statistics in my SQL 2008 database, when are the statistics out of date ?


Microsoft SQL Server 2008 collects statistical information about indexes and column data stored in the database. These statistics are used by the SQL Server query optimizer to choose the most efficient plan for retrieving or updating data. By default, SQL Server 2008 also creates and updates statistics automatically, when such an operation is considered to be useful. Sometimes it can happen that your statistics are not representative for your current data distribution which can result in a not efficient query plan. SQL Server 2008 determines whether to update statistics based on changes to column modification counters (colmodctrs).


If the statistics is defined on a regular table, it is out of date if:
  • The table size has gone from 0 to >0 rows.
  • The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then.
  • The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered.
  • For filtered statistics, the colmodctr is first adjusted by the selectivity of the filter before these conditions are tested. For example, for filtered statistics with predicate selecting 50% of the rows, the colmodctr is multiplied by 0.5.
One limitation of the automatic update logic is that it tracks changes to columns in the statistics, but not changes to columns in the predicate. If there are many changes to the columns used in predicates of filtered statistics, consider using manual updates to keep up with the changes.

More detailed information about database statistics can be found in this whitepaper: Statistics used by the query optimizer in Microsoft SQL Server 2008.

Tuesday, July 20, 2010

How to retrieve numbers of records in a table with high performance without index or tablescans?


Photo Credit: dodge challenger1

The most common way to retrieve the number of records in table is done with a query like:
select count(*) from tableX where column1 = 'value'

Using this kind of queries will result in a (clustered) index scan. On tables with millions of records this can still take a while to execute and can result in unnecessary overhead on the SQL server. SQL server stores a lot of statistics of your tables. These statistics are used by the optimizer to generate the most efficient queryplan. You can use these statistics to retrieve the number of records in your database. This is always faster then to query the table itself.
SQL server uses his own mechanisme to update the statistics, so it can happen that there is a small mismatch between the number of records in a table and the number of records stored in the statistics. For big tables this is not always so important. For instance to display the growth of your tables.

You can use next query to retrieve the number of records per tables via the statistics:

SELECT sysobjects.name,rowcnt
FROM sysobjects
INNER JOIN sysindexes ON sysobjects.id=sysindexes.id
WHERE xtype='U' AND indid IN (0,1)
AND sysobjects.name = 'Your table'

To check for the number of records for a column with a specific value use next query. This query wil return the number of records per column value. Available columns are columns which exists as the first column in an index. Example you have a table: X with the column: Type. The column type is used as the first column in an index. Next query will return all existing values for the column type with the number of records.

DECLARE @TableName Nvarchar(200)
SET @TableName = 'Your Table'

DECLARE ColumnStatistics CURSOR FOR
SELECT Tbl.name AS TableName, SSID.name as IndexName ,
         INDEX_COL( tbl.[name], idk.indid, 1 ) AS IndexColumn
FROM SYSINDEXkeys idk
INNER JOIN SYSOBJECTS tbl ON idk.[id] = tbl.[id]
INNER JOIN SYSINDEXES SSID on SSID.[ID] = TBL.[id] and ssid.indid = IDK.indid
WHERE tbl.ID = (SELECT ID FROM SYSOBJECTS WHERE name = @Tablename)
           AND keyno = 1

OPEN ColumnStatistics

DECLARE @IndexName Nvarchar(200)
DECLARE @IndexColumn Nvarchar(200)

FETCH NEXT FROM ColumnStatistics INTO @TableName,@IndexName, @IndexColumn
WHILE @@FETCH_STATUS = 0
BEGIN

CREATE TABLE #TempTable (ColumnValue nvarchar(200),INT1 INT,EQ_ROWS INT,
    INT3 INT,INT4 INT,TableName nvarchar(200))
INSERT INTO #TempTable (ColumnValue, int1,EQ_ROWS,int3,int4)
EXEC ('dbcc show_statistics(' +@TableName+','+@IndexName+') with histogram')

SELECT @TableName AS Tablename,@IndexColumn AS IndexColumn,ColumnValue, EQ_ROWS AS Records

FROM #Temptable
DROP TABLE #TempTable

FETCH NEXT FROM ColumnStatistics INTO @TableName,@IndexName, @IndexColumn
END
CLOSE ColumnStatistics
DEALLOCATE ColumnStatistics

How to check if and when the statistics are updated?

execute next query:
EXEC sp_autostats 'Your table'



Check for the column AUTOSTATS. The value should be set to ON.
Check for the column Last Updated to see when the statistics for the index is updated for the last time.


Enjoy using the statistics.

Friday, August 28, 2009

Index statistics and suggestions of a table in one query.

To optimize the index structure of tables in your database, you need to get information how these indexes are used. On the other hand you want to know if new indexes can improve the performance of your database. Index usage statistics and index suggestions are stored by the SQL Server engine as of SQL 2005.  With next query you can combine the index usages statistics and suggestions in one query. It will help you to improve the performance of your database.

Be aware to look very critical to all index suggestions. Suggestions have no relation to each other. Therefor do not add all index suggestions at the same. Add one index and analyze the results after a while and re-execute the query.

To use this script specify the table you want to analyze in the second line. In this example the table GBKMUT is analyzed.  If the table you specified does not exist, it will return all statistics and suggestions of all table in your database. A lot of success with improving your database.

DECLARE @ObjectID int

SELECT @ObjectID = OBJECT_ID('GBKMUT')

;WITH preIndexAnalysis
AS (
SELECT
OBJECT_SCHEMA_NAME(t.object_id) as schema_name
,t.name as table_name
,COALESCE(i.name, 'N/A') as index_name
,CASE WHEN i.is_unique = 1 THEN 'UNIQUE ' ELSE '' END + i.type_desc as type_desc
,NULL as impact
,ROW_NUMBER()
OVER (PARTITION BY i.object_id ORDER BY i.is_primary_key desc, ius.user_seeks + ius.user_scans + ius.user_lookups desc) as ranking
,ius.user_seeks + ius.user_scans + ius.user_lookups as user_total
,COALESCE(CAST(100 * (ius.user_seeks + ius.user_scans + ius.user_lookups)
/(NULLIF(SUM(ius.user_seeks + ius.user_scans + ius.user_lookups)
OVER(PARTITION BY i.object_id), 0) * 1.) as decimal(6,2)),0) as user_total_pct
,ius.user_seeks
,ius.user_scans
,ius.user_lookups
,STUFF((SELECT ', ' + QUOTENAME(c.name)
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
WHERE i.object_id = ic.object_id
AND i.index_id = ic.index_id
AND is_included_column = 0
ORDER BY index_column_id ASC
FOR XML PATH('')), 1, 2, '') AS indexed_columns
,STUFF((SELECT ', ' + QUOTENAME(c.name)
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
WHERE i.object_id = ic.object_id
AND i.index_id = ic.index_id
AND is_included_column = 1
ORDER BY index_column_id ASC
FOR XML PATH('')), 1, 2, '') AS included_columns
,i.object_id
,i.index_id
,STUFF((SELECT ', ' + QUOTENAME(ic.column_id,'(')
FROM sys.index_columns ic
WHERE i.object_id = ic.object_id
AND i.index_id = ic.index_id
AND is_included_column = 0
ORDER BY index_column_id ASC
FOR XML PATH('')), 1, 2, '') AS indexed_columns_compare
,COALESCE(STUFF((SELECT ', ' + QUOTENAME(ic.column_id, '(')
FROM sys.index_columns ic
WHERE i.object_id = ic.object_id
AND i.index_id = ic.index_id
AND is_included_column = 1
ORDER BY index_column_id ASC
FOR XML PATH('')), 1, 2, ''), SPACE(0)) AS included_columns_compare
FROM sys.tables t
INNER JOIN sys.indexes i ON t.object_id = i.object_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius ON i.object_id = ius.object_id AND i.index_id = ius.index_id AND ius.database_id = db_id()
WHERE t.object_id = @ObjectID OR @ObjectID IS NULL
UNION ALL
SELECT
OBJECT_SCHEMA_NAME(mid.object_id) as schema_name
,OBJECT_NAME(mid.object_id) as table_name
,'--MISSING--'
,'--NONCLUSTERED--'
,(migs.user_seeks + migs.user_scans) * migs.avg_user_impact as impact
,0 as ranking
,migs.user_seeks + migs.user_scans as user_total
,NULL as user_total_pct
,migs.user_seeks
,migs.user_scans
,0 as user_lookups
,COALESCE(equality_columns + ', ', SPACE(0)) + COALESCE(inequality_columns, SPACE(0)) as indexed_columns
,included_columns
,mid.object_id
,NULL
,NULL
,NULL
FROM sys.dm_db_missing_index_details mid
INNER JOIN sys.dm_db_missing_index_groups mig ON mid.index_handle = mig.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
WHERE database_id = db_id()
AND mid.object_id = @ObjectID OR @ObjectID IS NULL
)
SELECT schema_name
,table_name
,index_name
,type_desc
,impact
,user_total
,user_total_pct
,CAST(100 * (user_seeks + user_scans + user_lookups)
/(NULLIF(SUM(user_seeks + user_scans + user_lookups)
OVER(PARTITION BY schema_name, table_name), 0) * 1.) as decimal(6,2)) as estimated_percent
,user_seeks
,user_scans
,user_lookups
,indexed_columns
,included_columns
,STUFF((SELECT ', ' + index_name AS [data()]
FROM preIndexAnalysis iia
WHERE ia.object_id = iia.object_id
AND ia.index_id <> iia.index_id
AND ia.indexed_columns_compare = iia.indexed_columns_compare
AND ia.included_columns_compare = iia.included_columns_compare
FOR XML PATH('')), 1, 2, '') AS duplicate_indexes
,STUFF((SELECT ', ' + index_name AS [data()]
FROM preIndexAnalysis iia
WHERE ia.object_id = iia.object_id
AND ia.index_id <> iia.index_id
AND (ia.indexed_columns_compare LIKE iia.indexed_columns_compare + '%'
OR iia.indexed_columns_compare LIKE ia.indexed_columns_compare + '%')
AND ia.indexed_columns_compare <> iia.indexed_columns_compare
FOR XML PATH('')), 1, 2, '') AS overlapping_indexes
FROM preIndexAnalysis ia
ORDER BY schema_name, table_name, ROW_NUMBER() OVER (PARTITION BY schema_name, table_name ORDER BY user_total desc, ranking), index_name