Keep It Simple and Fast: 2008

Wednesday, December 31, 2008

The new world of work

Enjoy this nice video about the new world of work. I will mention some nice quotes in this video:

Technology has flattened our world.
People are loyal to their skill, not their employer.
Our people are our greatest asset...
People used to learn things, now they use Google.
We are currently preparing students for jobs and technologies that don't yet exists ... to solve problems we don't even know yet

Happy viewing.

Saturday, December 27, 2008

8 reasons why I uses a SaaS accounting solution for my financial administration?

At the moment that my first child was born I started to do my own private bookkeeping in Exact for Windows. I only know the amount of money on my bank account every month. I had no idea how much money I spend for different things this like food, clothes, housing, holidays, my car etc. I started, 9 years ago, with Exact for Windows. 4 years ago, I switched to my first SaaS solution. It was the alpha version of our current SaaS solution Exact Online. (http://www.exactonline.nl/ dutch only) Since 2 years, I'm a very happy user of Exact Online. A lot of people ask me the question, why are you using a SaaS solution for it? Is it safe to store your data somewhere else? Of course, before I start using Exact Online, I asked myself the same questions. Why did I decide to use it?

I realized that already a lot of my 'private' data is stored somewhere in the cloud. It's all about TRUST. Everybody is doing their banking stuff via Internet. Web shopping is very popular. Invoices of my telecom provider can only be retrieved via the internet. In the Netherlands, the government started with the Electronic Patient Dossier. This is your medical file, available via the web for doctors. So a lot of my data is already in the cloud and I'm happy with it. It's a running train which will never stop.
At home I have multiple PC's in a small network. My previous financial bookkeeping solution was installed on one of these PC's. Therefore it could only be used from that single PC. This is not a problem if you are alone, but with more people in the family it can happen that the financial bookkeeping PC is in use. With Exact Online, you can use it from every PC with a browser. Not only at home but everywhere in the world.
Backup of my data. How much times do you make a backup of all your home PC's. I did it almost not. With the hosted solution, I have no worries about that.
Availability of your data. A few months ago, I visited my dentist and she asked for the insurance number of my kids. I started Exact Online, from the PC of the dentist, to retrieve the insurance numbers. When I travel with my family, all my e-tickets and reservations are stored in Exact Online. If I need them, they are available for me.
Functionality which is the advantage of a SaaS solution. I will give one example: direct automatic import of your bank statements during the night. Exact Online is directly importing my bank statements from my bank. I do not have to download my bank statements and import them manually.
I'm working for the research and innovation center of Exact in Delft. I'm in favor of the principle: Eat your own dog food.
With the current financial crisis, what is the impact on my financial situation? In one overview, I can see all my revenue and cost in comparison with previous months and years.
Last but not least, Exact Online complies with my personal statement: Keep It Simple and Fast.

Wednesday, December 24, 2008

Identifying the performance bottleneck is already 50% of the solution.

In my previous blog post I'm in favor to visit customers as a product manager. You get useful feedback how customers are using your product. This week I did, as Principal Research Engineering, a consult to find the bottleneck in the Globe environment of a customer. It was an interesting consult with useful feedback. I will summarize some experiences:

SQL 2000 (X86) was installed on a Windows Server 2003 (X64). The server has 8 GB memory. However only 2 GB was used by the 32 bits version of SQL Server. It is possible to allocate more memory for SQL Server (X86) but only on a 32 bits operating system. To use all memory on Windows Server 2003 (X64), install an X64 version of SQL Server. After installing SQL 2005 (X64) all server memory was used. The more memory SQL server can use, the better is the performance.
An application took a while to display the information. With SQL profiler, one query was identified as the bottleneck. Two suggestions to improve the query will be investigated. Hopefully are these suggestions possible. The development organization is currently analyzing these suggestions.
The Intrastat application resulted almost always in a query time out. Also in this case, SQL Profiler was used to identify the slow query. The query itself was correct however in the query plan; a clustered index scan was executed on the biggest transaction table GBKMUT. After creating an index, the application executed within 2 minutes. This index will be added to the index structure of GBKMUT for our logistic Globe customers.

Conclusion: Identifying the performance bottleneck is already 50% of the solution. In this case, a wrong SQL Server installation, a query which can be improved and an improvement of the standard index structure of Globe. This consult resulted in a win-win situation. For the customer, the performance of his Globe environment is improved. For Exact, our customers can benefit of the improvements in the application and index structure.

Monday, December 15, 2008

Customer experiences and social media: Product manager should visit customers every 2 weeks.

Last week, I did two invoiced customer visits. In the past I have done a lot of customer visits as a support expert. During these troubleshoot consults, I had one mission. Solve the issue and make the customer happy. When I arrived at the customer site the atmosphere was already set by the things which had happened in the past. You start the visit with a meeting with the director, ICT manager, accountant and one or more users. They all have a story to tell. You NEED only to listen. When they are finished their story, you need to summarize their story in 2 sentences. If you are able to do that, you have already 50 % of the solution. The customer wants to get the acknowledgement that YOU understand his situation. With an invoiced visit, on request of the customer, the atmosphere is different. It’s more positive. The software is running and they need some specific help or advice. The conversation is much easier, however YOU need to listen and to observe. During your consult you will use your software with the customer. Look critical how these users are using your software. Is this the way you have designed it? If not, do not say: The customer is using it wrong. For what kind of reason, it is not clear for the customer. Find the reason and change the software, so the customer will use it in the right way. Sometimes you are surprised why a user is doing something. Ask him why he is doing this. He will tell you a workaround, which he has accepted in the past. This is a nice opportunity to improve the software. Customer experience will be great, if you can tell your customer one week after your visit, that his ‘accepted workaround’ is fixed in the next release.

As product manager, it will be your lucky day when this customer is using social media to tell this great customer experience to the community. This is a win - win situation. More and more people will use social media. At one day it will happen that you will get positive feedback on your product in a Google search. This is the best advertisement you can get for your product.

Positive experiences told by YOUR happy customer to the community.

As mentioned in one of my previous blogs: What is the URL of your company? The homepage of your company is Google, Yahoo or MSN. As mentioned in You can't stop it anyway, once people become more active in social media, there’s no way how you can control how things go. Who will write, who will find and who will comment on these experiences. This will even more happen with negative experiences.

As a product manager take the opportunity to visit every 2 weeks one customer. Do not only talk, but really do something with your software and the user. You want to see the 'accepted workarounds'. Remember: Think from a user perspective instead from a technical perspective. If you do these kind of visits for free, you have already the first positive customer experience. It will cost some time but you will get paid for it.

Do you agree on this? Please let me know YOUR opinion about this.

Sunday, December 14, 2008

Optimize index structure: Online nonclustered index defragmentation

In SQL 2005 a lot of statistical information is stored about the indexes in your database. This is the third post in a serie about optimizing your index structure.

Previous posts in this serie:

Tables in a database can contain a lot of nonclustered indexes per table. These indexes can get fragmented after a while. This will occur when a lot of records are inserted and deleted. Fragmented indexes will have a negative impact on the performance when these indexes are read.

Online index defragmentation can be done in 2 different ways:
1) Alter index statement. Can only be performed in Enterprise edition of SQL Server.
2) DBCC INDEXDEFRAG. Not supported as of SQL 2008 but it executed on SQL 2005 Standard or Enterprise edition.

Next script is an on line operation which can be executed in SQL 2005 Standard and Enterprise edition. It will have impact on the performance of the database. There for do not schedule this script on peak usage of the system.

SET NOCOUNT ON

DECLARE @Table NVARCHAR(512)
DECLARE @Index NVARCHAR(512)
DECLARE @Rebuild NVARCHAR(4000)
DECLARE DB CURSOR FOR

SELECT SO.Name, SI.Name
FROM SYS.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED')
INNER JOIN SYS.objects SO ON SO.object_id=SYS.dm_db_index_physical_stats.object_id
INNER JOIN SYS.indexes SI ON SI.index_id=SYS.dm_db_index_physical_stats.index_id AND
SI.object_id=sys.dm_db_index_physical_stats.object_id
-- Begin select all nonclustered indexes Index_id = 1
AND SI.Index_id <> 1
-- End select all nonclustered indexes Index_id = 1
WHERE avg_fragmentation_in_percent > 10.0 AND SYS.dm_db_index_physical_stats.index_id > 0
ORDER BY SO.Name

OPEN DB
FETCH NEXT FROM DB INTO @Table, @Index
WHILE @@FETCH_STATUS = 0

BEGIN
SET @Rebuild = 'DBCC INDEXDEFRAG (0,' + @Table + ',' + @Index + ')'

PRINT @Rebuild
EXEC SP_EXECUTESQL @Rebuild
FETCH NEXT FROM DB INTO @Table, @Index

END
CLOSE DB
DEALLOCATE DB
SET NOCOUNT OFF

To run this script on SQL Server 2008 change the syntax of @Rebuild to:
SET @Rebuild = 'ALTER INDEX ' + @Index + ' ON ' + @Table + ' REBUILD WITH (ONLINE = ON)'

On big tables this script can take a while to execute. In the software solutions of my company Exact Software we have Exact Globe and Exact Synergy Enterprise. In these solutions next tables can takes a while to defrag: GBKMUT, BacoAppLog, Absences, ItemNumbers, EBCDataLog, WordReferences, BacoSettings. AMUTAK, AMUTAS, ORSRG, ORKRG, FRSRG, FRKRG, ORHSRG, ORHKRG, FRHSRG, FRHKRG.

Wednesday, December 10, 2008

Optimize index structure: Defragmentation clustered indexes

In SQL 2005 a lot of statistical information is stored about the indexes in your database. This is the second post in a serie about optimizing your index structure.
A clustered indexes determines the physical order of data in a table. Because the clustered index dictates the physical storage order of the data in the table, a table can contain only one clustered index. Fragmentation affects disk performance and the effectiveness of the SQL Server read-ahead manager. The SQL Server read-ahead manager can read less records in a fragmented clustered index. A non fragmented clustered index will maximize the number of records the SQL Server read-ahead manager can retrieve in one disk IO.
With next script you can rebuild your clustered indexes which have a fragmentation of more than 10 %.

Be aware that a rebuild of a clustered indexes will results in a table lock. It is a off line operation. Execute this script when you know that the database is not used be other applications. Depending on the size of your database it can take a while to execute.

To generate the SQL script without rebuilding the clustered index directly, remark:
-- EXEC SP_EXECUTESQL @Rebuild
in the script.

The script can also be used for all non clustered indexes, however rebuilding of non clustered indexes can be done in an online operations. In another post I will make a script to rebuild non clustered indexes online

SET NOCOUNT ON
DECLARE @Table NVARCHAR(512)
DECLARE @Index NVARCHAR(512)
DECLARE @Rebuild NVARCHAR(4000)

DECLARE DB CURSOR FOR
SELECT SO.Name, SI.Name
FROM SYS.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED')
INNER JOIN SYS.objects SO ON SO.object_id=SYS.dm_db_index_physical_stats.object_id
INNER JOIN SYS.indexes SI ON SI.index_id=SYS.dm_db_index_physical_stats.index_id AND
SI.object_id=sys.dm_db_index_physical_stats.object_id
-- Begin select only clustered indexes Index_id = 1
AND SI.Index_id = 1
-- End select only clustered indexes Index_id = 1
WHERE avg_fragmentation_in_percent > 10.0 AND SYS.dm_db_index_physical_stats.index_id > 0
ORDER BY SO.Name

OPEN DB
FETCH NEXT FROM DB INTO @Table, @Index
WHILE @@FETCH_STATUS = 0
BEGIN
SET @Rebuild = 'ALTER INDEX ' + @Index + ' ON ' + @Table + ' REBUILD'

PRINT @Rebuild
EXEC SP_EXECUTESQL @Rebuild
FETCH NEXT FROM DB INTO @Table, @Index

END
CLOSE DB
DEALLOCATE DB
SET NOCOUNT OFF

Previous post in this serie:
Which indexes are not used in my database? SQL 2005 Index usage statistics and disk space usage.

Monday, December 8, 2008

Ultimate guide to upgrading to SQL Server 2008

Microsoft has published a guide to upgrading to SQL Server 2008, please refer to the SQL Server 2008 Upgrade Technical Reference Guide. It is a big Word document of 490 pages which is a good starting point for your own upgrade plan from SQL 2000/2005 to SQL 2008.

A successful upgrade to SQL Server 2008 should be smooth and trouble-free. To achieve that smooth transition, you must devote plan sufficiently for the upgrade, and match the complexity of your database application. Otherwise, you risk costly and stressful errors and upgrade problems. Like all IT projects, planning for every contingency and then testing your plan gives you confidence that you will succeed. But if you ignore the planning process, you increase the chances of running into difficulties that can derail and delay your upgrade. This document covers the essential phases and steps involved in upgrading existing SQL Server 2000 and 2005 instances to SQL Server 2008 by using best practices. These include preparation tasks, upgrade tasks, and post-upgrade tasks.

Chapter 1 gives an overview of the technical issues and decisions that are involved in an upgrade to SQL Server 2008, as well as recommendations for planning and deploying an upgrade.
Chapter 2 addresses issues related to upgrading to SQL Server 2008 Management Tools.
Chapters 3 through 8 focus on upgrade issues for SQL Server relational databases.
Chapter 9 addresses upgrading to SQL Server 2008 Express.
Chapters 10 through 14 focus on upgrading to SQL Server 2008 Business Intelligence components: Analysis Services, Data Mining, Integration Services, and Reporting Services.
Chapter 15 addresses the implications of upgrading to SQL Server 2008 for other Microsoft applications and platforms.
Appendix 1 contains a table of allowed SQL Server 2008 version and edition upgrade paths.
Appendix 2 contains an upgrade planning checklist.

Friday, December 5, 2008

What is the URL of your Corporate Homepage ?

Last month I viewed a nice video in which Jeremiah Owyang interviews Bob Pearson about Why DELL uses Social media. A presentation of this interview can be found on here. It takes 5 minutes to listen. In this interview Bob made some nice quotes. Bob has a really nice answer on the question:

What is the URL of your Corporate Homepage?

The real homepage is Google, MSN or Yahoo. People go to search. Type in your brand name and the result set is your new homepage. The web reflects the reality. You can't ignore it. If there is something negative, your reaction should be: How do we solve the problem?

Social media is becoming more popular. As a company you CAN'T ignore it. Monitor what the community is saying about your company and act of it. It is not a question if you want to act on it. You need to act on it. The community will decide on how they think about your company.

Wednesday, December 3, 2008

Optimize index structure: Which indexes are not used in my database? SQL 2005 Index usage statistics and disk space usage.

In SQL 2005 a lot information is stored about how indexes are used and which indexes can be useful to create. This is the first post in a serie about optimzing your index structure.

In SQL 2005 you can retrieve index usage statistics of all indexes in your database. This is a really good feature which allow you to tune your index structure of your database. Every database is used on it's own way. For a standard software package like Exact Globe, an index structure is shipped with the product. This index structure is not always optimal. It depends on how you are using the software. For instance, an index for costunits is only useful if you make use of costunits in your application.
With the index usage statistics you can see how much times an index is used. The most easiest ones are the indexes which are never used for reading but are only updated.

This script will display the index usage statistics for the table 'Gbkmut' which are never read.

-- Used Indexes with only updates and No reads
SELECT object_name(s.object_id) AS ObjectName,ISNULL(i.name, '(Heap)') AS IndexName,
user_seeks AS IndexSeeks,user_scans AS IndexScans,user_lookups AS LookUps,
user_seeks + user_scans + user_lookups AS TotalReads,user_updates AS Updates,
CASE WHEN (user_seeks + user_scans - user_updates < 0) THEN'Updates > Reads'
WHEN (user_seeks < user_scans) THEN'Scans > Seeks'
ELSE' '
END AS Warning
FROM sys.dm_db_index_usage_stats s
INNER JOIN sys.indexes i ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE database_id = DB_ID() AND objectproperty(s.object_id,'IsUserTable') = 1
AND object_name(s.object_id) = 'gbkmut'
AND (user_seeks + user_scans + user_lookups)=0

The result set of this query are indexes which maybe can be deleted. Monitor these indexes for a period in which you know that all functionality in your software packages is used.

Please be aware that index statistics are stored in Dynamic Views. After a restart of SQL Server the index usage statistics are reset to 0. Index statistics for a table are also reset to 0 when index changes are executed on this table.

With next query you can see how much diskspace the indexes for table 'Gbkmut' are using.
-- Indes sizes of an table
SELECT Name as Index_name, Type_desc AS IndexType, Space_used_in_mb = (page_count * 8.0 / 1024.0)
FROM SYS.INDEXES I
INNER JOIN SYS.DM_DB_INDEX_PHYSICAL_STATS(db_id(), object_id('gbkmut'), null, null, null) IP
ON I.object_id = IP.[object_id] and I.[index_id] = IP.[index_id]
ORDER BY I.index_id

Tuesday, November 25, 2008

Virtualization with SQL server. Think before you start.

Sometimes I got the question if it is useful to virtualize your SQL Servers. This is a valid question. There are a lot of reasons to virtualize your server park. So why not your SQL Servers. I have some tips when you plan to start virtualization with SQL Server:

Check if your SQL Server version is supported by Microsoft on the Virtualization platform you are going to use. Read this official web log for Microsoft PSS SQL. This blog will explain the support policies from Microsoft regarding SQL Server running in a hardware virtualization environment.
Do you have experience with virtualization with other applications? If not start with some 'non business critical' applications.
Read the best practices of SQL 2008 on Hyper-V.
With the use of virtualization, a new dimension of hardware capacity is introduced: The configuration of the resources of the Hyper-V machine. I will explain with one example:

Assume 4 servers are virtualized on one physical box with 2 processors and Hyper-V installed. Within Hyper-V you have configured that each guest operating system is allowed to use a maximum of 25% of the CPU power of the Hyper-V machine. The first guest operating system has a CPU utilization of 100% and the other 3 guest operating systems have a CPU utilization of 0%. The physical machine will have a CPU utilization of 25%. In the first guest operating system you can make the conclusion that you have a CPU bottleneck. However this is maybe caused by a wrong Hyper-V configuration. Does the other 3 guest operating system really need 25% of the CPU power of the Hyper-V machine?
Beside the regular performance counters of your guest operating system, you need to monitor Hyper-V performance to get a true measure of resource usage.

Convince yourself that you are able to find the performance bottleneck in a non virtualized SQL Server environment. If not, improve your skills to find the performance bottleneck before implementing virtualization.
Realize that virtualization will make your server configuration more complex.

Tuesday, November 11, 2008

Exact DashBoard Actie: Simple and Fast

As you can see in the title of my blog, I like the principle Keep it Simple and Fast. A product with a lot of features will getting very complex. A small amount of features will result in a simple application. This week my company introduced a nice solution: The Exact Dashboard. It is a straight forward installation which will create a dashboard. This dashboard contains nice reports to give you a better insight in the financial situation of your company.

Have a look to it, to experience my principle: Keep It Simple and Fast.

Tuesday, November 4, 2008

Triggers in SaaS accounting solution Exact Online

In Conor Cunningham's blog on SQL Server he asks companies who has experiences with Stored Procedures and Triggers. Therefor I want to share some of our experiences with the way we have implemented triggers. My company Exact Software has builded Exact Online which is a Software as a Service accounting solution. Currently this solution, with more than 6000 subscriptions, is available in the Netherlands and Belgium. Exact Online is builded on the latest .NET Framework. All data is stored in multiple SQL 2005 databases. We make use of triggers to update the balance table in which we store the Balance and Profit & Loss sheet. This balance table is based on the financial transactions created by the users.

Why did we choose to use triggers to update the balance table?
Advantages:

Consistency between transaction table and balance table.
High performance due to the efficiency of SQL Server triggers
Apart from the triggers themselves, no additional code is needed to keep the balance table up to date.

Disadvantages:

Chance of locking errors when 2 transactions update different transaction records, which related to the same balance record.

Implementation issues:

Define 3 triggers: One for insert, one for update and one for delete. One trigger for update, insert and delete resulted in our solution to locking errors.
Start every trigger with the statement IF @@ROWCOUNT = 0 RETURN. This is done for performance reasons, if no inserts, updates or deletes took place at all, do not do anything.
To limit locking situations:

1) Keep database transactions as short as possible.
2) Do not delete the balance record if all underlying transactions are deleted. In most situations this balance record will be used when new transactions are created.
In the past we have used indexed views, however because of performance reasons we moved to the trigger solution.

We are using the triggers, in the Exact Online solution, on the balance table for more than 3 years. Till today we are very enthousiastics about the performance and consistency of this implementation. According the statistics of SQL 2005 the triggers are fired 7 times per second on every database.

Monday, November 3, 2008

Compress content is a nice example of an application builded from a technical perspective instead of a user perspective.

This weekend I encountered a nice example of my previous blog: The difference between solutions from a technical perspective and from a user perspective.

I started the laptop of my father. It took 20 minutes to start. He told me that this was happen since a while. He had no idea what had happend. Every morning he started his laptop before he took a shower. After the shower his laptop was 'ready' to logon. I thought he had a virus or something like that. After a while, looking why it tooks so long to start, I found the root cause.

Compressed files in the Windows and Program Files directory.

All files which are needed to load the operating system and to start his applications where compressed. After decompressing these files, the PC started within 3 minutes. Still long but much better as it was. I can't wait for Windows 7 to start really fast.

How did it happen?
In the past the C: drive was almost out if diskspace, however his D drive was empty. (50 Gb available). The operating system had suggested to compress data. From a technical perspective this is correct. You need diskspace. However, a suggestion to move data from the C: drive to the D: drive would be much better. If data should be compressed on the C: drive the application should only compress user data instead of operating system files or program files.

Conclusion: Compress data can be useful but should not be used on all files on the disk. With a little bit more logic in the application a user friendly compress application can be made which will add value for a user. You can't expect from a user that he knows what the impact is of compressing operation system of program files.

Wednesday, October 15, 2008

Log data outside your database.

As described in my previous blog about building a server environment that's scalable and reliable one of the ideas is to log outside your database. In this blog I will describe the advantages of doing this.
Logging is essential to manage your infrastructure. Within an application you can divide the data you have into 2 types:
1) Transactional data. For instance, an financial entry, customer information, ....
2) Log data. For instance Date time that the financial entry is created. Time spend to create the financial entry, .....

What are advantages of logging data outside the database. This means you will have only transactional data in your database.
1) No bottleneck in your database to insert log records.
2) The database transaction log will be much smaller because most insert, update actions are done on log tables. Backup of transaction log will be smaller and faster, including the restore if needed.
3) Backup of data log can be less frequent as transactional data. Loss of log data will not hurt the users. It will hurt the manageability of the system.
4) Better use of data cache on the server. Cache is only filled with transactional data. Data records are not part of the data cache anymore.

Of course every advantage has his own disadvantage. For instance:
1) Separate database(s) for logging records, which results in more databases to manage.
2) If log data is logged in local files, how to aggregate this local files to one log table. You need to build a mechanism for it.

Friday, October 10, 2008

Building a server environment that’s scalable and reliable

Yesterday I joined a web cast on http://www.fastcompany.tv about Building a server environment that's scalable and reliable. In this session the architects of Technorati and Friendfeed told about their experiences. They had some interesting quotes about building reliable and scalable applications.
1) Logging is essential to estimate future usages of the system. However, logging in the database is too IO intensive. Log on the web server and aggregate the log files of all web servers in a separate database.
2) Data which are mostly static are not necessary to retrieve from the database. For instance a TOP 10 list of most popular documents. Store these lists locally on the web server. Local file IO is always faster than database retrieval. Build a mechanism to distribute the Top 10 List to all your web server.
3) Every new feature should be able take back. If a new feature does not work, or has a too big impact on the performance of the system, you should be able to remove this feature.
4) Push new code to specific user groups. When you have 700 web servers you are not able to update these web server at the same time.
5) If data is not in the cache, show nothing otherwise the system can't handle the load. If 10.000 users start retrieving this unavailable cached data from the server, your server will die. Build a mechanism to check if cached data is available. If not update your cache, so your application can make use of it.
6) Geographic user performance measurement.
7) During designing of functionality: Keep it Simple. It's already complex enough to manage, deploy and maintain.

The conclusion I had from these web cast:

It is Near Real Time.

It's all about partitioning of data. These quotes did realize me that we still are thinking to traditional in the way we are developing applications. Think in big numbers.

Monday, October 6, 2008

The difference between solutions from a technical perspective and from a user perspective.

In my daily work I have seen a lot of applications which are brilliant applications from a functional point of view. However, they are build from a technical point of view. What I mean, the application is doing what is should do, but it is not an 'easy to use' application for the standard user. When I test software, I try to test as a user with no affinity for technology. These users uses the software to do their job, not because they like to know all features of the software. For these users, software is like car. They have a driver license and need to drive from A to B. The car should always start and bring them from A to B. Without the need to have knowledge about how the engine is working.

I uses 2 different people in my direct environment as reflection point:
1) My wife. She is graduated at the university but has no affinity with technology. She always stick to the default. She is afraid to do something wrong if she make changes to the default settings. Most settings are to complex, she has no idea what the impact is of changing these settings.
2) My father. He start using a PC 25 years ago with the spreadsheet program VisiCalc. He uses VisiCalc because it saves him time to make financial reports and estimates for his boss. Later on he uses Lotus 123 and currently Excel. However he is still a user and not interested in advanced features. I got a lot of phone calls to assist him with his computer, printer, wireless router and even in Excel.

They both do their job very well and need to use the PC to do their job. When I got question of them I asked my self the question: Why are they asking this question? Why is the software not clear? How can we avoid this kind of questions? Be very critical to yourself to understand the root cause of the question. When I answered these kind of questions I understand why they ask these questions. In most of the situations the application is build from a technical perspective instead of a user perspective.

Example: My father creates an email with some pictures of my son. The email is send but he got an email from my internet provider: The email you send to to big to deliver. My father does not know what to do and pick's up the phone to call me. He has 2 questions: What did I wrong? How can I send the pictures to you? The technical solution is simple, resize the pictures in the email from 3 Mb to 50 Kb. This will make the email small enough to be send successfully. From a user perspective you can ask next questions:

How does my father know when a email message is to big?
How does my father know that you can resize a picture to a smaller format?
How does my father know how to resize a picture?

What can be a solution:
1) Explain the user what the maximum size of an email is and redirect the user to an help file in which is explained how to resize pictures.
2) Resize pictures in emails automatically to a small format before sending. In 95% of the situations, pictures are send for viewing and not for printing. So a resized picture is enough.

Solution 1 is a technical solution, in some situations it will work but still it can be complex for people like my father.

Solution 2 is a solution from a user perspective. My father is not aware that the pictures are resized. He is happy that the pictures of my son are send successfully.

When thinking about the solution you should be VERY critical to find the root cause of the situation. In this case: Why do people attach pictures to emails? Is it for printing or for viewing. In this case I think for 95% for viewing. So 95% of the situations a resized picture is acceptable. From a user perspective the supplier of the email software should improve his software to resize picture automatically and send the original file size if requested by the user.

Friday, September 19, 2008

Optimize index structure: How to find duplicate indexes?

This is the fourth post in a serie about optimizing your index structure.

Previous posts in this serie:

Every database will have some indexes on a table to retrieve information efficiently. After a long period it can happen that different people have added indexes to tables which already exists. This is not usefull and results in unnecessary overhead, for example disk space usage and performance on updates, inserts and or deletes.
With next script you can retrieve all double indexes in your database.

CREATE VIEW vw_index_list AS
SELECT tbl.[name] AS TableName,
idx.[name] AS IndexName,
INDEX_COL( tbl.[name], idx.indid, 1 ) AS col1,
INDEX_COL( tbl.[name], idx.indid, 2 ) AS col2,
INDEX_COL( tbl.[name], idx.indid, 3 ) AS col3,
INDEX_COL( tbl.[name], idx.indid, 4 ) AS col4,
INDEX_COL( tbl.[name], idx.indid, 5 ) AS col5,
INDEX_COL( tbl.[name], idx.indid, 6 ) AS col6,
INDEX_COL( tbl.[name], idx.indid, 7 ) AS col7,
INDEX_COL( tbl.[name], idx.indid, 8 ) AS col8,
INDEX_COL( tbl.[name], idx.indid, 9 ) AS col9,
INDEX_COL( tbl.[name], idx.indid, 10 ) AS col10,
INDEX_COL( tbl.[name], idx.indid, 11 ) AS col11,
INDEX_COL( tbl.[name], idx.indid, 12 ) AS col12,
INDEX_COL( tbl.[name], idx.indid, 13 ) AS col13,
INDEX_COL( tbl.[name], idx.indid, 14 ) AS col14,
INDEX_COL( tbl.[name], idx.indid, 15 ) AS col15,
INDEX_COL( tbl.[name], idx.indid, 16 ) AS col16,
dpages,
used,
rowcnt
FROM SYSINDEXES idx
INNER JOIN SYSOBJECTS tbl ON idx.[id] = tbl.[id]
WHERE indid > 0
AND INDEXPROPERTY( tbl.[id], idx.[name], 'IsStatistics') = 0

GO

SELECT l1.tablename,
l1.indexname,
l2.indexname AS duplicateIndex,
l1.col1, l1.col2, l1.col3, l1.col4, l1.col5, l1.col6, l1.col7, l1.col8,
l1.col9, l1.col10, l1.col11, l1.col12, l1.col13, l1.col14, l1.col15, l1.col16,
l1.dpages,l1.used,l1.rowcnt
FROM vw_index_list l1
INNER JOIN vw_index_list l2 ON l1.tablename = l2.tablename
AND l1.indexname <> l2.indexname
AND l1.col1 = l2.col1
AND COALESCE(l1.col2,'') = COALESCE(l2.col2,'')
AND COALESCE(l1.col3,'') = COALESCE(l2.col3,'')
AND COALESCE(l1.col4,'') = COALESCE(l2.col4,'')
AND COALESCE(l1.col5,'') = COALESCE(l2.col5,'')
AND COALESCE(l1.col6,'') = COALESCE(l2.col6,'')
AND COALESCE(l1.col7,'') = COALESCE(l2.col7,'')
AND COALESCE(l1.col8,'') = COALESCE(l2.col8,'')
AND COALESCE(l1.col9,'') = COALESCE(l2.col9,'')
AND COALESCE(l1.col10,'') = COALESCE(l2.col10,'')
AND COALESCE(l1.col11,'') = COALESCE(l2.col11,'')
AND COALESCE(l1.col12,'') = COALESCE(l2.col12,'')
AND COALESCE(l1.col13,'') = COALESCE(l2.col13,'')
AND COALESCE(l1.col14,'') = COALESCE(l2.col14,'')
AND COALESCE(l1.col15,'') = COALESCE(l2.col15,'')
AND COALESCE(l1.col16,'') = COALESCE(l2.col16,'')
ORDER BY
l1.tablename,
l1.indexname

GO

DROP VIEW Vw_index_list

Tuesday, August 26, 2008

First find the bottleneck before start making improvements in your application.

Every developer has been once or more in a situation in which people ask them to improve the performance of his\her application. What is the best approach? In this blog I will explain my approach to efficiently start to improve the performance of your application. The biggest improvements can only be made in the main bottlenecks. First I will start what you should NOT do:

Directly start improving your code for the parts you expect it is the main bottleneck.
Make ‘improvements’ in your applications without measuring the improvement. How can you know if it is really an improvement?
Discussing if the correct hardware is used. The fact is that somebody is complaining about your application. Based on the facts in log files you can prove if incorrect hardware is used.

What should you do before making changes in your code? Find the main bottlenecks. The analysis of an application should be done on facts, not assumptions.

Ask the people who complain to measure the performance of your application with tools like Performance monitor and SQL Profiler.
If possible, ask for a backup to reproduce the scenario in which the performance is bad.
Think from a user perspective. How long is a user waiting until the application is finished? I will call this Time Spend. Assume you are the user. Will you be happy and enthusiastic with this Time spend? The answer should be Yes. If the answer is No, you can start the challenge to get happy and enthusiastic.
Start analyzing the SQL profile log. In the SQL profile log you can focus on 3 different types of bottlenecks:
a) Long running queries.
b) Big number of small queries
c) Application processing time. (Time between queries)

Now I will describe how to find one of these 3 bottlenecks.

Save the SQL Profile to a table, so we are able to analyze the data with the query analyzer.
Execute next query to get some statistics of the SQL profile. In this example I have saved the profile to table [Profile]

DECLARE @AnalyzeTime VARCHAR(20)
SET @AnalyzeTime = '2008-08-25 16:01:30'
-- Profile statistics of application
SELECT SUM(DATEDIFF(ms, A.Starttime,A.Endtime)) AS SUMQueryTime,
SUM(DATEDIFF(ms, B.Endtime,A.Starttime)) AS SUMTimeDiffPrevQuery
,DATEDIFF(ms, MIN(A.Starttime),Max(A.Endtime)) AS TimeSpend, Count(*) As NumberOfQueries
FROM [Profile] A
INNER JOIN [Profile] B ON A.Rownumber = B.Rownumber + 1
WHERE A.TEXTDATA IS NOT NULL
AND A.starttime > @AnalyzeTime

All times are milliseconds.
SUMQueryTime = Total duration of all traces SQL queries
SUMTimeDiffPrevQuery = Total time between different queries. This is the time that SQL server is waiting after finishing Query(X) to start Query(X+1)
TimeSpend = Total duration between Start time of the first query and the end time of the last query.
NumberOfQueries = Total number of queries in this log

Analyze the result of the query in previous step. I will use some examples

Example 1:

SUMQueryTime: 237
SUMTimeDiffPrevQuery : 3009
TimeSpend: 3246
NumberOfQueries: 203

Total time user the user is waiting is 3246 ms. Most time is spend between different queries 92% ( 3009 ms out of 3246 ms). Main bottleneck is the time spend between different queries. What is the application doing between the different queries. Optimize SQL queries is not useful. Assume you can improve your SQL queries with 90 %. This is really nice, however the improvement from a user perspective is only 213 ms (90 % of 237ms) which is an improvement of 6 %.(213 out of 3246).

Use next query to find all gaps between Query(X) and Query(X+1)

DECLARE @AnalyzeTime VARCHAR(20)
SET @AnalyzeTime = '2008-08-25 16:01:30'
-- Time difference with previous query > 0
SELECT B.Textdata as FirstQuery,A.Textdata as SecondQuery,DATEDIFF(ms, B.Endtime,A.Starttime) AS TimeDiffFirstSecondQ,A.Starttime, A.Endtime
FROM [Profile] A
INNER JOIN [Profile] B ON A.Rownumber = B.Rownumber + 1
WHERE DATEDIFF(ms, B.Endtime,A.Starttime) > 0
AND A.starttime > @AnalyzeTime

Example 2:

SUMQueryTime: 15500
SUMTimeDiffPrevQuery: 100
TimeSpend: 15600
NumberOfQueries: 376

Total time user is waiting is 15,6 seconds. 99% (15500/15600) is spent by SQL server to execute the 376 queries. Use next query to find the most long running query.

DECLARE @AnalyzeTime VARCHAR(20)
SET @AnalyzeTime = '2008-08-25 16:01:30'
-- Long queries
SELECT TOP 25 A.Textdata,DATEDIFF(ms, A.Starttime,A.Endtime) AS QueryTime,A.Starttime, A.Endtime
FROM [Profile] A
WHERE DATEDIFF(ms, A.Starttime,A.Endtime) > 0
AND A.starttime > @AnalyzeTime
ORDER BY DATEDIFF(ms, A.Starttime,A.Endtime) DESC

Example 3:

SUMQueryTime: 5000
SUMTimeDiffPrevQuery: 2000
TimeSpend: 7000
NumberOfQueries: 4500

71% of time is spend by SQL server (5000/7000). However, after running the query to retrieve the long running queries, you see that the longest query in this log file takes 50 ms to execute. Optimize a lot of individual queries will take a lot of time to do. Because of the big number of queries in relation with the total Time Spend, a better approach is to avoid queries. Every query you can avoid will save SUMQueryTime but it will also save SumTimeDiffPrevQuery. Some ideas to lower the number of queries:
· Retrieve data of multiple queries into one. For instance instead of retrieve every order line in a separate query, retrieve all order line in one query.
· Is every query really need for the scenario of the customer. For instance is the retrieved description of a general ledger account displayed in the user interface.
· Is it possible to cache some data at the client side? For instance default currency of the administration. Data retrieved from cache will save a query on the database.

Now you know your main bottlenecks and can start spending your time on these bottlenecks. In most situation the 80/20 rule will apply. 80 percent of the performance can be improved by 20 % of your code.

Thursday, July 17, 2008

Why Keep It Simple and Fast?

In the past I have seen a lot of applications which do not perform very well. The root cause of these performance issues are related to next factors:
· Functionality
· Number of records to retrieve from the database
· Database model

Functionality. Lot applications are started as simple applications. Later on a lot of functionality is added which resulted in a very complex application with a lot of settings. By adding new settings, the matrix of all different combinations is getting more complex. The challenge, for the functional designer, is to design an application which is still easy to understand and easy to use. The more options you have, the bigger the challenge. At the end this is a mission impossible. The technical designer has the same challenge. How can I develop an application which supports all options which is secure and has an acceptable performance? The more options you have, the more complex your queries will be. In most situations, the performance of a complex query is not as good as the performance of a simple query.

Number of records to retrieve. In a database with a small amount of records, the performance is mostly acceptable. The impact of a table scan on a small table is not so big. However after a while, the number of records in your table is growing. A table scan is not allowed anymore because the impact of a table scan on performance is too big. Complex queries on tables with a lot of records will results in complex query plans. Generating the query plan can take some time but even more the execution of this query plan.

Database model. During the technical design of a new application, a database model is made which supports the functionality of that application in the best way. Later on, new functionality is added and build on the current database model. This is not always optimal. The best way to support this new functionality is another database model. However I can imagine why a redesign is not always possible. Redesign of your database model requires you to rewrite the current code and to convert the data to the new database model. This can have a huge impact for your customers with a lot of records in the tables you need to convert.

The challenge is and will always be: Find the perfect balance between functionality and performance in a way that the user can use your application without any help. Great functionality which does not perform will never be used.