Wednesday, October 15, 2008

Log data outside your database.

As described in my previous blog about building a server environment that's scalable and reliable one of the ideas is to log outside your database. In this blog I will describe the advantages of doing this.
Logging is essential to manage your infrastructure. Within an application you can divide the data you have into 2 types:
1) Transactional data. For instance, an financial entry, customer information, ....
2) Log data. For instance Date time that the financial entry is created. Time spend to create the financial entry, .....

What are advantages of logging data outside the database. This means you will have only transactional data in your database.
1) No bottleneck in your database to insert log records.
2) The database transaction log will be much smaller because most insert, update actions are done on log tables. Backup of transaction log will be smaller and faster, including the restore if needed.
3) Backup of data log can be less frequent as transactional data. Loss of log data will not hurt the users. It will hurt the manageability of the system.
4) Better use of data cache on the server. Cache is only filled with transactional data. Data records are not part of the data cache anymore.

Of course every advantage has his own disadvantage. For instance:
1) Separate database(s) for logging records, which results in more databases to manage.
2) If log data is logged in local files, how to aggregate this local files to one log table. You need to build a mechanism for it.

Friday, October 10, 2008

Building a server environment that’s scalable and reliable

Yesterday I joined a web cast on about Building a server environment that's scalable and reliable. In this session the architects of Technorati and Friendfeed told about their experiences. They had some interesting quotes about building reliable and scalable applications.
1) Logging is essential to estimate future usages of the system. However, logging in the database is too IO intensive. Log on the web server and aggregate the log files of all web servers in a separate database.
2) Data which are mostly static are not necessary to retrieve from the database. For instance a TOP 10 list of most popular documents. Store these lists locally on the web server. Local file IO is always faster than database retrieval. Build a mechanism to distribute the Top 10 List to all your web server.
3) Every new feature should be able take back. If a new feature does not work, or has a too big impact on the performance of the system, you should be able to remove this feature.
4) Push new code to specific user groups. When you have 700 web servers you are not able to update these web server at the same time.
5) If data is not in the cache, show nothing otherwise the system can't handle the load. If 10.000 users start retrieving this unavailable cached data from the server, your server will die. Build a mechanism to check if cached data is available. If not update your cache, so your application can make use of it.
6) Geographic user performance measurement.
7) During designing of functionality: Keep it Simple. It's already complex enough to manage, deploy and maintain.

The conclusion I had from these web cast:

It is Near Real Time.

It's all about partitioning of data. These quotes did realize me that we still are thinking to traditional in the way we are developing applications. Think in big numbers.

Monday, October 6, 2008

The difference between solutions from a technical perspective and from a user perspective.

In my daily work I have seen a lot of applications which are brilliant applications from a functional point of view. However, they are build from a technical point of view. What I mean, the application is doing what is should do, but it is not an 'easy to use' application for the standard user. When I test software, I try to test as a user with no affinity for technology. These users uses the software to do their job, not because they like to know all features of the software. For these users, software is like car. They have a driver license and need to drive from A to B. The car should always start and bring them from A to B. Without the need to have knowledge about how the engine is working.

I uses 2 different people in my direct environment as reflection point:
1) My wife. She is graduated at the university but has no affinity with technology. She always stick to the default. She is afraid to do something wrong if she make changes to the default settings. Most settings are to complex, she has no idea what the impact is of changing these settings.
2) My father. He start using a PC 25 years ago with the spreadsheet program VisiCalc. He uses VisiCalc because it saves him time to make financial reports and estimates for his boss. Later on he uses Lotus 123 and currently Excel. However he is still a user and not interested in advanced features. I got a lot of phone calls to assist him with his computer, printer, wireless router and even in Excel.

They both do their job very well and need to use the PC to do their job. When I got question of them I asked my self the question: Why are they asking this question? Why is the software not clear? How can we avoid this kind of questions? Be very critical to yourself to understand the root cause of the question. When I answered these kind of questions I understand why they ask these questions. In most of the situations the application is build from a technical perspective instead of a user perspective.

Example: My father creates an email with some pictures of my son. The email is send but he got an email from my internet provider: The email you send to to big to deliver. My father does not know what to do and pick's up the phone to call me. He has 2 questions: What did I wrong? How can I send the pictures to you? The technical solution is simple, resize the pictures in the email from 3 Mb to 50 Kb. This will make the email small enough to be send successfully. From a user perspective you can ask next questions:

How does my father know when a email message is to big?
How does my father know that you can resize a picture to a smaller format?
How does my father know how to resize a picture?

What can be a solution:
1) Explain the user what the maximum size of an email is and redirect the user to an help file in which is explained how to resize pictures.
2) Resize pictures in emails automatically to a small format before sending. In 95% of the situations, pictures are send for viewing and not for printing. So a resized picture is enough.

Solution 1 is a technical solution, in some situations it will work but still it can be complex for people like my father.

Solution 2 is a solution from a user perspective. My father is not aware that the pictures are resized. He is happy that the pictures of my son are send successfully.

When thinking about the solution you should be VERY critical to find the root cause of the situation. In this case: Why do people attach pictures to emails? Is it for printing or for viewing. In this case I think for 95% for viewing. So 95% of the situations a resized picture is acceptable. From a user perspective the supplier of the email software should improve his software to resize picture automatically and send the original file size if requested by the user.