Wednesday, February 27, 2013

Memory management in Power Pivot: Column oriented databases.


Power Pivot is a perfect personal Business Intelligence tool. It is simple to use and the performance of the Power Pivot engine is really great. To better understand this engine, so you can even better make use of it, I will explain how this engine is working.

Row oriented versus column oriented databases.

All traditional relational databases, including SQL Server, are row oriented databases. They store data in tables row by row. The row of a table is the main unit of storage. Indexes are used to point to all columns of a certain row. It depends on the definition of the index which records belongs to this index.

A column-oriented database, like Power Pivot, uses a different approach. Every column is considered as a separate entity. Data is stored for every column in a separate way. I will explain this with an example.
 
ID
Car
Engine
Color
1 Audi A4 Petrol Silver
2 Audi A4 Gazole Red
3 Audi A4 Gazole Blue
4 BMW Petrol Silver
5 BMW Gazole Silver
6 BMW Gazole Red
7 Mercedes Gazole Blue

 Every column will have it's own sorted dictionary with all distinct values and a bitmap index references the actual values of each item in the column by using a zero-based index to the dictionary. Next table will show the dictionary values and index values.
 
Column
Dictionary
Values
ID
32,23,10,43,57,65,71
2,1,0,3,4,5,6
Car
Audi,BMW,Mercedes
0,0,0,1,1,1,2
Engine
Petrol, Gazole
0,1,1,0,1,1,1
Color
Silver, Red, Blue
0,1,2,0,0,1,2

As you can see, the dictionary can be the most expansive part of the index. Especially if a high number of distinct values exists in a column. The lower the number of distinct values in a column the smaller the size of dictionary for this column. This will make the value bitmap index more efficient.

The xVelocity engine, which is implemented on Power Pivot, is an in-memory database. This means that it has been designed and optimized assuming that the whole database is loaded in memory. Data is compressed in memory and dynamically uncompressed during each query. Because all data is kept in memory it is essential to be critical which data to import in your Power Pivot sheet. For instance customer data can be useful like, country, state. However street name is not efficient. Every customer will have a unique address which will result in a big dictionary without a low number of distinct values. It will have a high number of distinct values.

Enjoy the power of Power Pivot.

No comments: