Thursday, February 28, 2013

Performance tips for your Power Pivot sheet


Power Pivot is a really good personal Business Intelligence tool with a great performance. However, for every tool there are tips to optimize the performance. In Power Pivot you need to define the BISM. (Business Intelligence Semantic model), please take next tips into consideration during the design of your BISM model:

  • Use views to import data in Power Pivot. The view will contain the business logic of how the data is stored in your database. If changes are made to your business logic, you only need to change the views. The Power Pivot sheet will still work.
  • Use logical columns names in the views. For instance [Account code] in stead of debnr. Everybody should understand what kind of content is stored in each column.
  • Import only columns you really need. Avoid SELECT * FROM MyView1 As described in my previous blog post: Memory management in Power Pivot, all data is kept in memory. Every column which is not used will use memory which can not be used for other purposes.
  • Import columns which are useful for analytics purposes. For instance for customer data: Account code, Country, State. Columns like street name are not so useful. As described here, it will create a lot of distinct values in your dictionary for this column. This will have a negative impact on performance.
  • Import DateTime columns in 2 separate columns. One Date column and one Time column. If time portion is not useful for your analytics do not import it at all.
  • Import master data in separate tabs. For instance all item attributes in one tab and use the item key in all transactional tabs. Link the item key from the transactional tab to the item key of the Item master tab.
  • Reduce the number of rows to import. If you analyse on month level, group all data in the view to the level you want. For instance group by Date, Item, Amount. This will save a lot of rows to import. Of course, this is not possible sometimes because you do not want to loose the granularity of analysis.
  • Reduce the number of rows to import by selecting only the subset you are going the analyze. For instance your database contains financial transaction as of financial year 2008. If you need to analyze of the current and previous year, import only the last 2 years.
  • Optimize column data types. A column with few distinct values will be lighter than a column with a high number of distinct values. This is important also for measures, which are considered also possible quantitative attributes. If the measure you are storing is a float and is the result of a calculation, consider reducing the number of digits to be imported. This will reduce the size of the dictionary, and possibly also the number of distinct values.
  • Avoid high-cardinality columns. Columns with unique ID's like invoice numbers are very expensive. Sometimes you can skip this columns and use the COUNTROWS function instead of the DISTINCTCOUNT.
  • Use measures instead of calculated columns if possible. Calculated columns are stored as an imported column. This does not apply to calculated measures. A calculated measure is calculated at query time.
  • In case you need to store a measure in a calculated column, consider to reduce the number of digits of the calculation.
  • Normalizing data doesn’t have a big effect on the size of the resulting database. However, it might have a strong impact on both processing time and memory required to process data. The key is to find a right balance. A Star schema is in most situation the right balance.
Enjoy it, to make your Power Pivot sheets even more powerful.

No comments: