Thursday, February 28, 2013

Performance tips for your Power Pivot sheet


Power Pivot is a really good personal Business Intelligence tool with a great performance. However, for every tool there are tips to optimize the performance. In Power Pivot you need to define the BISM. (Business Intelligence Semantic model), please take next tips into consideration during the design of your BISM model:

  • Use views to import data in Power Pivot. The view will contain the business logic of how the data is stored in your database. If changes are made to your business logic, you only need to change the views. The Power Pivot sheet will still work.
  • Use logical columns names in the views. For instance [Account code] in stead of debnr. Everybody should understand what kind of content is stored in each column.
  • Import only columns you really need. Avoid SELECT * FROM MyView1 As described in my previous blog post: Memory management in Power Pivot, all data is kept in memory. Every column which is not used will use memory which can not be used for other purposes.
  • Import columns which are useful for analytics purposes. For instance for customer data: Account code, Country, State. Columns like street name are not so useful. As described here, it will create a lot of distinct values in your dictionary for this column. This will have a negative impact on performance.
  • Import DateTime columns in 2 separate columns. One Date column and one Time column. If time portion is not useful for your analytics do not import it at all.
  • Import master data in separate tabs. For instance all item attributes in one tab and use the item key in all transactional tabs. Link the item key from the transactional tab to the item key of the Item master tab.
  • Reduce the number of rows to import. If you analyse on month level, group all data in the view to the level you want. For instance group by Date, Item, Amount. This will save a lot of rows to import. Of course, this is not possible sometimes because you do not want to loose the granularity of analysis.
  • Reduce the number of rows to import by selecting only the subset you are going the analyze. For instance your database contains financial transaction as of financial year 2008. If you need to analyze of the current and previous year, import only the last 2 years.
  • Optimize column data types. A column with few distinct values will be lighter than a column with a high number of distinct values. This is important also for measures, which are considered also possible quantitative attributes. If the measure you are storing is a float and is the result of a calculation, consider reducing the number of digits to be imported. This will reduce the size of the dictionary, and possibly also the number of distinct values.
  • Avoid high-cardinality columns. Columns with unique ID's like invoice numbers are very expensive. Sometimes you can skip this columns and use the COUNTROWS function instead of the DISTINCTCOUNT.
  • Use measures instead of calculated columns if possible. Calculated columns are stored as an imported column. This does not apply to calculated measures. A calculated measure is calculated at query time.
  • In case you need to store a measure in a calculated column, consider to reduce the number of digits of the calculation.
  • Normalizing data doesn’t have a big effect on the size of the resulting database. However, it might have a strong impact on both processing time and memory required to process data. The key is to find a right balance. A Star schema is in most situation the right balance.
Enjoy it, to make your Power Pivot sheets even more powerful.

Wednesday, February 27, 2013

Memory management in Power Pivot: Column oriented databases.


Power Pivot is a perfect personal Business Intelligence tool. It is simple to use and the performance of the Power Pivot engine is really great. To better understand this engine, so you can even better make use of it, I will explain how this engine is working.

Row oriented versus column oriented databases.

All traditional relational databases, including SQL Server, are row oriented databases. They store data in tables row by row. The row of a table is the main unit of storage. Indexes are used to point to all columns of a certain row. It depends on the definition of the index which records belongs to this index.

A column-oriented database, like Power Pivot, uses a different approach. Every column is considered as a separate entity. Data is stored for every column in a separate way. I will explain this with an example.
 
ID
Car
Engine
Color
1 Audi A4 Petrol Silver
2 Audi A4 Gazole Red
3 Audi A4 Gazole Blue
4 BMW Petrol Silver
5 BMW Gazole Silver
6 BMW Gazole Red
7 Mercedes Gazole Blue

 Every column will have it's own sorted dictionary with all distinct values and a bitmap index references the actual values of each item in the column by using a zero-based index to the dictionary. Next table will show the dictionary values and index values.
 
Column
Dictionary
Values
ID
32,23,10,43,57,65,71
2,1,0,3,4,5,6
Car
Audi,BMW,Mercedes
0,0,0,1,1,1,2
Engine
Petrol, Gazole
0,1,1,0,1,1,1
Color
Silver, Red, Blue
0,1,2,0,0,1,2

As you can see, the dictionary can be the most expansive part of the index. Especially if a high number of distinct values exists in a column. The lower the number of distinct values in a column the smaller the size of dictionary for this column. This will make the value bitmap index more efficient.

The xVelocity engine, which is implemented on Power Pivot, is an in-memory database. This means that it has been designed and optimized assuming that the whole database is loaded in memory. Data is compressed in memory and dynamically uncompressed during each query. Because all data is kept in memory it is essential to be critical which data to import in your Power Pivot sheet. For instance customer data can be useful like, country, state. However street name is not efficient. Every customer will have a unique address which will result in a big dictionary without a low number of distinct values. It will have a high number of distinct values.

Enjoy the power of Power Pivot.

Monday, February 25, 2013

How to create XML with a TSQL query?

In this blog post I will describe how you can generate an XML file using TSQL statements. For instance for data migrations, you need to export data from your SQL database which can be imported via XML in another system.

The solution is really simple. Add ''FOR XML" to your SELECT query.

Example 1:
SELECT res_id, sur_name, first_name
FROM Humres
WHERE Res_id > 0
FOR XML PATH ('Resource')







The „FOR XML‟ always needs to be completed with the „AUTO‟ or „PATH‟ command:
  • When using „AUTO‟ command every column in the SELECT query will be handled as an attribute in a single element per records.
  • When including the „PATH(< path name >)‟ command the XML path can be set. Every records starts with its own parent element having the label as defined in the „PATH‟ command. Every column in the SELECT query will be handled as child element.
In this example the „PATH‟ command is used since this allows better control.
 
Example 2:
The next step would be to include custom column names to be used in the XML elements (rather than using „res_id‟, „sur_name‟, etc.) and include the resource number as an attribute in the „Resource‟ element.
 
SELECT res_id AS '@number',
 RTRIM(sur_name) AS 'LastName',
 RTRIM(first_name) AS 'FirstName'
FROM Humres
WHERE Res_id > 0
FOR XML PATH ('Resource')
 
 
 
 
 
 
 
 
 
 
Explanation:
  • Use a „@‟ in the column name results in an attribute
  • Including the RTRIM command trims the value (removing the extra spaces at the end of the value).
  • In many cases XML is case sensitive. Therefore make sure to use the correct attribute and element names.
Example 3:
A final step in creating a basic XML file would be to include the root element. A root element can be included by simply adding the command „ROOT(< root name >)‟ to the XML command in the SQL query.
 
SELECT res_id AS '@number',
 RTRIM(sur_name) AS 'LastName',
 RTRIM(first_name) AS 'FirstName'
FROM Humres
WHERE Res_id > 0
FOR XML PATH ('Resource'), ROOT('Resources')
 
 
Please note:
Sometimes a second root element is needed. For instance in Exact  (the element). Since the XML formatting of SQL queries only allows one root element, this can only be handled by using sub queries (or adding the root element manually to the output file). The use of sub queries will be explained in the following paragraph.
 
Example 4: Creating child-elements
There are two ways of generating child-elements as part of your parent element (which is defined in the „PATH‟ command and applies to every record).
Example 4.1 Child elements with 1:1 relationship
The first way can only be used in case there is a 1:1 relationship between the parent element (in our example the resource records) and the child element (in the example below the title record). In this case the child-element can be generated by including the element name in the column names (in the SELECT section):

SELECT
h.res_id as '@number',
RTRIM(h.sur_name) as 'LastName',
RTRIM(h.first_name) as 'FirstName',
RTRIM(p.predcode) as 'Title/@code',
p.aan_oms as 'Title/Description',
p.aanhef as 'Title/Salutation'
FROM humres h
LEFT JOIN pred p ON h.predcode = p.predcode
WHERE res_id > 0
FOR XML PATH('Resource'), ROOT('Resources')
 
 
Explanation:
  • A LEFT JOIN on table „pred‟ has been included to get the prefix data.
  • Correlation names „h‟ and „p‟ have been included to easily refer to the correct tables (in this case „humres‟ and „pred‟).
  • By including a forward slash (“/”) in the custom column names, a child element can be generated. The child element name needs to be defined on the left side of the forward slash.
  • Multiple forward slashed can be used in the column names to use deeper child element levels.
Example 4.2 Child elements with 1:N relationship
In case a 1:N relationship exist, such as one customer having multiple contacts, the child elements should be generated based on a sub query which gets all matching records. In our example, using the resources table, the resource is the parent element and the roles are the child elements. The sub query should get all roles linked to the resource and generate the corresponding child elements.
First create the sub query with a XML mark-up to get all roles:
SELECT
r.RoleID as '@code',
r.RoleLevel as '@level',
rd.Description as 'Description'
FROM humres h2
LEFT JOIN HRRoles r ON h2.res_id = r.EmpID
LEFT JOIN HRRoleDefs rd ON r.RoleID = rd.ID
FOR XML PATH('Role'), ROOT('Roles')
 
 

Explanation:
  • The query gets all resources (humres), linked to roles (hrroles) and the role details (hrroledefs).
  • The correlation name for humres is set to h2 since this query will become a sub query in which “h” already exists.
Next, this query needs to be part of the main query. This can be done by including it in the SELECT section of our main query and by making sure the sub query returns only the roles per specific resource.
 
SELECT
h.res_id as '@number',
RTRIM(h.sur_name) as 'LastName',
RTRIM(h.first_name) as 'FirstName',
RTRIM(p.predcode) as 'Title/@code',
p.aan_oms as 'Title/Description',
p.aanhef as 'Title/Salutation',
(
SELECT
r.RoleID as '@code',
r.RoleLevel as '@level',
rd.Description as 'Description'
FROM humres h2
LEFT JOIN HRRoles r ON h2.res_id = r.EmpID
LEFT JOIN HRRoleDefs rd ON r.RoleID = rd.ID
WHERE h2.res_id = h.res_id
FOR XML PATH('Role'), ROOT('Roles'), TYPE
)
FROM humres h
LEFT JOIN pred p ON h.predcode = p.predcode
WHERE res_id > 0
FOR XML PATH('Resource'), ROOT('Resources')
 
 

Explanation:
  • In the WHERE section of the sub query the filter „h2.res_id = h.res_id‟ has been added to make sure only the roles per user are taken.
  • In the FOR XML section the command „TYPE‟ has been added. If left out, the query result of the sub query will be alphanumeric (varchar) instead of XML. This means, with the „TYPE‟, the sub query result will be printed as a text/string.
  • Note that in this case it is “legal” to have multiple records and fields coming from a sub query. In standard (non-XML) SELECT queries having a sub query in the SELECT should only result in one record and field.
Example 5. XML file format
When running a SELECT query with XML commands, the XML output is in „Unicode (UTF-8)‟ coding by default:
 
 

When saving the output file, the file format will be using this encoding. Therefore, make sure to select the correct encoding type supported by the target application. (One of) the encoding type supported by Exact is „Western European (Windows)‟.
 
Enjoy it to create your own XML files via TSQL.