Keep It Simple and Fast: tips

Showing posts with label tips. Show all posts

Thursday, February 28, 2013

Performance tips for your Power Pivot sheet

Power Pivot is a really good personal Business Intelligence tool with a great performance. However, for every tool there are tips to optimize the performance. In Power Pivot you need to define the BISM. (Business Intelligence Semantic model), please take next tips into consideration during the design of your BISM model:

Use views to import data in Power Pivot. The view will contain the business logic of how the data is stored in your database. If changes are made to your business logic, you only need to change the views. The Power Pivot sheet will still work.
Use logical columns names in the views. For instance [Account code] in stead of debnr. Everybody should understand what kind of content is stored in each column.
Import only columns you really need. Avoid SELECT * FROM MyView1 As described in my previous blog post: Memory management in Power Pivot, all data is kept in memory. Every column which is not used will use memory which can not be used for other purposes.
Import columns which are useful for analytics purposes. For instance for customer data: Account code, Country, State. Columns like street name are not so useful. As described here, it will create a lot of distinct values in your dictionary for this column. This will have a negative impact on performance.
Import DateTime columns in 2 separate columns. One Date column and one Time column. If time portion is not useful for your analytics do not import it at all.
Import master data in separate tabs. For instance all item attributes in one tab and use the item key in all transactional tabs. Link the item key from the transactional tab to the item key of the Item master tab.
Reduce the number of rows to import. If you analyse on month level, group all data in the view to the level you want. For instance group by Date, Item, Amount. This will save a lot of rows to import. Of course, this is not possible sometimes because you do not want to loose the granularity of analysis.
Reduce the number of rows to import by selecting only the subset you are going the analyze. For instance your database contains financial transaction as of financial year 2008. If you need to analyze of the current and previous year, import only the last 2 years.
Optimize column data types. A column with few distinct values will be lighter than a column with a high number of distinct values. This is important also for measures, which are considered also possible quantitative attributes. If the measure you are storing is a float and is the result of a calculation, consider reducing the number of digits to be imported. This will reduce the size of the dictionary, and possibly also the number of distinct values.
Avoid high-cardinality columns. Columns with unique ID's like invoice numbers are very expensive. Sometimes you can skip this columns and use the COUNTROWS function instead of the DISTINCTCOUNT.
Use measures instead of calculated columns if possible. Calculated columns are stored as an imported column. This does not apply to calculated measures. A calculated measure is calculated at query time.
In case you need to store a measure in a calculated column, consider to reduce the number of digits of the calculation.
Normalizing data doesn’t have a big effect on the size of the resulting database. However, it might have a strong impact on both processing time and memory required to process data. The key is to find a right balance. A Star schema is in most situation the right balance.

Enjoy it, to make your Power Pivot sheets even more powerful.

Wednesday, February 27, 2013

Memory management in Power Pivot: Column oriented databases.

Power Pivot is a perfect personal Business Intelligence tool. It is simple to use and the performance of the Power Pivot engine is really great. To better understand this engine, so you can even better make use of it, I will explain how this engine is working.

Row oriented versus column oriented databases.

All traditional relational databases, including SQL Server, are row oriented databases. They store data in tables row by row. The row of a table is the main unit of storage. Indexes are used to point to all columns of a certain row. It depends on the definition of the index which records belongs to this index.

A column-oriented database, like Power Pivot, uses a different approach. Every column is considered as a separate entity. Data is stored for every column in a separate way. I will explain this with an example.

ID	Car	Engine	Color
1	Audi A4	Petrol	Silver
2	Audi A4	Gazole	Red
3	Audi A4	Gazole	Blue
4	BMW	Petrol	Silver
5	BMW	Gazole	Silver
6	BMW	Gazole	Red
7	Mercedes	Gazole	Blue

Every column will have it's own sorted dictionary with all distinct values and a bitmap index references the actual values of each item in the column by using a zero-based index to the dictionary. Next table will show the dictionary values and index values.

Column	Dictionary	Values
ID	32,23,10,43,57,65,71	2,1,0,3,4,5,6
Car	Audi,BMW,Mercedes	0,0,0,1,1,1,2
Engine	Petrol, Gazole	0,1,1,0,1,1,1
Color	Silver, Red, Blue	0,1,2,0,0,1,2

As you can see, the dictionary can be the most expansive part of the index. Especially if a high number of distinct values exists in a column. The lower the number of distinct values in a column the smaller the size of dictionary for this column. This will make the value bitmap index more efficient.

The xVelocity engine, which is implemented on Power Pivot, is an in-memory database. This means that it has been designed and optimized assuming that the whole database is loaded in memory. Data is compressed in memory and dynamically uncompressed during each query. Because all data is kept in memory it is essential to be critical which data to import in your Power Pivot sheet. For instance customer data can be useful like, country, state. However street name is not efficient. Every customer will have a unique address which will result in a big dictionary without a low number of distinct values. It will have a high number of distinct values.

Enjoy the power of Power Pivot.

Friday, July 1, 2011

More tips to improve performance of SSRS reports.

In the past I have written something about performance in combination with SQL Server Reporting Services.See Analyze performance of your reporting services reports by using SSRS statistics and Tips to improve performance of MS Reporting service reports (SSRS). In this blog post I will give some more tips to analyze and improve the performance of your SQL 2008 R2 SSRS reports.

To analyze the performance of your SSRS report (RDL) you need to understand what will have impact on the total time to generate the reports. The total time to generate a reporting server report (RDL) can be divided into 3 elements:

Time to retrieve the data (TimeDataRetrieval).
Time to process the report (TimeProcessing)
Time to render the report (TimeRendering)

Total time = (TimeDataRetrieval) + (TimeProcessing) + (TimeRendering)

As of SQL Server 2008 R2, this 3 performance components are logged every time for which a deployed report is executed. This information can be found in the table Executionlog3 in the ReportServer database. You can use next query:

SELECT TOP 10 Itempath,parameters,
     TimeDataRetrieval + TimeProcessing + TimeRendering as [total time],
     TimeDataRetrieval, TimeProcessing, TimeRendering,
     ByteCount, [RowCount],Source, AdditionalInfo
FROM ExecutionLog3
ORDER BY Timestart DESC

1. Itempath
This is the location and name of the executed report (RDL)

2. Parameter
The parameter values used to execute the report.

3. TimeDataRetrievalThe number of milliseconds spent interacting with data sources and data extensions for all data sets in the main report and all of its subreports. This value includes:

Time spent opening connections to the data source
Time spent reading data rows from the data extension

TimeDataRetrieval is is the time needed for SQL Server to retrieve the data of all datasets in your report. This is time spent openings connections to the data source and time spent reading data rows from the data extension. Keep in mind that all dataset defined in your report will be retrieved. Even if you do not use it in the report to display the data. Remove datasets which are not used to display data. Datasets will be executed in parallel, by making use of multiple database connections. In next example you will that my report is started at 2011-06-29 14:16:42.677 and ended at 2011-06-29 14:16:46.757. Total time to retrieve the data = 4 seconds and 80 milliseconds. However every the total time of every single data set is much more: 235 + 1242 + 2442 + 3470 + 3678 + 4069 = 15 seconds and 136 milliseconds.

In this example I have a server with multiple CPU's in it. Every dataset is retrieved using a seperate database connection. (See SPID 61, 63,64,65,66,67,68). Execution these datasets over multiple database connections saves a lot of time. TimeDataRetrieval contains the duration of the longest DataSet

4. TimeProcessingThe number of milliseconds spent in the processing engine for the request. This value includes:

Report processing bootstrap time
Tablix processing time (e.g. grouping, sorting, filtering, aggregations, subreport processing), but excludes on-demand expression evaluations (e.g. TextBox.Value, Style.*)
ProcessingScalabilityTime (see AdditionalInfo column)

Processing time can be very high if you have datasets with a big number of records for which a lot of GROUP BY and SORTING need to be done.

5. TimeRenderingThe number of milliseconds spent after the Rendering Object Model is exposed to the rendering extension. This value includes:

Time spent in renderer
Time spent in pagination modules
Time spent in on-demand expression evaluations (e.g. TextBox.Value, Style.*). This is different from prior releases of SQL server, where TimeProcessing included all expression evaluation.
PaginationScalabilityTime (see AdditionalInfo column)
RenderingScalabilityTime (see AdditionalInfo column)

6. ByteCount
Total number of bytes received from all datasets in the report.

7. RowCount
Total number of records received from all datasets in the report.

8. SourceSpecifies the type of the execution. It is one of the following values: Live, Cache, Snapshot, History, AdHoc, Session, Rdce

Live indicates a live executed dataset queries.
Cache indicates a cached execution, i.e. dataset queries are not executed live.
AdHoc indicates either a dynamically generated report model based drillthrough report, or a Report Builder 2.0 report that is previewed on a client utilizing the report server for processing and rendering.
Session indicates a subsequent request within an already established session (e.g. the initial request is to view page 1, and the subsequent request is to export to Excel with the current session state).
Rdce indicates a Report Definition Customization Extension (RDCE; a new feature in RS 2008) that can dynamically customize a report definition before it is passed to the processing engine upon report execution.

The information in the Executionlog3 table will help you to find the bottlenck in your report. What to do in the situation of:

High TimeDataRetrieval.

Remove not used datasets from your report. Every dataset will be executed, even if it is not used to display data.
Use SQL Profiler to analyze the data set query to see where you can improve your data set. See Guidelines to write well performing queries.
Check if the RowCount (number of records returned by all datasets) is not too big. Ask yourself is all data needed in the report. Data which is not needed should not be retrieved.
Data set will be executed once. You can use one data set for multiple report parts like tablix, chart, lists, sprakline, indicator, data bar, map, gauge etc. Look if you can combine two or more data sets which are almost identical to one dataset.

High Timeprocessing.

Processing the data is done sequentially. The more report parts like tablix, chart, lists, sprakline, indicator, data bar, map, gauge etc. you put on your report, the more time it will take to process the report.
Minimize the number of records of your data set to process. Do you need all data, or can you group and sort the data in your dataset. This will lower the result set which will results in faster processing time. In most situations the SQL Server engine will group data much faster in comparions with Reporting Services.

High TimeRendering.

Be critical if all rendering is needed.

Enjoy it, to deliver a great user experience to your users by improving the performance of your reports.

Thursday, June 9, 2011

Dashboard design rules, the do's and don'ts

If you are going to build dashboards you need to read the book: Information Dashboard Design written by Stephen Few. Dashboards are very popular because they can be very powerful. However, this potential is rarely realized. A dashboard should directly tell you what you need to know. If that is not the case the dashboard will never be used again. This book will teach you the visual design skills you need to have to create dashboards that communicate clearly, rapidly and compellingly. It will explain how to:

Avoid the thirteen mistakes common to dashboard design
Provide viewers with the information they need quickly and clearly
Apply what we know about visual perception to the visual presentation of information
Minimize distractions, cliches, and unneccessary embellishments that create confusion
Organize business information to support meaning and usability
Create an aesthetically pleasing viewing experience
Maintain consistency of design to provide accurate interpretation
Optimize the power of dashboard technology by pairing it with visual effectiviness

The book start with the definition of a dashboard. Without knowing the definition you do not know what to build.

Visual display

the most important information needed to achieve on or more objectives

which

fits entirely on a single computer screen

so it can be

monitored at a glance

I will tell some observations I had when I read the book with a lot of pleasure.

We do not see with our eyes. We see with our brains. If you need to think how to read the data, you have not used the best display media.
Dashboards display information needed to achieve objectives. What do you prefer? 1) Nice fancy dashboard which is not be re-used. 2) Functional dashboard which help you to reach your goal
A dashboard fits on a single computer screen. No scroll bars etc.
Colors have a function. Do not use them because it looks nice. Colors should tell something for instance to grab attention .
Blank space is better than meaningless decoration. Do not use pictures, logos'. It is a waste of valuable space. If you need to use a logo, make it small and visually subtle, and place it somewhere out of the way.
Use gridlines will care. Mostly the do nothing but distract from the data.
Think about what you want to show or compare. Based on that you need to choose the best display media. Do not use a pie chart because it looks nice. Mostly a bar graph is much better.
3D Graphs are nice but 2D are easier to read and understand.
Dashboards are used to monitor information at a glance. Information is abbreviated in the form of summaries or exceptions.
Numbers should not be center-justified in the columns. Right-justified is easier to compare when scanning up and down a column.
The drop shadows on a graph are visual fluff. These elements serve only to distract.
In general Keep It Simple.

To test your dashboard design skills you can do the Graph Design IQ Test on the website of the writer Stephen Few.

Enjoy reading the book and do not forget to do the Graph Design IQ Test

Thursday, May 6, 2010

Tips to improve performance of MS Reporting service reports (SSRS).

Photo credit: visuloqik

A lot of times I got questions from people how to improve SSRS reports. Of course every report is different, but some tips can be applied to every report. Assume we have a report which takes X seconds to display all data on your screen. The total time to generate a SSRS reports can be split into 2 main parts:

Time to executed all queries on the SQL Server. Use the SQL profiler to measure the duration of all executed queries. (Y seconds)
Time to generated the report on the reporting server, based on the result set of the executed queries. (Z seconds)

Because we know the total time to execute the report and the total duration of all executed queries, you can calculate the time to generate the report (Z seconds). (X = Y + Z). After you know the values for X, Y and Z, you can start focusing on the biggest part.

Use the SQL Profiler to see which queries are executed when the report is generated. Sometimes you will see more queries being executed than you expected. Every dataset in the report will be executed. A lot of times new datasets are added during building of reports. Check if all datasets are still being used. For instance, datasets for available parameter values. Remove all datasets which are not used anymore.
A dataset contains more columns than used in the Tablix\list. A lot of times you will see datasets with next syntax:

SELECT * FROM TableX WHERE Column1 = 'Something'.

This is easy when start building the report. It will retrieve all columns of the table. However all columns are not useful and creates a lot of overhead. Assume you uses 3 columns in the tablix\list. Change the syntax of the dataset to only these 3 columns. This can save bookmark lookups and it will save disk I/O on the SQL Server. So you get a dataset with next syntax:

SELECT Column1, Column5, Column 8 FROM TableX WHERE Column1 = 'Something'
ORDER BY in the dataset differs from the ORDER BY in the Tablix\list. You need to decide where the data will be sorted. It can be done within SQL Server with an ORDER BY clause in the dataset or in by the Reporting server engine. It is not useful to do it on both sites. If an index is available use the ORDER BY in your dataset.
Use the SQL Profiler to measure the performance of all datasets (Reads, CPU and Duration). Use the SQL Server Management Studio (SSMS) to analyze the execution plan of every dataset.
Avoid datasets with result sets with a lot of records like more than 250 records. Greater result sets are more like a data export (dump). A lot of times data is GROUPED in the report without an Drill down option. In that scenario do the group by already in your dataset. This will save a lot of data transfer to the SQL Server and it will save the reporting server engine to group the resultset.
Rendering of the report can take a while if the resultset is very big. Look very critical if such a big resultset is necessary. If details are used in only 5 % of the situations, create another report to display the details. This will avoid the retrieval of all details in 95 % of the situations.
Use the database documentation of your applications to understand how to retrieve information from the database. For Exact Globe click here to find the database documentation. For Exact Synergy Enterprise click here to find the database documentattion.

Enjoy building high performing reports.

Keep It Simple and Fast

Thursday, February 28, 2013

Performance tips for your Power Pivot sheet

Wednesday, February 27, 2013

Memory management in Power Pivot: Column oriented databases.

Friday, July 1, 2011

More tips to improve performance of SSRS reports.

Thursday, June 9, 2011

Dashboard design rules, the do's and don'ts

Thursday, May 6, 2010

Tips to improve performance of MS Reporting service reports (SSRS).

About Me

Search This Blog

Popular Posts

Blog Archive

Followers of my blog

Blogs I Read

My Places