Wednesday, 23 September 2009

MDX: Performance Guide

MDX Performance Hints
Here are some simple steps you can take to ensure that you get the best performance from your calculations in SQL Server Analysis Services. Some of these exploit new functions or features in SQL Server Analysis Services 2005 while others apply to both this release and SQL Server Analysis Services 2000.


Use your attributes!
In SQL Server Analysis Services 2000, the filter function was a common means of selecting members from a set whose member property satisfied some condition; for example male customers could be expressed as :

Filter(Customer.name.Name.members, Customer.name.currentmember.properties(“Gender”) = “Male”)


Don’t do this in SQL Server Analysis Services 2005. Instead, create an attribute hierarchy Customers.Gender and do this:

(Customers.Gender.Male, Measures.Sales)

The attribute hierarchy can be made not visible if they should not be seen by end-users and still be available in MDX expressions.

The average sales to male customers can be expressed as this:

Avg(Customer.Name.Name.members*Customers.Gender.Male, Measures.Sales)


Why is .properties so much slower? It’s because getting the value of a member forces each member to be retrieved so the property value can be retrieved and queried. Exists is much faster because internal storage structures can be exploited directly


Use Is Comparison Operator
When comparing members, compare the objects using the IS comparison operator. Don’t do this:

Iif( [Currency Code].currentmember.Name = “USA”], …)

Do this:

Iif( [Currency Code].currentmember IS [Currency Code].[USA], …)

Translating the members to strings takes time and doesn’t need to be done.


Use NonEmpty Function

The NonEmpty function (new in SQL Server Analysis Services 2005) is optimized for removing empty tuples. So instead of doing this to get the customers who bought an Xbox,
Filter(Customer.Name.members, not IsEmpty( ([Measures].[Unit Sales], [Product].[Name].[Xbox])

Do this:

NonEmpty (Customer.Name.members, ([Measures].[Unit Sales], [Product].[Name].[Xbox]))

Note that Filter(, Not Isempty()) is equivalent to NonEmpty(,) and the engine does the optimization automatically.


Another example. Instead of this:

Count(Filter([year and month].[Month],Not IsEmpty(([Project Count], [Layer].[Layer].[web])) or Not IsEmpty(([Project Count], [Layer].[Layer].[server]))))

Do this:

Count(NonEmpty ([year and month].[Month],{([Project Count], [Layer].[Layer].[web]), ([Project Count], [Layer].[Layer].[server])})

Calculation Non Empty Behavior
Whether or not an expression resolves to null is important for two major reasons. First, most client applications use the “non empty” key word in a query. If you can tell the engine that you know an expression will evaluate to null, then needn’t be computed and can be eliminated from the query results before the expression is evaluated..

Second, the calculation engine can use the knowledge of a calculation’s non empty behavior even when the “non empty” keyword is not used. If a cell’s expression evaluates to null, then it needn’t be computed during query evaluation.

Note that the current distinction between how the engine uses an expression’s non empty behavior is really an artifact of the engine’s design and we’re working to eliminate the distinction in a future release. For now, we make this distinction because one or both optimizations is made depending how the calculation property non_empty_behavior (NEB) is defined - we’ll call the first “Non Empty” and the second “Calculation Engine”

When an expression’s NEB is defined, the author is guaranteeing the result is null when the NEB is null and not null when NEB is not null. This information is used internally by the calculation engine to build the query plan.

What can/should be specified in Non_Empty_Behavior (“NEB”) clauses is first and foremost determined by correctness/semantics aspects, before taking into account any performance goals.

Calculation type NEB applies to
NEB expressions recognized
Calculation Engine SP2 support
Non Empty SP2 support
Example
Calculated Measure
Constant measure reference
Yes
Yes
With Member Measures.DollarSales As Measures.Sales / Measures.ExchangeRate,
NEB = Measures.Sales
Calculated Measure
{List of two or more constant measure references}
No
Yes
With Member Measures.Profit As Measures.Sales – Measures.Cost,
NEB = {Measures.Sales, Measures.Cost}
Any (calculated member, script assignment, calculated cell)
Constant tuple reference
Constant single-measure set
Yes
No
Scope [Measures].[store cost];
This = iif( [Measures].[Exchange Rate]>0, [Measures].[Store Cost]/[Measures].[Exchange Rate], null );
Non_Empty_Behavior(This) = [Measures].[Store Cost];
End Scope;

Here’s a summary of representative, common cases for using NEB. Note that defining NEB is never necessary to achieve the correct calculation results but defining it incorrectly can result in incorrect results. It should only be used if correctness implications are well understood and performance is poor without it.

Measures.M1 + or – Measures.M2
- In general, MUST specify both measures in NEB, for correctness reasons. Using SP2 is required for this.
- In particular, if both measures belong to the same measure group, MAY specify just one of them in NEB (could have better performance).

Measures.M1 * Measures.M2
- In general, one cannot specify any correct NEB for this expression because the result is null when either M1 or M2 is null.
- However, if it is guaranteed that one of the measures is never null (e.g. a currency exchange rate), then the other measure may be used in NEB.
- In particular, if it guaranteed that, for any given cell, either both measures are null, or both are non-null (e.g. they belong to the same measure group), then one may specify both measures in NEB (SP2 required), or specify a single measure.

Measures.M1 / Measures.M2
- The first measure (the numerator, M1) must be used when defining NEB.

Be very careful. This property is generally abused by those that use it. If this property is defined incorrectly, query results can be incorrect.


Don’t use LookupCube
It’s expensive. For any non-trivial usage, create a single cube that has the values you need.

Conditional Calculations
The IIF function can play havoc with existing engine calculation optimizations – the cause of the problem is that the optimizations can’t be brought to bear without knowing the expression that applies to a cell when the cell is being evaluated. We’re looking at the resolving this in a future release but in the meantime there may be some steps you can take.

Conditional Calculations and Scoping
The IIF is sometimes used to scope a calculation on a set of members by relying on the condition; for example:

This =
IIF([account].[Account Type].currentmember IS [Account].[Account Type].[Flow],
,
)

Don’t do this. This can expressed differently with much better performance:

Scope ([Account].[Account Type].[Flow]);
This = ;
End Scope;

Scope ([Account].[Account Type].[Account Type].members –
[Account].[Account Type].[Flow]);
This = ;
End Scope;


Conditional Calculations and NULL
If you can, having one of the arguments as NULL can significantly improve performance:

IIF ( , , NULL)

If this can’t be done, oddly enough this expression can be much faster (but only in AS2005 SP2 and later) and is worthy of experimentation:

IIF ( , , NULL) + IIF ( NOT , NULL, )

(If is non-trivial – consider pulling it out as a calculated member.)

Conditional Calculations and Non Empty Behavior
If the expression has a simple non empty behavior (a subject matter discussed elsewhere in this document), then exploit it. For example, this expression flips the sign of an expression based on a condition:

IIF (, , -)

If has a non empty behavior, than do this:

* IIF(, 1, -1), non_empty_behavior =

Single Branch Conditions
If you must use a condition and the calculation only applies when a condition evaluates to true, do not do something like this to have the cell retain its value:

This = IIF(
,
,
CalculationPassValue(measures.currentmember,-1,relative)
);

But do this instead:

IF THEN this = ;


Use ValidMeasure to Reduce Calculation Space
In SQL Server Analysis Services 2005, each measure group has the property IgnoreUnrelatedDimensions that defines how dimensions unrelated the measure are treated – either the dimension is ignored, or moving off the default member results ion a null value for the measure.

When defining calculations, don’t rely on this behavior but use the validmeasure function. For example:

scope(leaves([Time]), [Currency].[Currency].members - [Currency].[Currency].[Currency].[USA]
Scope [Measures].[store Sales];

This = iif( isempty(validmeasure([Measures].[Exchange Rate])), null, [Measures].[Store Sales]/validmeasure([Measures].[Exchange Rate]));
End Scope;



Use Sum or Aggregate instead of Additions
Instead of doing this:

Create Member measures.x as
(Sales, Country.USA) + (Sales, Country.Canada) + (Sales, Country.Mexico)…

Do this:

Create Member measures.x as Sum({Country.USA, Country.Canada, Country.Mexico}, Sales)


Clump Hierarchies in Big Crossjoins
If you can, keep hierarchies from the same dimension together. Why? Because internally, the engine does an exists between adjacents sets in a crossjoin. If a hierarchy from a different dimension is inserted between two hierarchies from the same dimension, this exists function happens afterwards and not before. This can greatly expand the space and impact performance.

Caching Calculation Results
The formula engine has an internal cache to reuse calculation results. But to cache something, the result has to be an addressable cell (or tuple) in the cube space.

For example, I have an application that calculates the ratio of a value to the maximum that value is over a set and renders the result as a histogram. The first way I tried to do this was with a calculation something like this:

with
member measures.y as
measures.[unit sales]
/
max(customers.[name].[name].members, measures.[unit sales])
select
measures.y on 0,
customers.[name].[name].members on 1
from sales

This took about 15 seconds on my laptop. The calculation is inefficient in that the maximum value does not change over the set but the expression forces its reevaluation as many times as there are customers (and there are over 10,000 of them). This approach is much better:

with
member measures.x as
max(customers.[name].[name].members, measures.[unit sales])
member measures.y as
measures.[unit sales]
/
(measures.x,[Customers].[Customers].[All Customers] )
select
measures.y on 0,
customers.[name].[name].members on 1
from sales

This takes less than a second! Over an order of magnitude performance improvement! Fun!

Now here’s a quiz – why must the expression for measures.y contain the member [Customers].[Customers].[All Customers] in the denominator? In other words, why can’t I do this:
with
member measures.x as
max(customers.[name].[name].members, measures.[unit sales])
member measures.y as
measures.[unit sales]
/
(measures.x)
select
measures.y on 0,
customers.[name].[name].members on 1
from sales


It is because we want to reference a cached value. Without [Customers].[Customers].[All Customers] overwriting the customer dimension’s attributes, the cell changes with a reference to the customer’s name and the calculation goes back to being reevaluated in each cell.

In the next release, we plan to do this automatically. But for now you have to give the engine a hint that the calculation doesn’t change.

http://www.sqlserveranalysisservices.com/OLAPPapers/MDX%20Performance%20Hintsv1.htm

Tuesday, 22 September 2009

SQL: Row Number Example

SELECT
FirstName,
LastName,
[Row Number]=ROW_NUMBER() OVER(ORDER BY SalesYTD DESC), SalesYTD
FROM x

Monday, 21 September 2009

Thursday, 17 September 2009

SSAS: Attribute relationship

Attribute relationship example
Setting proper attribute relationships for the attributes of a dimension is essential for query performance and usability. As Mosha wrote in his blog, it might be the most important advice for cube designers to set the attribute relationship correctly. The full article can be viewed here and it is very detailed.
I still saw some cases recently where the improper use of attribute relationship settings caused wrong query results (which might even be worse than a low performance), so I decided to post a wrap up of this topic here.
First, you need to know that setting attribute relationship has these benefits:
Better query performance
More efficient methods for the cube to store the data of the hierarchies
Easier way to write MDX queries (as the dependant attribute is correctly positioned and can be referenced e.g. by currentmember), see below for an example
Dependant attribute may be used as metadata (especially helpful when the dependant attribute has it property "Enable attribute hierarchy" set to "false")
Higher ranked attribute may be used as "Order-By"-attribute.
Let me provide an example for the MDX query. Think of a sample cube with a date dimension containing two attributes for month and monthname and think about the following query:
with member MonthMember as [Dim Time].[Monthname].currentmember.NAME
select MonthMember on rows, [Dim Time].[Month].[Month] on columns
from [TestCube]
if there is no relationship between the month and monthname-attribute the monthname-attribute is not changed when selecting the month-attribute and therefor resides its default which is 'All'.

But if there is a relationship, binding the monthname-attribute to the month-attribute, the monthname-attribute is automatically set to the corresponding member:

This shows how query syntax is influenced by the proper use of attribute relationship. In the first example we had to try something with the linkmember() MDX-function to achieve the same result. So designing the attribute relationship in a proper way leads to easier to understand queries with a better performance.
The data was loaded from a very simple date dimension table like shown below:

The attribute relationship can be modeled in BIDS when designing a dimension. After creating the dimension using the wizard, all attributes are associated to the key attribute of the dimension as shown below:

The attribute 'Month' and 'Monthname' have no relationship but are simply tied to the attribute "Dim Time' which is the key attribute for our simple time dimension. If you build a hierarchy, like the YearMonth-hierarchy in the screenshot above, you'll notice a yellow warning sign informing you that there is no relationship between Month and Year. You'll get further information when placing the mouse coursor on the warning sign:

Whenever you see this yellow warning sign we're talking of a non-natural hierarchy.
The relationship can easily be set by dragging the 'Year' attribute to the 'Month' attribute. Now the same dimension looks like this:

Notice that the yellow warning sign in the hierarchy title has disappeared. In this case we're talking of a natural hierarchy.
But be warned: This doesn't mean that your dimension is healthy now. For our simple example we actually made a big mistake. If you check the dimension in the dimension browser, you'll see something like this:

There are no more months for the years 2006 to 2008 anymore! This also means that no fact data will be displayed for those months and that the aggregation will be wrong.
If you model an attribute relationship like Month -> Year you have to be absolutely sure that every month corresponds to exactly one year. You can verify this using a simple SQL query like the one below:
select count(Year) from dbo.DimTime group by Month having count(Year)>1
This query should return no rows at all but in our case it returns

So every month in our source table is mapped to four years which breaks our attribute relationship above.
If you're having BIDS Helper installed on your machine you can also do a dimension health check from the dimension context menu.

In our case the result looks like this:

You can clearly see the rows that break our relationship.
So in this simple example it would have been a bad advice to simply set the attribute relationship as we did above. You should always check the relationship in your source data before designing such a relationship.
And even if the users say that the relationship is unique, you shouldn't rely on that. For example a product might belong to a unique product group. But what happens if the association changes over time and you want to keep track of the changes using a slowly changing dimension of type 2. This can also easily break up your attribute relationship. And as the MDX results might differ after changing the relationship, this could also have an influence on existing reports. So it's a good advise to carefully design your hierarchies before you start creating reports, because it's not difficult turning a non-natural hierarchy into a natural one.
SQL Server 2008:
http://www.sqlskills.com/blogs/stacia/post/SQL-Server-2008-Attribute-Relationship-Designer.aspx


http://oraylis-olap.spaces.live.com/Blog/cns!61F98448A5E17D57!228.entry?wa=wsignin1.0

SSRS: External Images Issue and Configuring the Unattended Execution Account

If the External image does not display you will have to check 2 actions.

Firstly, check the image url is valid and can be accessed through an IE page.

If this is the case you must check you have specified a Unattended Execution Account otherwise it will run as anonymous. In order to do this go to the report configuration and connect to the particular report instance. Under Execution account specify an account and password then click apply.

Now try again.