Running Totals over Arbitrary Sort in MDX

February 16, 2010
Tags:

A few months ago I was sitting in on an MDX class Stacia Misner was teaching.  The students were playing with their MDX and wanted to know if it was possible to have a calculation that would show the running sum over products, ordered by the gross profit margin; i.e. we would list the product subcategories on rows, sorted by their gross profit margin, and then have a running total down this list.  This mirrored a calculation they had to have in their business.

Stacia (aka mom :-) ) was a bit busy teaching the class and didn’t want to get too bogged down on this problem (although she REALLY REALLY wanted to figure it out) so she tossed it to me after some quick initial attempts were unsuccessful.  She had never seen anyone do this, and knew that MDX had rolling sums over time, but this was over something other than time.  This is what I came up with.

This is using the Adventure Works cube.


with

set Q as
Nonempty(
Order(
[Product].[Subcategory].[Subcategory].Members
,[Measures].[Reseller Gross Profit Margin], ASC)
,[Measures].[Reseller Sales Amount])

member [Measures].[level] as rank([Product].[SubCategory].CurrentMember,Q)

member [Measures].[Running] as Sum(
Head(
Q
, ([Measures].[Level],[Product].[Subcategory].CurrentMember)
)
,[Measures].[Reseller Sales Amount]
)


select {[Measures].[Running]
,[Measures].[level]
, [Measures].[Reseller Sales Amount]
, [Measures].[Reseller Gross Profit Margin]}
on columns,

Q
on rows

from [Adventure Works]


I found it easier to just go ahead and create my row set with a with statement, rather than keep repeating it throughout the query.  It made things neater.  The set Q is straightforward enough; I have a nonempty set of Product Subcategories, and I order them by the gross profit margin as requested.  NB:  the sort is the key thing here.  This is where you decide how you want your total to “roll”, so to speak.

Next, I created a measure that gave me the rank of my current subcategory out of the overall set Q.  Yay rank() function!

Finally, I create my running measure which will calculate the running total.  What I figured out is, once I have my set ordered correctly, the running sum is just the sum of all the previous rows plus the current row.  So if I can isolate my set down to those rows, I can just apply the sum over that.  Well, thanks to the ordering I set, I can, thanks to the Head() function.  I take the Head() of my set Q, where my integer value is [Measures].[Level].  So I have gotten exactly the rows I want the running sum of.  I take the sum of that, and voila!  Running total.

Just a simple query, but people seemed excited to have it, so I record it here for posterity.

0

Calculating Non-Numerical Medians in MDX

January 22, 2010
Tags:

MDX offers many powerful statistical functions, however they have their limitations.  One of the things I’ve had to learn this past year is what types of statistical functions are available, and what they can and can’t do.  And when they can’t do what I want, I’ve had to come up with alternatives.

One such case of this came up with calculating medians.  In case you need a reminder of what a median is, the median of a set of data is the datapoint that falls in the middle, dependent on some ordering.  If there are an odd number of datapoints, it’s the point in the exact middle, if it’s an even number of points, it’s the average of the two points in the middle.  For example, if you have the set of points {4,1,3,5,6} the median point (ordered by magnitude) is 4.  If the set was {4,1,3,5} then the median point is the average of 3 and 4, or 3.5.

There is a median function in MDX, with syntax Median(<<set>>, <<numeric expression>>) which would handle our above example easily.  However, what happens if we need the median of a set of something other than numbers.  Perhaps we want the median name ordered alphabetically.  Or in the case that sparked this point, what if we wanted the median date of an event?

We have to create our own calculation, based on what we know about medians. I will actually show and explain two ways of doing this.  The first way works regardless of the data type, so long as you have an ordering you wish to use.  The second way is specifically for date types (and is a bit more clever in my opinion), but you may be able to use the idea behind it to come up with other examples.

The nice thing about using dates is they are already ordered if you have your Date dimension built correctly, so we don’t even have to specify an ordering.  If you have to though, you can always choose to use Order() to order your set however you choose to.

I will be using the AdventureWorks 2008 cube.  Here is our query:

with
member [Measures].[MedianDate] as

Tail(
Head(
Nonempty(([Ship Date].[Calendar].[Date],
[Product].[Product Categories].CurrentMember)
, [Measures].[Order Count])
,Nonempty(([Ship Date].[Calendar].[Date],
[Product].[Product Categories].CurrentMember)
, [Measures].[Order Count]).Count/2+1)
).Item(0).Item(0).Name

select [Measures].[MedianDate] on columns,

Nonempty([Product].[Product Categories].Children
, [Measures].[Order Count]) on rows

from [Adventure Works]

Let’s dissect our calculation.  We have our set of Ship Dates, sliced by the Product Categories member that is on the row.  The set is already ordered since Dates are ordered in the cube already.  So which date, of this set do we want?  Well, we want the middle date.  So we count the number of data points and divide by 2.  We add the one because since we are using the Head() function, if our count is odd, our count divided by 2 will be .5 less than the ordinal of the middle number (I’ll leave this as an exercise to the reader ;) ).  Since Head() just takes the integer part, if we didn’t add 1 we’d be taking the number one below the median number.  We then take the Tail() of this set in order to get down to one date.  The Item(0).Item(0).Name bit is just to get us down to the part we’re interested in displaying – the date.

The fun bit about using the Product Categories hierarchy is that you can change the level easily in the query and see how the median date changes with it.

Now, technically, the above calculation isn’t a TRUE median.  Do you see why?  We are taking the middle value, but only if the set is odd.  If the set is even, then there is no middle data point and we should take the average of the two points in the middle.  For the purposes of the case that inspired this blog, I made a decision that it made more sense to have the median be a data point within the set, which it is not guaranteed to be if the set is even.  However, while it is more code, it should be easy enough based on the above query to figure out how to find the other data point you would need if the set is even, and to take the “average” date of those two dates using the DateDiff() function and working from there.

Speaking of DateDiff(), this leads me to the second way we can calculate the median date.  The reason we didn’t use the Median() function in the first example is because the Median() function requires a numerical expression, which date isn’t.  But we can transform date into an integer that preserves the order of the dates, thanks to the DateDiff() function.

If we pick some arbitrary anchor date, then take the difference in date between that anchor point and our data point date, then we will get an integer and those integers will be in the same order as the dates.  For example, if our anchor point is 11/30/2009 and our set of data points is the dates in December 2009, then the difference between 12/1/2009 and 11/30/2009 is 1, the difference between 12/2/2009 and 11/30/2009 is 2, etc.  We can then apply the Median() function to that set of integers, then once we have that integer we can figure out what date it is by adding the median integer to our anchor point with DateAdd().

Here is our query:

with
member [Measures].[DateInt] as
DateDiff(“d”, CDate(“1/1/2007″), [Ship Date].[Calendar].CurrentMember.Name)

member [Measures].[MedianDate] as

DateAdd(“d”,CDate(“1/1/2007″),
Median(
Nonempty(([Ship Date].[Calendar].[Date],
[Product].[Product Categories].CurrentMember)
,[Measures].[Order Count])
,[Measures].[DateInt]))

select [Measures].[MedianDate] on columns,

Nonempty([Product].[Product Categories].Children
, [Measures].[Order Count]) on rows

from [Adventure Works]

Our first calculated measure, DateInt, gives us our integer, using the anchor point 1/1/2007.  Then, we take the same set we saw before in the first query, and take our median using the DatInt measure.  Then we add that number to our anchor point and voila!
If you run the above queries side by side you’ll notice they give you the same results except for on the Components row, which displays 12/24/2006 12:00:00 PM.  This is a dead giveaway that this set was even and the average of the middle dates resulted in something plus half a day.  A clear illustration of the difference between our true median calculation in this example, versus our first example where we cheated a little bit.
0

Passing Multi-Value SQL Parameters to an MDX Query in Reporting Services

January 21, 2010
Tags:

Or How I Learned to Stop Worrying and Love String Manipulation ;) .

To kick off this blog with some real content I’m starting small.  I recently did some reporting work where I changed the main dataset queries from using a relational database to using a cube.  It sped up performance and resolved a lot of calculations that were nasty in T-SQL.  However, when I went to implement the new query in the report I ran into a small obstacle.  Our parameter lookups were all in T-SQL and needed to stay that way because of some security we had implemented.  So I needed to be able to take the parameter value from the database and translate it into a form that the MDX query could handle.

Let’s look at a quick example.  I’ll be using the AdventureWorksDW2008R2 Database (which isn’t much different – if at all – from the AdventureWorksDW2008 Database) and I’ll be using the AdventureWorks 2008 Analysis Services project.  These are both available online from codeplex.

Generally speaking, our problem isn’t so hard.  In our case, the value was an ident that was the same as the member key for the corresponding attribute.  Let’s say our report needed ProductCategory as a parameter.  The ProductCategoryKey in the relational database for Bikes is 1, and our cube member for Bikes is [Product].[Category].&[1].  So we can transform our parameter value to pass into our mdx query like so:

“[Product].[Category].&”+Cstr(Parameters!ProductCategory.Value) + “]”

Simple!  But the plot thickens…

I needed to be able to pass multiple values, which meant I had to take some extra steps before I could parse my string.  If ProductCategory is a multi-value parameter, then Parameters!ProductCategory.Value is an object – an array of values.  So we need to create a single string out of that array, format that string, and then break it up again back into an array since that is what the MDX query is expecting.  Whew!

Here’s the punchline:

Split(“[Product].[Category].&["+Replace(Join(Parameters!ProductCategory.Value,"],”)+”]”,”,”,”,[Product].[Category].&["),",")

That's a little bit of a mouthful!  What's happening here?

Join() is a handy little function that takes our array and joins the values into a string.  We can delimit the values with a string, in this case "],”, like so:

Join(Parameters!ProductCategory.Value, “],”)

Bikes, Components, and Clothing for our parameter values, our string after using the join function will be “1],2],3″.  We can then add an ending bracket to the end via simple string concatenation so now we have “1],2],3]”.

Applying the Replace() function, we replace “,” with “,[Product].[Category].&[" and now we have "1],[Product].[Category].&[2],[Product].[Category].&[3]“.  Finish with one final string concatenation in the front and we have “[Product].[Category].&1],[Product].[Category].&[2],[Product].[Category].&[3]“

Unfortunately we can’t just pass this into our MDX query.  It’s expecting an array object, not a string!  But luckily, the Split() function is the counterpart to Join() and breaks up our string into that array object we want via the commas as a delimiter.

Overall, not a difficult task, but it took me enough time to figure it out that I don’t want to have to re-derive the solution.  Now you don’t have to either!

1

Welcome!

December 3, 2009

Welcome to my blog!

A little bit about me:  I joined Data Inspirations in June 2008.  I graduated on a Friday in Boston, flew to Vegas on Saturday, and started work on Monday!  Initially all I could do was data entry, but in the ensuing time I’ve learned enough T-SQL, MDX, SSRS, SSAS, SSIS, and Sharepoint to get myself into trouble (and sometimes, luckily, back out of trouble).

As a newcomer to the BI world, I was inundated with information, much of it way over my head.  I spent a lot of time on google trying to find answers to questions that had simple answers – simple if you knew what you were doing!  It seems that often the simple stuff just isn’t deemed important or interesting enough to be written about very often, and I hope to change that.

I’m still learning, but I also would like to share some of what I have already learned with you.  If nothing else, by writing it down hopefully I’ll remember what I’ve learned!

0