MSBIC – the B stands for Beignets!

June 22, 2010
Tags: ,

Mmmmmmm beignets.

This year I went to the Microsoft BI Conference in food heaven, New Orleans. In between stuffing my craw with crawfish, I went to a bunch of sessions and actually learned a thing or two! A very productive week, indeed.

The theme of the conference was definitely PowerPivot. It’s here, it’s free, get it, use it, love it, blah blah blah. For the past year it’s all been demos and betas of PowerPivot, but now that it’s been released it was very interesting to see it in action, and to see how the story has changed.

When PowerPivot was first introduced, there seemed to be the notion that this could be a replacement for Analysis Services. And/or that one could use PowerPivot to prototype Analysis Services cubes. Well, the first session I went to cleared THAT right up. PowerPivot does not replace Analysis Services. PowerPivot can’t replace Analysis Services. While PowerPivot uses the Analysis Services engine, and while it can be seen as an Analysis Services cube by such things as Management Studio after it has been deployed via Excel Services, the similarities end there. Dave Wicker did a great job illuminating the differences between the two products at the end of his session, “Building Custom Extensions to the PowerPivot Management Dashboard.”

The PowerPivot engine reads DAX natively. There is nowhere where DAX input is translated to MDX to be read by the engine, it’s all DAX all the time. And while DAX can do some cool things, it’s no MDX. If I ever get around to writing my post about how to do correlations with DAX that will be made clear. There is no correlation function in DAX, there’s no standard deviation function in DAX, there aren’t a lot of things in DAX that exist in MDX and this makes a lot of things….trickier…in DAX that are quite easy in MDX.

PowerPivot also creates its measures in a different place. In Analysis Services, the measures are all created in one place. And they can be grouped via measure groups. In PowerPivot you can have measures all over the place, associated with various tables.

PowerPivot also has limitations when it comes to establishing relations between tables. There can only be one relationship between tables. Many to Many relations are not supported (although there are some workarounds). There are no KPIs. And hierarchies don’t exist. The lack of hierarchies is a big big BIG thing (in my opinion) that limits the power of PowerPivot. Analysis Services, of course, has all these things.

And of course, PowerPivot is all about its in-memory columns and its storage engine. So long as something is in the storage engine, it’s great. You can sort billions of rows (plus seven)! But as soon as you ask it to something in a formula via calculation…well…not so fast. And a lot of things can only be done at runtime (read: dynamic correlations).

So, the new story seems to be that PowerPivot does not replace Analysis Services. You can’t prototype off it, because it is just so very different. But that doesn’t mean that it is worse! Only different.

I attended a really great session by Dan Bulos, “So Many Tools, So Little Time” that did a fantastic job of going over all the main BI tools, and discussing their strengths and weaknesses, and when is the “right” time to use the “right” tool. Unfortunately, the real answer is “it depends,” but thanks to Dan’s talk we were given some great guidelines to, well, guide our decision.

PowerPivot is brand new, and the inclination is to figure out what all is possible in PowerPivot, but as Dan so keenly pointed out, just because something is possible in a tool, doesn’t mean you should use that tool.

I really do think PowerPivot is going to come in extremely handy for users. I already know of some people for whom PowerPivot is perfect and will add a lot of value to their day to day work. It’s ability to combine various datasets is amazing for on-the-fly analysis, and for all those things that, as much as we wish they would, simply don’t belong in a cube.

It will be interesting to see what the future holds for PowerPivot. We got a glimpse of it via the BI Keynote. Some of the things currently missing in PowerPivot will be showing up in the next version, like KPIs, the ability to use PowerPivot in BIDS, and various other new extensions. At the end of the day, we have to remember this is a V1 product, but a really really cool one, and it can only get better.

0

A Treatise on the ALL() Function in DAX – or – ‘Is it all er nuthin’ with you?’

April 7, 2010

With all the hubbub surrounding PowerPivot, I have been given the task of learning DAX: not an easy task when the product is still pre-release and there aren’t any step-by-step books on hand! With a list of functions and a vague understanding of how they work, plus powerpivotpro’s very helpful posts, I decided to set myself a problem that I could hopefully solve with DAX. I noticed that in the long list of DAX functions, standard deviation and correlation were missing. As I have had to use these fairly frequently in MDX (along with various derivations of such), I decided I would try to set up a correlation measure in a powerpivot pivot table. In the process, I discovered some interesting nuances to DAX, one of which is the topic of this blog post. It should be fun for ALL()!

According to this handy dandy list of DAX functions, the ALL() function is simple enough. The syntax is ALL(table_or_column) and it returns “all the values in a table, or all the values in a column, ignoring any filters that might have been applied.” Let’s see it in action.

I have created a powerpivot workbook that I have loaded with some AdventureWorks data. On rows I have product subcategories and product, on columns I have OrderQuantity and some measures I created, CountRowsAll and CountRowsSON. I’ve filtered my dataset down to the year 2006, the United States sales region, and the product Category Accessories.


The CountRowsAll measure counts the rows in the FactResellerSales table and has the formula:
=CountRows(All(FactResellerSales))*if([Sum of OrderQuantity]>0, 1, [Sum of OrderQuantity])

The CountRowsSON measure counts the rows in the SalesOrderNumber column of the FactResellerSales table and has the formula:
=CountRows(All(FactResellerSales[SalesOrderNumber]))*if([Sum of OrderQuantity]>0, 1, [Sum of OrderQuantity])

What about the if() statement?  Well, because the count of the rows will be the same for all products and product subcategories, we would get a LOT of rows in our pivot table.  The if() statement multiplies our row count by 1 if the OrderQuantity is > 0 (not null), else it multiplies it by the OrderQuantity (which should be null).  There is no “nothing” or “null” object I could find in DAX so doing it this way is a cheat that helps me eliminate extraneous product rows that otherwise have no data.  It keeps my table looking neat and tidy.

When a run a simple SQL query on my AdventureWorks database I find that the FactResellerSales table does indeed have 60855 rows.  Now, notice that the CountRowsSON does NOT have 60855 rows even though it is pulling from the same table.  This suggests that the ALL() function does not bring back duplicates, it is a distinct ALL function.  This is confirmed when we query the database for distinct SalesOrderNumbers from FactResellerSales.

Now I will create two new measures, that will sum up the OrderQuantity over All(FactResellerSales) and All(FactResellerSales[SalesOrderNumber]).

The first measure is called CountAllSales and has the formula:
=SUMX(All(FactResellerSales),FactResellerSales[Sum of OrderQuantity])*if([Sum of OrderQuantity]>0, 1, [Sum of OrderQuantity])

The second measure is called CountAllSalesSON and has the formula:
=SUMX(All(FactResellerSales[SalesOrderNumber]),FactResellerSales[Sum of OrderQuantity])*if([Sum of OrderQuantity]>0, 1, [Sum of OrderQuantity])

Our table now looks like this:

AllFigure2


CountAllSales has the same number for every row, which is to be expected. It is the sum of all OrderQuantities over the whole FactResellerSales table.  However, our CountAllSalesSON measure matches our Sum of OrderQuantity measure!  It changes depending on the row!  But we know that the All(FactResellerSales[SalesOrderNumber]) is bringing back all the SalesOrderNumbers from the FactResellerSales table for each row, so what is going on here?  I thought the ALL function ignored all filters?

Well, it does!  But our measure, Sum of OrderQuantity, doesn’t.  The answer lies in context.  When we bring back the full FactResellerSales table, we are bringing back all the information for each row, productkey, orderdate, everything.  So the Sum of OrderQuantity is figured on each row using all that information.  But when we bring back just the SalesOrderNumber column, we don’t have all that other info, and the measure needs that.  So it figures it out from what other context is available, namely the product values on rows and the filters applied to our pivottable.

This is just like when we have default members in MDX.  The following MDX query displays the same order quantity data as in the second pivot table:

with
member Measures.OrderQuantityAll
as ([Measures].[Reseller Order Quantity], [Product].[Product Categories].[All], [Date].[Calendar Year].[All], [Sales Territory].[Sales Territory Country].[All])

select {Measures.OrderQuantityAll, Measures.[Reseller Order Quantity]}
on columns,
Nonempty([Product].[Product Categories].[Accessories].Children,Measures.[Reseller Order Quantity])
on rows

from [Adventure Works]

where([Date].[Calendar Year].&[2006],[Sales Territory].[Sales Territory Country].&[United States])

When we defined Measures.OrderQuantityAll, we overrode the context of the Product CurrentMember, the Date CurrentMember, and the Sales Territory CurrentMember that are defined elsewhere in the query, be it on rows or in the where clause.  Those would otherwise be our default members.  When we pull the full FactResellerSales table with the ALL function in our pivot table, we are doing something similar.  We now have all that info and that overrides what would otherwise be our default members – the context on rows.  When we just pull the SalesOrderNumber column, we don’t have any of that info and so the pivot table uses the context on rows.

One more thing I want to note – what happens when we put SalesOrderNumber on columns?  We get this:

Notice that while Sum or Order Quantity changes with the SalesOrderNumber, CountAllSalesSON doesn’t.  Because we are summing over all the SalesOrderNumbers.

When is all this useful?  Remember when I said I started playing with DAX by calculating standard deviations and correlations?  In order to do those calculations, you need the means over the set, and then you need to find the difference between each individual data point and that mean.  Replace CountAllSalesSON with an Average instead of a Sum, and you have a way to define that mean for each row, which allows us to then calculate our standard deviations and correlations.  I will go into further detail in a future post.

4

Running Totals over Arbitrary Sort in MDX

February 16, 2010
Tags:

A few months ago I was sitting in on an MDX class Stacia Misner was teaching.  The students were playing with their MDX and wanted to know if it was possible to have a calculation that would show the running sum over products, ordered by the gross profit margin; i.e. we would list the product subcategories on rows, sorted by their gross profit margin, and then have a running total down this list.  This mirrored a calculation they had to have in their business.

Stacia (aka mom :-) ) was a bit busy teaching the class and didn’t want to get too bogged down on this problem (although she REALLY REALLY wanted to figure it out) so she tossed it to me after some quick initial attempts were unsuccessful.  She had never seen anyone do this, and knew that MDX had rolling sums over time, but this was over something other than time.  This is what I came up with.

This is using the Adventure Works cube.


with

set Q as
Nonempty(
Order(
[Product].[Subcategory].[Subcategory].Members
,[Measures].[Reseller Gross Profit Margin], ASC)
,[Measures].[Reseller Sales Amount])

member [Measures].[level] as rank([Product].[SubCategory].CurrentMember,Q)

member [Measures].[Running] as Sum(
Head(
Q
, ([Measures].[Level],[Product].[Subcategory].CurrentMember)
)
,[Measures].[Reseller Sales Amount]
)


select {[Measures].[Running]
,[Measures].[level]
, [Measures].[Reseller Sales Amount]
, [Measures].[Reseller Gross Profit Margin]}
on columns,

Q
on rows

from [Adventure Works]


I found it easier to just go ahead and create my row set with a with statement, rather than keep repeating it throughout the query.  It made things neater.  The set Q is straightforward enough; I have a nonempty set of Product Subcategories, and I order them by the gross profit margin as requested.  NB:  the sort is the key thing here.  This is where you decide how you want your total to “roll”, so to speak.

Next, I created a measure that gave me the rank of my current subcategory out of the overall set Q.  Yay rank() function!

Finally, I create my running measure which will calculate the running total.  What I figured out is, once I have my set ordered correctly, the running sum is just the sum of all the previous rows plus the current row.  So if I can isolate my set down to those rows, I can just apply the sum over that.  Well, thanks to the ordering I set, I can, thanks to the Head() function.  I take the Head() of my set Q, where my integer value is [Measures].[Level].  So I have gotten exactly the rows I want the running sum of.  I take the sum of that, and voila!  Running total.

Just a simple query, but people seemed excited to have it, so I record it here for posterity.

0

Calculating Non-Numerical Medians in MDX

January 22, 2010
Tags:

MDX offers many powerful statistical functions, however they have their limitations.  One of the things I’ve had to learn this past year is what types of statistical functions are available, and what they can and can’t do.  And when they can’t do what I want, I’ve had to come up with alternatives.

One such case of this came up with calculating medians.  In case you need a reminder of what a median is, the median of a set of data is the datapoint that falls in the middle, dependent on some ordering.  If there are an odd number of datapoints, it’s the point in the exact middle, if it’s an even number of points, it’s the average of the two points in the middle.  For example, if you have the set of points {4,1,3,5,6} the median point (ordered by magnitude) is 4.  If the set was {4,1,3,5} then the median point is the average of 3 and 4, or 3.5.

There is a median function in MDX, with syntax Median(<<set>>, <<numeric expression>>) which would handle our above example easily.  However, what happens if we need the median of a set of something other than numbers.  Perhaps we want the median name ordered alphabetically.  Or in the case that sparked this point, what if we wanted the median date of an event?

We have to create our own calculation, based on what we know about medians. I will actually show and explain two ways of doing this.  The first way works regardless of the data type, so long as you have an ordering you wish to use.  The second way is specifically for date types (and is a bit more clever in my opinion), but you may be able to use the idea behind it to come up with other examples.

The nice thing about using dates is they are already ordered if you have your Date dimension built correctly, so we don’t even have to specify an ordering.  If you have to though, you can always choose to use Order() to order your set however you choose to.

I will be using the AdventureWorks 2008 cube.  Here is our query:

with
member [Measures].[MedianDate] as

Tail(
Head(
Nonempty(([Ship Date].[Calendar].[Date],
[Product].[Product Categories].CurrentMember)
, [Measures].[Order Count])
,Nonempty(([Ship Date].[Calendar].[Date],
[Product].[Product Categories].CurrentMember)
, [Measures].[Order Count]).Count/2+1)
).Item(0).Item(0).Name

select [Measures].[MedianDate] on columns,

Nonempty([Product].[Product Categories].Children
, [Measures].[Order Count]) on rows

from [Adventure Works]

Let’s dissect our calculation.  We have our set of Ship Dates, sliced by the Product Categories member that is on the row.  The set is already ordered since Dates are ordered in the cube already.  So which date, of this set do we want?  Well, we want the middle date.  So we count the number of data points and divide by 2.  We add the one because since we are using the Head() function, if our count is odd, our count divided by 2 will be .5 less than the ordinal of the middle number (I’ll leave this as an exercise to the reader ;) ).  Since Head() just takes the integer part, if we didn’t add 1 we’d be taking the number one below the median number.  We then take the Tail() of this set in order to get down to one date.  The Item(0).Item(0).Name bit is just to get us down to the part we’re interested in displaying – the date.

The fun bit about using the Product Categories hierarchy is that you can change the level easily in the query and see how the median date changes with it.

Now, technically, the above calculation isn’t a TRUE median.  Do you see why?  We are taking the middle value, but only if the set is odd.  If the set is even, then there is no middle data point and we should take the average of the two points in the middle.  For the purposes of the case that inspired this blog, I made a decision that it made more sense to have the median be a data point within the set, which it is not guaranteed to be if the set is even.  However, while it is more code, it should be easy enough based on the above query to figure out how to find the other data point you would need if the set is even, and to take the “average” date of those two dates using the DateDiff() function and working from there.

Speaking of DateDiff(), this leads me to the second way we can calculate the median date.  The reason we didn’t use the Median() function in the first example is because the Median() function requires a numerical expression, which date isn’t.  But we can transform date into an integer that preserves the order of the dates, thanks to the DateDiff() function.

If we pick some arbitrary anchor date, then take the difference in date between that anchor point and our data point date, then we will get an integer and those integers will be in the same order as the dates.  For example, if our anchor point is 11/30/2009 and our set of data points is the dates in December 2009, then the difference between 12/1/2009 and 11/30/2009 is 1, the difference between 12/2/2009 and 11/30/2009 is 2, etc.  We can then apply the Median() function to that set of integers, then once we have that integer we can figure out what date it is by adding the median integer to our anchor point with DateAdd().

Here is our query:

with
member [Measures].[DateInt] as
DateDiff(“d”, CDate(“1/1/2007″), [Ship Date].[Calendar].CurrentMember.Name)

member [Measures].[MedianDate] as

DateAdd(“d”,CDate(“1/1/2007″),
Median(
Nonempty(([Ship Date].[Calendar].[Date],
[Product].[Product Categories].CurrentMember)
,[Measures].[Order Count])
,[Measures].[DateInt]))

select [Measures].[MedianDate] on columns,

Nonempty([Product].[Product Categories].Children
, [Measures].[Order Count]) on rows

from [Adventure Works]

Our first calculated measure, DateInt, gives us our integer, using the anchor point 1/1/2007.  Then, we take the same set we saw before in the first query, and take our median using the DatInt measure.  Then we add that number to our anchor point and voila!
If you run the above queries side by side you’ll notice they give you the same results except for on the Components row, which displays 12/24/2006 12:00:00 PM.  This is a dead giveaway that this set was even and the average of the middle dates resulted in something plus half a day.  A clear illustration of the difference between our true median calculation in this example, versus our first example where we cheated a little bit.
0

Passing Multi-Value SQL Parameters to an MDX Query in Reporting Services

January 21, 2010
Tags:

Or How I Learned to Stop Worrying and Love String Manipulation ;) .

To kick off this blog with some real content I’m starting small.  I recently did some reporting work where I changed the main dataset queries from using a relational database to using a cube.  It sped up performance and resolved a lot of calculations that were nasty in T-SQL.  However, when I went to implement the new query in the report I ran into a small obstacle.  Our parameter lookups were all in T-SQL and needed to stay that way because of some security we had implemented.  So I needed to be able to take the parameter value from the database and translate it into a form that the MDX query could handle.

Let’s look at a quick example.  I’ll be using the AdventureWorksDW2008R2 Database (which isn’t much different – if at all – from the AdventureWorksDW2008 Database) and I’ll be using the AdventureWorks 2008 Analysis Services project.  These are both available online from codeplex.

Generally speaking, our problem isn’t so hard.  In our case, the value was an ident that was the same as the member key for the corresponding attribute.  Let’s say our report needed ProductCategory as a parameter.  The ProductCategoryKey in the relational database for Bikes is 1, and our cube member for Bikes is [Product].[Category].&[1].  So we can transform our parameter value to pass into our mdx query like so:

“[Product].[Category].&”+Cstr(Parameters!ProductCategory.Value) + “]”

Simple!  But the plot thickens…

I needed to be able to pass multiple values, which meant I had to take some extra steps before I could parse my string.  If ProductCategory is a multi-value parameter, then Parameters!ProductCategory.Value is an object – an array of values.  So we need to create a single string out of that array, format that string, and then break it up again back into an array since that is what the MDX query is expecting.  Whew!

Here’s the punchline:

Split(“[Product].[Category].&["+Replace(Join(Parameters!ProductCategory.Value,"],”)+”]”,”,”,”,[Product].[Category].&["),",")

That's a little bit of a mouthful!  What's happening here?

Join() is a handy little function that takes our array and joins the values into a string.  We can delimit the values with a string, in this case "],”, like so:

Join(Parameters!ProductCategory.Value, “],”)

Bikes, Components, and Clothing for our parameter values, our string after using the join function will be “1],2],3″.  We can then add an ending bracket to the end via simple string concatenation so now we have “1],2],3]”.

Applying the Replace() function, we replace “,” with “,[Product].[Category].&[" and now we have "1],[Product].[Category].&[2],[Product].[Category].&[3]“.  Finish with one final string concatenation in the front and we have “[Product].[Category].&1],[Product].[Category].&[2],[Product].[Category].&[3]“

Unfortunately we can’t just pass this into our MDX query.  It’s expecting an array object, not a string!  But luckily, the Split() function is the counterpart to Join() and breaks up our string into that array object we want via the commas as a delimiter.

Overall, not a difficult task, but it took me enough time to figure it out that I don’t want to have to re-derive the solution.  Now you don’t have to either!

1

Welcome!

December 3, 2009

Welcome to my blog!

A little bit about me:  I joined Data Inspirations in June 2008.  I graduated on a Friday in Boston, flew to Vegas on Saturday, and started work on Monday!  Initially all I could do was data entry, but in the ensuing time I’ve learned enough T-SQL, MDX, SSRS, SSAS, SSIS, and Sharepoint to get myself into trouble (and sometimes, luckily, back out of trouble).

As a newcomer to the BI world, I was inundated with information, much of it way over my head.  I spent a lot of time on google trying to find answers to questions that had simple answers – simple if you knew what you were doing!  It seems that often the simple stuff just isn’t deemed important or interesting enough to be written about very often, and I hope to change that.

I’m still learning, but I also would like to share some of what I have already learned with you.  If nothing else, by writing it down hopefully I’ll remember what I’ve learned!

0