Subscribe: Microsoft OLAP by Mosha Pasumansky
http://sqlblog.com/blogs/mosha/rss.aspx
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
adventure works  analysis services  analysis  data  gross margin  gross  mdx  measures  product  sales amount  sales   
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Microsoft OLAP by Mosha Pasumansky

Microsoft OLAP by Mosha Pasumansky





 



Good Bye BI

Wed, 30 Dec 2009 03:23:00 GMT

"Every new beginning comes from some other beginning's end" ( Seneca ) I spent big chunk of my professional career working on what we call today BI. From Panorama, coding some of the innovative cutting-edge (back then) algorithms, to Microsoft with mission...(read more)(image)



WolframAlpha

Fri, 15 May 2009 08:52:25 GMT

Integration of BI and Search became a popular subject couple of years ago. No doubt, it was heavily influenced by the “Google revolution”. Search is a hugely successful application. It features simple and natural UI - free form text, yet it is extremely powerful and gives access to pretty much all of the world freely available information (i.e. on Internet). Sounds very close to the mission of the BI. So several of BI vendors tried to jump on the Search bandwagon, and integrate search-like functionality into the products. None were very successful with it – it didn’t catch up.

Having worked on BI for quite a while, and recently having worked on Search, I grew more and more skeptical that meaningful integration of BI and Search was possible. It just seemed too difficult to do anything non trivial. (and it didn’t help to remember English Query fiasco, even though its last version could talk to the SSAS cubes).

But then I saw today WolframAlpha, and I was completely awed by it. If it could do even half of the things that you see in the promotional video – it is pretty amazing. And with Stephen Wolfram genius behind it – I have no doubt it really can do all of that. It launches tomorrow, so we will have a chance to try it out for real, and in the meantime – watch this screencast video of what it promises to do: http://www.wolframalpha.com/screencast/introducingwolframalpha.html.

(image)



Deep dive to MDX presentation - slides

Wed, 31 Dec 2008 22:54:45 GMT

During SQL PASS 2008 summit, I gave full day preconference seminar “Deep Dive to MDX”. The evaluation forms now have arrived, and the scores for the content, relevance and depth portion of the seminar are very good. The scores are not as high, however, for the time allocated (people argued that it should’ve been  2 days instead of one), and for the materials (since nobody received any materials). Both points are valid, one day turned out to be not enough, and perhaps two days wouldn’t have been enough either, since I got into less than a half of material that I had prepared. Also, due to some issues, I couldn’t hand down the presentation material – slides and MDX queries/calculations. These issues are cleared now, and I will start uploading the slides from the presentation. I don’t suppose they will be very useful for people who didn’t attend the session, since slides really only contain major talking points, and the rest was delivered verbally, but they probably will be very useful for people who did attend.

The time allowed to cover the following subjects:

  • Overall Architecture
  • Subcubes
  • Sonar
  • Storage Engine query plan
  • Storage Engine cache
  • Formula Engine query plan
    • Block mode
  • Formula Engine cache
  • Sets

More detailed Table of Content can be found here.

I will do series of posts uploading one subject at a time. The first subject is “Overall Architecture”, and it gives high-level picture of the components discussed further in the presentation. The slides are here.

Slightly unrelated announcement: MDX Studio 0.4.11 was also released today, with multiple UI enhancements, docked windows (Visual Studio style) etc. Versions for SSAS 2005 and SSAS 2008 are available, and can be downloaded from here.

(image)



AS2008 MDX: subselects and CREATE SUBCUBE in non-visual mode

Tue, 04 Nov 2008 10:22:16 GMT

There were not very many changes to MDX syntax in AS2008, and Vidas Matelis described most of them in his blog here (for his list of all changes in AS2008 check this blog entry). I just noticed that there is at least one more change which Vidas didn’t include, but which is somewhat important: ability to define CREATE SUBCUBE and subselects in non visual mode. To understand this, we need to understand what CREATE SUBCUBE does. It really does two things: 1. Implicit Exists between the set in CREATE SUBCUBE and query axis, query named sets including top level set functions (for more details see blog "Slicer and axis interaction in MDX Part 2 - Implicit Exists" – even though it talks about WHERE clause, everything from that entry equally applies to subselects and CREATE SUBCUBE as well) 2. Applies visual totals to the values of physical measures even within expressions if there are no coordinate overwrites. Let’s look into this deeper. Consider the following example: CREATE SUBCUBE [Adventure Works] AS (SELECT { [Customer].[Customer Geography].[City].&[Redmond]&[WA] ,[Customer].[Customer Geography].[City].&[Seattle]&[WA] } ON 0 FROM [Adventure Works]) We are creating filter to include only two cities – Redmond and Seattle. Now if we send the following query: WITH MEMBER [Measures].[Gross Margin] AS [Measures].[Internet Gross Profit]/[Measures].[Internet Sales Amount] ,FORMAT_STRING = 'Percent' SELECT {[Measures].[Internet Sales Amount], [Measures].[Internet Gross Profit],[Measures].[Gross Margin]} ON 0 , [Customer].[City].MEMBERS ON 1 FROM [Adventure Works]   Internet Sales Amount Internet Gross Profit Gross Margin All Customers $153,989.23 $64,075.99 41.61% Redmond $78,824.37 $33,150.36 42.06% Seattle $75,164.86 $30,925.63 41.14% We will observe two things. First, we only get two cities on row axis – that’s the implicit exists part. Second, the values for physical measure Internet Sales Amount, cube defined calculated measure Internet Gross Profit and query defined calculated measure Gross Margin – are all visual totals, i.e. they all are computed using data for only two cities defined by the CREATE SUBCUBE command. AS2008 adds capability to only keep implicit exists behavior, and to return real totals as opposed to visual totals. The syntax is described in MSDN documentation and in our example it will be (note that NON VISUAL keywords) CREATE SUBCUBE [Adventure Works] AS NON VISUAL (SELECT { [Customer].[Customer Geography].[City].&[Redmond]&[WA] ,[Customer].[Customer Geography].[City].&[Seattle]&[WA] } ON 0 FROM [Adventure Works]) Now if we execute the same query, the result is different:   Internet Sales Amount Internet Gross Profit Gross Margin All Customers $29,358,677.22 $12,080,883.65 41.15% Redmond $78,824.37 $33,150.36 42.06% Seattle $75,164.86 $30,925.63 41.14% The implicit exists still happens, we see only two cities, but there is no visual totals anymore. Both physical and calculated measures are computed now as if there were no CREATE SUBCUBE. Same functionality applies also to subselects (documented on MSDN here). Why is this important ? Remember, that Excel 2007 uses subselects to implement multiselect functionality. However, since subselects only supported visual totals mode before AS2008, Excel team was forced to remove ability to show non-visual totals in PivotTables. This caused users dissatisfaction, since users wanted to have control over numbers in PivotTables, just like with previous versions of Excel and Analysis Services. This problem is well documented, for example[...]



Custom filters with MDX subcubes

Tue, 04 Nov 2008 09:30:43 GMT

Omniture Discover is not general purpose OLAP client tool. Rather it is specialized application that lets users to explore data from the Omniture’s Web Analytics product. But it does look and feel like an OLAP browser – which is not surprising, since “A” in OLAP stands for “Analysis”, and the product is all about analyzing web site performance. Omniture model feels like typical cube too - there are dimensions (called segments), and there are measures (called metrics), and there are drilldowns, pivots, aggregations etc. However, there is something not typical about this client tool – the way it handles slicers. Usually, OLAP client provides a UI element called slicer or filter or page field or something similar, where user can choose how the current view should be sliced. UI for slicer does differ between applications, from classic drilldown list to calendar control for Time dimension to something else, but there is always something to choose the current slice. Omniture Discover doesn’t have this. Instead, it uses different paradigm. User is expected to define “custom segments”, which really are dimension filters, but they are given names and list of these segments is stored in special folder. Then user can apply these predefined filters to any report view. The definitions of the segments are very flexible, they really can be arbitrary expressions – something from “Last 3 days” (filter on Time) to “Users referred by www.live.com through paid search campaign who bought product in specific category” (filter on Referer, Action and Product dimensions) etc. Dimensions used in the custom segment definition can be included in the report, and can be omitted from it – the filter is applied in either case. The only limitation is that only one such segment can be applied to the report at a time – if the user want to combine two custom segments, he needs to create a new one. First I was taken aback by how unusual this approach was, but then I saw that it could come in handy sometimes. So this is an interesting approach, how could we add this capability to existing Analysis Services client tools ? Well, the only way to extend existing client UI functionality is through actions, so it is clear we will need to use actions here, but how can we implement custom segments in MDX ? Actually, MDX has perfect solution for that – and it is CREATE SUBCUBE statement – in fact – this is exactly what CREATE SUBCUBE was invented for in the first place – to be able to apply arbitrary custom filters on any MDX query. CREATE SUBCUBE sets this filter in the session state, and all MDX queries executed after it are affected  by it. Let’s take few examples from Adventure Works 2008 cube to demonstrate this approach in action: 1. Filtering on named set. In this example, we want to limit analysis to the top 50 customers. Fortunately, Adventure Works cube already has named set defined inside MDX Script which defines these customers. So, our CREATE SUBCUBE will  be fairly simple: CREATE SUBCUBE [Adventure Works] AS (SELECT [Top 50 Customers] ON 0 FROM [Adventure Works]) The only tricky thing here to remember is how actions work in Analysis Services. The action expression is an MDX expression, which, in case of STATEMENT action type, should return string representing MDX statement to be executed by the client tool. Therefore, in the action expression field – we need to put a string with statement inside, not the statement itself. This will become more clear in example 3. Here is how action definition will look like: Name : Top 50 Customers Target type : Cells Target object : All cells Type : Statement Expression : "CREATE SUBCUBE [Adventure Works] AS (SELECT [Top 50 Customers] ON 0 FROM [Adventure Works])" Invocation : Interactive Caption : Top 50 Customers Caption is MDX: false This action now works fine inside cube browser – once executed, the [...]



Gross margin - dense vs. sparse block evaluation mode in MDX

Sun, 02 Nov 2008 01:23:35 GMT

Gross margin (also known as Gross profit margin or Gross profit rate) is defined as (Revenue – Cost of Sales)/Revenue. In terms of Adventure Works sample database we can write this in MDX as [Gross Margin] = ([Measures].[Internet Sales Amount] - [Measures].[Internet Total Product Cost]) / [Measures].[Internet Sales Amount]; While this expression is simple enough, it might be tempting to try to optimize it. If we look at the evaluation tree for this expression – we will see 5 nodes in it: Fetching values for Internet Sales Amount twice – because it appears twice in the expression, fetching values for Total Cost, one minus operation and one division operation. Accessing Internet Sales Amount twice is not really a problem, because second access is going to come from cache anyway, but one might think that removing this extra operation would still improve thing, if only a little. So, perhaps we can rewrite the formula using simple math equivalence: (a-b)/a = 1 – b/a The first problem with this rewrite is that it is not always correct. In math it relies on the fact that a/a=1, but is is only true when a <> 0. The result of a/a is undefined when a=0. In MDX, 0/0 will also result in undefined number, usually formatted as “-1.#IND”. But more interesting question is what would happen in MDX when a is empty, i.e. has value of NULL. In MDX NULL/NULL = NULL, therefore when both Sales and Cost are empty, i.e. there is no record in the fact table, the two formulas are going to give different results. (a-b)/a = NULL when a = NULL and b = NULL 1 – b/a = 1 when a = NULL and b = NULL So at very least someone first need to set exact definition of Gross margin at the coordinates where no sales occurred. Is Gross margin 100% there or is it empty as well ? But I want to direct your attention to the performance aspect of this change. Remember, it seemed that 1-b/a would perform a little bit better – is it really so ? Let’s compare both approaches side by side in AS2005 using MDX Studio. // Gross margin using 1-a/b WITH MEMBER [Gross Margin] AS 1 - [Measures].[Internet Total Product Cost] / [Measures].[Internet Sales Amount] , FORMAT_STRING = 'Percent' MEMBER [Max Gross Margin] AS Max ( ( [Customer].[Customer Geography].[Customer] ,[Product].[Product Categories].[Subcategory] ,[Date].[Calendar].[Calendar Year] ) ,[Measures].[Gross Margin] ) SELECT [Measures].[Max Gross Margin] ON 0 FROM [Adventure Works]; Time : 6 sec 640 ms Calc covers : 6 Cells calculated : 1 Sonar subcubes : 2 NON EMPTYs : 0 Autoexists : 1 EXISTINGs : 0 SE queries : 2 Cache hits : 3 Cache misses : 1 Cache inserts : 1 Cache lookups : 4 Memory Usage KB : 88840 WITH MEMBER [Gross Margin] AS ( [Measures].[Internet Sales Amount] - [Measures].[Internet Total Product Cost] ) / [Measures].[Internet Sales Amount] , FORMAT_STRING = 'Percent' MEMBER [Max Gross Margin] AS Max ( ( [Customer].[Customer Geography].[Customer] ,[Product].[Product Categories].[Subcategory] ,[Date].[Calendar].[Calendar Year] ) ,[Measures].[Gross Margin] ) SELECT [Measures].[Max Gross Margin] ON 0 FROM [Adventure Works]; Time : 234 ms Calc covers : 7 Cells calculated : 1 Sonar subcubes : 1 NON EMPTYs : 0 Autoexists : 0 EXISTINGs : 0 SE queries : 3 Cache hits : 3 Cache misses : 1 Cache inserts : 1 Cache lookups : 4 Memory Usage KB : 0 The results come shockingly different. Almost 7 seconds for the “optimized” approach, and mere 234 ms for the original approach. Let’s try to understand why. First let’s compare the number of SE queries. It is 3 in the (a-b)/a approach, and 2 [...]



Optimizing order of sets in MDX crossjoins

Fri, 24 Oct 2008 10:00:05 GMT

For scalar values a*b is the same as b*a. (although we saw that performance of these two could be different in MDX in “Performance of multiplication in MDX” article). But if a and b are sets, then obviously results are different – since the order of tuples in the resulting set will be different. But does it matter if we were to perform some other operation on top of result, which wouldn’t depend on order of the tuples ? Semantically – the answer is no, but there could be a big difference in performance. Let’s take the following example: WITH MEMBER [Measures].x AS Count( ( [Customer].[City].[City] ,[Customer].[Gender].[Gender] ,[Customer].[Education].[Education] ,[Product].[Subcategory].[Subcategory] ,[Product].[Color].[Color] ,[Product].[Size].[Size] ) ) SELECT x ON 0 FROM [Adventure Works] This query runs in 78 ms – no problem. But if we do innocently looking change – move one set to another position - WITH MEMBER [Measures].x AS Count( ( [Customer].[City].[City] ,[Product].[Subcategory].[Subcategory] ,[Customer].[Gender].[Gender] ,[Customer].[Education].[Education] ,[Product].[Color].[Color] ,[Product].[Size].[Size] ) ) SELECT x ON 0 FROM [Adventure Works] Now this query takes 13 seconds, and also grabs significant memory. So what happened ? To understand this better, let’s go back to the first query and pay closer attention to what MDX Studio told us about the execution. We notice, that in PerfMon tab, it reported that there were 2 autoexists, and also hierarchical profiler shows 2 NON EMPTY events (each one corresponding to autoexist operation). Why does it report 2 ? The query has just single crossjoin, so there should’ve been only 1 autoexist. What happens here is that crossjoin detects that the set that it needs to apply autoexists to, can be nicely split into two parts – first part related to Customer dimension, and second part to Product dimension. Then it can apply the following formula JOIN( c1, c2, c3, p1, p2, p3 ) = CROSSJOIN( INNERJOIN(c1,c2,c3), INNERJOIN(p1,p2,p3) ) I.e. it can run inner join against each dimension table separately, and then do full cross join between results. Inner join against single dimension table is very efficient since even in worse case, we don’t need to go deeper than the dimension key to which all other attributes relate – so this isn’t going to take additional memory or much time. However, in second case, the equation doesn’t hold anymore, since the order of dimensions inside crossjoin is mixed. So some other, much less efficient algorithm will be needed, probably loop join, with crossjoin now being pushed to the earlier stages. This is very inefficient and also going to take lots of additional memory. Conclusion: Inside crossjoin, cluster all the sets from the same dimension together – this will result in great benefit to performance.[...]



Optimizing MDX aggregation functions

Wed, 22 Oct 2008 22:32:01 GMT

One of the most significant changes in Analysis Services 2008 was improving performance of MDX queries and calculations. In particular, query optimizer can choose block (a.k.a. subspace) computation mode in query plan more often than in AS2005, and usually using block/subspace computation mode brings performance orders of magnitude better than without it. However, even in AS2008, query optimizer is not always capable of using block mode. The “Performance Improvements for MDX in SQL Server 2008 Analysis Services” whitepaper documents which MDX constructs are optimized and which not, and MDX Studio automates process of analyzing MDX and figuring out which fragments are not optimized, and offers advice how to optimize them. In this article we will cover techniques related to the optimization of aggregation functions in MDX, i.e. Sum, Min, Max, Aggregate and (in AS2008) Avg. All examples were done on AS2008 version of Adventure Works (some of these techniques will work with AS2005, but others won’t). 1. Aggregating over Filter Scenario: We want to compute average sales for the products which increased its sales since same date last month. The normal approach is to use Filter function to determine which products grew their sales, and then apply Avg to the resulting set. Now applying this to every day in the Year 2003, we will get WITH MEMBER [Measures].AvgGrowingProducts AS Avg ( Filter ( [Product].[Product].[Product].MEMBERS ,[Measures].[Sales Amount] > ([Measures].[Sales Amount],ParallelPeriod([Date].[Calendar].[Month])) ) ,[Measures].[Sales Amount] ) SELECT [Measures].AvgGrowingProducts ON 0 ,Descendants ( [Date].[Calendar].[Calendar Year].&[2003] ,[Date].[Calendar].[Date] ) ON 1 FROM [Adventure Works]; Running this query took more than 7 seconds on my laptop, and the culprit is clear from the perfmon counters – there were 295011 cells calculated – this is a sign of cell-by-cell iterations as opposed to working in the block mode. And if we run “Analyze” function of MDX Studio, it will tell us why this is happening: Function ‘Filter’ was used inside aggregation function – this disables block mode. How do we get rid of Filter ? I actually has written about it before – in the “Optimizing Count(Filter(...)) expressions in MDX” with respect to the Count function, but same technique can be applied to any aggregation function, including Avg (in AS2008). The idea is simple – even though Filter reduces the set of products, in order to compute the Filter in the first place, we really have to check sales of every single product, so we may as well just run Avg over all products, but with smart expression which will return NULL in case product should not be included – and since Avg ignores NULLs, this will work correctly. Here is how rewritten MDX will look like: WITH MEMBER [Measures].Growth AS IIF ( [Measures].[Sales Amount] > ([Measures].[Sales Amount] ,ParallelPeriod([Date].[Calendar].[Month])) ,[Measures].[Sales Amount] ,NULL ) ,FORMAT_STRING = 'Currency' MEMBER [Measures].AvgGrowingProducts AS Avg ( [Product].[Product].[Product].MEMBERS ,[Measures].Growth ) SELECT [Measures].AvgGrowingProducts ON 0 ,Descendants ( [Date].[Calendar].[Calendar Year].&[2003] ,[Date].[Calendar].[Date] ) ON 1 FROM [Adventure Works]; Now it executes in mere 200 ms, and number of cells calculated is 365 – exactly the same number as number of cells in the result – which is the best theoretical value we can get ! 2. Aggregating over NonEmpty Scenario: Compute average sales for all products. The solution is straightforward: WITH MEMBER [Measures].AvgProductSales AS Avg ( [Product].[Product].[Product].MEMBERS ,[Measures].[Sales Amount] ) [...]



Score another point for XMLA and MDX

Tue, 14 Oct 2008 19:21:00 GMT

While some people are wondering whether ‘XMLA is dead’ and call to ‘Resurrect the XMLA Council’, the reality in the field is quite different. There are, of course, plenty of XMLA clients, and new ones being born – the server side, XMLA providers also gain numbers. Today open-source OLAP server Palo announced support for XMLA and MDX.

Actually, they announced even more – also support for OLEDB for OLAP (which is where XMLA came from). Of course, the biggest driver seems to be compatibility with Excel PivotTables, but whatever is driving it, it should be pretty clear to everyone, that XMLA is not dead, it is alive and well, even though there were no changes in the standard since 2003 – maybe it means we did such a good job on the standard, that XMLA 1.1 is enough for everything :)

(image)



Get most out of partition slices

Tue, 14 Oct 2008 09:36:33 GMT

Setting partition slice has always been an important optimization technique in Analysis Services. Every presentation talked about it and every whitepaper mentioned it, for example the Microsoft SQL Server 2000 Analysis Services Performance Guide contains a chapter appropriately named “Define the Data Slice for Each Partition”, here is a quote from it: “If the data slice value for a partition is set properly, Analysis Services can quickly eliminate irrelevant partitions from the query processing and significantly reduce the amount of physical I/O and processor time needed for many queries issued against MOLAP and HOLAP partitions Caution: Creating a partition without setting the data slice is not a good practice, and can result in considerable overhead being added to the system (artificially increasing response times). Without the data slice, Analysis Services cannot limit a query to the appropriate partitions and must scan each partition even if zero cells will be returned. The data slice enables Analysis Services to determine which partitions contain data relevant to the query” However, after AS2005 was released, there was a new tip making it into presentations - “For MOLAP partitions no need to specify slice property, it is detected automatically”. Even though I briefly debunked it before here,  this rumor just kept popping up, and it even made it as far as official “OLAP Design Best Practices for Analysis Services 2005” document – here is the quote: “For MOLAP partitions, you do not have to specify the slice because the server will be able to figure out the relevant partitions after the data has been processed” Even worse, the official documentation on data slices goes as far as stating: “Data slices are applicable only to objects that use the ROLAP storage mode” But this is completely not true, and many people already discovered it. This Jesse Orosz’s blog entry has great description of 4 reasons why setting partition slice in MOLAP is important. In my blog I want to focus on #3 from his list – automatic slice detection is not perfect. Here is how automatic slice detection works – during building indexes for the partition, for every attribute it determines the minimum and maximum data id values in this attribute (for more detailed explanation see section “Partition Slice” in the Microsoft SQL Server 2005 Analysis Services book, page 355). It is possible to discover what values exactly Analysis Services detected, either doing it hard way as described here, or doing it easy way as described here. In either case, it is clear, that since slice autodetection operates with ranges of data ids, it is totally possible that partition which contains only two attribute members, but one with the lowest possible data id (2), and another one with highest possible data id, and as a result the range will cover all possible values, i.e. will be useless. On the other hand, if for certain attribute the data in partition contains only single value of member’s data id, then the range will be very effective – since such a range will be equivalent to slice on a single member. So this must be the root of all the confusion in documentation and in whitepapers. Prior to AS2005, partition slice could have only been a single member. And specifying partition slice of a single member is really not necessary in MOLAP, since it will always be automatically discovered by autoslice (well, except for other 3 reasons that Jesse lists in his blog). But starting with AS2005, it is possible to specify MDX sets for partition slice, and that’s exactly the case where setting partition slice manually makes sense. Analysis Services Stored Procedures Project on Codeplex features PartitionHealthCheck function, which shows the overlaps[...]



Screencast of “MDX Studio to the rescue” presentation

Sat, 11 Oct 2008 23:50:01 GMT

During MS BI Conference 2008 I had a presentation called “Got MDX problems ? MDX Studio to the rescue !”. Given how the theme of self-service BI has been in the center of the conference, I decided to apply some self-service BI to analyze the performance of the presentation. The conference website provided all the evaluation data through Excel Services running in Sharepoint, but as a true self-service power user, I immediately exported all the data to Excel to do all the analysis on my machine. Fortunately, even with 3000 participants, 120 sessions and 8 questions in each evaluation form – the total number of rows was well within today’s Excel abilities to handle, so I didn’t need to install Gemini to deal with millions of rows yet. The overall score was 3.94 (out of 5), barely above median at 3.87. By this overall score my session ranked at mediocre 50 out of 120. Slicing data by individual question, and comparing it to the rest of the sessions, I got some insights. The good news were that I did really well in what I considered the most important question in the evaluation: “How applicable/relevant was the content of this session to the requirements of your job”. Here, my session ranked 10th with score of 4.45. It scored even better in the “How knowledgeable was the speaker on the subject” – rank 9 with score of 4.68. But what pulled the total numbers down was “How effective was the speaker in presenting the information”. Here I ranked 86th with score of 3.64, below both median and average.   Overall Speaker was knowledgeable Content applicable to the job Speaker presented effectively Highest 4.79 4.88 5.00 4.75 MDX Studio 3.94 4.68 4.45 3.64 Median 3.87 4.33 4.00 3.9 Avg 3.80 4.23 3.90 3.83 Lowest 2.61 2.5 2.33 2.7 Rank 50 9 10 86 Percentile 42.37% 7.63% 8.47% 72.88% Percent below 57.63% 92.37% 91.53% 27.12%      So the conclusion is that the content was very valuable and useful, and I had all the knowledge, but clearly had a problem with presenting this content and knowledge effectively. Now I needed to drill down into that area – so I went over all the comments trying to correlate them with this low score. This revealed 3 areas that appeared repeatedly through the comments: 1. Some people were not able to make it to the session because the room was too small, and and filled to capacity well before the session started. Few of these people filled the evaluation forms giving low satisfaction scores. So while I feel it is a little unfair to score the session that one didn’t attend, but on the other hand I totally understand their frustration, and they didn’t have any other mechanism to give feedback other than filling the evaluation forms. 2. By far the biggest complaint was that the computer screen in the room was too small, and it was difficult or impossible to see what was on it. Pretty much everybody gave this feedback – even people who gave me high scores. And same sentiment was repeated again and again in the comments section of my blog as well. 3. Lastly, I did not speak very clearly, so few people struggled not only to see what was on the screen, but also to understand what I was saying. OK, now I understood the root cause of the problem. But this isn’t enough, with all the talk about “Actionable BI”, the real value will come from taking the corrective action. But how can I correct[...]



MS BI Conference 2008 is over

Thu, 09 Oct 2008 04:10:44 GMT

The conference is over, and I am back home. Of course unlike some other folks who will spend 11 hours flying over Atlantic to get home, I only spent 11 minutes driving over the Lake Washington. Well, it’s time to reflect on the conference, here are some of my thoughts: 1. No doubt, Project Gemini was the main focus and center of the conference. I was overhearing people talking about it all the time - in the hallways, on escalators, during lunch etc. Both mainstream and blogosphere had plenty of coverage too. Pretty much everybody who stopped to have a chat with me talked about Gemini. I think I have heard the full specter of opinions, from absolute excitement and “ship it now” comments to full dismissal and “you are going to create spreadmart hell” comments (even though Gemini claims completely the opposite – solve the spreadmart hell). Anyway, I think we need some more time for dust to settle. One thing that everybody seems to agree is that underlying technology is very powerful (even though there were almost no details shared on this technology). Amir’s presentation yesterday was fantastic, especially since he pretty much had to repeat the same demo steps as in the keynote (but with 100 million rows instead of 20 million – yet performance was exactly the same – instantaneous). Unfortunately, his presentation won’t be available online, but in 6 weeks everybody will get it on DVD. I am curious to see response from analysts, there were quite a few in the conference, especially from Nigel Pendse. 2. The technical content was much better this year than during last year BI conference. Pretty much every session I attended was great. The level of content was on par or sometimes even better than what we usually get in PASS and TechEd. (I didn’t go to any session in the “Business Value” track, nor to keynotes except for the first one – I heard not so great opinions about those). But the experience with chalk talks was horrible. The rooms were way too small, and people lined up well in advance to get into chalk talk they wanted, but many of them were not able to, the doors were closed when capacity got filled. I really felt sorry for people who spent lots of money and traveled from far away only to not being able to attend the session they wanted. Of course, all these sessions were recorded, but the whole point of attending the conference is to get into live presentations. So I didn’t even try to get into any of these chalk talks in order to get other people better chance. 3. Big thanks for all the people with whom I got a chance to chat. It felt great to get all the “thanks – I read your blog” comments. Correlating number of hits in my blog and number of people with whom I talked – I think I now pretty much know every reader of my blog by name :) 4. On a more negative side, my chalk talk about MDX Studio today didn’t go well at all. It must have been one of the worst presentations I ever did. People started to take seats in the room an hour before the time, and 30 minutes before the session the room was already completely full. I became very nervous, because I when I saw it – I was afraid that the sessions might not live up to the expectations (after all, people skipped another session just to get seat on mine), and this is exactly what happened. On top of it, the screen in the room was very small, and most people couldn’t even see anything on it, which, of course, didn’t help the session which spent almost all the time in demos. So, while the overall evaluation score for the session was OK and above average, I got some of the comments like “worst session in the conference” :( I apologize for that.[...]



Watch announcements from MS BI Conference in video recording

Tue, 07 Oct 2008 07:27:39 GMT

Hopefully last live blog post for today :) I keep getting constant stream of questions about both project Madison and Gemini. The Ted Kummert keynote where both announcements were made was recorded and can be viewed online – hopefully this will get many questions answered about both announcements.

Steven Elop & Ted Kummert keynote recording: Low Medium High

For project Madison (DATAllegro, scale out DW) - scroll to 1 hour 4 minutes

For project Gemini and SQL Kilimanjaro release (self-service BI, column oriented, in memory) – scroll to 1 hour 16 minutes and enjoy “Fairy Tale” story and Donald’s demo. “Fairy Tale” is great, by the way – it is pretty funny but it resonates extremely well with anybody who has been doing real BI projects.

(image)



So what is exactly “Project Gemini”

Mon, 06 Oct 2008 23:44:53 GMT

In the couple of hours since the announcement here at MS BI conference, I talked with few people already and saw few threads and comments in the Web, and (as expected), there is plenty of confusion about what “Project Gemini” is and isn’t etc. Here are the facts:

1. Project Gemini is the Analysis Services

2. It is next version of Analysis Services, which will ship before SQL11, as part of SQL Server “Kilimanjaro”

3. It uses column oriented processing and in memory storage as core technological principals

Out of all coverage on the Web, that I saw so far – the best explanation comes from Doug Henschen article “Microsoft Will Bow In-Memory Analysis, Appliance-Ready Database in 2010” in Intelligence Enterprise. Here is the key quote from this article:

“It's essentially another storage mode for Microsoft SQL Server Analysis Services with access via MDX, so existing applications will be able to take advantage of the performance enhancements”

Just to repeat it in other words: Existing applications that work today with Analysis Services through XMLA and MDX will be able to work with Gemini, because Gemini is Analysis Services. In fact, in “SQL 2008 and beyond” session today, Thierry demonstrated how Report Builder worked with Gemini seamlessly. And because column oriented (and in-memory) are used as a foundation of storage engine, the performance and scalability by data volumes is much better.

To read more about column oriented processing: http://www.dbms2.com/category/database-theory-practice/columnar-database-management/

To read more about in-memory databases: http://www.dbms2.com/category/memory-centric-data-management/in-memory-dbms/

(image)



The big news from MS BI Conference are out

Mon, 06 Oct 2008 21:08:55 GMT

So there were two sets of announcements just made during Ted Kummert keynote.

1. Project “Madison” – Share-nothing MPP scale-out version of SQL Server, powered by DATAllegro technology. We saw live demo of 24 instances running 150 TB data warehouse, returning queries into Reporting Services in seconds. This is great news, but everybody probably could predict that this is going to happen sooner or later after DATAllegro acquisition was announced. Well, it is sooner rather than later.

2. SQL Server “Kilimanjaro” to be released in first half of 2010, driven by project “Gemini” technology. This is the big big news I referred to earlier. While Ted in his presentation and Donald during the demo were focused on the Self-Service part of it, I want to put attention to the technology powering it. Just repeating what Ted and Donald said: This is Analysis Services using column-oriented and in-memory technologies running inside Excel (first demo) and in SharePoint (second demo). This is what allows loading of all these tens millions of rows into Excel and fully manipulating them in split seconds, building pivot table on the fly, infering data relationships etc etc. As Ted said, the work on this project has been ongoing for quite some time now, way before SQL 2008 shipped, and this is first public showing of the technology. Much more details coming during this session – don’t miss it tomorrow !

Update:

Some more coverage here:

Richard Tkachuk: http://blogs.msdn.com/sqlcat/archive/2008/10/06/project-gemini-building-models-and-analysing-data-from-excel.aspx

Marco Russo: http://sqlblog.com/blogs/marco_russo/archive/2008/10/06/microsoft-bi-conference-2008-announcements-on-stage.aspx

(image)