SQL Configure: Confusion on Cube Design

Hello everyone. I'm having a bit of trouble getting my head around designing a new cube and I'm hoping this forum can help. I have a database which contains the following simplified structure:

tbl_Panelist:

panelist_id

tbl_Question:

question_id

tbl_Answer:

answer_id

tbl_Result_Set:

panelist_id

question_id

answer_id

I'm trying to design a cube which will allow analyzing of the counts of how many panelists answered each question by each answer. For example if Q1 has possible answers of A,B,C and Q2 has possible answers of X,Y, I'd like to be able to browse the cube and see

X Y

A 10 5

B 3, 2

indicating that 10 people answered Q1 with A and Q2 with X. My current thinking has been to create a view for each question and create a dimension off of that. Is this the correct method in a case like this? Any help would be greatly appreciated. Thank you.

Alex Levin

Principal Consultant

Fifth Marker Consulting, LLC

alex.levin@.fifthmarker.com

Hello Alex,

I was able to create an AS solution for your case simply by using the cube wizard. Basically you create 3 dimensions - Panelists, Questions, Answers bound to the respective tables and tell the cube wizard that your fact table is ResultSet, while including the existing dimensions. The resulting cube will provide additional analysis by Panelists, if needed.

If you write me at this address: andrewgaATnetzeroDOTcom i will send you a zip file with the AS solution, which you can deploy onto your server.

Andrew

|||

Thank you for responding. That is exactly what I attempted initially. However, I could not have different Questions act as different dimensions through that technique. I should say that at this point, due to the views, I am not having problems creating the dimensions. My current problem lies with attempting to create measures that get "hit" by every dimension.

To make this even more complicated, a single person can answer some of the questions with multiple answers. Thus, I need to have two different measures, one for a raw count of the answers, and the other with a distinct count of each panelist's answers.

Finally, I should also mention that I am using SQL Server 2005 and SSAS 2005 to do this project. Thanks again for any help that could be provided.

Alex Levin

Principal Consultant

Fifth Marker Consulting, LLC

alex.levin@.fifthmarker.com

|||

Hello Alex,

> distinct count of each panelist's answers.

Could you please clarify? When you browse this measure and panelists dimension, do you want to see how many *questions* each panelist answered with at least one answer? Also, do you need it as a measure, which should also behave nice when user browses not panelists dimension, or all you need is some MDX query, which would fetch this information (calc measure within one MDX query)?

I am asking because in my test i created a Count measure bound to the row of the fact table. When i browse the panelists dimension and that measure i do get the numbers of answers provided by each panelist. I suppose you gave it "raw answers" name.

Andrew

|||

I'm not exactly sure of the correct terminology here, so perhaps a continuation of my first example will explain what I'm looking for. Using the same table structure mentioned previously, suppose three panelists, p1, p2, and p3. We'll also suppose two questions, q1 and q2. Finally, q1 has possible answers A-E and q2 has possible answers W-Z. Here are the responses of each:

P1 A, C, D, W, Y, Z

P2 A, B, W, Z

P3 A, G, X, Z

Using this data, I would like to be able to build a pivottable where I can have both Q1 and Q2 as seperate dimensions. Keep in mind, in my actual project, there are closer to 40 questions, rather than just 2. For now I'd just like to focus on one measure as there are other factors involved in how the questions are answered. I would expect the results of this pivottable to looks something like:

W X Y Z

A 2 1 1 3

B 1 0 0 1

C 1 0 1 1

D 1 0 1 1

E 0 0 0 0

The trouble I seem to be running into is that when I try and examine dimesions with a granularity based on the answerID, I am getting no "hits". When there is a one to one relationship between the answer and the panelist, I am able to have a dimension with a granularity of the panelistID. In this case I get the results exactly as expected. I can even examine a granularity of panelist against a granularity of answer and get the result I expect. It's the answer to answer where I am currently having problems. Thanks again for your help.

Alex Levin

Principal Consultant

Fifth Marker Consulting, LLC

alex.levin@.fifthmarker.com

|||

Just to move this along a bit more, the crux of the problem that I am having is that for questions (dimensions) that allow multiple answers by the same person I cannot browse those two dimensions in the pivottable. I have now changed their granularity to be a collection of the answerID and the panelistID. While this seems to be a step closer, I still cannot get "hits" off of them. If this were a SQL query, I would just do an intersect on the results of selects of the panelist per question query. Once again, any help would be greatly appreciated. I'd be happy to further clarify what I've done if any of this is unclear. Thank you.

Alex Levin

Principal Consultant

Fifth Marker Consulting, LLC

alex.levin@.fifthmarker.com

|||

Hello Alex,

I have just sent you my version of the solution but that one was based on the first post.

Your second post needs clarifications. In your matrix of the results, which you would like to obtain, let's discuss the left-top cell.

A 2

From your description, A and W are the instances of answers. Basically you have put the same dimension "Answers" on columns and rows axes. This can't be.

Anyway, how the value of 2 was computed? From your description the only correlation i can find is "the number of panelists, who gave both answers".

|||

Hello Andrew. Thank you for taking the time to help me. I will attempt to clear this up a bit. A and W are exactly what I need to create a pivottable on as they represent the results of two different questions. I have attempted to overcome this through the use of views containing the results of each question. So, to continue with the example above, I have in my database and data source view:

View_Q1 - panelist_ID, question_ID, answer_ID where question_ID = Q1

View_Q2 - panelist_ID, question_ID, answer_ID where question_ID = Q2

Keep in mind I'm simplifying this as of course you'd want the friendly names and some other information in addition to the ID's for actually building the dimensions. I have then created a dimension from each view table and used my master results table as the Measures table.

This all works fine when comparing questions with a 1-1 (one to one) panelist to answer relationship and even a single 1-M (one to many) question compared to a 1-1 table. My problem comes from comparing two 1-M questions. I've set the granularity in these cases to be the collection of answer and panelist. Perhaps using views is completely the wrong approach, but it certainly seems like I'm pretty close. The example above is of two 1-M questions.

Finally, to answer your second question, 2 is the count of panelists who answered both A and W. In this example that would actually be panelists P1 and P2. I'm only concerned with the count of panelists, and not with actually identifiying the panelists in any way. Does that explain what I am trying to accomplish or is there still something else that I can clear up? Once again, thank you for your help.

Alex Levin

Principal Consultant

Fifth Marker Consulting, LLC

alex.levin@.fifthmarker.com

|||

Hello Alex,

It might take some time till i will be able to think how to resolve your problem by means of OLAP, but having read your goal i wanted to ask some questions if you were really determined to solve it with OLAP.

The model of the cube should be good enough to resolve not just one report. Although it is also OK to build a cube for the sake of one report, it does not seem people do that.

In your case, what would be the value of your measure for the tuple (AnswersDim1.All, AnswersDim2.All)? Is it the number of panelists answered all questions?

What would be the values for the tuple (AnswersDim1.A, AnswersDim2.All) and (AnswersDim1.All, AnswersDim2.A)?

Suppose you make both dimensions non-aggregatable. You would not need to think about the 2 questions above, but would need to think about the default members for the 2 dimensions. Are you planning to run MDX queries like:

select [measures].yourmeasure on 0,

AnswersDim1.members on 1

from Cube

Notice, that this query would need to fetch tuples like (AnswersDim1.currentmember, AnswersDim2.defaultmember).

Being not advanced in OLAP modeling personally, if my goal was to produce just that matrix report you described, i would have been done in one day by writing a C++ or C# program, which would open forward-only row-set for a view joining your tables and building in-memory structures based on STL (C++) or Collections.Generics (C#), calculating your goal, writing the result as a SQL table and binding the report to that already calculated table.

Of course, i would think more if the number of panelists were huge. I suppose the number of questions and possible answers is not big, because making the report the way you described would make it unusable.

Andrew

|||

Unfortuantely, I fear that I must use OLAP for this project. The business requirements are that analysists must be able to view the responses to any combination of questions in a pivottable in Excel. To give you an idea of exactly how much data we're talking about here, there are currently about 60 questions with about 2-15 answers each. Currently there are a few thousand panelists, but this is expected to eventually reach into the hundreds of thousands. The database structure was designed to allow for this expected increase in data and for the easy ability to change questions and answers as time goes on.

You are correct in noticing that my aggregations are fairly meaningless in this project. There are some questions which have groupings of answers, but for the most part, each question only has one level in its heirarchy. Perhaps I can turn off aggregations in these cases if that will help.

To answer your questions specifically, Q1Dim.All,Q2Dim.All (I assume that's what you meant) would be 15 which represents the total number of responses to Q1 + the total number of responses to Q2. Had these both been 1-1 questions, that total would represent the union of panelists who answered each of the questions. Q1Dim.A, Q2Dim.All would be7 and Q2Dim.A, Q1Dim.All would be 5.

As I said at the outset, this is my first project using Analysis Server so it would not be surprising if I am going about this the wrong way or if I have missed some fundamental underlying concept. I was not expecting to have to write any MDX queries in this phase. What role does the defaultmember property play here?

Finally, as best as I can tell the crux of my problem may rest on being able to create a measure which will find results within a dimension using the collection of answerid and panelistid but find results between dimensions just using the panelistid. Does that sound correct to you based on your understanding of what I am trying to accomplish? Again, thanks for your time and interest on my behalf.

Alex Levin

Principal Consultant

Fifth Marker Consulting, LLC

alex.levin@.fifthmarker.com

|||

I think I've moved forward a bit on this problem. As I'm not an expert on MDX, I've written this in SQL code. Hopefully someone out there can help me write an MDX based measure which will be the equivalent:

with a1 as (select distpanelistid from dbo.tbl_Survey_Results where surveyanswerid = 40 intersect select distpanelistid from dbo.tbl_Survey_Results where surveyanswerid = 491)

select count(distpanelistid) from a1

In this case I'm specifically getting the count for answers 40 and 491. I need the MDX to be of whatever answers I am currently browsing in the pivottable. Any MDX experts out there know this one?

Alex Levin

Principal Consultant

Fifth Marker Consulting, LLC

alex.levin@.fifthmarker.com

|||Yet another person raised similar task (topic: Need help designing the Cube). I did it in MSAccess, while i see that you made it with SQL. I also provided some thoughts why i failed to do it with OLAP.|||

Ok, after much effort and with some amazing help from Matt Burr with the Analysis Server group at Microsoft, I have a solution to this problem. I am going to quote from the wrap up email provided by Matt. If anyone has any questions or would like a copy of the sample cube and db which was created for this, I would be happy to provide them. Thanks to Andrew Garbuzov for his help on this issue as well. From Matt Burr:

Resolution

In the end, the solution involved creating views against your source “fact” table to represent individual query dimensions. Then, we created the different query dimensions from these views. We also created a Panelist dimension that served two purposes: (1) it contains the data that we actually will count and (2) it serves to help join together the various question dimensions so that navigating one dimension effectively navigates the other dimensions.

Next, we created measures based on each of the question dimensions; these simply counted the unique number of participants that provided any given answer to a question, but their primary value is that they serve as a sort of “hook” that we can use to relate the various question dimensions back to the Panelist dimension, and from there out to the other question dimensions.

We then used the Dimension Usage tab to ensure that the Panelist dimension related to the measures that we had created from the question dimensions, and thus related to the question dimensions (so that navigating or “filtering” a given question dimension subsequently navigated/filtered the Panelist dimension), and we also created many-to-many relationships from each of the question dimensions to each of the other question dimensions (their relationship with the Panelist dimension facilitated this), so that navigating/filtering any one of the question dimensions consequently navigated/filtered all of the other question dimensions, based on a shared set of panelists that provided certain answers to both of the questions.

Finally, we created a calculated member that counts the distinct number of participants that exist in the Panelist dimension, which will have been navigated/filtered by your choice of question dimensions. This calculated member tells you what you wished to know: the number of panelists that answered x for one question AND y for another.

Alex Levin

Principal Consultant

Fifth Marker Consulting, LLC

alex.levin@.fifthmarker.com

Thursday, 8 March 2012

Confusion on Cube Design

No comments:

Post a Comment

SQL Configure

Blog Archive

About Me