Tuesday, March 26, 2013

Got Data from People? Take Dan Ariely's Coursera course.

A Beginner's Guide to Irrational Behavior started yesterday.  One might not immediately think that such a course would be relevant for statistical modeling.  Well, it is if your statistical modeling uses people as informants.  If the data come from individuals responding to inquiries, then it would help if we understood the response generation process. 

For example, the reading list from the first week includes the article "How actions create - not just reveal - preferences."  For some reason this makes perfect sense in machine learning when we are programming our driverless car.  Well, people were the first processors of big data, and our survival depended on our ability to structure input to serve our needs.  We see what we must in order to act, and the rest is background.  Our statistical models should do the same.

I recommend this course because it is a fun introduction covering many of the examples that you must work your way through in order to understand how much of what we take as immediate and direct requires learning and processing.  What if that positive manifold, the halo effect, was not measurement error but a real manifestation of a greedy algorithm supporting action in an uncertain environment?  If such were the case, then I would not want my statistical model to control for halo effects because the halo is the driver of behavior.  Similarly, I would not use self-reports in my statistical model if the self-reports were after-the-fact justifications of choices made for other reasons.  These are the issues raised by Dan Ariely's Coursera course.  They ought to be addressed in our model specification.

Sunday, March 24, 2013

Does It Make Sense to Segment Using Individual Estimates from a Hierarchical Bayes Choice Model?

I raise this question because we see calls for running segmentation with individual estimates from hierarchical Bayes choice models without any mention of the possible complications that might accompany such an approach.  Actually, all the calls seem to be from those using MaxDiff to analyze the data from incomplete block designs.  For example, if one were to run a search for "MaxDiff segmentation," you would find many in the marketing research community arguing that one should run a two-stage clustering starting with hierarchical Bayes estimation of individual-level MaxDiff parameters and then using those estimates as observations in some type of cluster analysis.  However, there is no reason to limit the discussion to MaxDiff, since the same issues that one might have concerning segmentation with MaxDiff apply to all hierarchical Bayes estimates.

So, where's the problem?  There would be no question had we given each respondent a sufficient number of "choice sets" to estimate a complete set of parameters separately for each individual.  To refresh our memories, we are trying to measure the individual-level importance of an array of attributes or features (e.g., the attributes of a fast food restaurant).   Let us say that we had 16 attributes, and we wanted to form choice sets with four attributes in each block (e.g., clean bathrooms, reasonable prices, tastes good, short wait).  Each block is a choice set where the respondent would be shown the four attributes and asked which they wanted most and which they wanted least.  A balanced incomplete block design could be formed with 20 blocks.

I have written a function that uses the r package AlgDesign to generate an incomplete block design.  As you can see below, it is called as IBD(nalt, nperblock, nblock, niter) and takes four arguments:  number of total alternatives, number of alternatives per block, number of blocks, and number of iterations.  I have given a default value of 1000 for the number of iterations; however, you may need to increase that number depending on the complexity of your design (e.g., niter=5000).  The command IBD(16, 4, 20) will create an incomplete block design for 16 attributes, presented with four attributes per block, over 20 blocks.  You should note that the set.seed() function would be needed in order to obtain the same design each time and that you might need to try more than one set.seed() if the first does not generate a balanced design.

 library(AlgDesign)
  
 IBD<-function(nalt, nperblock, nblock, niter=1000) {
  BIB<-optBlock(~., withinData=factor(1:nalt), blocksizes=rep(nperblock,nblock), nRepeats=niter)
  design<-matrix(BIB$rows,nrow=nblock, ncol=nperblock, byrow=TRUE)
  counts<-crossprod(table(c(rep(1:nblock, rep(nperblock,nblock))), BIB$rows))
  list(design,counts)
 }
  
 IBD(16,4,20)

> IBD(16,4,20)
[[1]]
      [,1] [,2] [,3] [,4]
 [1,]    6   12   13   14
 [2,]    3    6   10   16
 [3,]    1    3    9   13
 [4,]    2    8   13   15
 [5,]    4    6    8   11
 [6,]    7   10   11   13
 [7,]    8    9   14   16
 [8,]    2    3    4   12
 [9,]    7   12   15   16
[10,]    1    8   10   12
[11,]    3   11   14   15
[12,]    5    9   11   12
[13,]    4    5   13   16
[14,]    2    6    7    9
[15,]    3    5    7    8
[16,]    2    5   10   14
[17,]    1    5    6   15
[18,]    1    2   11   16
[19,]    1    4    7   14
[20,]    4    9   10   15

[[2]]
    
     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  1  5 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1
  2  1 5 1 1 1 1 1 1 1  1  1  1  1  1  1  1
  3  1 1 5 1 1 1 1 1 1  1  1  1  1  1  1  1
  4  1 1 1 5 1 1 1 1 1  1  1  1  1  1  1  1
  5  1 1 1 1 5 1 1 1 1  1  1  1  1  1  1  1
  6  1 1 1 1 1 5 1 1 1  1  1  1  1  1  1  1
  7  1 1 1 1 1 1 5 1 1  1  1  1  1  1  1  1
  8  1 1 1 1 1 1 1 5 1  1  1  1  1  1  1  1
  9  1 1 1 1 1 1 1 1 5  1  1  1  1  1  1  1
  10 1 1 1 1 1 1 1 1 1  5  1  1  1  1  1  1
  11 1 1 1 1 1 1 1 1 1  1  5  1  1  1  1  1
  12 1 1 1 1 1 1 1 1 1  1  1  5  1  1  1  1
  13 1 1 1 1 1 1 1 1 1  1  1  1  5  1  1  1
  14 1 1 1 1 1 1 1 1 1  1  1  1  1  5  1  1
  15 1 1 1 1 1 1 1 1 1  1  1  1  1  1  5  1
  16 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  5

The first element [[1]] of the above list shows the attributes in each of the 20 blocks.  The second element [[2]] displays the number of times each of the 16 attributes occur together in the same block.  That is, all 16 attributes appear in five blocks (diagonal values), and each pair of attributes appears together in only one block (off-diagonal values).  I wrote this function in order to generate quickly incomplete block designs and check their "balance."   The idea came from the surveyanalysis.org website, which is full of interesting r code and well-written discussions of advanced statistical procedures.  You should try this IBD function with other well-known incomplete block designs, such as, IBD(6, 3, 10) and IBD(9, 5, 18).

Had every respondent answered all 20 blocks, we could provide MaxDiff counts for each respondent independent of all the other respondents.  To see how this works, suppose that the 16 attributes are in order so that the attribute labeled "1" is the most preferred and the attribute labeled "16" is the least preferred.  When we look at the 20 blocks in [[1]], "best" is always the attribute in the first column and "worst" is always the attribute in the last column.  It should be clear why we needed to ask both best and worst.   If we had not asked for the selection of the worst attribute from each choice set, the last eight attributes (numbered 9 - 16) would have been tied for last place.

Although this configuration does not perfectly rank order all 16 attributes, it comes very close to achieving its goal.  For example, the most preferred attribute #1 is selected five times, followed by attribute #2 that is selected 4 times, and so on.  Had we asked for second best or equivalently second worst, we would have reproduced our rankings of all the attributes.  Of course, we could have saved ourselves a lot of time and effort by simply asking for a rank ordering of all 16 attributes in the first place (i.e., select best of 16, best of remaining 15, etc.).

Nevertheless, 20 blocks might be somewhat demanding for a respondent to complete in a single interview, so instead, each respondent would be given only a handful of blocks; too few to obtain anything like the stable rank ordering of the attributes achieved with a set of 20 balanced blocks.  As a result, we need hierarchical Bayes estimation to "borrow strength" from the other respondents in order to provide individual-level estimates.  Explaining exactly how this is accomplished requires some time, but in the end we would discover that hierarchical Bayes needs to make some type of simplifying assumption concerning the distribution of attribute weights across all the respondents, most often, that each respondent comes from the same multivariate normal distribution of attribute weights.  That is, one way to "borrow" from the other respondents is to assume that all the respondents are alike in that they all belong to the same population of respondents with a common set of mean attribute values.
 
We are not claiming that every individual assigns the same value or weight to every attribute, but that the weights given by everyone in the group can be described by a multivariate normal distribution with a common vector of means.  If we let individual respondents be represented by the subscript h, then we might use the following notation to indicate that the group of individual attribute weights came from the same multivariate normal distribution:   βh ~ Normal(β , Σβ).  Clearly, this would not be the case if our respondents came from different segments.  Thus, in order to run the hierarchical Bayes we assume that all the respondents come from a common multivariate normal distribution, and then in order to run a segmentation, we claim that the respondents actually belong to different segments from different distributions.  Obviously, those advocating segmentation using the estimates from a hierarchical Bayes model seem to have forgotten the underlying assumptions that they made in order to obtain the estimates in the first place.

Even more troubling, the approach is self-defeating.  Individual estimates from a hierarchical Bayes are pooled or weighted averages of individual and group effects.  We call this Bayesian shrinkage.  It might help to think of it as a form of regression to the mean where predictions for extremes scores are "pulled" toward the regression line or moving mean.  If segmentation was your ultimate objective, you would be better served by incorporating multiple groups in your hierarchical model.  But this would not be a two-stage clustering strategy.

In a previous post, I showed how to use the rhierMnlRwMixture function from the R package bayesm.  The "Mixture" in the function name refers to the treatment of heterogeneity in the hierarchical model.  In the prior R code specifying the function for that example, the number of components used in the normal mixture was set to one (ncomp=1), meaning that we were assuming a single multivariate normal distribution.  Had we believed that there were segments, we could have set ncomp equal to the number of segments.  The book, Bayesian Statistics and Marketing, provides a formal overview of this approach.  Google Books provides online much of the relevant text covering "mixtures of normals" from Chapters 3.9 and 5.5.

Wednesday, March 6, 2013

Let's Do Some Hierarchical Bayes Choice Modeling in R!

It can be difficult to work your way through hierarchical Bayes choice modeling.  There is just too much new to learn.  If nothing else, one gets lost in all ways that choice data can be collected and analyzed.  Then there is all this output from the estimation procedure that you have probably never seen before.  What is this Markov Chain Monte Carlo (MCMC)?  What if I don't want to become a Bayesian?  How does one get started? 
We can reduce the steepness of the curve by learning how to use one function from the R package bayesm to analyze one type of choice data.  Hopefully, once you have stepped through a simple example, you will no longer be completely confused when reading introductions and tutorials.  My claim is not that you will understand, but that you will stop feeling so overwhelmed.

Your choice:  Brand A at some $ versus Brand B at some $ vs. No Thank You

This is a common choice setting.  The researcher wants to learn about price sensitivity between two brands in the market.  Why not generate a number of choice sets with both brands at varying prices along with the option to refuse to buy either brand.  The last option is needed because customers will leave the market if prices are too high.  We will keep it simple by using three price points for Brand A and three price points for Brand B and crossing the two pricing factors.  That is, we will use a complete factorial design with the 3x3=9 cells obtained from the table with Brand A prices as the rows and Brand B prices as the columns.

Below is what I had in mind.  Nine choice sets with three alternatives in each choice set.  Price 1 is the lowest price, and Price 3 is the highest price.  The price points can be different for the two brands because we will be estimating separate effects for Brand A's and Brand B's prices (called alternative-specific effects).  The proportions show the likelihood of a respondent selecting each alternative in each choice set.  You should note that Brand A seems to be the preferred brand with higher choice probabilities, except in the Choice Set #7 where Brand A is at its highest price and Brand B is at its lowest price.  You should also notice that this is a main-effects design with the likelihood of selecting Brand B remaining the same at each price level of Brand A.  That is, the pattern "0.3 - 0.2 - 0.1" is repeated for each of Brand A's price levels.  A similar pattern can be seen for Brand A's prices within Brand B's price levels (0.5 - 0.4 - 0.3).

Choice
Pricing Level
Proportion Selecting Option
Set
Brand A
Brand B
Brand A
Brand B
Neither
1
Price 1
Price 1
0.5
0.3
0.2
2
Price 1
Price 2
0.5
0.2
0.3
3
Price 1
Price 3
0.5
0.1
0.4
4
Price 2
Price 1
0.4
0.3
0.3
5
Price 2
Price 2
0.4
0.2
0.4
6
Price 2
Price 3
0.4
0.1
0.5
7
Price 3
Price 1
0.3
0.3
0.4
8
Price 3
Price 2
0.3
0.2
0.5
9
Price 3
Price 3
0.3
0.1
0.6


Randomly Generating the Data So You Can Step Through the Analysis

Choice modeling is an "as if" statistical model.  It is assumed that people behave as if they assigned utility to the attribute levels varied in a choice design.  Products are attribute bundles, so the value of the product is a function of the utilities of its component attribute levels.  One should be able to look at the proportions in the above table and infer what utilities would have produced such data.  That is, Brand A tends to be selected more often than Brand B, therefore Brand A must have a higher utility than Brand B.  Similarly, one can observe price sensitivity in the decreasing proportions selecting each brand as that brand's price increases.

My goal was to keep it simple with a 3x3 complete factorial with varying choice set probabilities that would "reveal" the underlying utilities.  Here is the R-code to generate 100 respondents making choices from the above nine sets.  This code was written to be easy to follow.  It creates a NULL object "choice" outside of the loop to store the dataset.  The loop repeats 100 times by row binding a new respondent to choice.  Data for only 100 respondents were created because I wanted to minimize the run time for the hierarchical Bayes estimation.  Each of the nine choice sets is called a cell.  For each loop all nine cells are reset to NULL, and a new choice is made using the probabilities from the above table. 

 # set seed in order to replicate
  
 set.seed(362013)
 #create structure for choice data
 choice<-NULL
  
 #simulates 100 respondents and 9 choice sets
 for (i in 1:100) {
   cell1<-NULL
   cell2<-NULL
   cell3<-NULL
   cell4<-NULL
   cell5<-NULL
   cell6<-NULL
   cell7<-NULL
   cell8<-NULL
   cell9<-NULL
  
   cell1<-rbind(cell1,which.max(rmultinom(1, size = 1, prob = c(0.5,0.3,0.2))))
   cell2<-rbind(cell2,which.max(rmultinom(1, size = 1, prob = c(0.5,0.2,0.3))))
   cell3<-rbind(cell3,which.max(rmultinom(1, size = 1, prob = c(0.5,0.1,0.4))))
   cell4<-rbind(cell4,which.max(rmultinom(1, size = 1, prob = c(0.4,0.3,0.3))))
   cell5<-rbind(cell5,which.max(rmultinom(1, size = 1, prob = c(0.4,0.2,0.4))))
   cell6<-rbind(cell6,which.max(rmultinom(1, size = 1, prob = c(0.4,0.1,0.5))))
   cell7<-rbind(cell7,which.max(rmultinom(1, size = 1, prob = c(0.3,0.3,0.4))))
   cell8<-rbind(cell8,which.max(rmultinom(1, size = 1, prob = c(0.3,0.2,0.5))))
   cell9<-rbind(cell9,which.max(rmultinom(1, size = 1, prob = c(0.3,0.1,0.6))))
  
  row<-cbind(cell1,cell2,cell3,cell4,cell5,cell6,cell7,cell8,cell9)
  choice<-rbind(choice,row)
 }
  
 choice
 apply(choice,2,table)
 choice_df<-data.frame(cbind(1:100,choice))
 names(choice_df)<-c("id","set1","set2","set3","set4","set5","set6","set7","set8","set9")
  
 choice_df
  

I used the R function rmultinom to generate the multinomial choice using the table proportions.  The resulting dataset has the following format, showing each respondent as a row and the alternative that was selected in each choice set (1=Brand A, 2=Brand B, and 3=Neither).

id
set1
set2
set3
set4
set5
set6
set7
set8
set9
1
2
2
1
3
1
3
1
3
2
2
1
3
1
1
3
3
1
2
3
3
1
3
3
3
3
1
1
3
1
4
3
2
1
1
3
1
1
3
1
5
1
1
3
1
1
1
1
3
3


One should always begin by looking at the counts for each of the choice sets.  Had we not known how these data were generated, we could still observe a pattern.  Brand A is selected more often than Brand B.  Price sensitivity for Brand A is clear with Price 1 > Price 2 > Price 3.  Although there is some price inconsistency for Brand B between Sets 4 and 5, the marginal totals indicate that Brand B was selected 70 times at Price 1 and 60 times at Price 2.  Exploratory analysis of the raw counts is always useful, however, it becomes a more difficult task when the design is not balanced (e.g., when attribute levels do not appear equally often or when attributes are correlated).

Option
set1
set2
set3
set4
set5
set6
set7
set8
set9
1
58
51
58
47
39
41
37
35
46
2
26
21
8
15
18
9
29
21
5
3
16
28
34
38
43
50
34
44
49


On to the Design Matrices

So far, we have a data frame with the categorical dependent variable called choice. Now we need to attach a design matrix for each of the nine choice sets.  It might be easiest to explain by presenting the R code.  Design #1 is for the first choice set.  There are 6 columns for the 6 parameters to be estimated in this design.  That is, there are 2 dfs for the three alternatives, plus 2 dfs for Brand A price and 2 dfs for Brand B price.  I have used dummy coding so that the third alternative, Neither, is the base with only 0s.  The columns, in order, represent the effects for Brand A, Brand B, Brand A Price 1, Brand A Price 2, Brand B Price 1, and Brand B Price 2.  Brand A and B appear in every choice set; only the effects for price vary.

 design1<-matrix(c(
  1,0,1,0,0,0,
  0,1,0,0,1,0,
  0,0,0,0,0,0),
 nrow=3,ncol=6,byrow=TRUE)
 design1
  
 design2<-matrix(c(
  1,0,1,0,0,0,
  0,1,0,0,0,1,
  0,0,0,0,0,0),
 nrow=3,ncol=6,byrow=TRUE)
 design2
  
 design3<-matrix(c(
  1,0,1,0,0,0,
  0,1,0,0,0,0,
  0,0,0,0,0,0),
 nrow=3,ncol=6,byrow=TRUE)
 design3
  
 design4<-matrix(c(
  1,0,0,1,0,0,
  0,1,0,0,1,0,
  0,0,0,0,0,0),
 nrow=3,ncol=6,byrow=TRUE)
 design4
  
 design5<-matrix(c(
  1,0,0,1,0,0,
  0,1,0,0,0,1,
  0,0,0,0,0,0),
 nrow=3,ncol=6,byrow=TRUE)
 design5
  
 design6<-matrix(c(
  1,0,0,1,0,0,
  0,1,0,0,0,0,
  0,0,0,0,0,0),
 nrow=3,ncol=6,byrow=TRUE)
 design6
  
 design7<-matrix(c(
  1,0,0,0,0,0,
  0,1,0,0,1,0,
  0,0,0,0,0,0),
 nrow=3,ncol=6,byrow=TRUE)
 design7
  
 design8<-matrix(c(
  1,0,0,0,0,0,
  0,1,0,0,0,1,
  0,0,0,0,0,0),
 nrow=3,ncol=6,byrow=TRUE)
 design8
  
 design9<-matrix(c(
  1,0,0,0,0,0,
  0,1,0,0,0,0,
  0,0,0,0,0,0),
 nrow=3,ncol=6,byrow=TRUE)
 design9
  

Creating a List Structure for bayesm

We will be using the function rhierMnlRwMixture from the R package bayesm to estimate the parameters of our choice model, and bayesm uses a list structure to store the y (choices), the X (design matrices), and the z (covariates).  We will not be discussing covariates in this post, but you should be aware that bayesm offers much greater capability than we will be covering.  For instance, the function's name, r-hier-Mnl-Rw-Mixture, suggests a hierarchical multinomial random walk with mixtures of normal distributions.  We, however, will be keeping it simple.

The following code will create the list structures that bayesm needs.  It is somewhat more general than we need for this example given that every individual sees the same nine choice sets.  Briefly, id is used to select only those rows with the same id value.  This allows us to have multiple rows for each respondent, as we might if the data were in long rather than wide format.  The list will be stored in lgtdata.  Then, we loop through each respondent to create a list of 100 with y and X.

 id=levels(as.factor(choice_df[,1]))
 lgtdata=NULL
  
 for (i in 1:100)
 {
  respdata=choice_df[choice_df[,1]==id[i],]
  ty<-NULL
  tdesign<-NULL
  ty=c(ty,respdata$set1)
  tdesign=rbind(tdesign,design1)
  ty=c(ty,respdata$set2)
  tdesign=rbind(tdesign,design2)
  ty=c(ty,respdata$set3)
  tdesign=rbind(tdesign,design3)
  ty=c(ty,respdata$set4)
  tdesign=rbind(tdesign,design4)
  ty=c(ty,respdata$set5)
  tdesign=rbind(tdesign,design5)
  ty=c(ty,respdata$set6)
  tdesign=rbind(tdesign,design6)
  ty=c(ty,respdata$set7)
  tdesign=rbind(tdesign,design7)
  ty=c(ty,respdata$set8)
  tdesign=rbind(tdesign,design8)
  ty=c(ty,respdata$set9)
  tdesign=rbind(tdesign,design9)
  lgtdata[[i]]=list(y=ty,X=as.matrix(tdesign))
 }
  

To verify that the code functioned as intended, you need to view the elements of lgtdata.  You index the list as lgtdata[[1]] for the first respondent and lgtdata[[100]] for the last respondent.  Here are the data for the first respondent.  The design matrices have been stacked in $X, and the choices have been placed in the vector $y.  Once you tell rhierMnlRwMixture that there are three alternatives per choice set, $X will be subdivided and matched with $y.

$y
[1] 2 2 1 3 1 3 1 3 2

$X
      [,1] [,2] [,3] [,4] [,5] [,6]
 [1,]    1    0    1    0    0    0
 [2,]    0    1    0    0    1    0
 [3,]    0    0    0    0    0    0
 [4,]    1    0    1    0    0    0
 [5,]    0    1    0    0    0    1
 [6,]    0    0    0    0    0    0
 [7,]    1    0    1    0    0    0
 [8,]    0    1    0    0    0    0
 [9,]    0    0    0    0    0    0
[10,]    1    0    0    1    0    0
[11,]    0    1    0    0    1    0
[12,]    0    0    0    0    0    0
[13,]    1    0    0    1    0    0
[14,]    0    1    0    0    0    1
[15,]    0    0    0    0    0    0
[16,]    1    0    0    1    0    0
[17,]    0    1    0    0    0    0
[18,]    0    0    0    0    0    0
[19,]    1    0    0    0    0    0
[20,]    0    1    0    0    1    0
[21,]    0    0    0    0    0    0
[22,]    1    0    0    0    0    0
[23,]    0    1    0    0    0    1
[24,]    0    0    0    0    0    0
[25,]    1    0    0    0    0    0
[26,]    0    1    0    0    0    0
[27,]    0    0    0    0    0    0

Finally We Can Run the Analysis

We need to set a couple of values and pass those values along with the data to the function.  R=20000 is the number of MCMC draws, and keep=10 is the thinning parameter telling the function to keep every 10th draw.

 mcmc=list(R=20000,keep=10)
  
 library(bayesm)
 out=rhierMnlRwMixture(Data=list(p=3,lgtdata=lgtdata),Prior=list(ncomp=1),Mcmc=mcmc)
 plot(out$loglike,type="l")
 trace<-t(apply(out$betadraw,c(2,3),mean))
 matplot(trace, type="l")
 beta.501_2000<-apply(out$betadraw[,,501:2000], 2, mean)
 beta.1001_2000<-apply(out$betadraw[,,1001:2000], 2, mean)
 beta.1501_2000<-apply(out$betadraw[,,1501:2000], 2, mean)
 cbind(beta.501_2000,beta.1001_2000,beta.1501_2000)
  

The rhierMnlRwMixture function took about 10 minutes on my PC to complete the 20,000 draws.  The coefficients for the parameters in our design matrix are stored in out$betadraw, which is a 100 x 6 x 2000 array.  The first two dimensions, the 100 x 6, seem reasonable.  If hierarchical Bayes yields individual-level beta weights, then we should have six coefficients for each of the 100 respondents.  But we do not have one estimate of each coefficient for each respondent.  There were 20,000 draws per respondent from which we kept every 10th for 2000 in total.

I have included a number of lines after rhierMnlRwMixture that require some time to explain because they involve issues about whether the MCMC draws have entered a high-probability region (often called convergence).  Appendix A from the book Bayesian Statistics and Marketing reviews these commands when discussing another example.  This chapter by Rossi and Allenby also describes this same study.  Both readings are challenging, but neither is overly technical.

However, I do want to finish with the last four lines of codes.  I want to examine the aggregate level estimates for my six parameters after averaging over respondents.  If I discard the first 500 draws, I will average draw 501 to 2000.  Thus, the three apply operations discard a different number of initial draws (500, 1000, and 1500).  One of the conditions for entering a high-probability region is that the estimates should not fluctuate widely.  Let's take a look.

 
beta.501_2000
beta.1001_2000
beta.1501_2000
Brand A
-0.08
-0.08
-0.04
Brand B
-1.99
-2.02
-2.01
Brand A, Price 1
0.76
0.74
0.69
Brand A, Price 2
0.11
0.09
0.05
Brand B, Price 1
1.39
1.43
1.43
Brand B, Price 2
1.16
1.19
1.18

We do not see much fluctuation, but do these coefficients reflect what we knew when we generated the data?  Given the dummy coding, the coefficients appear reasonable.  The brands are ordered as expected, given that "Neither" was the base alternative and has a coefficient of zero (Neither = Brand A > Brand B).  The highest price is Price 3, which is also coded 0, therefore price sensitivity seems to be in the correct direction.  But these coefficients do not represent linear effects, and we should be reluctant to reach general conclusions.  We need a simulator to access the impact of changing prices for the two brands.

As you might recall, I began this post with the hope that it would help the reader to run their first hierarchical Bayes choice model using R.  With that in mind, I tried to stay focused on running the analysis.  For those wanting to understand the details, Kenneth Train provides an updated online textbook and some very blurry RealPlayer videos of a 2001 course.  The learning curve is still very steep.  Perhaps with the ability to run the analysis in R, you will be motivated to start the climb.

Friday, March 1, 2013

Counts may be ratio, but not importance

I can see from those of you who have contacted me that there is still some confusion about the claims made by Sawtooth that MaxDiff estimates can be converted to ratio-scale probabilities.  Many of you seem to believe that if attribute A has a MaxDiff score of 20 and attribute B has a MaxDiff score of 10, then A must be twice as important as B.  A quick example should convince you that this is not the case.

We can all agree that weight is measured on a ratio scale.  A 20 pound object is twice as heavy as a 10 pound object.  Below you will find three sets of 10 objects each.  The 10 objects in the three sets have very different weights, but they have the same rank order.  Now, if MaxDiff yielded a ratio scale that measured weight (importance) and summed to 100, we would expect to see the MaxDiff scores in the columns adjacent to the weights.  That is, I simply rescaled the weights of the objects so that they summed to 100 (100/55, 100/228 and 100/145 for each set, respectively).

However, we have a problem.  The three sets would always produce the same maximum difference data.  Construct any choice set of four objects (e.g., B, E, G, and J).  B would be worst, and J would be best in all three sets.  Moreover, this would be true for any choice set with any block size.  MaxDiff uses only rank order information to determine best and worst.  Therefore, regardless of the size of the differences among the weights (importance), we will always get the same maximum difference selections and the same MaxDiff scores.

The MaxDiff claim of ratio scaling refers to the number of times an object is selected from the choice sets.  But selection is determined solely by rank order.  As we have just seen, magnitudes of the differences play no role whatsoever.  When my clients spend the time to work through this example, their conclusion is always the same:  "Why am I spending all this time and money if I am only getting a rank ordering based on stated importance?"  Why indeed?
 
 
Set #1
 
Set #2
 
Set #3
Object
Ounces
MaxDiff
 
Ounces
MaxDiff
 
Ounces
MaxDiff
A
1
1.82
 
1
0.44
 
1
0.69
B
2
3.64
 
2
0.88
 
2
1.38
C
3
5.45
 
5
2.19
 
3
2.07
D
4
7.27
 
7
3.07
 
4
2.76
E
5
9.09
 
12
5.26
 
5
3.45
F
6
10.91
 
21
9.21
 
6
4.14
G
7
12.73
 
42
18.42
 
7
4.83
H
8
14.55
 
43
18.86
 
8
5.52
I
9
16.36
 
45
19.74
 
9
6.21
J
10
18.18
 
50
21.93
 
100
68.97
Sum
55
100.00
 
228
100.00
 
145
100.00