Wednesday, February 27, 2013

The MaxDiff Killer: Rank-Ordered Logit Models

Compared to MaxDiff (Sawtooth Software), ranked-order logit modeling:
  • simplifies data collection without needing additional software to generate experimental designs
  • reduces respondent burden making the task easier and seemingly faster
  • collects more data from individuals (complete ranking data for all alternatives) 
  • eliminates the expense of buying proprietary software
  • enable us to be creative and mimic the purchase process
  • makes data analysis faster and less demanding
  • estimates the impact of both respondent characteristics and alternative-specific variables
  • opens the door to using other R packages for additional analysis and graphical displays
I will not repeat my critique of MaxDiff scaling published in a previous post with the pithy title "Warning: Sawtooth's MaxDiff Is Nothing More Than a Technique for Rank Ordering Features!"   Instead, I want to bring to your attention an alternative to MaxDiff with all the positive features I have just listed. 

An Example Using Brand Preference

We start with an example from the mlogit package by Yves Croissant.  We will be looking at Section 2.8 from the mlogit vignette.  My goal is rather limited because I only wish to introduce and demonstrate the technique.  As with all modeling, there are issues raised by the violation of assumptions that require more complex analyses.  In fact, rank-ordered logit yields important insights into marketing behavior when it is extended to handle the complications of the marketplace.  However, we will begin with trying to understand the basic model and showing how this basic model might be applied in marketing research.

The mlogit vignette illustrates rank-ordered logit using data from a gaming platform study with 91 Dutch students.  A preprint of the article describing this study can be obtained here.  If you read this preprint, you ought to note that its focus is latent class modeling of the underlying heterogeneity in ranking capabilities.  However, since this is an introduction, we will concentrate on the specific study and the sections talking about ROL (ranked-order logit).

Although we are not provided with all the details of the data collection procedure, it is clear that students were shown six different gaming platforms, including a regular PC.  They were told to assume that they did not have a game platform but were in the market considering a purchase.  Student were asked to rank order the six games, and then indicated which of the six platforms they owned and the average number of hours gaming each week.  It is easy to image how one might embed this example within the context of a realistic marketplace where one's first-choice might not be available.  At first, all the alternatives are shown.  The most preferred alternative is selected.  It is removed from the list, and another choice is made from the remaining alternatives until only one is left standing.

Thus, it is as if each respondent had been presented with five different choice scenarios.  The first choice set contained all six alternatives.  The second choice set included only the remaining five alternatives.  Then there was a four alternative choice set, followed by a three alternative choice and finally a two alternative set.  This is rank-order logit modeling, the transformation of the rankings of k alternatives into k-1 choice sets that can be analyzed using standard choice modeling techniques.

I suggest at this point that you open R, install mlogit, go to the help files, open the directory and the file mlogit.R with the commands from the mlogit vignette.  Code chunks numbers 29 to 31 will run the ranked-order logit analysis shown in the vignette.  Because the data has a hierarchical structure with multiple choice sets for each respondent, one can "shape" the data file into a wide (row = respondent) or a long (row = alternatives) format.  In either case, the mlogit.data function is needed to transform the rankings into choice sets (sometimes called "exploding" the rankings).  Finally, the mlogit function runs the analysis using a formula that predicts choice for each of the exploded choice sets.  PC was selected as the base alternative.  One of the predictors, own, varies across the alternatives depending on whether or not the student owns that alternative.  The other two variables, hours and age, vary over respondents but not alternatives.

Now, how does one interpret the coefficients?  I have copied the printout from the mlogit vignette in order to make this discussion easier to follow.

Coefficients :
Estimate Std. Error t-value Pr(>|t|)
GameBoy:(intercept) 1.570379 1.600251 0.9813 0.3264288
GameCube:(intercept) 1.404095 1.603483 0.8757 0.3812185
PSPortable:(intercept) 2.583563 1.620778 1.594 0.1109302
PlayStation:(intercept) 2.278506 1.606986 1.4179 0.156227
Xbox:(intercept) 2.733774 1.536098 1.7797 0.0751272 .
own 0.963367 0.190396 5.0598 4.20E-07 ***
GameBoy:hours -0.235611 0.05213 -4.5197 6.19E-06 ***
GameCube:hours -0.187070 0.051021 -3.6665 0.0002459 ***
PSPortable:hours -0.233688 0.049412 -4.7294 2.25E-06 ***
PlayStation:hours -0.129196 0.044682 -2.8915 0.0038345 **
Xbox:hours -0.173006 0.045698 -3.7858 0.0001532 ***
GameBoy:age -0.073587 0.078630 -0.9359 0.3493442
GameCube:age -0.067574 0.077631 -0.8704 0.3840547
PSPortable:age -0.088669 0.079421 -1.1164 0.2642304
PlayStation:age -0.067006 0.079365 -0.8443 0.3985154
Xbox:age -0.066659 0.075205 -0.8864 0.3754227


At first glance, the coefficients differ in sign, in size, and in significance.  The intercepts indicate the relative standing of the five gaming platforms compared to the base alternative (PC) controlling for the other variables in the model.  None of these intercepts are significant.  However, it makes a difference if you own a platform.  Platform owners are more likely to select their own platform from a choice set (remember ownership varies over alternatives).  In addition, how many hours that one spends gaming has an impact.  The negative coefficients suggest that heavier usage is associated with preference for a PC over a gaming platform.

Unfortunately, these are not linear models, and interpreting coefficients can get a little difficult. An article by Allison and Christakis called "Logit Models for Sets of Ranked Items" provides the kind of overview that someone from the social sciences might find helpful.

These coefficients are not what we would have found had we not included ownership and usage in the equation.  Xbox, PC, and PlayStation tend to be chosen more often than the other three game platforms.  Had we included only the intercepts, we would have seen significant negative coefficients for PSPortable, GameCube, and GameBoy.  This is what we find when we look at the counts, the following rank ordering Xbox>PC>PlayStation>PSPortable>GameCube>GameBoy with a sizable gap between PlayStation and PSPortable.  Nevertheless, this is not the end of the story.  A good amount of this variation can be attributed to differences in ownership and usage.  Perhaps you can see where this is headed.  We are no longer simply measuring the utility of the alternatives.  We have extended the model to include predictors of choice. 

Where are the individual estimates?  MaxDiff produces individual estimates using the Sawtooth CBC/HB software.  Rank-ordered logit, on the other hand, has the complete rankings of all the alternatives from every respondent.  I don't need estimates because I have the data.  Specifically, once I know the rank order of these six alternatives for any individual, I can program the computer to select the best and worst from every possible combination of k alternative presented in sets of n.  We only turn to partial ranking and estimation when complete ranking is not possible. In this case complete ranking is easier than a series of best-worst choice sets.  Moreover, it feels faster because the task picks up pace over time as the number of alternatives decreases. 

Of course, we are not restricted to fitting only rank-ordered logit models.  Rankings are data, and there are many other R packages and statistical procedures for working with rank orders.  Many of these techniques were covered in introductory statistics classes.  Rankings are constrained to sum to a constant, so we are not allowed to treat them as if they were just another number.  But that does not stop us from running most analyses.  For example, although one needs to be careful when calculating distance matrices for ranking data, there is no reason why we cannot cluster respondents.  The same is true for correspondence analysis, biplots, and multidimensional scaling.

Each researcher will need to decide the value of rank ordering alternatives as a data collection device.  Our gaming example may seem reasonable if we think of it as brand substitute or out-of-stock decision making.  It may even make sense in some situations to ask for the ordering of unfamiliar brands if we believe that respondents are using name or image to complete the rankings.  Many argue that feature importance can be ranked.  It can be situated as in the following question.  "Assume that you driving home from a late night party and want to stop at a restaurant for a snack, which one of the following features is the most important to you?"  Respondents successively remove features until a complete ranking is achieved.  It is self-stated importance with all the problems and limitations associated with introspection.  However, the question is situated and possibly vivid enough to retrieval actual past occurrences of the behavior we are seeking to measure.  At least, those advocating and using this technique acknowledge its limitations and are actively testing assumptions and offering extensions when those assumptions are questioned. 


3 comments:

  1. Joel, I enjoy your blog and am happy to say that I have learned a thing or two in reading it. Please keep up the interesting work.

    In seeming to describing maxdiff as nothing more than a glorified ranking technique, though, I wonder if you might be overlooking one of its primary claimed advantages, particularly in light of your earlier post about constructed vs. recalled preferences. Namely, maxdiff starts from the position that respondents are more likely to be able to identify which pair of alternatives are the furthest apart on some latent scale of importance or desirability than they are to be able to consistently rank all possible possible alternatives, particularly as the number of alternatives grows. In the example you give above, is it reasonable to expect that respondents are able to meaningfully and consistently rank game consoles to which they may be largely indifferent? I would tend to accept the advantage of a maxdiff in this case, but I am curious to know your thoughts.

    As a disclaimer, I have not (yet) used maxdiff, and I do not use Sawtooth.

    ReplyDelete
    Replies
    1. Thank you for your comments. I too was concerned about the ability of respondents to rank order unfamiliar alternatives. However, respondents often complete the ranking exercise, especially when they are instructed to answer even if they are unsure. In this case, the preference ordering can be divided into three sections consisting of the consideration set, the unfamiliar set, and the rejection set.

      As I noted in my post, extensions to the basic rank-ordered logit attempt to incorporate partial rankings when such distinctions cannot be made. For example, a partial ranking of our six games would allow respondents to rank order only the subset of games with which they were familiar, so their six scores might be 1, NA, 3, 2, NA, and 4. MaxDiff, on the other hand, simply ignores this issue. Alternatives that are seldom selected as best or worst receive MaxDiff scores in the middle of the scale, which is what we see with the complete ranking task. My problem with MaxDiff is not that it produces a ranking. My problem is that MaxDiff is a cumbersome and expensive way to rank a set of alternatives.

      Delete
  2. Hi Joel, very interesting perspective! In practical terms, how can the data be collected using Survey Monkey for example? Seems the logic involved does require powerful and expensive survey software to collect the data

    ReplyDelete