Wednesday, February 27, 2013

The MaxDiff Killer: Rank-Ordered Logit Models

Compared to MaxDiff (Sawtooth Software), ranked-order logit modeling:
  • simplifies data collection without needing additional software to generate experimental designs
  • reduces respondent burden making the task easier and seemingly faster
  • collects more data from individuals (complete ranking data for all alternatives) 
  • eliminates the expense of buying proprietary software
  • enable us to be creative and mimic the purchase process
  • makes data analysis faster and less demanding
  • estimates the impact of both respondent characteristics and alternative-specific variables
  • opens the door to using other R packages for additional analysis and graphical displays
I will not repeat my critique of MaxDiff scaling published in a previous post with the pithy title "Warning: Sawtooth's MaxDiff Is Nothing More Than a Technique for Rank Ordering Features!"   Instead, I want to bring to your attention an alternative to MaxDiff with all the positive features I have just listed. 

An Example Using Brand Preference

We start with an example from the mlogit package by Yves Croissant.  We will be looking at Section 2.8 from the mlogit vignette.  My goal is rather limited because I only wish to introduce and demonstrate the technique.  As with all modeling, there are issues raised by the violation of assumptions that require more complex analyses.  In fact, rank-ordered logit yields important insights into marketing behavior when it is extended to handle the complications of the marketplace.  However, we will begin with trying to understand the basic model and showing how this basic model might be applied in marketing research.

The mlogit vignette illustrates rank-ordered logit using data from a gaming platform study with 91 Dutch students.  A preprint of the article describing this study can be obtained here.  If you read this preprint, you ought to note that its focus is latent class modeling of the underlying heterogeneity in ranking capabilities.  However, since this is an introduction, we will concentrate on the specific study and the sections talking about ROL (ranked-order logit).

Although we are not provided with all the details of the data collection procedure, it is clear that students were shown six different gaming platforms, including a regular PC.  They were told to assume that they did not have a game platform but were in the market considering a purchase.  Student were asked to rank order the six games, and then indicated which of the six platforms they owned and the average number of hours gaming each week.  It is easy to image how one might embed this example within the context of a realistic marketplace where one's first-choice might not be available.  At first, all the alternatives are shown.  The most preferred alternative is selected.  It is removed from the list, and another choice is made from the remaining alternatives until only one is left standing.

Thus, it is as if each respondent had been presented with five different choice scenarios.  The first choice set contained all six alternatives.  The second choice set included only the remaining five alternatives.  Then there was a four alternative choice set, followed by a three alternative choice and finally a two alternative set.  This is rank-order logit modeling, the transformation of the rankings of k alternatives into k-1 choice sets that can be analyzed using standard choice modeling techniques.

I suggest at this point that you open R, install mlogit, go to the help files, open the directory and the file mlogit.R with the commands from the mlogit vignette.  Code chunks numbers 29 to 31 will run the ranked-order logit analysis shown in the vignette.  Because the data has a hierarchical structure with multiple choice sets for each respondent, one can "shape" the data file into a wide (row = respondent) or a long (row = alternatives) format.  In either case, the mlogit.data function is needed to transform the rankings into choice sets (sometimes called "exploding" the rankings).  Finally, the mlogit function runs the analysis using a formula that predicts choice for each of the exploded choice sets.  PC was selected as the base alternative.  One of the predictors, own, varies across the alternatives depending on whether or not the student owns that alternative.  The other two variables, hours and age, vary over respondents but not alternatives.

Now, how does one interpret the coefficients?  I have copied the printout from the mlogit vignette in order to make this discussion easier to follow.

Coefficients :
Estimate Std. Error t-value Pr(>|t|)
GameBoy:(intercept) 1.570379 1.600251 0.9813 0.3264288
GameCube:(intercept) 1.404095 1.603483 0.8757 0.3812185
PSPortable:(intercept) 2.583563 1.620778 1.594 0.1109302
PlayStation:(intercept) 2.278506 1.606986 1.4179 0.156227
Xbox:(intercept) 2.733774 1.536098 1.7797 0.0751272 .
own 0.963367 0.190396 5.0598 4.20E-07 ***
GameBoy:hours -0.235611 0.05213 -4.5197 6.19E-06 ***
GameCube:hours -0.187070 0.051021 -3.6665 0.0002459 ***
PSPortable:hours -0.233688 0.049412 -4.7294 2.25E-06 ***
PlayStation:hours -0.129196 0.044682 -2.8915 0.0038345 **
Xbox:hours -0.173006 0.045698 -3.7858 0.0001532 ***
GameBoy:age -0.073587 0.078630 -0.9359 0.3493442
GameCube:age -0.067574 0.077631 -0.8704 0.3840547
PSPortable:age -0.088669 0.079421 -1.1164 0.2642304
PlayStation:age -0.067006 0.079365 -0.8443 0.3985154
Xbox:age -0.066659 0.075205 -0.8864 0.3754227


At first glance, the coefficients differ in sign, in size, and in significance.  The intercepts indicate the relative standing of the five gaming platforms compared to the base alternative (PC) controlling for the other variables in the model.  None of these intercepts are significant.  However, it makes a difference if you own a platform.  Platform owners are more likely to select their own platform from a choice set (remember ownership varies over alternatives).  In addition, how many hours that one spends gaming has an impact.  The negative coefficients suggest that heavier usage is associated with preference for a PC over a gaming platform.

Unfortunately, these are not linear models, and interpreting coefficients can get a little difficult. An article by Allison and Christakis called "Logit Models for Sets of Ranked Items" provides the kind of overview that someone from the social sciences might find helpful.

These coefficients are not what we would have found had we not included ownership and usage in the equation.  Xbox, PC, and PlayStation tend to be chosen more often than the other three game platforms.  Had we included only the intercepts, we would have seen significant negative coefficients for PSPortable, GameCube, and GameBoy.  This is what we find when we look at the counts, the following rank ordering Xbox>PC>PlayStation>PSPortable>GameCube>GameBoy with a sizable gap between PlayStation and PSPortable.  Nevertheless, this is not the end of the story.  A good amount of this variation can be attributed to differences in ownership and usage.  Perhaps you can see where this is headed.  We are no longer simply measuring the utility of the alternatives.  We have extended the model to include predictors of choice. 

Where are the individual estimates?  MaxDiff produces individual estimates using the Sawtooth CBC/HB software.  Rank-ordered logit, on the other hand, has the complete rankings of all the alternatives from every respondent.  I don't need estimates because I have the data.  Specifically, once I know the rank order of these six alternatives for any individual, I can program the computer to select the best and worst from every possible combination of k alternative presented in sets of n.  We only turn to partial ranking and estimation when complete ranking is not possible. In this case complete ranking is easier than a series of best-worst choice sets.  Moreover, it feels faster because the task picks up pace over time as the number of alternatives decreases. 

Of course, we are not restricted to fitting only rank-ordered logit models.  Rankings are data, and there are many other R packages and statistical procedures for working with rank orders.  Many of these techniques were covered in introductory statistics classes.  Rankings are constrained to sum to a constant, so we are not allowed to treat them as if they were just another number.  But that does not stop us from running most analyses.  For example, although one needs to be careful when calculating distance matrices for ranking data, there is no reason why we cannot cluster respondents.  The same is true for correspondence analysis, biplots, and multidimensional scaling.

Each researcher will need to decide the value of rank ordering alternatives as a data collection device.  Our gaming example may seem reasonable if we think of it as brand substitute or out-of-stock decision making.  It may even make sense in some situations to ask for the ordering of unfamiliar brands if we believe that respondents are using name or image to complete the rankings.  Many argue that feature importance can be ranked.  It can be situated as in the following question.  "Assume that you driving home from a late night party and want to stop at a restaurant for a snack, which one of the following features is the most important to you?"  Respondents successively remove features until a complete ranking is achieved.  It is self-stated importance with all the problems and limitations associated with introspection.  However, the question is situated and possibly vivid enough to retrieval actual past occurrences of the behavior we are seeking to measure.  At least, those advocating and using this technique acknowledge its limitations and are actively testing assumptions and offering extensions when those assumptions are questioned. 


Tuesday, February 19, 2013

When Discrete Choice Becomes a Rating Scale: Constant Sum Allocation

Why limit our discrete choice task to next purchase when we can ask about next ten purchases?  It does not seem appropriate to restrict choice modeling to one selection only when repeat purchases from the same choice set are made by the same individual buying different products at different times.  Similarly, a purchasing agent or a company buyer will make multiple purchases over time for different people.  Why not use choice modeling for such multiple purchases?

Everyone seems to be doing it, although they might use different names, calling it a constant sum, a chip allocation, or simply shares.  For example, the R package ChoiceModelR allows the dependent variable to be a proportion or share.  Statistical Innovations' Latent Gold Choice software permits constant sum data.  Sawtooth Software prefers to call it chip allocation in its CBC/HB system because one can "normalize" whenever numbers have been assigned to the alternatives before analyzing the data.

A specific example might be helpful.  Suppose that we were conducting a discrete choice study varying the size and price of six different coffee menu items, we might use the following directions.
"Please assume that every week day you buy your coffee from the same small vendor offering only six possible selections.   I will give you a menu listing six different items plus the option of getting your coffee somewhere else.  I would like you to tell me how many of the different alternatives you would select over the next two weeks?  It is as if you had 10 chips to allocate across the seven alternatives.  If you would buy the same coffee every day, you would place 10 on that one alternative.  If every day you would get your coffee somewhere else, you would place 10 on the 'Get Somewhere Else' alternative.  You are free to allocate the 10 chips across the seven alternatives in any way you wish as long as it shows what you would buy or not buy over the next 10 days."
On the surface, it makes sense to treat the choice exercise as yielding not one choice but ten separate choices.  It is as if the respondent made ten independent purchases, one each day over a two week period.  That is, we could pretend that the respondent actually saw 10 different choice sets, all with the same attribute levels, and made 10 separate choices.  You do not need to analyze the data in this manner, but it is probably the most straightforward way of thinking about the task and the resulting choice data.  Thus, the data remain essentially the same regardless of whether you analyze the numbers as replicate weights (Latent Gold Choice) or introduce a "total task weight" (Sawtooth CBC/HB).

If you have read my last post on incorporating preference construction into the choice modeling process, you may have already guessed that people are probably not very good at predicting their future behavior.  Diversification bias is one of the problems respondents encounter.  When individuals are asked to decide what they intend to consume over the course of several time periods in the future, their selections are more varied than what they actually will select when observed over the same time periods.  Thus, going to a grocery store once a week and making all your purchases for an entire week of dinners will produce more variety than deciding what you are in the mood for each evening and making separate trips to the store.  Fortunately, we know a great deal about how we simulate future events and predict our preferences for the outcomes of those simulations.  As retrospection is remembering the past, prospection is experiencing the future.  Unsurprisingly, systematic errors limit what we can learn about actual future behavior from today's intentions.

This is another example of choice architecture, which was discussed in the previous post.  Choice is a task, and small changes in the task may have a major impact on the results.  We could stop at this point and reach the conclusion that asking about next 10 purchases only makes sense in those situations where future choices are all made at one point in time (not a very common occurrence).  Clearly, it makes little sense to ask respondents to participate in a choice study whose findings cannot be generalized to the marketplace of ultimate interest.  However, we do not wish to overlook another important difference between assigning 10 points among the alternatives and asking respondents to perform 10 different choice tasks.  That is, diversification bias occurs when we ask each respondent to complete a Monday choice task, then a Tuesday choice task, and so on.  This was not our choice task in the constant sum allocation.

When respondents are debriefed, they do not report that they spent the time to think about each of the 10 days separately.  They do not imagine filling in their daily menu planner.  Instead, they talk about the relative preferences for the alternatives in the choice set.  If only one alternative is acceptable, it gets 10 points.  If two alternatives are equally desired, then each receives a score of five.  The researcher begins believing that this was a choice study, but respondents simplify the task by treating it as a typical constant sum and transforming it into relative preference ratings. 

One might argue that all this improves our research because we are gathering more information about the relative preference standing of the alternatives.  However, if our goal is making money by selling coffee, it does not help to add a menu item that is never purchased because it is always a close second-place finisher.  Moreover, the constant sum leads the respondent astray to make distinctions and consider attributes that would not have occurred spontaneously when actual purchases were made.

There is an element of problem solving in choice modeling.  The respondent is presented with the choice task.  They are given the instructions, the choice sets, and told how to provide their response.  I have deliberately avoided showing the choice sets in order not to introduce an additional level of complexity into this post (e.g., the dynamic effects of repeated presentations of what might be complex choice descriptions).  But even with this somewhat abridged description, we can recognize that the choice task defines the rules of the game.

Preferences are constructed.  The choice task elicits memories of past experiences in similar situations.  This alone may be sufficient to generate a response, or additional problem solving may be needed, sometimes a good amount of simplification and sometimes extensive restructuring of the information provided.  It depends on the choice task and the respondent.  As market researchers, we must make the effort to ensure that our experimental game matches the game played by consumers in the marketplace.

Friday, February 15, 2013

Incorporating Preference Construction into the Choice Modeling Process

Statistical modeling often begins with the response generation process because data analysis is a combination of mathematics and substantive theory.  It is a theory of how things work that determines how we ought to collect and analyze our data.

A good example of this type of statistical modeling was the accurate predictions made by several political scientists in the 2012 presidential election.  This is how Simon Jackman, author of the pscl R package, described his work for the Huffington Post.
"The 'quant triumph' here is more about the way we've approached the problem: a blend of political insight, statistical modeling, a ton of code and a fire hose of data from the pollsters. Since at least four or five of us will claim 51/51 (subject to FL) this cycle, it's not 'a Nate thing.' It's a data-meets-good-model, scientific-method thing."
We start with the science.  If we have a theory of the data generation process, we use that knowledge to guide our data collection and statistical modeling.  I recognize that this is not the only approach to analyzing data.  When the data mechanism is unknown, we must rely on exploratory techniques and algorithmic modeling as Breiman argues in his two cultures of statistical modeling paper.  However, choice modeling is well-grounded with extensive empirical findings from behavioral economics and a theoretical foundation from psychology.  We ought to use that knowledge to avoid traps and missteps.

How Does Human Judgment and Decision Making Work?

In the decision making literature we can identify two incommensurate worldviews of how humans form judgments and make choices.  On the one hand, we have those who seem to believe that attribute preferences are well-formed and established, waiting to be retrieved from memory.  To be clear, we are speaking about specific attributes that might be varied in a conjoint study like the 50 attributes with 167 levels in the Courtyard by Marriot Study.  We will refer to this viewpoint as "revealed preference" because it holds that detailed and extensive preferences are present somewhere in memory and are uncovered by our questioning. 

On the other hand, many of us do not see preferences as well-formed or enduring.  Preferences are constructed "on the fly" or in the situation based on the conditions present at the time and past experiences in similar contexts.  That is, one does not retrieve a stored preference for each of the 167 feature levels in the above Courtyard conjoint.  Preference construction, like language production, is an adaptive act using whatever prior experience and knowledge is deemed relevant in the present context.  One does not expect stability across data collection procedures unless the same construction process is used each time, and even seemingly minor changes in the survey method can have a major impact on preference measurement.  We will refer to this viewpoint as "constructed preference" for obvious reasons.

These two worldviews lead to very different concerns about data collection and analysis.  The "revealed preference" follower is far less concerned about the reactive effects of the experimental arrangements in conjoint research.  It is not that they deny the possibility of measurement bias.  However, preferences are real and show themselves regardless of whether one uses a self-reported importance, a rating of purchase intent, or a choice from a set of alternatives.

On the other hand, the marketing researcher in the "constructed preference" camp expends a good deal of effort trying to mimic the marketplace as closely as possible.  They worry that the conjoint study has the potential to create experimental task-specific preferences rather than measure preferences that would be constructed in the purchase context.  They know that preferences are constructed in the marketplace, and they wish to replicate those naturally occurring processes so that their finding can be generalized.  Wanting to make statements about what is likely to happen in the real world, they need to be certain that there is a sufficient match between the experimental task and the actual purchase task as experienced by customers.

What Does the Product Manager Know that the Marketing Researcher Doesn't?

The arrangement of products or services impacts purchases by potential customers.  For example, placing the store-brand aspirin at a much lower price next to the comparable national brand on the same shelf elicits higher levels of price sensitivity and lures customers into thinking that national brands are probably not of higher quality after all.  At some point the price reduction gets large enough that it becomes easier for customers to accept the popular belief that both the national brand and the store brand were manufactured at the same place with different labels placed on the two bottles.

We refer to the above effect as framing.  Although I know the sofa never sold for $1000, it is so hard to resist that 50% discount.  It just looks like a better price than $500 without the discount.  Framing is a perceptual illusion, like the moon appearing larger at the horizon than at its zenith.  Why would a retailer place at least some of its more expensive wines on the middle shelf?  Because the middle shelf is where most shoppers look first, the higher prices set the frame and make the lower priced wine appear least expensive.  Price sensitivity is not retrieved from memory.  It is constructed at the moment from the information on hand at the time.

Although marketing has always designed the shopping experience to increase sales, choice architecture makes this topic an area of formal study.  Beginning with the recognition that there is no neutral way to present a choice, the question shifts to how to manipulate the choice presentation in order to "nudge" people toward the behavior you desire.  I wish to avoid the political controversy surrounding the book Nudge because it is irrelevant to my point that preference is constructed and at least part of that construction process includes the way choices are presented. 

What Worldview Guides Choice Modeling in Marketing?

I can only assume that a good number of marketing researchers hold the revealed preference worldview.  What other explanation can be given for adaptive choice-based conjoint where the respondent begins the choice process with a build-your-own product exercise?  Why else would someone use a menu-based choice when the actual purchase task was selecting from a set of predetermined bundles?

Both these examples come from Sawtooth Software and both deal with what they call choice-based conjoint.  Conjoint designs assume that products and services can be represented as attribute bundles and that the preference for the bundle is a function of the preferences for the individual attribute levels.  When the dependent variable is a categorical choice, we have choice-based conjoint.  When the dependent variable is a rating, we have rating-based conjoint.  Sawtooth offers adaptive conjoint for both choices and ratings.

Sawtooth's recommendations are contained in their research paper "Which Conjoint Method Should I Use?"  In their summary they tell us first that our method ought to reflect the marketplace, but then they assert that the important considerations are the number of attributes, the sample size, and the available interviewing time.  Similarly, Sawtooth claims that their menu-based products can be used equally well for buying pre-designed bundles or a la carte items.  Implicit is the belief that one would get the same results whether a buyer "built-your-own" or had to pick one of a set of available feature bundles.  It is as if the process of designing your own product would have no effect on what you wanted, as if stable preferences for hundreds of feature levels were revealed regardless of the method.

It is not as if Sawtooth does not acknowledge that their measurement procedures can impact preferences.  One can find several papers from Sawtooth itself or from its annual conference that demonstrate order and context effects.  For example, when discussing how many choice sets should be shown, Rich Johnson presents compelling evidence that price becomes more important over time as respondents repeatedly make selections from more and more choice sets.  But his conclusion is not that varying price simulates a "price war" and draws attention to the pricing attribute.  Instead, he argues that over time respondents become better shoppers and attend to variables other than brand.  That is, where others would see a measurement bias, Johnson discovers an opportunity to uncover and reveal "real" preferences.

We should not try to minimize the possible confounding effects of asking respondents to repeatedly make choices from sets of alternatives with varying features.  This is the problem with within-subject design that Kahneman discusses in his Nobel Prize acceptance speech (pp. 473-474), "They are liable to induce the effect that they are intended to test."  Kahneman views preferences as constructions.

Hopefully, one last example will clarify how the two worldviews can look at the same data and see two different things.  Here is a quote from the third page of the previously mentioned paper "Which Conjoint Method Should I Use?" by Bryan Orme,

"Despite the benefits of choice data, they contain less information than ratings per unit of respondent effort. After evaluating a number of product concepts, the respondent tells us which one is preferred. We do not learn whether it was strongly or just barely preferred to the others; nor do we learn the relative preference among the rejected alternatives."
Orme sees real preferences that exist independently of the task.  Moreover, these preferences are continuous.  Choice data does not reveal all that is there because it does not reveal strength of preference. 

As a cognitive miser, the constructed-preference view holds that respondents only make the distinctions they need to make in order to complete the task.  In a choice task, once I eliminate an alternative for any reason, even superficial features such as color or shape, I am done.  I do not do need to form a relative preference for each alternative.  My goal was to simplify the choice set, and I do not spend time studying alternatives and forming preferences of relative strength for alternatives that have been rejected.  Unless, of course, you ask me to rate every alternative in the choice set.  However, now the measurement task no longer mimics the purchase task and different preferences get constructed.

Actually, you do not need to accept the constructed-preference view to see both menu-based and adaptive conjoint as intrusive measurement techniques.  That is, one can believe in revealed preference and still hold that some measurement procedures are disruptive.  However, believing that preferences are constructed forces one to take additional steps.  We need a model of real world purchases, a model of the measurement process, and a determination if they two are similar enough to justify generalization.

Let us compare the Sawtooth approach with that from John Colias at Decision Analysts.  They offer a free R package, called ChoiceModelR, which builds on the rhierMnlRwMixture function from Peter Rossi R package bayesm.  Although they do not appear to take an explicit position on the constructed versus revealed preference debate, they do raise several cautions about the importance of recreating the real-world purchase.  They stress the need to customize each design to match the specifics of the brand offering and are concerned about the need to deal with critical idiosyncrasies that are unique to every application.  Realism is important enough that shopping visits are simulated using 3D animation.  Perhaps I should not count Decision Analysts as a "yes" in constructed-preference column.  Nonetheless, they demonstrate that choice modeling can be conducted with some sensitivity to what happens in the marketplace. 

The Quantitative Triumph

Simon Jackman got it right, both the 2012 election prediction and how to model real world phenomena.  "It's a data-meets-good-model, scientific-method thing."  Although this post has focused on the role of preference construction on choice modeling, the same processes are at work whenever any respondent is asked any question.  Election polling is subject to similar measurement effects that must be addressed in the statistical model.  Fortunately, we know a good deal about how respondents interpret questions and how they form a response from research under the heading cognitive aspects of survey methodology.

Obviously, when I look to election prediction for guidance, I am speaking of the modeling process and not the statistical models actually used.  Election prediction serves as a standard because of its willingness to admit the limitation of its data and its ability to compensate with theoretical knowledge and advanced statistical modeling.  Making the political pundits look stupid was simply a little extra treat.