Thursday, May 15, 2014
The Mind Is Flat! So Stop Overfitting Choice Models
Conjoint analysis and choice modeling rely on repeated observations from the same individuals across many different scenarios where the features have been systematically manipulated in order to estimate the impact of varying each feature. We believe that what we are measuring has substance and existence independent of the measurement process. Nick Chater, my source for this eerie figure depicting the nature of self-perception, lays to rest this "illusion of depth" in a short video called "The Mind is Flat." We do not possess the cognitive machinery demanded by utility theory. When we "make up our mind," we are literally making it up. Features do not possess value independent of the decision context. Features acquire value as we attempt to choose one option from many alternatives. Consequently, whatever consistency we observe results from reoccurring situations that constrain preference construction and not because of some seemingly endless store of utilities buried deep in our memories.
Although it is convenient for the product manager to think of their offerings as bundles of features and services, the consumer finds such a representation to be overwhelming. As a result, the choice modeler is forced to limit how much each respondent is shown. The conflict in choice modeling is between the product manager who wants to add more and more features to the bundle and the analyst who needs to reduce task complexity so that respondents will participate in the research. At times, fractional factorial designs fail to remove enough choice sets, so we turn to optimal configurations with acceptable confounding (see design of experiments in R). Still, even our reduced number of choice scenarios may be too many for any one individual, so we show only a few scenarios to each respondent, make a few restrictive assumptions about homogeneity (e.g., introduce hyperparameters specifying the relationships between individual- and group-level parameters), and then proceed with hierarchical Bayes to compute separate estimates for every person in the study.
We justify such data collection by arguing that it is an "as-if" measurement model. Of course, people cannot retain in memory the utilities associated with every possible feature or service level. Clearly, no one is capable of the mental arithmetic necessary to do the computation in their heads. Yet, we rationalize the introduction of such unrealistic assumption claiming that they allow us to learn what drives choice and decision making. Thus, by asking a consumer to decide among feature bundles using only the information provided by the experimenter, one can fit a model and estimate parameters that will predict behavior in this specific setting. But our findings will not generalize to the marketplace because we are overfitting. The estimated utilities work only for this one particular task. What we have learned from behavioral economics over the last 30 years is that what is valued depends on the details of the decision context.
For those of you wishing a more complete discussion of these issues, I will refer you to my previous posts on Context Matters When Modeling Human Judgment and Choice, Got Data from People?, and Incorporating Preference Construction into the Choice Modeling Process.
Ecological Data Collection and Latent Variable Modeling
I am not suggesting that we abandon choice modeling or hierarchical Bayes estimation. A well-designed choice study that carefully mimics the actual purchase context can reveal a good deal about the impact of varying a small number of features and services. However, if our concern is learning what will happen in the marketplace when the product is sold, we ought to be cautious. Order and context effects will introduce noise and limit generalizability. Multinomial logistic models, such as those in the bayesm R package, teach us that feature importance depends on the range of feature levels and the configuration of all the other features varied across the choice scenarios. We are no longer in the linear world of rating-based conjoint via multiple regression with its pie charts indicating the proportional contribution of each feature.
A good rule of thumb might be to include no more features than the number that would be shown on the product package or in a print ad. Our client's desire to estimate every possible aspect will only introduce noise and result in overfitting. On the other hand, simply restricting the number of features will not eliminate order effects. Whenever we present more than one choice scenario, we need to question whether our experimental arrangements have induced selection strategies that would not be present in the marketplace. Does varying price focus attention on price? Does the inclusion of one preferred feature level create a contrast effect and lower the appeal of the other feature levels? These effects are what we mean when we say the preference is not retrieved from a stable store kept in long-term memory.
It is unnecessary for consumers to store utilities because they can generate them on the fly given the choice task. "What do you feel like eating?" becomes a much easier question when you have a menu in your hands. We use the choice structure to simplify our task. I read down the menu imaging how each item might taste and select the most appealing one. I switch products or providers by comparing what I am using with the new offer. The important features are the ones that differentiate the two alternatives. If the task is difficult or I am not sure, then I keep what I have and preserve the status quo. In both cases context comes to our rescue.
The flexibility that characterizes human judgment and decision making flows from our willingness to adapt to the situation. That willingness, however, is not a free choice. We are not capable of storing, retrieving and integrating feature level utilities. You might remember the telephone game where one person writes down a message and whispers it to a second person, who whispers the message they heard to a third, and so on. Everyone laughs at the end of the telephone line when the last person repeats what they think they had heard and it is compared to what was written. Such is the nature of human memory.
We can avoid overfitting by reducing error and simplifying our statistical models. These are the two goals of statistical learning theory. We keep the choice task realistic and avoid order effects. Occam's razor will trim our latent variables down to a single continuous dimension or a handful of latent classes. For example, the offerings within a product category are structured along a continuum from basic to premium. The consumer learns what is available and decides where they personally fall along this same continuum. Do they get everything they want and need from the lower end, or is it worth it to them to pay more for the extras? The consumer exploits the structure of the purchase context in order to simplify their purchase decision. If our choice modeling removes those supports, it no longer reflects the marketplace.
Choice remains complex, but now the complexity lies in the recognition phase. That is, choice begins with problem recognition (e.g., I need to retrieve email away from my desktop or I want to watch movies on the go or both at the same time). Framing of the choice problem determines the ad hoc or goal-derived category, which in turn shapes the consideration set (e.g., smartphones only, tablets only, laptops only, or some combination of the three product categories) and determines the evaluative criteria to be used in this particular situation. This is why I called this section ecological data collection. It is the approach that Donald Norman promotes when designing products for people. For the choice modeler, it mean a shift in our statistical modeling from estimating feature-level utilities to pattern recognition and unsupervised learning.