All of this can be summarized in the information integration paradigm shown in the figure above. Features are the large S real-world stimuli that become corresponding small s perceptions with their associated utilities inside our heads. The utility function does the integrating and outputs an internal small r response, the total utility of the feature bundle. The capital R represents the answer on the questionnaire because there is some additional translation needed to provide a rating or to select an alternative from a choice set.
One feature (big S) elicits one preference (little s). If one likes the features, then one will like the product, which is nothing more than a bundle of features. So, if you like strawberry jam, but not too sweet and especially not too expensive, then you will prefer Brand X strawberry preserve if it possesses the optimal combination of your preferred features. And you know this from a blind taste test of all the strawberry preserves on our supermarket shelf? Or, do you buy Brand X strawberry preserve because it was what you ate as a child or was a gift from someone important to you or was recommended by a trusted companion? A good deal of your product knowledge comes from observational learning where we watch and copy the purchase choices made by others. Are your choices determined by the features that you prefer, or are you seduced into purchase and infer feature preference from your choices?
Will Choice Blindness Change Your Mindset?
Let's head to the supermarket for a taste test of different flavors of jams and an experimental assessment of which comes first preference or choice. Better yet, we have this YouTube video from a BBC program on decision making. The details of the study have been published, so that all I need to do is describe the following figure.
The woman with the spoon is the respondent tasting two different flavors of jam, one in the blue jar and the other in the red jar. Here is the videotape associated with this picture. What you should note is that the woman controlling the jars is the experimenter. She opens the red jar first, the taster tastes, and the experimenter turns the red jar over. Unknown to the taster, the red and blue jars are identical. Both are double jars that can be opened by unscrewing either the top or the bottom, and both contain the same two jams. What you need to notice is that the experimenter turns the jar over after the first tasting. Consequently, the top of the red jar now has the same flavor jam as the blue jar. The respondent tastes the blue jar, which given the look of disapproval on the taster's face, we will assume has been rejected. Now, the respondent is asked to taste again the flavor that they preferred, which has been switched and replaced with rejected flavor, and to tell us their reasons for their preference. This video and especially the BBC video illustrate that our tasters have no problem giving reasons why they like the flavor that they just rejected.
In two-thirds of the trials our tasters did not notice that the flavors were switched. They liked the blue on their left or the red on their right, and they were sticking with their choice. When asked to taste again and tell us why, they gave reasons why they preferred the flavor that they originally rejected. If you are questioning if the two jams were similar enough to be mistaken for each other, you can be reassured that in a separate experiment different subjects had no problems discriminating between the two flavors. The first video from the BBC contains clips of respondents explaining their preferences for the flavor that they rejected but were fooled into believing that they preferred. It should be noted that the reasons seem entirely reasonable as if the tasters fell for the trick and took the second tasting as preferred. If this one study does not convince you, you can find many more available from the choice blindness lab.
What are the implications for statistical modeling?
Does the product as feature bundle seem to work in our research because we have removed so much of the actual purchase context and all that the respondents have left is the information we provided in our vignettes and scenarios? For marketing researchers this implies that choice modeling may be of limited usefulness because there are only a few purchase contexts where we can generalize from the laboratory to the marketplace. Otherwise, the preference construction process induced by our hypothetical experiment does not match the preference construction process unfolding in real purchase contexts.
In fact, this is why we have discrete choice modeling. As Daniel McFadden explains in his Nobel Prize acceptance speech, the decision to drive alone, carpool, take the bus or the metro required a new statistical model reflecting the specific preference construction process required to make such a choice. What is true for transportation choices is true for many occasions when consumers decide what to buy. It is not difficult to think of actual purchase situations where we have narrowed our choices down to two or three alternatives that we are actively considering and where we compare these offers by trading off features. Online purchases by sites that enable you to create side-by-side comparisons will satisfy the criteria underlying information integration. We can observe the same phenomena in the store when a shopper takes two containers offer the shelf to compare the ingredients.
The R package bayesm will handle such data and return individual-level estimates for all the utilities, although we still need to be concerned when presenting multiple choice sets to each respondent. Feature importance is determined by the range and frequency with which feature levels are systematically manipulated in an experimental design. Thus, what is not important when presented as a single choice task becomes important when varied over several choice sets. Moreover, there are good reasons to limit the number of features listed for each alternative. We must resist client demands to cram more information into the product description than they would be willing to include on the packaging or in advertising.
Now, what about all the other purchases that we make every day that do not involve feature comparisons? A possible answer is offered by John Hauser in his paper on the role of recognition-based heuristics in marketing science. Recognition is one of many heuristics from the Fast and Frugal Paradigm, which holds that simple processes can yield good results with little cognitive effort when the environment is structured appropriately. For example, larger brands with more customers tend to be more easily recognized by everyone. If the greater market share results from the brand's ability to satisfy more customers, then the market is structured so that my recognition of the brand is an indication that I am likely to be satisfied after buying the brand. This is an example of choice without feature preference.
Sell the Sizzle Not the Steak
What is the basis for competition in the product category? Is it feature comparison (steak type, grade, degree of marbling, maturity, fat content, origin), or is the focus on benefit (happy steak consumption experiences with loved ones)? Often, the market is engaged in a battle to frame the purchase in term of benefits or features with the market leader emphasizing benefits and competitors struggling to gain share by pushing feature or price advantages. Statistical modeling plays on this same battlefield when we collect and analyze choice data by varying feature levels. Conjoint analysis frames the purchase task and encourages the impact of features. Who is surprised to discover strong price effects when prices are varied repeatedly over a sizeable range of values?
Branding, on the other hand, draws our attention away from features and toward benefits. It invites different data collection procedures and alternative statistical models. Treating the brand as just another feature in an experimental design diminishes the role that brand plays in the market. In the actual purchase context brand dominates as recognizable and familiar, as an intentional agent promising benefits, or as I argued in an earlier post, as an affordance. Consequently, when we measure brand perceptions, we consistently find a highly correlated set of responses: a strong first principal component (linear) or a clear low-dimensional manifold (nonlinear). This is the brand schemata, a pattern of strengths and weaknesses, that I have analyzed using item response theory.
Focusing on the brand takes us down an analytic path toward pattern recognition and matrix decomposition. It moves us out of the econometric task view on the CRAN into machine learning. We begin looking at linear and nonlinear dimension reduction as statistical models of how the brand holds it all together. Recommender systems acquire a special appeal. Moreover, experimental designs begin to seem too obstructive, and we are drawn toward more naturalistic data collection. If consumers rely on brand information to assist them in their decision journey, then we must be careful not to remove those supports and create a self-fulfilling prophecy where feature preferences dominate because feature comparisons are all that is available.