|Why does lifting out a slice make the pizza appear more appealing?|
We can begin our discussion with the ultimate feature bundle - pizza toppings. Technically, a menu would only need to list all the toppings and allow the customers to build their own pizza. According to utility maximization, choice is a simple computation with the appeal of any pizza calculated as a function of the utilities associated with each ingredient. This is the conjoint assumption that the attraction to the final product can be reduced to the values of its features. Preference resides in the features, and the appeal of the product is derived from its particular configuration of feature levels.
Yet, pizza menu design is an active area of discussion with many websites offering advice. In practice, listing ingredients does not generate the sales that are produced by the above photo or even a series of suggestive names with associated toppings. Perhaps this is why we see the same classic combinations offered around the world, as shown below in this menu from Bali.
I am not denying that a vegetarian does not want ham on their pizza. Ingredients do matter, but what is the learning process? Does preference formation begin with taste tests of the separate toppings heated in a very hot oven? "Oh, I like the taste of roasted black olives. Let me try that on my pizza." No, we learn what we like by eating pizzas with topping combinations that appear on menus because others before have purchased such configurations in sufficient quantity and at a profitable price for the restaurant. Moreover, we should not forget that we learn what we like in the company of others who are more than happy to tell us what they like and that we should like it too.
The expression "what grows together goes together" suggests that tastes are acquired initially by pairing what is locally available in abundance. It is the entire package that attracts us, which explains why pizza seems more appealing in the above photo. If we were of an experimental mindset, we might systematically vary the toppings one pizza at a time and reach some definitive conclusions concerning our tastes. However, it is more likely that consumers invent or repeat a rationale for their behavior based on minimal samplings. That is, consumer inference may function more like superstition than science, less like Galileo and more like an establishment with other interests to be served. At least we should consider the potential impact of the preference formation process and keep our statistical models open to the possibility that consumer go beyond simple features when they represented products in thought and memory (see Connecting Cognition and Consumer Choice).
Simply stated, you get what you ask. Features become the focus when all you have are choice sets where the alternatives are the cells of an optimal experimental design. Clearly, the features are changing over repeated choice sets and the consumer responds to such manipulations. If we are careful and mimic the marketplace, our choice modeling will be productive. I have tried to illustrate in previous posts how R can generate choice designs and provide individual-level hierarchical Bayes estimates along with some warnings about overgeneralizing. For example, choice modeling works just fine when the purchase context is label comparisons among a few different products sitting on the retail shelf.
Instead, what if we showed actual pizza menus from a large number of different restaurants? Does this sound familiar? What is we replace pizzas with movies or songs? This is the realm of recommender systems where the purchase context is filled with lots of alternatives arrayed across a landscape of highly correlated observed features generated by a much smaller set of latent features. We have entered the world of representation learning. Choice modeling delivers when the purchase context requires that we compare products and trade off features. Recommendation systems, on the other hand, thrive when there is more available than any one person can experience (e.g., the fragmented music market).
To be clear, I am using the term "recommendation systems" because they are familiar and we all use them when we shop online or search the web. Actually, any representational system that estimates missing values by adding latent variables will do nicely (see David Blei for a complete review). However, since reading a description of the computation behind the winning of the Netflix Prize, I have relied on matrix factorization as a less demanding approach that still yields substantial insight with marketing research data. In particular, the NMF package in R offers such an easy to use interface to nonnegative matrix factorization that I have used it over and over again with repeated success. You will find a list of such applications at the end of my post on Brand and Product Category Representation.