Wednesday, April 22, 2015

Conjoint Analysis and the Strange World of All Possible Feature Combinations

The choice modeler looks over the adjacent display of cheeses and sees the joint marginal effects of the dimensions spanning the feature space: milk source, type, origin, moisture content, added mold or bacteria, aging, salting, packaging, price, and much more. Literally, if products are feature bundles, then one needs to specify all the sources of variation generating so many different cheeses. Here are the cheeses from goats, sheep and cows. Some are local, and some are imported from different countries. In addition, we will require columns separating the hard and soft cheeses. The feature list can become quite long. In the end, one accounts for all the different cheeses with a feature taxonomy consisting of a large multidimensional space of all possible feature combinations. Every cheese falls into a single cell in the joint distribution, and the empty cells represent new product possibilities (unless the feature configuration is impossible).

The retailer, on the other hand, was probably thinking more of supply and demand when they filled this cooler with cheeses. It's an optimization problem that we can simplify as a tradeoff between losing customers because you do not have what they are looking for and losing money when the product spoils. Meanwhile, consumers have their own issues for they are buying for a reason and may infer a means to a desired end from individual features or complex combinations of transformed features. Neither the retailer nor the consumer is a naturalist seeking a feature taxonomy. In fact, except for the connoisseur, most consumers have very limited knowledge of any product category. We are simply not familiar with all the offerings nor could we name all the alternatives in the cheese cooler or the dog food aisle or the shelves filled with condensed soups. Instead, we rely on the physical or online displays to remind ourselves what is available, but even then, we do not consider every alternative or try to differentiate among all the products.

Thus, the conjoint world of all possible feature combinations is strange to a consumer who sees the products from a purposefully restricted perspective. The consumer categorizes products using goal-derived categories, for instance, restocking or running out of slices for your ham and Swiss sandwich. Thus, attention, categorization and preference are situated within the purchase context defined by goals and the purchase process including the physical product display (e.g., a deli counter with attendant is not the same as self-service selection of prepackaged products). Bartels and Johnson summarize this emerging view in their recent article "Connecting Cognition and Consumer Choice" (see Section 3 Learning and Constructing Value in Context).

Speaking of cheese (in particular, Brillat-Savarin cheese), we are reminded of the above quote popularized by the original Japanese Iron Chef TV show. Can it be this simple? I can tell you what is being discussed if you give me a "bag of words" and the R package topicmodels. R-bloggers shows how to recover the major cuisines from a list of ingredients from different recipes. My claim is that I learn a great deal by asking if you buy single wrapped slices of processed American cheese. As Charles de Gaulle quips, "How can you govern a country which has 246 varieties of cheese?" One can start by identifying the latent goal structure that shapes awareness, familiarity and usage.

Much is revealed by learning what music you listen to, your familiarity with various providers in a product category, which brands of Scotch whiskey you buy, or the food choices you make for breakfast. In each of those posts, the R package NMF was able to discover the underlying latent variables that could reproduce the raw data with many columns and most rows containing only a few responses (e.g., Netflix ratings with viewers in the rows seeing only a small proportion of all the movies in the columns). Nonnegative matrix factorization (NMF), however, is only one method for uncovering the hidden forces that structure consumption activities. You are free to select any latent variable model that can accommodate such high-dimensional sparse data (e.g., the already mentioned topic modeling, the R package HDclassif, the R package bclust, and more on the way). My preference for NMF stems from its ease of use and successful application across a diverse range of marketing research data as reported in prior posts.

Unfortunately, in the strange world of all possible feature combinations, consumers are unable to apply the strategies that work so well in the marketplace. Given nothing other than hypothetical products described by lists of orthogonal features, what else can a respondent do but rely on the information provided?


  1. Hi Joel do you think NMF would work on an NPS survey where I'm trying to understand the key topics why a customer would/would not recommend me ?

    1. I have suggested NMF in those cases where the data matrix is high-dimensional (lots of columns) and sparse (lots of rows with mostly zeros). I have been successful with such data matrices when there is clear separation so that we can jointly cluster the rows and columns to form blocks with higher density than other regions. Movies are good examples with the blocks representing the intersection of film genre and viewer niche. I wrote this post because I believed that the large varieties of cheeses are made and sold to different consumers for different purposes with a good deal of separation into purpose-by-cheese blocks. So, do we expect recommendation reasons to follow a similar pattern? Personally, I do not see it. Reasons for cancellation can fall into different subspaces with different customers experiencing different issues resulting from their own personal situations. Of course, if you are asking about using topic modeling for open-ended comment on a survey, then there is an R package for that called stm or structural topic models.

  2. Thanks so much for the tip. I liked the stm package especially with the metadata feature!!