Wednesday, July 23, 2014

Uncovering the Preferences Shaping Consumer Data: Matrix Factorization

How do you limit your search when looking for a hotel? Those trying to save money begin with price. Members of hotel reward programs focus on their brand. At other times, location is first to narrow our consideration set. What does hotel search reveal about hotel preference?

What do consumer really want in a hotel? I could simply provide a list of features and ask you to rate the importance of each. Or, I could force a trade-off by repeatedly giving you a small set of features and having you tell me which was the most and least important in each feature set. But self-report has its flaws requiring that consumers know what they want and that they are willing and able to articulate those desires. Besides, hotels offer lots of features, often very specific features that can have a major impact on choice (e.g., hours when the pool or restaurant are open, parking availability and cost, check-out times, pet policy, and many more). Is there a less demanding route to learning consumer preferences?

Who won the World Series last year, or the Academy Award for best director, or the Nobel Prize for Economics? You would know the answer if you were a baseball fan, a movie buff, or an econometrician. What you know reflects your preferences. Moreover, familiarity with budget hotels is informative and suggests some degree of price sensitivity. One's behavior on a hotel search engine would also tell us a great deal about preference. With a little thought and ingenuity, we could identify many more sources of consumer data that would be preference-revealing had we the analytic means to uncover the preferences shaping such data matrices.

All these data matrices have a common format. Consumers are the rows, and the columns could be either features or brands. If we asked about hotel familiarity or knowledge, the columns would be a long list of possible hotels and the cells would contain the familiarity score with most of those values equal to zero indicating no awareness or familiarity at all. Substituting a different measure in the cells would not change the format or the analysis. For example, the cell entries could be some measure of depth of search for each hotel (e.g., number of inquiries or amount of time). Again, most of the entries for any one consumer would be zero.

In both cases, the measurements are outcomes of the purchase process and are not constructed in response to being asked a question. That is, the hotel search process is observed, unobtrusively, and familiarity is a straightforward recall question with minimal inference required from the consumer. Familiarity is measured as a sequence of achievements: one does not recognize the name of the hotel, one has some sense of familiarity but no knowledge, one has heard something about the hotel, or one has stayed there themselves. Preference has already shaped these measures. That which is preferred becomes familiar over time through awareness, consideration and usage.

Consumer Preference as Value Proposition and Not Separate Utilities

Can I simply tell you what I am trying to accomplish? I want to perform a matrix factorization that takes as input the type of data matrix that we have been discussing with consumers as the rows and brands or features as the columns. My goal is to factor or decompose that data matrix into two parts. The first part will bring together the separate brands or features into a value proposition, and the second part will tells us the appeal of each value proposition for every respondent.

Purchase strategies are not scalable. Choice modeling might work for a few alternatives and a small number of features, but it will not help us find the hotel we want. What we want can be described by the customer value proposition and recovered by matrix factorization of any data matrix shaped by consumer preferences. If it helps, one can think of the value proposition as the ideal product or service and the purchase process as attempting to get as close to that ideal as possible. Of course, choice is contextual for the hotel that one seeks for a business meeting or conference is not the hotel that one would select for a romantic weekend getaway. We make a serious mistake when we ignore context for the consumer books hotel rooms only when some purpose is served.

In a previous post I showed how nonnegative matrix factorization (NMF) can identify pathways in the consumer decision journey. Hotel search is merely another application, although this time the columns will be features and not information sources. NMF handles the sparse data matrix resulting from hotel search engines that provide so much information on so many different hotels and consumers who have the time and interest to view only a small fraction of all that is available. Moreover, the R package NMF brings the analysis and the interpretation within reach of any researcher comfortable with factor loadings and factor scores. You can find the details in the previous post from the above link, or you can go to another example in a second post.

Much of what you have learned running factor analyses can be applied to NMF. Instead of factor loadings, NMF uses a coefficient matrix to link the observed features or brands in the columns to the latent components. This coefficient matrix is interpreted in much the same way as one interprets factor loadings. However, the latent variables are not dimensions. I have called them latent components; others refer to them as latent features. We do not seem to possess the right terminology because we see products and services as feature bundles with preference residing in the feature levels and the overall utility as simply the sum of its feature-level utilities. Utility theory and conjoint analysis assume that we live in the high-dimensional parameter space defined by the degrees of freedom associated with feature levels (e.g., 167 dimensions in the Courtyard by Marriott conjoint analysis).

Matrix factorization takes a somewhat different approach. It begins with the benefits that consumers seek. These benefits define the dimensionality or rank of the data matrix, which is much smaller than the number of columns. The features acquire their value as indicators of the underlying benefit. Only in very restricted settings is the total equal to the sum of its part. As mentioned earlier in this post, choice modeling is not scalable. With more than a few alternatives or a handful of features, humans turn to simplification strategies to handle the information overload. The appeal or beauty of a product design cannot be reduced to its elements. The persuasiveness of a message emerges from its form and not its separate claims. It's "getting a deal" that motivates the price sensitive and not the price itself, which is why behavioral economics is so successful at predicting biases. Finally, Choice architecture works because the whole comes first and the parts are seen only within the context of the initial framing.

Our example of the hotel product category is organized by type and storyline within each type. As an illustration of what I mean by storyline, there are luxury hotels (hotel type) that do not provide luxury experiences (e.g., rude staff, unclean rooms, or uncomfortable beds). We would quickly understand any user comment describing such a hotel since we rely on such stories to organize our experiences and make sense out of massive amounts of information. Story is the appropriate metaphor because each value proposition is a tale of benefits to be delivered. The search for a hotel is the quest for the appealing story delivering your preferred value proposition. These are the latent components of the NMF uncovered because there exists a group of consumers seeking just these features or hotels. That is, a consumer segment that only visits the links for budget hotels or filters their search by low price will generate a budget latent component with large coefficients for only these columns.

This intuitive understanding is essential for interpreting the results of a NMF. We are trying to reproduce the data matrix one section at a time. If you picture Rubik's cube and think about sorting rows and columns until all the consumers whose main concern is money and all the budget hotels or money-saving features have been moved toward the first positions, you should end up with something that looks like this biclustering diagram:
Continuing with the other rows and columns, we would uncover only blocks in the main diagonal if everyone was seeking only one value proposition. But we tend to see both "pure" segments focusing on only one value proposition and "mixed" segments wanting a lot of this one plus some of that one too (e.g., low price with breakfast included).

So far, we have reviewed the coefficient matrix containing the latent component or pure value propositions, which we interpreted based on their association with the observed columns. All we need now is a consumer matrix showing the appeal of each latent component. That is, a consumer who wants more than offered by any one pure value proposition will have a row in the data matrix that cannot be reproduced by any one latent component. For example, a pure budget guest spends a lot of time comparing prices, while the budget-plus-value seeker spends half of their time on price and the other half on getting some extra perks in the package. If we had only two latent components, then the budget shoppers would have weights of 1 and 0, while the other would have something closer to 0.5 and 0.5.

The NMF R package provides the function basismap to generate heatmaps, such as the one below, showing mixture proportions for each row or consumer.
You can test your understanding of the heatmap by verifying that the bottom three rows identified as #7, #2 and #4 are pure third latent component and the next two rows (#17 and #13) require only the first latent component to reproduce their data. Mixtures can be found on the first few rows.

Mining Consumer Data Matrices for Preference Insights

We can learn a lot about consumer preferences by looking more carefully at what they do and what they know. The consumer is not a scientist studying what motivates or drives their purchase behavior. We can ask for the reasons why, and they will respond. However, that response may be little more than a fabrication constructed on the fly to answer your question. Tradeoffs among abstract words with no referents tell us little about how a person will react in a specific situation. Yet, how much can we learn from a single person in one particular purchase context?

Collaborative filtering exploits the underlying structure in a data matrix so that individual behavior is interpreted through the latent components extracted from others. Marketing is social and everything is shared. Consumers share common value propositions learned by telling and retelling happy and upsetting consumption stories in person and in text. Others join the conversation by publishing reviews or writing articles. Of course, the marketing department tries to control it all by spending lots of money. The result is clear and shared constraints on our data matrix. There are a limited number of ways of relating to products and services. Individual consumers are but variations on those common themes.

NMF is one approach for decomposing the data matrix into meaningful components. R provides the interface to that powerful algorithm. The bottleneck is not the statistical model or the R code but our understanding of how preference guides consumer behavior. We mistakenly believe that individual features have value because the final choice is often between two alternatives that differ on only a few features. It is the same error that we make with the last-click attribution model. The real work has been done earlier in the decision process, and this is where we need to concentrate our data mining. Individual features derive their value from their ability to deliver benefits. These are our value propositions uncovered by factoring our data matrices into preference generating components.

No comments:

Post a Comment