Friday, March 20, 2015

What Consumers Learn Before Deciding to Buy: Representation Learning

Features form the basis for much of our preference modeling. When asked to explain one's preferences, features are typically accepted as appropriate reasons: this job paid more, that candidate supports tax reform, or it was closer to home. We believe that features must be the drivers since they so easily serve as rationales for past behavior. Choice modeling formalizes this belief by assuming that products and services are feature bundles with the value of the bundle calculated directly from the utilities of its separate features. All that we need to know about a product or service can be represented as the intersection of its features, which is why it is called conjoint analysis.

At first, this approach seems to work, but it does not scale well. We create hypothetical products and services defined by the cells in a factorial experimental design (see the book Stated Preference Methods Using R). The number of cells increases quickly with each additional feature so that we need to turn to optimal designs in R in order to limit the number of possible combinations. We have reduced the number of hypothetical descriptions, while the number of estimated parameters remains unchanged. Overall preference continues to be an additive function of the values attributed to each of the separate components.

Representation learning, on the other hand, is associated with deep neural networks, such as the h2o package discussed by John Chambers at the useR! 2014 conference. According to Yoshua Bengio (see his new chapter on Distributed Representations), "a good representation is one that makes further learning tasks easy." The process is described in his first chapter on Deep Learning. As shown in this figure from Wikipedia, the observed features are visible units and the product representation is a transformation contained in hidden units.


What do consumers learn before deciding to buy? They learn a representational structure that reduces the complexity of the purchase process. This learning comes relatively easy with so many sources telling us what to look for and what to buy (e.g., marketing communications, professional reviews, social media and of course, friends and family). Bengio speaks of evolving culture vs. local minima as the process for "brain to brain transfer of information." Others refer to it as a meeting of minds or shared conceptualizations.

Are you thinking about a Smart Watch? Representation learning would suggest that the first step is "getting a lay of the land" or untangling the sources of variation accounting for differences among the offerings. I outlined such an approach in my last post on precursors to preference construction. It is possible to go online and request side-by-side feature comparisons that look similar to what one might find in choice modeling. However, that step is often late in the process after you have decided to purchase and have narrowed your consideration set. Before that, one looks at pictures, scans specifications, reads reviews and learns from others through user comments. One discovers what is available and what benefits are delivered. As you learn what if offered, you come to understand what you might want and be willing to spend.

The purchase task is somewhat easier than language translation or facial recognition because product categories are marketing creations with a deliberately simplified structure. Products and services are simple by design with benefits and features linked together and set to music with a logo and a tagline. Product and service features are observed (red in the above figure); benefits are latent or hidden features (the blue) and can be extracted with deep neural networks or nonnegative matrix factorization. That is, we can think of representation learning as the relatively slow unsupervised learning that occurs early in the decision process and makes later learning and decision making easier and faster. Utility theory lacks the expressive power to transform the input into new ways of seeing. Both deep neural networks and nonnegative matrix factorization free us to go beyond the information given.

Finally, what happens when the consumer is pulled out of the purchase context and presented feature lists constructed according to a fractional factorial or optimal design? The norms of the marketplace are violated, yet respondents get through the task the best they can using the only information that you have provided them. Unfortunately, you do not learn much about bears in the wild when they are confined in cages.




Thursday, March 5, 2015

Brand and Product Category Representation: Precursors to Preference Construction

Evidently, preference is contextual, or so The Hershey's Company claims in their advertising. I agree and will not repeat the argument made in a previous post on incorporating preference construction into the choice modeling process.

Within the framework of utility theory and conjoint analysis, R provides both an introduction (Stated Preference Methods using R) and access to advanced algorithms (hierarchical Bayes choice modeling). However, generalization remains a problem. The experimental procedures that elicit stated preference are not the same as those in the marketplace where purchases are made for differing occasions, purposes and participants. Preferences are not well-formed and stable, but constructed on the fly within the choice context. Even price sensitivity depends on framing, which is why we see such robust and resistant order-effects when costs are increasing versus decreasing (e.g., a 10% price increase seems less objectionable when it comes after a proposed 15% increment than when it comes after a 5% raise).

Such context dependence is the reason why so many of us who use choice modeling in marketing research seek to limit the number of attributes and their associated levels and demand that the experimental arrangements mimic as closely as possible the actual purchase process. But even with such restrictions, the repetition from presenting several choice sets leads consumers to focus on what is varied and induces sensitivities that would not be found in the marketplace where these attributes levels would be constant or difficult to find. Moreover, our designs attempt to keep attributes independent so that we can estimate separate effects for every attribute. Yet, customers enter with conceptual structures that link these attributes (e.g., larger quantities are discounted and premium brands cost more). Do we disrupt the purchase process when we ignore such shared conceptual spaces?

Consumers learn quite a lot about a product category before they decide to purchase anything. The SmartWatch will serve as a good example because it is relatively new and still evolving. The name "SmartWatch" invites us to transfer what we know about SmartPhones and their relationship to cell phones. There has to be brands (where's Apple?) and alternative versions running from basic to premium (good-better-best). Considerers will be talking to others and reading reviews telling them what is the best device for their individual usage and needs. This is the product representation that one learns in order to decide to enter a product category. You answer the question "Do I really need or want a SmartWatch?" by learning what is available and deciding what you are willing to spend to obtain it. When we enter the marketplace, we enter with this shared representation and we tradeoff specific features or pricing offers within this understanding. Those of you with machine learning backgrounds might wish to think of this as a form of unsupervised feature learning.

R provides the interface for representation learning about brands and products categories. Although one has a number of alternatives, I will keep it simple and discuss only one approach, nonnegative matrix factorization (NMF). I am thinking of feature or representation learning as a form of data reduction or manifold learning as outlined in Section 8 of the Yoshua Bengio et al. review paper. Consumers populate the rows of the data matrix, and the columns might span brand and feature familiarity, benefits and features sought, or expected usage. It is easy to generate a long list of columns just for features alone. Moreover, features are linked to benefits, and both features and benefits sought flow from usage. Obviously, the consumer requires a simpler representation and NMF supplies the building blocks.

Diving into the details, a potential customer wanting to use the SmartWatch in their fitness program would attend to and know about features related to their intended usage. Would they be likely to remember a bunch of specific features, or would they learn what features were standard on the basic version and what features were extras on the more premium models? Brand affordance organizes perceptions along a continuum with different features at the lower and higher ends of the scale. Simultaneously, consumers are differentiated along with the features, for example, some SmartWatch prospects will be interested only in convenience and discretion. The co-clustering produced by matrix factorization provides the underlying representation of both consumers grouped by the benefits and features they seek and those same benefits and features clustered because they are sought by the same consumers.

The R package NMF supplies the interface and several ways to display the results, as I have shown in previous posts: