Revisiting Stated versus Derived Importance
All is well as long as the respondent is able and willing to provide a response that can be used to improve the marketing of products or candidates. Unfortunately, "know thyself" is no easier for us than it was for the ancient Greeks. The introspection illusion has been well documented. Simply put, we feel that those reasons that are easiest to provide when asked why must be the motivations for our behavior. The response to the exit poll may be nothing more than a playback of something previously heard or read. Yet, it is so easy to repeat that it must be the true reason. The questions almost write themselves, and the responses are tabulated and tracked over time without much effort at all. You have seen the headlines: "Top 10 Reasons for This or That" or "More Doing This for That Reason." So, what is the alternative?
Marketing research faced a similar situation with the debate over stated versus derived importance. Stated importance, as you might have inferred from the name, is a self-report by the respondent concerning the contribution of a feature or benefit to a purchase decision. The wording is typically nonspecific, such as "how important is price" without any actual pricing information. Respondents supply their own contexts, presumably derived from variations in price that they commonly experience, so that in the end we have no idea of the price range they are considering. Regrettably, findings do not generalize well for what is not important in the abstract becomes very important in the marketplace. The devil is in the details, and actual buying and selling is filled with details.
Derived importance, on the other hand, is the result of a statistical analysis. The experimental version is conjoint analysis or choice modeling. By systematically varying the product description, one estimates the impact of manipulating each attribute or feature. With observational data, one must rely on natural variation and perform a regression analysis predicting purchase intent for a specific brand from other ratings of the same brand.
In both case we are looking for leverage, specifically, a regression coefficient derived from regressing purchase interest on feature levels or feature ratings. If the goal is predicting how consumers will respond to changing product features, then conjoint seems to be winner once you are satisfied that the entire process is not so intrusive that the results cannot be generalized to the market. Yet, varying attributes in an experiment focuses the consumer's attention on aspects that would not be noticed in the marketplace. In the end, the need for multiple ratings or choices from each respondent can create rather than measure demand.
On the other hand, causal inferences are not possible from observation alone. All we know from the regression analysis are comparisons of the perceptual rating patterns of consumers with different purchase intent. We do not know the directionality or if we have a feedback loop. Do we change the features to impact the perceptions in order to increase purchase intent? Or, do we encourage purchase by discounting price or adding incentives so that product trial will alter perceptions? Both of these approaches might be successful if the associative relationship between perception and intend results from a homeostatic process of mutual feedback and reinforcement.
Generalized Perceptions Contaminated with Overall Satisfaction
Many believe that "good value for the money" is a product perception and not another measure of purchase intent. Those that see value as a feature interpret its high correlation with likelihood to buy as an indication of its derived importance. Although it is possible to think of situations where one is forced to repurchase a product that is not a good value for the money, in general, both items are measuring the same underlying positive affect toward the product. Specifically, the memories that are retrieved to answer the purchase question are the same memories that are retrieved to respond to the values inquiry. Most of what we call "perceptions" are not concrete features or services asked about within a specific usage context that tap different memories. Consequently, we tend to see predictors in the regression equation with substantial multicollinearity from halo effects because we only ask our respondents to recall the "gist" of their interactions and not the details.
Our goal is to collect informative survey data that measures more than a single approach-avoidance evaluative dimension (semantic memory). The multicollinearity among our predictors that continually plagues our regression analyses stems from the lack of specificity in our rating items. Questions that probe episodic memories of features or services used or provided will reduce the halo effect. Unfortunately, specificity creates its own set of problems trying to analyze high-dimensional and sparse data. Different needs generate diverse usage experiences resulting in substantial consumer heterogeneity. Moreover, the infrequently occurring event or the seldom used feature can have a major impact and must be included in order for remedial action to be taken. Some type of regularization is one approach (e.g., the R package glmnet), but I prefer an alternative that attempts to reduce the large number of questions to a smaller set of interpretable latent features.
An Example to Make the Discussion Less Abstract
If we were hired by a cable provider to assess customer satisfaction, we might start with recognizing that not everyone subscribes to all the services offered (e.g., TV, internet, phone and security). Moreover, usage is also likely to make a difference in their satisfaction judgments, varying by the ages and interest of household members. This is what is meant by consumers residing in separate subspaces for parents who use their security system to monitor their children when they are at work have very different experiences from a retired couple without internet access. Do I need to mentions teens in the family? Now, I will ask you to list all the good and bad experiences that a customer might have using all possible services provided by the cable company. It is a long list, but probably not any longer than a comprehensive medical inventory. The space formed by all these items is high-dimensional and sparse.
This is a small section from our data matrix with every customer surveyed as a row and experiences that can be probed and reliably remembered as the columns. The numbers are measures of intensity, such as counts or ratings. The last two respondents did not have any interaction with the six features represented by these columns. The entire data matrix is just more of the same with large patches of zeros indicating that individuals with limited interactions will repeatedly response no, never, or none.
In practice, we tend to compromise since we are seeking only actionable experiences that are frequent or important enough to make a difference and that can be remediated. Yet, even given such restrictions, we are still tapping episodic or autobiographical memories that are relatively free of excessive halo effects because the respondent must "relive" the experience in order to provide a response.
Our data matrix is not random but reflects an underlying dynamics that creates blocks of related rows and columns. In order to simplify this discussion we can restrict ourself to feature usage. For example, sports fans must watch live events in high definition. One's fanaticism is measure by the breadth and frequency of events watched. It is easy to image a block in our data matrix with sport fans as the rows, sporting events as the columns and frequency as the cell entries. Kids in the household along with children's programming generate another block, and so on. To be clear, we co-cluster or bicluster the rows and columns simultaneously for it is their interaction that creates clusters.
The underlying dynamics responsible for the co-clustering of the rows and the columns can be called a latent feature. It is latent because it is not directly observed, and like factor analysis, we will name the latent construct using coefficients or loadings reflecting its relationships to the observed columns. "Feature" was chosen due to the sparsity of the coefficients with only a few sizeable values and the remaining close to zero. As a result, we tend to speak of co-clustering rows and columns so that "latent feature" seems more appropriate than latent variable.
You can find an example analysis of a feature usage inventory using the R package NMF in a previous post. In addition, all the R code needed to run such an analysis can be found in a separate post. In fact, much of my writing over the last several months has focused on NMF, so you may wish to browse. There are other alternatives for biclustering in R, but nonnegative matrix factorization is such an easy transition from principal component analysis and mixture modeling that most should have little trouble performing and interpreting the analysis.