"A wealth of information creates a poverty of attention."
We categorize our world so that we can ignore most of it. In order to see figure, everything else must become ground. Once learned, the process seems automatic, and we forget how hard and long it took to achieve automaticity. It is not easy learning how to ride a bicycle, but we never forget. The same can be said of becoming fluent in a foreign language or learning R or deciding what toothpaste to buy. The difficulty of the task varies, yet the process remains the same.
selective, as is our exposure to media and marketing communications. Neither is passive, although our awareness is limited because the process is automatic. We do not notice advertising for products we do not buy or use. We walk pass aisles in the grocery store and never see anything on the shelves until a recipe or changing circumstances require that we look. We are lost in conversation at the cocktail party until we hear someone call our name. Source separation requires a considerable amount of learning and active processing of which we are unaware until it is brought to our attention.
Attention is Preference and the First Step in Brand Involvement
To attend is to prefer, though that preference may be negative as in avoidance rather than approach. Attention initiates the purchase process, so this is where we should begin our statistical modeling. We are not asking the consumer for inference, "Which of these contributes most to your purchase choice?" We are merely taking stock or checking inventory. If you wish to have more than a simple checklist, one can inquire about awareness, familiarity and usage for all of these are stored in episodic memory. In a sense, we are measuring attentional intensity with a behaviorally anchored scale. Awareness, familiarity and usage are three hurdles that a brand must surpass in order to achieve success. Attention becomes a continuum measured with milestones as brand involvement progresses from awareness to habit.
Still, the purpose of selective attention is simplification so that much of the market and its features will never past the first hurdle. We recognize and attend to that which is already known and familiar, and in the process, all else become background. Take a moment the next time for are at the supermarket making your usual purchases. As you reach for your brand, look at the surrounding area for all the substitute products that you never noticed because you were focused on one object on the shelf. In order to focus on one product or brand or feature, we must inhibit our response to all the rest. As the number of alternatives grow, attention becomes more scarce.
The long tail illustrates the type of data that needs to be modeled. If you enter "long tail" into a search engine looking for images, you will discover that the phenomena seems to be everywhere as a descriptive model of product purchase, feature usage, search results and more. We need to be careful and keep the model descriptive rather than as claims that the future is selling less of more. For some childlike reason, I personally prefer the following image describing search results with the long tail represented by the dinosaur rather than the more traditional product popularity of the new marketplace.
Unfortunately, this figure conceals the heterogeneity that produces the long tail. In the aggregate we appear to have homogeneity when the tail may be produced by many niche segments seeking distinct sets of products or features. Attention is selective and enables us to ignore most of the market, yet individual consumers attend to their own products and features. Though we may wish to see ourselves as unique individuals, there are always many others with similar needs and interests so that each of us belongs to a community whether we know it or not. Consequently, we start our study of preference by identify consumer types who live in disparate worlds created by selective exposure and attention to different products and features.
Building a Foundation with Brand Involvement Segmentation
Even without intent, our attention is directed by prior experience and selective exposure through our social network and the means by which we learn about and buy products and services. Sparsity is not accidental but shaped by wants and needs within a particular context. For instance, knowing the brand and type of makeup that you buy tells me a great deal about your age and occupation and social status (and perhaps even your sex). Even if we restrict our sample to those who regularly buy makeup, the variety among users, products and brands is sufficient to generate segments who will never buy the same makeup through the same channel.
Why not just ask about benefits they are seeking or the features that interest them and cluster on the ratings or some type of forced choice (e.g., best-worst scaling)? Such questions do not access episodic memory and do not demand that the respondent relive past events. Instead, the responses are relatively complex constructions controlled by conversational rules that govern how and what we say about ourselves when asked by strangers.
As I have tried to outline in two previous posts, consumers do not possess direct knowledge of their purchase processes. Instead, they observe themselves and infer why they seem to like this but not that. Moreover, unless the question asks for recall of a specific occurrence, the answer will reflect the gist of the memory and measure overall affect (e.g., a halo effect). Thus, let us not contaminate our responses by requesting inference but restrict ourselves to concrete questions that can be answered by more direct retrieval. While all remembering is a constructive process, episodic memory require less assembly.
Nonnegative Matrix Factorization (NMF)
Do we rely so much on rating scales because our statistical models cannot deal easily with highly skewed variables where the predominant response is never or not applicable? If so, R provides an interface to nonnegative matrix factorization (NMF), an algorithm that thrives on such sparse data matrices. During the past six weeks my posts have presented the R code needed to perform a NMF and have tried to communicate an intuitive sense of how and why such matrix factorization works in practice. You need only look in the titles for the keywords "matrix factorization" to find additional details in those previous posts.
I will draw an analogy with topic modeling in an attempt to explain this approach. Topic modeling starts with a bag of words used in a collection of documents. The assumptions are that the documents cover different topics and that the words used reflect the topics discussed by each document. In our makeup example, we might present a long checklist of brands and products replacing the bag of words in topic modeling. Then, instead of word counts as our intensity measure, we might ask about familiarity using an ordinal intensity scale (e.g., 0=never heard, 1=heard but not familiar, 2=somewhat familiar but never used, 3=used but not regularly, and 4=use regularly). Just as the word "401K" implies that the document deals with a financial topic, regular purchasing of Clinique Repairwear Foundation from Nordstrom helps me located you within a particular segment of the cosmetics market. Nordstrom is an upscale department store, Clinique is not a mass market brand, and you can probably guess who Repairwear Foundation is for by the name alone.
The output can be summarized with two heatmaps: one indicating the "loadings" of the brands on the latent features so that we can name those hidden constructs and the second clustering individuals based on those latent features.
Like factor analysis one can vary the number of latent variables until an acceptable solution is found. The NMF package offers a number of criteria, but interpretability must take precedence. In general, we want to see a lot of yellow indicating that we have achieved some degree of simple structure. It would be helpful if each latent features was anchored, that is, a few rows or columns with values near one. This is a restatement of the varimax criteria in factor rotation (see Varimax, page 3). The variance of factor loadings is maximized when the distribution is bimodal, and this type of separation is what we are seeking from our NMF.
The dendrogram at the top of the following heatmap displays the results of a hierarchical clustering of the brands based on their association with the latent features. It is a good place to start. I am not going into much detail, but let me name the latent features from Rows 1-6: 1. Direct Sales, 2. Generics, 3. Style, 4. Mass Market, 5. Upscale, and 6. Beauty Tools. The segments were given names that would be accessible to those with no knowledge of the cosmetics market. That is, differentiated retail markets in general tend to have a lower end with generic brands, a mass market in the middle with the largest share, and a group of more upscale brands at the high end. The distribution channel also has its impact with direct sales adding differentiation to the usual separation between supermarkets, drug stores, department and specialty stores.
Now, let us look at the same latent features in the second heatmap below using the dendrogram on the left as our guide. You should recall that the rows are consumers, so that the hierarchical clustering displayed by the dendrogram can be considered a consumer segmentation. As we work our way down from the top, we see the mass market in Column 4 (looking for both reddish blocks and gaps in the dendrogram), direct sales in Column1 (again based on darker color but also glancing at dendrogram), and beauty tools in Column 6. All three of these clusters are shown to be joined by the dendrogram later in the hierarchical process. The upscale in Column 5 form their own cluster according to the dendrogram, as do the generic in Column 2. Finally, Column 3 represents those consumer who are more familiar with artistic brands.
My claim is that segments live in disparate worlds or at least segregated neighborhoods defined in this case study by user imagery (e.g., age and social status) and place of purchase (e.g., direct selling, supermarkets and drug stores, and the more upscale department and specialty store). These segments may use similar vocabulary but probably mean something different. Everyone speaks of product quality and price, however, each segment is applying such terms relative to their own circumstances. The drugstore and the department store shoppers have a different price range in mind when they tells us that price is not an important consideration in their purchase.
Without knowing the segment or the context, we learn little from asking importance ratings or forced tradeoffs such as MaxDiff, which is why the word "foundation" describes the brand involvement segmentation. We now have a basis for the interpretation of all perceptual and importance data collected with questions that have no concrete referent. The resulting segments ought to be analyzed separately for they are different communities speaking their own languages or at least having their own definitions of terms such as cost, quality, innovative, prestige, easy, service and support.
Of course, I have oversimplified to some extent in order for you to see the pattern that can be recovered from the heatmaps. We need to examine the dendrogram more carefully since each individual buys more than one brand as makeup for different occasions (e.g., day and evening, work and social). In fact, NMF is able to get very concrete and analyze the many possible combinations of product, brand, and usage occasion. More importantly, NMF excels with sparse data matrices so do not be concerned if 90% of your data are zeros. The key to probing episodic memory is maintaining high imagery by asking for specifics with details about the occasion, the product and the brand so that the respondent may relive the experience. It may be a long list, but relevance and realism will encourage the respondent to complete a lengthy but otherwise easy task.
Lastly, one does need to accept the default of hierarchical clustering provided in the heatmap function. Some argue that an all-or-none hard clustering based on the highest latent feature weight or mixing coefficient is sufficient, and it may be if the individuals are well separated. However, you have the weights for every respondent so that any clustering method is an alternative. K-means is often suggested as it is the workhorse of clustering for good reason. Of course, the choice of clustering method depends on your prior beliefs concerning the underlying cluster structure, which would require some time to discuss. I will only note that I have experimented with some interesting options, including affinity propagation, and have had some success.
Postscript: It is not necessary to measure brand involvement across its entire range from attention through acquaintance to familiarity and habit. I have been successful with an awareness checklist. Yes, preference can be accessed with a simple recognition task (e.g., presenting a picture from a retail store with all the toothpastes in their actual places on the shelves and asking which ones have they been seen before). Preference is everywhere because affect guides everything we notice, search for, learn about, discuss with others, buy, use, make a habit of, or recommend. All we needed was a statistical model for uncovering the pattern hidden in the data matrix.