Obviously, preference precedes choice because choices are made to maximize preference. That is certainly the way we conduct our marketing research. We generate factorial designs and write descriptions full of information about products and services. Our subjects have no alternative but to use the lists of features that we provide them in our choice sets.
All of this can be summarized in the information integration paradigm shown in the figure above. Features are the large S real-world stimuli that become corresponding small s perceptions with their associated utilities inside our heads. The utility function does the integrating and outputs an internal small r response, the total utility of the feature bundle. The capital R represents the answer on the questionnaire because there is some additional translation needed to provide a rating or to select an alternative from a choice set.
One feature (big S) elicits one preference (little s). If one likes the features, then one will like the product, which is nothing more than a bundle of features. So, if you like strawberry jam, but not too sweet and especially not too expensive, then you will prefer Brand X strawberry preserve if it possesses the optimal combination of your preferred features. And you know this from a blind taste test of all the strawberry preserves on our supermarket shelf? Or, do you buy Brand X strawberry preserve because it was what you ate as a child or was a gift from someone important to you or was recommended by a trusted companion? A good deal of your product knowledge comes from observational learning where we watch and copy the purchase choices made by others. Are your choices determined by the features that you prefer, or are you seduced into purchase and infer feature preference from your choices?
Will Choice Blindness Change Your Mindset?
Let's head to the supermarket for a taste test of different flavors of jams and an experimental assessment of which comes first preference or choice. Better yet, we have this YouTube video from a BBC program on decision making. The details of the study have been published, so that all I need to do is describe the following figure.
The woman with the spoon is the respondent tasting two different flavors of jam, one in the blue jar and the other in the red jar. Here is the videotape associated with this picture. What you should note is that the woman controlling the jars is the experimenter. She opens the red jar first, the taster tastes, and the experimenter turns the red jar over. Unknown to the taster, the red and blue jars are identical. Both are double jars that can be opened by unscrewing either the top or the bottom, and both contain the same two jams. What you need to notice is that the experimenter turns the jar over after the first tasting. Consequently, the top of the red jar now has the same flavor jam as the blue jar. The respondent tastes the blue jar, which given the look of disapproval on the taster's face, we will assume has been rejected. Now, the respondent is asked to taste again the flavor that they preferred, which has been switched and replaced with rejected flavor, and to tell us their reasons for their preference. This video and especially the BBC video illustrate that our tasters have no problem giving reasons why they like the flavor that they just rejected.
In two-thirds of the trials our tasters did not notice that the flavors were switched. They liked the blue on their left or the red on their right, and they were sticking with their choice. When asked to taste again and tell us why, they gave reasons why they preferred the flavor that they originally rejected. If you are questioning if the two jams were similar enough to be mistaken for each other, you can be reassured that in a separate experiment different subjects had no problems discriminating between the two flavors. The first video from the BBC contains clips of respondents explaining their preferences for the flavor that they rejected but were fooled into believing that they preferred. It should be noted that the reasons seem entirely reasonable as if the tasters fell for the trick and took the second tasting as preferred. If this one study does not convince you, you can find many more available from the choice blindness lab.
What are the implications for statistical modeling?
Does the product as feature bundle seem to work in our research because we have removed so much of the actual purchase context and all that the respondents have left is the information we provided in our vignettes and scenarios? For marketing researchers this implies that choice modeling may be of limited usefulness because there are only a few purchase contexts where we can generalize from the laboratory to the marketplace. Otherwise, the preference construction process induced by our hypothetical experiment does not match the preference construction process unfolding in real purchase contexts.
In fact, this is why we have discrete choice modeling. As Daniel McFadden explains in his Nobel Prize acceptance speech, the decision to drive alone, carpool, take the bus or the metro required a new statistical model reflecting the specific preference construction process required to make such a choice. What is true for transportation choices is true for many occasions when consumers decide what to buy. It is not difficult to think of actual purchase situations where we have narrowed our choices down to two or three alternatives that we are actively considering and where we compare these offers by trading off features. Online purchases by sites that enable you to create side-by-side comparisons will satisfy the criteria underlying information integration. We can observe the same phenomena in the store when a shopper takes two containers offer the shelf to compare the ingredients.
The R package bayesm will handle such data and return individual-level estimates for all the utilities, although we still need to be concerned when presenting multiple choice sets to each respondent. Feature importance is determined by the range and frequency with which feature levels are systematically manipulated in an experimental design. Thus, what is not important when presented as a single choice task becomes important when varied over several choice sets. Moreover, there are good reasons to limit the number of features listed for each alternative. We must resist client demands to cram more information into the product description than they would be willing to include on the packaging or in advertising.
Now, what about all the other purchases that we make every day that do not involve feature comparisons? A possible answer is offered by John Hauser in his paper on the role of recognition-based heuristics in marketing science. Recognition is one of many heuristics from the Fast and Frugal Paradigm, which holds that simple processes can yield good results with little cognitive effort when the environment is structured appropriately. For example, larger brands with more customers tend to be more easily recognized by everyone. If the greater market share results from the brand's ability to satisfy more customers, then the market is structured so that my recognition of the brand is an indication that I am likely to be satisfied after buying the brand. This is an example of choice without feature preference.
Sell the Sizzle Not the Steak
What is the basis for competition in the product category? Is it feature comparison (steak type, grade, degree of marbling, maturity, fat content, origin), or is the focus on benefit (happy steak consumption experiences with loved ones)? Often, the market is engaged in a battle to frame the purchase in term of benefits or features with the market leader emphasizing benefits and competitors struggling to gain share by pushing feature or price advantages. Statistical modeling plays on this same battlefield when we collect and analyze choice data by varying feature levels. Conjoint analysis frames the purchase task and encourages the impact of features. Who is surprised to discover strong price effects when prices are varied repeatedly over a sizeable range of values?
Branding, on the other hand, draws our attention away from features and toward benefits. It invites different data collection procedures and alternative statistical models. Treating the brand as just another feature in an experimental design diminishes the role that brand plays in the market. In the actual purchase context brand dominates as recognizable and familiar, as an intentional agent promising benefits, or as I argued in an earlier post, as an affordance. Consequently, when we measure brand perceptions, we consistently find a highly correlated set of responses: a strong first principal component (linear) or a clear low-dimensional manifold (nonlinear). This is the brand schemata, a pattern of strengths and weaknesses, that I have analyzed using item response theory.
Focusing on the brand takes us down an analytic path toward pattern recognition and matrix decomposition. It moves us out of the econometric task view on the CRAN into machine learning. We begin looking at linear and nonlinear dimension reduction as statistical models of how the brand holds it all together. Recommender systems acquire a special appeal. Moreover, experimental designs begin to seem too obstructive, and we are drawn toward more naturalistic data collection. If consumers rely on brand information to assist them in their decision journey, then we must be careful not to remove those supports and create a self-fulfilling prophecy where feature preferences dominate because feature comparisons are all that is available.
Engaging respondents with interesting measurement tasks. Involving clients with visualizations of actionable findings. Challenging marketing research to provide a theoretical basis for its measurement procedures.
Tuesday, May 27, 2014
Monday, May 19, 2014
The Purchase Funnel Survives the Consumer Decision Journey
The journey metaphor is almost irresistible. All one needs is a starting point and a finish line, plus some notion of progression. Thus, life is a journey, and so is love. Why not apply the metaphor to your next purchase? McKinsey & Company takes such a metaphoric leap in their very popular paper on the consumer decision journey. They start with a purchase trigger and move through four primary phases that they see as four potential battlegrounds: initial consideration, active evaluation, closure (purchase), and post-purchase. If you are thinking that this seems to be a variation of awareness-interest-desire-action (AIDA) or the purchase funnel, as it is also known, you would not be wrong. In fact, McKinsey & Company begin with the purchase funnel but reject its claims that consumers move sequentially through invariant stages with a narrowing of the number of brands considered until only one victor remains.
According to the consumer decision journey, the pathway is no longer linear or invariant, nor is the momentum consistently forward. The notion of progression remains, but progress can be piecemeal with one step forward followed by two steps back. The internet makes a lot more information available and transfers control of marketing activities from the brand to the consumer. Consumers are in command and can seek information as they wish from any source and in any order.
Yet, the journey metaphor maintains the goal of narrowing options and selecting what is best for each consumer. The prize for the brand remains loyalty as indicated by continued purchase and advocacy through recommendation and positive word of mouth. Brand awareness still matters, as does customer satisfaction and support services. Competitive pricing and new offerings have not lost their ability to steal away customers. The outcome that we seek to maximize has not changed whether you call it brand equity, attachment, involvement, engagement or loyalty. Moreover, that outcome retains its status as a latent variable that can be observed only through its effects on consumer responses to the brand, such as awareness, interest, desire and action. In fact, as Joakim Nilsson shows in the diagram below, this latent dimension has an impact after the purchase with social media enlarging the funnel and amplifying the reach of each satisfied or dissatisfied customer. By beginning and ending the process with impressions, one gets the sense that the consumer decision journey evolves over time as more customers join and usage diversifies.
So why is the purchase funnel dead? Are consumers considering products with which they have no awareness? Are they purchasing products without considering them first? If a sizable percentage of your customers reported that they have recommended your brand to others, cannot we conclude that your brand has made it to the end of the decision-making process with a more or less stable customer base? On the other hand, would you not agree that your brand was stuck in the starting blocks if most consumers had a difficult time naming your brand when asked about the product category?
When one wants to intervene in the purchase process to improve its brand's standing, it is important to know the different paths that consumers are taking to learn about the brand. However, improving your brand's status means that you also need to know how well the brand is doing, which is what the purchase funnel provides. Brands that fail to achieve awareness among prospective customers will not succeed, so we measure brand achievement by asking customers about their brand awareness. Brand awareness is good, but consideration is better, and purchase is best. The percentage of prospects lost as we move from awareness to consideration to purchase would seem to be a good indicator of brand value. We assess the brand by measuring consumers. Therefore, the purchase funnel is not dead, at least not as an instrument for brand appraisal.
Thinking Like an Item Response Theorist
An item response theorist sees people lined up, single file, one after another with each person possessing more of some property than the person before but less of that property than the next person in line. "Alright, everyone in line with the shortest person in front and the tallest at the end." But what if the property were not visible, such as one's location along the consumer decision journey? How would the item response theorist know which consumer was farther along and which had just started the trip? Would they not write items for an achievement test?
We already have some idea of what to measure based on the battlegrounds identified in the McKinsey & Company article. Successful brands are those that come to mind when making a purchase and are able to maintain a loyal customer base. The item response theorist would place a "sensor" at each of these locations: one item measuring awareness and another item measuring recommendation. If a brand activated the awareness sensor but not the recommendation sensor, we would know where the brand fell along the path. In fact, we would want to put a "sensor" at each gate leading from one stage to the next (e.g., attention, positive image, consideration, purchase, and recommendation). These stages are achievement milestones along the consumer decision journey.
Brand equity is found in the value that consumers see in the brand. Thus, the brand appraisal process proceeds by asking for consumer ratings. The items assess the final achievement for the brand and not the consumer learning process. We measure if the brand is considered, but not whether that consideration results from an advertisement or a showroom visit or an online review or a YouTube video. The first step is brand appraisal. The next step is tracing the journey or at least assessing the impact of touchpoints on brand standing.
Item response theory (IRT) will guide us in the brand appraisal process. We begin looking for an underlying continuum, a latent variable that will account for our observations. Brands attract consumers. The strength of this attraction is what we wish to measure. Small levels of attraction appear initially in awareness by getting the brand noticed. Does the consumer follow-up and get acquainted? Stronger attraction places the brand into the consideration set, and the strongest brands pull us even closer to them so that we continually purchase more and more frequently. At the highest levels of attraction, we become advocates and begin spreading the word. The black hole pushes the analogy too far, but the image ought to make the point memorable.
On the other hand, if you would like a more formal justification for this hierarchy, you can find it in the literature on consumer-based brand equity (the CBBE model). Simply replace the purchase funnel with the brand resonance pyramid since a funnel is nothing more than an inverted pyramid.
I have provided the code in previous posts showing how the r package ltm will perform the analysis for checklists or rating scales. To learn about item response theory, it is best to start with binary items (e.g., yes/no or present/absent). This is because ratings follow the same logic as checklists by treating the rating as if it were an ordered sequence of checklists. While a binary item divides brand attraction into two parts, a rating scale partitions the same continuum into the number of values on the scale. In both cases an item response model will give us some number of cutpoints for each observed indicator of the underlying latent variable, either one cutpoint for a checklist or the number of scale values minus one ordered cutpoints for rating scales. The same data generating process is responsible regardless of the type of scale.
The brand attracts consumers with milestones indicating the strength of that attraction. The item response model calculates the position of these transition points from inattention to interest to consideration to purchase to satisfaction to retention to advocacy. Then, it positions each respondent along that same continuum showing the strength of the brand's attraction for that respondent. The distribution of consumers for each brand is a measure of the brand's total attraction yielding not only a central tendency but also showing the nature and shape of the heterogeneity among respondents.
It's Alive!
The purchase funnel has survived as a reliable tool for measuring brand strength. Yet, I have offered no guidance for how to assess the impact of the consumer decision journey. The number of predictors explodes when the consumer is in charge. There are hundreds of possible touchpoints where prospective customers can learn about the brand. This figure is but a start.
Yet, each individual is likely to have only limited contact with only a subset of all possible touchpoints, yielding a high-dimensional predictor space that is quite sparse (not unlike what we see with recommendation systems like Netflix). Moreover, if we take the journey concept seriously, then touchpoint effects depend on where we are in purchase process. It's a continuum in the sense that first comes awareness and then comes consideration followed by purchase, but what gets a prospect from awareness to consideration is not the same as what gets them from consideration to purchase. All this will need to wait for a later post.
According to the consumer decision journey, the pathway is no longer linear or invariant, nor is the momentum consistently forward. The notion of progression remains, but progress can be piecemeal with one step forward followed by two steps back. The internet makes a lot more information available and transfers control of marketing activities from the brand to the consumer. Consumers are in command and can seek information as they wish from any source and in any order.
Yet, the journey metaphor maintains the goal of narrowing options and selecting what is best for each consumer. The prize for the brand remains loyalty as indicated by continued purchase and advocacy through recommendation and positive word of mouth. Brand awareness still matters, as does customer satisfaction and support services. Competitive pricing and new offerings have not lost their ability to steal away customers. The outcome that we seek to maximize has not changed whether you call it brand equity, attachment, involvement, engagement or loyalty. Moreover, that outcome retains its status as a latent variable that can be observed only through its effects on consumer responses to the brand, such as awareness, interest, desire and action. In fact, as Joakim Nilsson shows in the diagram below, this latent dimension has an impact after the purchase with social media enlarging the funnel and amplifying the reach of each satisfied or dissatisfied customer. By beginning and ending the process with impressions, one gets the sense that the consumer decision journey evolves over time as more customers join and usage diversifies.
So why is the purchase funnel dead? Are consumers considering products with which they have no awareness? Are they purchasing products without considering them first? If a sizable percentage of your customers reported that they have recommended your brand to others, cannot we conclude that your brand has made it to the end of the decision-making process with a more or less stable customer base? On the other hand, would you not agree that your brand was stuck in the starting blocks if most consumers had a difficult time naming your brand when asked about the product category?
When one wants to intervene in the purchase process to improve its brand's standing, it is important to know the different paths that consumers are taking to learn about the brand. However, improving your brand's status means that you also need to know how well the brand is doing, which is what the purchase funnel provides. Brands that fail to achieve awareness among prospective customers will not succeed, so we measure brand achievement by asking customers about their brand awareness. Brand awareness is good, but consideration is better, and purchase is best. The percentage of prospects lost as we move from awareness to consideration to purchase would seem to be a good indicator of brand value. We assess the brand by measuring consumers. Therefore, the purchase funnel is not dead, at least not as an instrument for brand appraisal.
Thinking Like an Item Response Theorist
An item response theorist sees people lined up, single file, one after another with each person possessing more of some property than the person before but less of that property than the next person in line. "Alright, everyone in line with the shortest person in front and the tallest at the end." But what if the property were not visible, such as one's location along the consumer decision journey? How would the item response theorist know which consumer was farther along and which had just started the trip? Would they not write items for an achievement test?
We already have some idea of what to measure based on the battlegrounds identified in the McKinsey & Company article. Successful brands are those that come to mind when making a purchase and are able to maintain a loyal customer base. The item response theorist would place a "sensor" at each of these locations: one item measuring awareness and another item measuring recommendation. If a brand activated the awareness sensor but not the recommendation sensor, we would know where the brand fell along the path. In fact, we would want to put a "sensor" at each gate leading from one stage to the next (e.g., attention, positive image, consideration, purchase, and recommendation). These stages are achievement milestones along the consumer decision journey.
Brand equity is found in the value that consumers see in the brand. Thus, the brand appraisal process proceeds by asking for consumer ratings. The items assess the final achievement for the brand and not the consumer learning process. We measure if the brand is considered, but not whether that consideration results from an advertisement or a showroom visit or an online review or a YouTube video. The first step is brand appraisal. The next step is tracing the journey or at least assessing the impact of touchpoints on brand standing.
Item response theory (IRT) will guide us in the brand appraisal process. We begin looking for an underlying continuum, a latent variable that will account for our observations. Brands attract consumers. The strength of this attraction is what we wish to measure. Small levels of attraction appear initially in awareness by getting the brand noticed. Does the consumer follow-up and get acquainted? Stronger attraction places the brand into the consideration set, and the strongest brands pull us even closer to them so that we continually purchase more and more frequently. At the highest levels of attraction, we become advocates and begin spreading the word. The black hole pushes the analogy too far, but the image ought to make the point memorable.
On the other hand, if you would like a more formal justification for this hierarchy, you can find it in the literature on consumer-based brand equity (the CBBE model). Simply replace the purchase funnel with the brand resonance pyramid since a funnel is nothing more than an inverted pyramid.
I have provided the code in previous posts showing how the r package ltm will perform the analysis for checklists or rating scales. To learn about item response theory, it is best to start with binary items (e.g., yes/no or present/absent). This is because ratings follow the same logic as checklists by treating the rating as if it were an ordered sequence of checklists. While a binary item divides brand attraction into two parts, a rating scale partitions the same continuum into the number of values on the scale. In both cases an item response model will give us some number of cutpoints for each observed indicator of the underlying latent variable, either one cutpoint for a checklist or the number of scale values minus one ordered cutpoints for rating scales. The same data generating process is responsible regardless of the type of scale.
The brand attracts consumers with milestones indicating the strength of that attraction. The item response model calculates the position of these transition points from inattention to interest to consideration to purchase to satisfaction to retention to advocacy. Then, it positions each respondent along that same continuum showing the strength of the brand's attraction for that respondent. The distribution of consumers for each brand is a measure of the brand's total attraction yielding not only a central tendency but also showing the nature and shape of the heterogeneity among respondents.
It's Alive!
The purchase funnel has survived as a reliable tool for measuring brand strength. Yet, I have offered no guidance for how to assess the impact of the consumer decision journey. The number of predictors explodes when the consumer is in charge. There are hundreds of possible touchpoints where prospective customers can learn about the brand. This figure is but a start.
Yet, each individual is likely to have only limited contact with only a subset of all possible touchpoints, yielding a high-dimensional predictor space that is quite sparse (not unlike what we see with recommendation systems like Netflix). Moreover, if we take the journey concept seriously, then touchpoint effects depend on where we are in purchase process. It's a continuum in the sense that first comes awareness and then comes consideration followed by purchase, but what gets a prospect from awareness to consideration is not the same as what gets them from consideration to purchase. All this will need to wait for a later post.
Thursday, May 15, 2014
The Mind Is Flat! So Stop Overfitting Choice Models
Conjoint analysis and choice modeling rely on repeated observations from the same individuals across many different scenarios where the features have been systematically manipulated in order to estimate the impact of varying each feature. We believe that what we are measuring has substance and existence independent of the measurement process. Nick Chater, my source for this eerie figure depicting the nature of self-perception, lays to rest this "illusion of depth" in a short video called "The Mind is Flat." We do not possess the cognitive machinery demanded by utility theory. When we "make up our mind," we are literally making it up. Features do not possess value independent of the decision context. Features acquire value as we attempt to choose one option from many alternatives. Consequently, whatever consistency we observe results from reoccurring situations that constrain preference construction and not because of some seemingly endless store of utilities buried deep in our memories.
Although it is convenient for the product manager to think of their offerings as bundles of features and services, the consumer finds such a representation to be overwhelming. As a result, the choice modeler is forced to limit how much each respondent is shown. The conflict in choice modeling is between the product manager who wants to add more and more features to the bundle and the analyst who needs to reduce task complexity so that respondents will participate in the research. At times, fractional factorial designs fail to remove enough choice sets, so we turn to optimal configurations with acceptable confounding (see design of experiments in R). Still, even our reduced number of choice scenarios may be too many for any one individual, so we show only a few scenarios to each respondent, make a few restrictive assumptions about homogeneity (e.g., introduce hyperparameters specifying the relationships between individual- and group-level parameters), and then proceed with hierarchical Bayes to compute separate estimates for every person in the study.
We justify such data collection by arguing that it is an "as-if" measurement model. Of course, people cannot retain in memory the utilities associated with every possible feature or service level. Clearly, no one is capable of the mental arithmetic necessary to do the computation in their heads. Yet, we rationalize the introduction of such unrealistic assumption claiming that they allow us to learn what drives choice and decision making. Thus, by asking a consumer to decide among feature bundles using only the information provided by the experimenter, one can fit a model and estimate parameters that will predict behavior in this specific setting. But our findings will not generalize to the marketplace because we are overfitting. The estimated utilities work only for this one particular task. What we have learned from behavioral economics over the last 30 years is that what is valued depends on the details of the decision context.
For those of you wishing a more complete discussion of these issues, I will refer you to my previous posts on Context Matters When Modeling Human Judgment and Choice, Got Data from People?, and Incorporating Preference Construction into the Choice Modeling Process.
Ecological Data Collection and Latent Variable Modeling
I am not suggesting that we abandon choice modeling or hierarchical Bayes estimation. A well-designed choice study that carefully mimics the actual purchase context can reveal a good deal about the impact of varying a small number of features and services. However, if our concern is learning what will happen in the marketplace when the product is sold, we ought to be cautious. Order and context effects will introduce noise and limit generalizability. Multinomial logistic models, such as those in the bayesm R package, teach us that feature importance depends on the range of feature levels and the configuration of all the other features varied across the choice scenarios. We are no longer in the linear world of rating-based conjoint via multiple regression with its pie charts indicating the proportional contribution of each feature.
A good rule of thumb might be to include no more features than the number that would be shown on the product package or in a print ad. Our client's desire to estimate every possible aspect will only introduce noise and result in overfitting. On the other hand, simply restricting the number of features will not eliminate order effects. Whenever we present more than one choice scenario, we need to question whether our experimental arrangements have induced selection strategies that would not be present in the marketplace. Does varying price focus attention on price? Does the inclusion of one preferred feature level create a contrast effect and lower the appeal of the other feature levels? These effects are what we mean when we say the preference is not retrieved from a stable store kept in long-term memory.
It is unnecessary for consumers to store utilities because they can generate them on the fly given the choice task. "What do you feel like eating?" becomes a much easier question when you have a menu in your hands. We use the choice structure to simplify our task. I read down the menu imaging how each item might taste and select the most appealing one. I switch products or providers by comparing what I am using with the new offer. The important features are the ones that differentiate the two alternatives. If the task is difficult or I am not sure, then I keep what I have and preserve the status quo. In both cases context comes to our rescue.
The flexibility that characterizes human judgment and decision making flows from our willingness to adapt to the situation. That willingness, however, is not a free choice. We are not capable of storing, retrieving and integrating feature level utilities. You might remember the telephone game where one person writes down a message and whispers it to a second person, who whispers the message they heard to a third, and so on. Everyone laughs at the end of the telephone line when the last person repeats what they think they had heard and it is compared to what was written. Such is the nature of human memory.
We can avoid overfitting by reducing error and simplifying our statistical models. These are the two goals of statistical learning theory. We keep the choice task realistic and avoid order effects. Occam's razor will trim our latent variables down to a single continuous dimension or a handful of latent classes. For example, the offerings within a product category are structured along a continuum from basic to premium. The consumer learns what is available and decides where they personally fall along this same continuum. Do they get everything they want and need from the lower end, or is it worth it to them to pay more for the extras? The consumer exploits the structure of the purchase context in order to simplify their purchase decision. If our choice modeling removes those supports, it no longer reflects the marketplace.
Choice remains complex, but now the complexity lies in the recognition phase. That is, choice begins with problem recognition (e.g., I need to retrieve email away from my desktop or I want to watch movies on the go or both at the same time). Framing of the choice problem determines the ad hoc or goal-derived category, which in turn shapes the consideration set (e.g., smartphones only, tablets only, laptops only, or some combination of the three product categories) and determines the evaluative criteria to be used in this particular situation. This is why I called this section ecological data collection. It is the approach that Donald Norman promotes when designing products for people. For the choice modeler, it mean a shift in our statistical modeling from estimating feature-level utilities to pattern recognition and unsupervised learning.
Friday, May 9, 2014
Customer Satisfaction and Loyalty: Structural Equation Model or One-Dimensional Dissonance
Causal thinking is seductive. Product experience comes first, then feelings of satisfaction, and finally intentions to continue as a customer. Although customer satisfaction and loyalty data tend to be collected all at one time within the same questionnaire, who does not see the work of the invisible hand of causation? We call product and service ratings "drivers" of satisfaction because it is so easy to imagine experience impacting affect and intention. Thus, no one will be surprised to see the R package semPLS (Structural Equation Modeling using Partial Least Squares) using the customer satisfaction model as its example.
However, what if we were to ignore the causal model and look only at the data? The mobi dataset from the semPLS package is a data matrix with 250 rows containing ratings on a scale from 1 to 10 across 24 items measuring mobile phone customer satisfaction. All you need to know in order to run the SEM and interpret the results can be found in the above link to the well-written Journal of Statistical Software article. I, on the contrary, will pretend that I have no causal model and treat the data set as any other battery of ratings. Let us start by exploring the intercorrelations among these variables.
A principal component analysis for this 250 x 24 data matrix yields a first principal component accounting for 39.5% of the total variation and a second principal component that is only one-sixth the size of the first with 6.6%. The size of the first principal component tells us a great deal about the amount of redundancy in the data matrix. The biplot will provide the visualization.
A biplot is a graphic display showing the 24 variables as vectors and the 250 observations as points projected onto the two-dimensional principal component space. That is, we can create a two-dimensional map with the first principal component as the x-axis and the second principal component as the y-axis. We calculate the two principal component scores for every respondent and use points to represent each row. We know the correlation between each variable and the principal components (i.e., the factor loadings), and we can use that knowledge to project the variables as lines or vectors onto the principal component space. As you might recall, the higher the correlation between two variables, the smaller the angle between the lines representing these variables.
This plot was created using the BiplotGUI R package. It shows the distribution of the 250 rows across the first two principal components as red boxes. The lines are the 24 ratings with tick marks indicating the scores from 1 to 10. Only abbreviations are provided, but you can find the actual questions in the semPLS documentation. With the exception of one variable, all the ratings point in the same direction toward the right. Given the small angles between all these lines, one would expect the correlation matrix to be filled with positive correlations. The "L" in the CUSL# indicates loyalty. The second loyalty measure (would you switch providers for price discount) seems to go its own way.
You may have noticed an arc of non-red circles. I picked one of the points toward the right side to show the arc of predicted values for one respondent with a high first principal component score. The same way that you would drop a perpendicular from a point to the x- and y-axes in order to determine the point's location on those dimensions, one can read out the predicted rating by the perpendicular projection of a point onto that variable's line. These are the dotted lines shown by BiplotGUI. One can clearly see that a high first principal component score results in uniformly high ratings across all 24 ratings. This predicted rating would had been the actual rating had the first two principal components accounted for 100% of the variation.
The "GUI" in BiplotGUI indicates that the function opens its own window that allows you to interact with the biplot. This interaction will enable you to experience the relationship between a point's position in the two-dimensional principal component space and its scores on the 24 variables. The "arc" of varying colored circle tells us that those respondents toward the positive end of the first dimensions tended to rate everything higher.
The above biplot is identical to the first biplot except for the location of the arc, which is moved toward the mean. It illustrates the effect of a decreasing first principal component score. The first two principal components account for less than half of the total variation (39.5% + 6.6%), so there will be some discrepancy between our predicted and the actual ratings. Although it is not shown here, the BiplotGUI window provides a frame where you can see the actual and predicted ratings as you move the cursor across the biplot.
Let me show one more arc from the other side of the first principal component. Now the arc is located toward the lower end of the x-axis and shows how these respondents give uniformly lower ratings.
I would encourage you to install BiplotGUI and semPLS, copy the few lines of R code at the end of this post, and interact with the biplot by moving your cursor across the space. As I note in the R code, you will need to active "Predict points closest to cursor position" by right clicking on the biplot display. By selecting the Prediction tab in the adjacent frame, you will be able to see the actual and predicted ratings for each respondent. Seeing the consistency with which all the ratings move together as different respondents are selected might just change your mindset.
Do I really need all these separate latent constructs with their causal connections? Does Occam's Razor shred the structural equation model? True, the network display of a causal model, as shown below, does provide an organization for the 24 ratings. Conceptually, we can distinguish between performance, satisfaction, and loyalty. Obviously, over time, experience comes first, followed by feelings of satisfaction and then loyalty intentions. The directed graph makes causation a compelling inference.
But the ratings are all that I have, and those ratings were all collected on a single occasion. A longitudinal study may be able to separate performance, satisfaction, and loyalty. By the time I make by measurements, the directed arrows have become feedback loops. Thus, my evaluation of my brand's performance depends on whether or not a better alternative is available or how much effort is needed to switch providers. Sometimes it is convenient to believe that all companies are the same. On the other hand, once I decide to switch, all my ratings fall, that is, until I investigate further and discover problems with my "new" provider. I am not arguing that all the ratings will be identical. Every provider will have strengths and weaknesses. But mean-level item differences are not separate latent variables. As long as the ratings move together as a cohesive unit, we have one latent dimension.
To be clear, I am not claiming that unresolved problems will not impact satisfaction and retention. However, the key word is "unresolved" and the frequency of such problems tend to be low among current customers. The unhappy churn unless there are barriers to exit. Our data matrix is a mixture of respondents: a few "hostages" waiting to be freed, some "shoppers" looking for a better deal, a plurality of "inerts" who prefer not to think about it, and a brand-specific percentage of "advocates" making recommendations and active on social media. I have ordered these four components of our mixture model as they might appear along the first principal component. They are not well-separated, which is why a dimensional representation works so well (see this previous post for a more complete discussion of this issue).
In the end, perhaps it is cognitive dissonance, and not cause-and-effect, that binds the ratings together? Attitudes serve behavior. Do I switch or continue using or simply not think about it at all? Do I become an advocate for the brand? Each of these alternative courses of action result from a complex interplay of product experience with each customer's usage situation and personal needs. We cannot simply assume that whether or not one recommends the brand to others depends solely on the brand and not because there are personal gains and losses associated with the recommendation process that have nothing to do with the brand. Complex systems of thought and action can be described but not by causal models.
R code to run analysis in this post:
Created by Pretty R at inside-R.org
However, what if we were to ignore the causal model and look only at the data? The mobi dataset from the semPLS package is a data matrix with 250 rows containing ratings on a scale from 1 to 10 across 24 items measuring mobile phone customer satisfaction. All you need to know in order to run the SEM and interpret the results can be found in the above link to the well-written Journal of Statistical Software article. I, on the contrary, will pretend that I have no causal model and treat the data set as any other battery of ratings. Let us start by exploring the intercorrelations among these variables.
A principal component analysis for this 250 x 24 data matrix yields a first principal component accounting for 39.5% of the total variation and a second principal component that is only one-sixth the size of the first with 6.6%. The size of the first principal component tells us a great deal about the amount of redundancy in the data matrix. The biplot will provide the visualization.
A biplot is a graphic display showing the 24 variables as vectors and the 250 observations as points projected onto the two-dimensional principal component space. That is, we can create a two-dimensional map with the first principal component as the x-axis and the second principal component as the y-axis. We calculate the two principal component scores for every respondent and use points to represent each row. We know the correlation between each variable and the principal components (i.e., the factor loadings), and we can use that knowledge to project the variables as lines or vectors onto the principal component space. As you might recall, the higher the correlation between two variables, the smaller the angle between the lines representing these variables.
This plot was created using the BiplotGUI R package. It shows the distribution of the 250 rows across the first two principal components as red boxes. The lines are the 24 ratings with tick marks indicating the scores from 1 to 10. Only abbreviations are provided, but you can find the actual questions in the semPLS documentation. With the exception of one variable, all the ratings point in the same direction toward the right. Given the small angles between all these lines, one would expect the correlation matrix to be filled with positive correlations. The "L" in the CUSL# indicates loyalty. The second loyalty measure (would you switch providers for price discount) seems to go its own way.
You may have noticed an arc of non-red circles. I picked one of the points toward the right side to show the arc of predicted values for one respondent with a high first principal component score. The same way that you would drop a perpendicular from a point to the x- and y-axes in order to determine the point's location on those dimensions, one can read out the predicted rating by the perpendicular projection of a point onto that variable's line. These are the dotted lines shown by BiplotGUI. One can clearly see that a high first principal component score results in uniformly high ratings across all 24 ratings. This predicted rating would had been the actual rating had the first two principal components accounted for 100% of the variation.
The "GUI" in BiplotGUI indicates that the function opens its own window that allows you to interact with the biplot. This interaction will enable you to experience the relationship between a point's position in the two-dimensional principal component space and its scores on the 24 variables. The "arc" of varying colored circle tells us that those respondents toward the positive end of the first dimensions tended to rate everything higher.
The above biplot is identical to the first biplot except for the location of the arc, which is moved toward the mean. It illustrates the effect of a decreasing first principal component score. The first two principal components account for less than half of the total variation (39.5% + 6.6%), so there will be some discrepancy between our predicted and the actual ratings. Although it is not shown here, the BiplotGUI window provides a frame where you can see the actual and predicted ratings as you move the cursor across the biplot.
Let me show one more arc from the other side of the first principal component. Now the arc is located toward the lower end of the x-axis and shows how these respondents give uniformly lower ratings.
I would encourage you to install BiplotGUI and semPLS, copy the few lines of R code at the end of this post, and interact with the biplot by moving your cursor across the space. As I note in the R code, you will need to active "Predict points closest to cursor position" by right clicking on the biplot display. By selecting the Prediction tab in the adjacent frame, you will be able to see the actual and predicted ratings for each respondent. Seeing the consistency with which all the ratings move together as different respondents are selected might just change your mindset.
Do I really need all these separate latent constructs with their causal connections? Does Occam's Razor shred the structural equation model? True, the network display of a causal model, as shown below, does provide an organization for the 24 ratings. Conceptually, we can distinguish between performance, satisfaction, and loyalty. Obviously, over time, experience comes first, followed by feelings of satisfaction and then loyalty intentions. The directed graph makes causation a compelling inference.
But the ratings are all that I have, and those ratings were all collected on a single occasion. A longitudinal study may be able to separate performance, satisfaction, and loyalty. By the time I make by measurements, the directed arrows have become feedback loops. Thus, my evaluation of my brand's performance depends on whether or not a better alternative is available or how much effort is needed to switch providers. Sometimes it is convenient to believe that all companies are the same. On the other hand, once I decide to switch, all my ratings fall, that is, until I investigate further and discover problems with my "new" provider. I am not arguing that all the ratings will be identical. Every provider will have strengths and weaknesses. But mean-level item differences are not separate latent variables. As long as the ratings move together as a cohesive unit, we have one latent dimension.
To be clear, I am not claiming that unresolved problems will not impact satisfaction and retention. However, the key word is "unresolved" and the frequency of such problems tend to be low among current customers. The unhappy churn unless there are barriers to exit. Our data matrix is a mixture of respondents: a few "hostages" waiting to be freed, some "shoppers" looking for a better deal, a plurality of "inerts" who prefer not to think about it, and a brand-specific percentage of "advocates" making recommendations and active on social media. I have ordered these four components of our mixture model as they might appear along the first principal component. They are not well-separated, which is why a dimensional representation works so well (see this previous post for a more complete discussion of this issue).
In the end, perhaps it is cognitive dissonance, and not cause-and-effect, that binds the ratings together? Attitudes serve behavior. Do I switch or continue using or simply not think about it at all? Do I become an advocate for the brand? Each of these alternative courses of action result from a complex interplay of product experience with each customer's usage situation and personal needs. We cannot simply assume that whether or not one recommends the brand to others depends solely on the brand and not because there are personal gains and losses associated with the recommendation process that have nothing to do with the brand. Complex systems of thought and action can be described but not by causal models.
R code to run analysis in this post:
#load semPLS and datasets library(semPLS) data(mobi) data(ECSImobi) #runs PLS SEM ecsi <- sempls(model=ECSImobi, data=mobi, E="C") ecsi #calculate percent variation (prcomp(scale(mobi))$sdev^2)/24 #load and open BiPlotGUI library(BiplotGUI) Biplots(mobi, PointLabels=NULL) #right click on biplot #select "Predict points closest to cursor positions" #Select Prediction Tab in top-right frame
Subscribe to:
Posts (Atom)