Wednesday, November 5, 2014

Net Promoter Mixture Modeling: Can a Single Likelihood Rating Reveal Customer Segments?

Net Promoter believes that customers come in one of three forms: promoters (happy yellows), passives (neutral grays), or detractors (angry reds). Cluster identification is relatively easy for all you need to do is ask the "ultimate question" concerning likelihood to recommend. As the figure indicates, the top two boxes are promoters, the bottom six boxes are distractors, and the net promoter score (NPS) is the difference in those aggregate percentages. Although there is considerable controversy surrounding this scoring system, I wish to focus on the density estimation question of whether I can look at the distribution of ratings across this 11-point scale and determine whether or not there are clusters (e.g., a bimodal distribution might suggest the mixing of two components).

The mclust R package (Section 6 Density Estimation) illustrates this type of analysis using eruption data from the Old Faithful geyser. The histogram on the left reflects our situation using only the waiting time between eruptions. The distribution appears to be bimodal suggesting that there might be two different sources generating the eruptions. The two-dimensional plot on the right, displaying both waiting times between eruptions and length of the eruptions, was included to show how easily we can generalize to higher dimensional spaces and how an additional dimension can increase separation. It appears that some eruptions occur more quickly and last for a shorter time than those eruptions with greater latencies and longer durations. You can find a more detailed introduction to mclust and all the R code needed to produce such plots in an earlier post.

Evidence for Clusters in the Distribution of Recommendation Scores?

The observed distribution for likelihood to recommend tends to follow a familiar pattern with the entire curve moved up or down the scale depending on the provider's overall performance. High recommendation ratings are not unusual with the best providers having as many as half of their customers giving the highest possible marks. At the other end, we often find a much smaller group who are quite unhappy. Another bump in the distribution appears in the middle with a larger than otherwise expected proportion using the midpoint. In general, the highest peak is toward the upper end with progressively smaller summits as one moves down the scale.

First, we will look at the density plot for a group of more boutique providers with higher overall recommendation scores. Although the data are from but one proprietary study, one consistently finds similar curves for smaller brands appealing to niche segments.  Each of the 11 bars in the histogram represents a different rating from 0 to 10. Over 40% of the customers tell us that they are extremely likely to recommend. We also see a slight "bump" at 5 and another at the very bottom for a score of 0. Net Promoter claims that this distribution of recommendation ratings was generated by three different customer clusters as shown in the first figure and defined by the intervals 9-10, 7-8 and 0-6.

The mclust R package allows us to test that hypothesis, returning the best BIC for 6 components with equal variances spread out at almost two-point intervals. That is, when asked what mixture of normal distributions might have generated this distribution of recommendation likelihood, mclust identified six components spaced at approximately equal intervals. We can specify the number of clusters by inserting G=3 into the densityMclust function. The three normal density curves representing the three clusters from mclust have been added to the histogram: 78% with mean 9.3, 15% in the middle concentrated about 5.8, and 7% averaging 1.4.

Next, we will repeat the previous analysis with a different set of providers and their customers. As shown below, customers from larger mass market providers followed a similar pattern but with more spread at the top and more use of the middle and bottom boxes. Once again mclust returns 6 components, and when forced with G=3, reproduces the three normal density curves: 61% centered at 8.7, 25% close to the midpoint at 5.2, and 14% at the bottom with a mean of 0.9. The pattern seems to be the same but with lower ratings. At no time do we see any empirical support for the Net Promoter cutpoints.

Finally, we can compare the distributions for recommendation and overall satisfaction for our mass market. A correlation of 0.89 between the two suggests that we are measuring the same construct, except that satisfaction gets higher ratings because customers are more reluctant to recommend than they are to express satisfaction. Generally, recommendation is a more difficult achievement for the provider than satisfaction, or said differently, satisfaction is necessary but not sufficient for recommendation (e.g., must be better than competitors to recommend or must be good for everyone and not just you).

Measuring a Single Avoidance-Approach Evaluative Dimension

Customers probably do come from a mixture of clusters. The hostage is looking for an escape route, the advocate enjoys singing praises, and the habitual is oblivious. If we wished to identify such loyalty segments, we would need to ask about actual behaviors in real situations for this is what differentiates the clusters (e.g., a battery of who, where and when did you recommend rather than some abstract propensity to recommend measured out of any context).

We capture none of the underlying segmentation with ratings scales that measure a single evaluative dimension. Recommendation is not the ultimate question but just another indicator of one's orientation toward the brand, specifically a index of avoidance-approach or thumbs-up and thumbs-down. What separates satisfaction, retention and recommendation is their difficulty level for all three measure the same latent variable, which is why they are so highly correlated.

The peaks and valleys in the satisfaction and recommendation distribution are not indicators of customer type but scale usage. Customers make their point when they use the extremes of a rating scale. The dissatisfied let you know by uniformly giving the lowest possible score. The pleased also want to tell you to keep up the good work. Those who are not sure or don't care overuse the midpoint.

R code from mclust to create these plots:
promoters <- densityMclust(recommend, G=3)
summary(promoters, parameters = TRUE)

Created by Pretty R at

1 comment:

  1. I think it is now more widely accepted that NPS does not really tell you much about actual recommendations. Also connecting the NPS directly to KPI's such is now rare without any other inputs (behavioral data from CRM databases like churn scores). What interests companies like mine and our clients is when to measure the NPS and how to model the relationship between the individual NPS we recieve from a survey and the customer interactions prior to the measurement.

    Every customer interaction at a touchpoint is a part of a small mission the customer is on. If NPS is just another indicator of "brand health" or overall perception of the company, it should not be very sensitive to these individual interactions. My loyalty is not so easily undermined by a grumpy bank clerk or ocassional problem with billing. Some of our clients insist on measuring the NPS after every one of these interactions (visit of a branch, call center contact) and they claim this is how you evaluate it.. The question sounds like "Based on your latest experience with our bank, would you recommend our bank to your friends ..." - I argue NPS is not great at evaluating single experiences and the number you get tells you nothing much about the interaction, but rather sums up the hunderds of past impressions and experiences..

    I would love to hear other people's thoughts on this.. Do you think measuring NPS is wise at the interaction level, the customer task/mission/journey level or rather quarterly in your general brand relationship survey...