Whatever happened to those
evaluations that your airline asked you to complete after taking a flight? They
ask you for a number of ratings about buying your ticket, attributes of the plane,
the service you received, and if you were satisfied, if you would recommend,
and if you would fly again.
The airline is certainly concerned
about tracking changes in these ratings over time. But they might also be
interested in increasing customer loyalty (i.e., satisfaction, recommendation,
and repeat purchase). For the later, the airline might request a "key
driver analysis." The term "driver analysis" is used because the
airline is looking for a marketing strategy that will increase loyalty. The
word "key" is used because the airline wants to find the drivers with
the biggest impact. "What is the one thing that we could do to increase
customer loyalty?"
Here is one answer -- a network
visualization of the correlations among all the ratings. It can be produced
using the R statistical programming language in one line of code. I claim that clients find the
network engaging. That is, anyone can look at the figure below and quickly see
the interrelationships among the ratings and what is driving the different
manifestations of customer loyalty. It is an easy picture to understand and
helps clients to think strategically. Let's see if I can support that claim.
What it is? It is a mapping of the
correlations among 15 ratings with the colors added to show ratings with
highest mutual intercorrelations. The nodes are the ratings. The
lines are the correlations. Here we only show lines for correlations above a
specified cutoff value.
The greater the correlation between
two variables, the thicker the line will be. So, the aqua or light blue
nodes are interconnected by thicker lines because they are all highly
correlated with each other. You can think of this as a customer service
component. The green circles refer to the aircraft seating and cleanliness. The
ticketing process is represented by the red circle, although their lines are less
thick, suggesting a less cohesive component than customer service. Finally, the
outcomes associated with customer loyalty are shown in purple.
Pressure Points = Driver Analysis
Say the airline asks you how they
could increase customer satisfaction. You find the Satisfaction node on the
left hand side, and you look for thick lines leading to it. There are several
pathways to increased customer satisfaction. For example, all four customer
service ratings have sizeable paths. If we were able to improve customer
perceptions of friendliness (apply pressure to the Friendliness node), the effect
would spread along its path to Satisfaction. The improvement in perceived
Friendliness would also spread to perceptions of Courtesy, Service, and
Helpfulness since these are also connected. In fact, if the airline were to
make changes that impacted all four service components at the same time (apply
pressure simultaneously to four nodes), they could possibly see even greater
improvement. Perhaps we should not be asking “what is the one thing that will
most increase customer loyalty,” but what is the one area where we should
concentrate our efforts.
Moreover, we can see that the
"drivers" of repeat purchase (Fly Again) are different from the
drivers of customer satisfaction. Otherwise, Fly Again would be positioned
closer to Satisfaction. In fact, the network visualization makes it obvious that the key drivers change as one moves from Satisfaction to Recommendation to Fly Again.
Comparison to More Traditional Key
Driver Analyses
Multiple regression is the most
common form of key driver analysis. How did our network map perform relative to
regression analysis? Here are the standardized regression coefficients from
three separate regressions of Satisfaction, Recommend, and Fly Again on all 12
predictors.
Standardized Regression Weights
|
|||
Sat
|
Recommend
|
Fly Again
|
|
(Intercept)
|
0.00
|
0.00
|
0.00
|
Easy Reservation
|
0.05
|
0.16
|
0.12
|
Preferred Seats
|
0.04
|
0.15
|
0.14
|
Flight Options
|
0.05
|
0.11
|
0.14
|
Ticket Prices
|
0.04
|
0.06
|
0.10
|
Seat Comfort
|
0.09
|
0.09
|
0.03
|
Seat Roominess
|
0.07
|
0.17
|
0.10
|
Overhead Storage
|
0.02
|
0.19
|
0.16
|
Clean Aircraft
|
0.10
|
0.15
|
0.09
|
Courtesy
|
0.06
|
0.00
|
-0.01
|
Friendliness
|
0.15
|
0.00
|
-0.01
|
Helpfulness
|
0.13
|
-0.02
|
0.10
|
Service
|
0.14
|
-0.09
|
0.05
|
R-Squared
|
0.59
|
0.61
|
0.63
|
These coefficients are consistent
with what we learned from the network. The largest weights for satisfaction
come from the customer service components, while Recommend and Fly Again are
influenced more by ticketing and cabin characteristics.
Warning, it's caveat time. The data are observational. We do not know the causal connections
among these ratings. Does friendliness impact satisfaction? Or does
satisfaction make it less likely that customers will give lower friendliness
ratings?
Appendix
Appendix
I have used an R package called qgraph to produce the network visualization. You call the package using the command
library(qgraph) and create the network graph using the following code:
gr<-list(1:4,5:8,9:12,13:15)
qgraph(cor(ratings),layout="spring", groups=gr,
labels=names(ratings), label.scale=FALSE, minimum=0.50)
The 15 ratings are located in a data frame called
ratings. The map uses the correlation
matrix among the 15 ratings as the proximity matrix to create the network. We have asked for the “spring” layout, which has the
effect of placing more highly correlated variables near each other and away
from less or negatively correlated variables.
The author is Sacha Epskamp . He works with the PsychoSystems Project. Either of the following two links will take you to web sites that will explain qgraph and the network visualization approach in greater depth: http://sachaepskamp.com/ or http://www.psychosystems.org/.
The author is Sacha Epskamp . He works with the PsychoSystems Project. Either of the following two links will take you to web sites that will explain qgraph and the network visualization approach in greater depth: http://sachaepskamp.com/ or http://www.psychosystems.org/.
Libraries like qgraph are one of the great strengths of the
R programming language. The
PsychoSystems Project is a group of programmers and researchers attempting to
introduce a new metaphor for understanding the basis of psychological
measurement. You should use either link
to learn about their work. However, for
us what is important is that Sacha Epskamp has spent a considerable amount of
time and work to create a single line of code that generates exactly the type
of graph that one would want if they needed to show the interrelationships among
a set of ratings.
Finally, the ratings data set was randomly generated using a
specific factor model. It was not
essential to our discussion for the reader to know this because the simulated
data set mimics the structure that underlies most of the satisfaction data sets
one finds in marketing research. I have seen this structure over and over again from customer satisfaction surveys across markets, including my research with the airlines. However,
in order to reproduce the analysis shown in this posting, you will need to run
the necessary R code. I have listed
everything you will need below.
R Code to Generate the Simulated Data and Run All Analyses
# The goal is to show all the R code that you would need
# to reproduce everything that has been reported.
# We will use the mvtnorm package in order to randomly
# generate a data set with a given correlation pattern.
# First, we create a matrix of factor loadings.
# This pattern is called bifactor because it has a
# general factor with loadings from all the items
# and specific factors for separate components.
# The outcome variables are also formed as
# combinations of these general and specific factors.
loadings <- matrix(c (
.33, .58, .00, .00, # Ease of Making Reservation
.35, .55, .00, .00, # Availability of Preferred Seats
.30, .52, .00, .00, # Variety of Flight Options
.40, .50, .00, .00, # Ticket Prices
.50, .00, .55, .00, # Seat Comfort
.41, .00, .51, .00, # Roominess of Seat Area
.45, .00, .57, .00, # Availability of Overhead Storage
.32, .00, .54, .00, # Cleanliness of Aircraft
.35, .00, .00, .50, # Courtesy
.38, .00, .00, .57, # Friendliness
.60, .00, .00, .50, # Helpfulness
.52, .00, .00, .58, # Service
.43, .10, .20, .30, # Overall Satisfaction
.35, .50, .40, .20, # Purchase Intention
.25, .50, .50, .00), # Willingness to Recommend
nrow=15,ncol=4, byrow=TRUE)
# Matrix multiplication produces the correlation matrix,
# except for the diagonal.
cor_matrix<-loadings %*% t(loadings)
# Diagonal set to ones.
diag(cor_matrix)<-1
library(mvtnorm)
N=1000
set.seed(7654321) #needed in order to reproduce the same data each time
std_ratings<-as.data.frame(rmvnorm(N, sigma=cor_matrix))
# Creates a mixture of two data sets:
# first 50 observations assinged uniformly lower scores.
ratings<-data.frame(matrix(rep(0,15000),nrow=1000))
ratings[1:50,]<-std_ratings[1:50,]*2
ratings[51:1000,]<-std_ratings[51:1000,]*2+7.0
# Ratings given different means
ratings[1]<-ratings[1]+2.2
ratings[2]<-ratings[2]+0.6
ratings[3]<-ratings[3]+0.3
ratings[4]<-ratings[4]+0.0
ratings[5]<-ratings[5]+1.5
ratings[6]<-ratings[6]+1.0
ratings[7]<-ratings[7]+0.5
ratings[8]<-ratings[8]+1.5
ratings[9]<-ratings[9]+2.4
ratings[10]<-ratings[10]+2.2
ratings[11]<-ratings[11]+2.1
ratings[12]<-ratings[12]+2.0
ratings[13]<-ratings[13]+1.5
ratings[14]<-ratings[14]+1.0
ratings[15]<-ratings[15]+0.5
# Truncates Scale to be between 1 and 9
ratings[ratings>9]<-9
ratings[ratings<1]<-1
# Rounds to single digit.
ratings<-round(ratings,0)
# Assigns names to the variables in the data frame called ratings
names(ratings)=c(
"Easy_Reservation",
"Preferred_Seats",
"Flight_Options",
"Ticket_Prices",
"Seat_Comfort",
"Seat_Roominess",
"Overhead_Storage",
"Clean_Aircraft",
"Courtesy",
"Friendliness",
"Helpfulness",
"Service",
"Satisfaction",
"Fly_Again",
"Recommend")
# Calls qgraph package to run Network Map
library(qgraph)
# creates grouping of variables to be assigned different colors.
gr<-list(1:4,5:8,9:12,13:15)
qgraph(cor(ratings),layout="spring", groups=gr, labels=names(ratings), label.scale=FALSE, minimum=0.50)
# Calculates z-scores so that regression analysis will yield
# standardized regression weights
scaled_ratings<-data.frame(scale(ratings))
ols.sat<-lm(Satisfaction~Easy_Reservation + Preferred_Seats +
Flight_Options + Ticket_Prices + Seat_Comfort + Seat_Roominess +
Overhead_Storage + Clean_Aircraft + Courtesy + Friendliness +
Helpfulness + Service, data=scaled_ratings)
summary(ols.sat)
ols.rec<-lm(Recommend ~ Easy_Reservation + Preferred_Seats +
Flight_Options + Ticket_Prices + Seat_Comfort + Seat_Roominess +
Overhead_Storage + Clean_Aircraft + Courtesy + Friendliness +
Helpfulness + Service, data=scaled_ratings)
summary(ols.rec)
ols.fly<-lm(Fly_Again ~ Easy_Reservation + Preferred_Seats +
Flight_Options + Ticket_Prices + Seat_Comfort + Seat_Roominess +
Overhead_Storage + Clean_Aircraft + Courtesy + Friendliness +
Helpfulness + Service, data=scaled_ratings)
summary(ols.fly)
That's really helpful, thanks!
ReplyDeleteHi,
ReplyDeleteAny possibility / time to start a MOOC for doing "Market Research Using R"
I have learned a lot from your posts. But that would be a big help to get everything in order.
Please do think ...
Cheers !
any package in R programming for brand mapping and data visualitation??
ReplyDeleteIf by "brand mapping" you mean perceptual mapping using correspondence analysis, then you should search for "Gaston Sanchez correspondence analysis in R" for a complete how-to guide. If that is not what you were looking for, let me know.
DeleteWauww...thank you so much
DeleteInteresting, have you looked at the specific package pcalg in R? What if you have highly correlated data? The Network visulization wouldn't show you much?
ReplyDeleteYou are correct that partial correlations or regression coefficients become less stable and less informative as all the variables become more correlated. I have argued in several posts that there comes a point when the first principal component becomes too large to believe that we have anything more than a single dimension. That is, we believe that the data cloud resides in a high dimensional space because we have many variables, but the high intercorrelations suggest that the data are confined to a single dimension. One cannot discover a causal structure using pcalg when all the variables are measuring the same underlying construct.
DeleteFor those who wish to learn more, Shalizi provides a good introductory chapter on pcalg called Discovering Causal Structure from Observations. Here is the link http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch25.pdf
Please compile your blog into a book.!!! its great!!
ReplyDelete