Pages

Sunday, August 26, 2012

Structural Equation Modeling: Separating the General from the Specific (Part II)

As promised in Halo Effects and Multicollinearity (my last post), I will show how to run a confirmatory factor analysis in R to test our bifactor model.  In addition, I will include a dependent variable and fit a structural equation model to illustrate how the general and specific components in a rating contribute to an outcome such as overall satisfaction.

The Bifactor Model:  Exploratory Factor Analysis

I introduced the bifactor model to explain how a rating, even a rating of something relatively concrete, such as the fairness of an airline's ticket prices, reflects both the respondent's specific knowledge of the airline's prices and the respondent's feelings about the airline in general.  I noted that this is "bias" if your intention was to use the customer as an informant to learn about an airline's actual ticket pricing, but it is "brand equity" if instead you wanted to learn about customers' perceptions.

Let's return to our airline satisfaction data and look again at the bifactor model diagram from the previous post (shown below).  When providing 12 ratings of their last flight, respondents seem to distinguish, to some extent, between the aircraft's physical features (F1*), the service they received from the staff (F2*), and the ticketing process (F3*).  But the greatest impact came from g, the generalized impression or brand affinity.

The factor loadings associated with each path are estimates from an exploratory factor analysis using the omega function in the psych package.  Although the factor analysis is exploratory, the bifactor model does impose some structure in its attempt to reproduce the observed 12x12 correlation matrix. Specifically, it is assumed that the ratings were produced by orthogonal latent variables. One latent variable, g, has 12 slots for factor loadings, one for each item. Moreover, we had to specify the number of factors in advance, and these factors were restricted to be uncorrelated (no arcs between circles in the above diagram). As we will see, confirmatory factor analysis frees us to impose more interesting constraints.  However, the model still needs to be identified.

Please allow me to make one last point about how factor loadings reproduce observed correlations before we leave the exploratory bifactor model. Looking at the bottom of the above diagram, we see that the correlation between Flight Options and Easy Reservation has two parts. One part is due to impact of g: 0.73*0.62=0.4526. The second part comes from F3*: 0.38*0.34=0.1292. These estimates of the factor loadings are those that most closely reproduce the observed correlation of 0.59 between Flight Options and Easy Reservation. In this case, the observed correlation is 0.59, and the reproduced correlation from the bifactor model is 0.5818 (=0.73*0.62 + 0.38*0.34).

The Bifactor Model:  Confirmatory Factor Analysis

We have already seen the major elements of a confirmatory factor analysis.  We start with a factor model representing the data generation process, for example, the diagram above.  It shows the number of observed variables (boxes) and the number of latent variables (circles).  It specifies that the latent variables are uncorrelated (absence of arcs between circles).  It indicates that the observed variables are the effects and the latent variables are the causes (arrow pointing from circles to boxes).  However, to leave some white space, we have not included the unique variances for each observed variable (one arrow pointing to each box indicating how much variation is unique to that observed variable).

We will use lavaan for a number of reasons but primarily because its syntax mimics the way we just talked about specifying the model.

bifactor <- '
general.factor   =~ Easy_Reservation + Preferred_Seats + Flight_Options + Ticket_Prices + Seat_Comfort + Seat_Roominess + Overhead_Storage +

Clean_Aircraft + Courtesy + Friendliness + Helpfulness + Service
ticketing =~ Easy_Reservation + Preferred_Seats + Flight_Options + Ticket_Prices
aircraft  =~ Seat_Comfort + Seat_Roominess + Overhead_Storage + Clean_Aircraft
service   =~ Courtesy + Friendliness + Helpfulness + Service
'


The name of the model statement is "bifactor."  A single quote starts and ends the model statement.  We have four latent variables, each given a name and tied to the observed variables through the =~ operator.  Thus, the general.factor lists all 12 observed variables using the + sign to link them together.  This type of code should seem familiar to those using formula in R functions like lm().

To run the bifactor model, we add the following two lines:

modelfit <- cfa(bifactor, data=ratings[,1:12]), orthogonal=TRUE)
summary(modelfit, fit.measures = TRUE, standardize = TRUE)


"cfa" is the function in lavaan for confirmatory factor analysis, and we pass the model statement (bifactor), the observed data set with the first 12 ratings, and a option (orthogonal=TRUE) to keep the latent variables orthogonal.

I will present the factor loadings in matrix format in order to highlight the hierarchical structure of the bifactor model.

 
       g
 

 
  ticketing
 

 
  aircraft
 
service
 
Easy_Reservation
0.753
0.311
Preferred_Seats
0.697
0.398
Flight_Options
0.618
0.386
Ticket_Prices
0.644
0.436
Seat_Comfort
0.793
0.303
Seat_Roominess
0.725
0.366
Overhead_Storage
0.702
0.435
Clean_Aircraft
0.741
0.337
Courtesy
0.776
0.303
Friendliness
0.756
0.366
Helpfulness
0.835
0.435
Service
0.811
0.337

We find the same pattern that we saw in the exploratory factor analysis.  There is a strong "halo" or generalized impression in all the ratings, but more so for the items with secondary loadings on service than the items with secondary loadings on ticketing.  This is all consistent with our original hypothesis that respondents would base their ratings on brand affinity, if they could, and would not try to retrieve more detailed information about the brand from memory unless they were forced to do so.

Because this is a confirmatory factor analysis, we ought to look and see how well our model fits.  Fortunately, David Kenny has updated recently his page on Measuring Model Fit.  You will find a complete and succinct discussion of all the different indices (and a comprehensive tutorial on SEM as well).  The lavaan package prints several of these tests and goodness-of-fit indices when you ask for a summary. 

I have little to add to Kenny's overview, except to suggest that you always look at the residuals before you fail to reject the model.  Remember that we are trying to reproduce the observed correlations after imposing the restrictions in our bifactor model.  If we were successful, the residual correlations ought to be close to zero.  The command, residuals(modelfit, type="cor"), will get us what we want.

 
Esy_Rs
Prfr_S
Flgh_O
Tckt_P
St_Cmf
St_Rmn
Ovrh_S
Cln_Ar
Cortsy
Frndln
Hlpfln
Servic
Easy_Reservation
0.00
Preferred_Seats
0.00
0.00
Flight_Options
0.01
-0.01
0.00
Ticket_Prices
-0.01
0.01
0.00
0.00
Seat_Comfort
-0.01
0.01
0.00
0.02
0.00
Seat_Roominess
0.00
0.00
0.00
-0.03
0.00
0.00
Overhead_Storage
-0.01
0.01
0.01
0.00
0.00
0.00
0.00
Clean_Aircraft
0.01
0.00
0.00
-0.02
0.00
0.01
0.00
0.00
Courtesy
0.01
-0.02
0.01
0.02
0.00
0.01
0.00
0.00
0.00
Friendliness
0.01
-0.03
0.01
0.01
-0.01
0.00
0.00
0.01
0.01
0.00
Helpfulness
-0.01
0.00
-0.01
-0.01
0.01
0.01
0.00
-0.01
-0.01
0.00
0.00
Service
0.01
0.01
-0.01
0.01
-0.02
0.00
0.00
0.00
-0.01
0.00
0.01
0.00

Although the typeface is small, so are the residual correlations.  Of course, you also need to look at the other measures of fit, including the chi-square test and the fit indices.  You ought to read Kenny's summary of the controversy surrounding these indices.  Because the chi-square test can be significant even when the model seems to fit, it has become common practice to rely on goodness-of-fit indices ranging from 0 (worst) to 1.0 (best).  With this data, we find indices in the high 0.90's, which most judge as good or excellent.  Thus, I can find nothing in any of these results that suggests that I do not have a reasonable fit for the bifactor model.

The Bifactor Model:  Structural Equation Modeling

To summarize what we have done, we took 12 ratings that were highly correlated and "factored" them into four independent scores, one general and three specific latent variables.  In a previous post, we learned that those 12 ratings were so interconnected that we could not disentangle them and assess their relative contribution to the prediction of outcomes like overall satisfaction, retention, and recommendation.  But now we have four uncorrelated latent variables, so our ratings have been untangled.  Let's add an endogenous outcome variable and run a structural equation model.

Our first thought might be to group all three outcome variables together as manifestations of the same latent variable.  After all, satisfaction, retention, and recommendation are all measures of customer loyalty.  The three observed variables do have substantial correlations of 0.63, 0.56, and 0.69.  However, when we first looked at the driver analysis for these three outcomes, we ran separate regression equations and found different drivers for each variable (see Network Visualization of Key Driver Analysis).  Therefore, we will run three separate models, one for each outcome variable.

The lavaan code changes very little, which is why the package is so easy to use.  Here is the code for Overall Satisfaction.

bifactor <- '
general =~ Easy_Reservation + Preferred_Seats + Flight_Options + Ticket_Prices +
  Seat_Comfort + Seat_Roominess + Overhead_Storage + Clean_Aircraft +
  Courtesy + Friendliness + Helpfulness + Service
ticketing =~ Easy_Reservation + Preferred_Seats + Flight_Options + Ticket_Prices
aircraft  =~ Seat_Comfort + Seat_Roominess + Overhead_Storage + Clean_Aircraft
service   =~ Courtesy + Friendliness + Helpfulness + Service
Satisfaction ~ general + ticketing + aircraft + service
'
satfit <- sem(bifactor, data=ratings[,c(1:12,13)], orthogonal=TRUE)
summary(satfit, fit.measures = TRUE, standardize = TRUE)
inspect(satfit, "rsquare")


The model statement stays the same, except we have added a line for the regression equation predicting Satisfaction from the four latent variables.  In addition, we changed the function from cfa to sem and included rating #13 (Satisfaction) in the data list.

We can see a clear pattern in the structural equation parameters shown below.  General impression dominates the equations but becomes less important as we move from satisfaction to retention to recommendation.  Although I have not included the standard errors, all the coefficients above 0.10 are significant.


Satisfaction
 

Fly_Again

Recommend
Standardized Regression Coefficients
g
0.756
0.735
0.670
ticketing
0.063
0.305
0.322
aircraft
0.094
0.227
0.384
service
0.186
0.046
-0.027
R-squared
0.619
0.687
0.701
Percent Contribution to R-squared
g
92%
79%
64%
ticketing
1%
14%
15%
aircraft
1%
7%
21%
service
6%
0%
0%

Because the latent variables are independent, the R-squared is equal to the sum of the squared beta weights.  This allows us to calculate the percent contribution that each predictor makes.  Service has minimal influence, and then only on satisfaction.  Ticketing becomes more important when thinking about flying again with the same airline.  The physical features of the aircraft are given more consideration when making a recommendation.  Still, there is substantial "halo" or "unattached affect" impacting all the outcome measures.  It is "as if" respondents started with brand affinity and then made different "adjustments" for each outcome measure (Kahneman's anchoring and adjustment heuristic).

[If you want to see more R code for running structural equation models using the sem and lavaan packages, including bifactor models, check out Module 9 of Do it yourself Introduction to R.]

Now, do we know the one thing that should be done?

This was the question motivating our original inquiry.  Our client wanted to know "what is the one thing that they should do to improve customer satisfaction, to reduce churn, and to generate positive word of mouth."  We have come a long way in our understanding of the structure underlying these 15 ratings (12 features + 3 outcomes).  We learned how generalized affect or brand affinity has a major impact on every airline perception, including the outcome measures.  But we also learned that customers make outcome-specific adjustments:  overall satisfaction moves toward service perceptions, retention is more closely associated with ticketing, and recommendation depends more on the perceived physical characteristics of the aircraft.

So what is the one thing that we should do?  Having identified g as the dominant driver, we could look to the factor loadings of individual items on g and pick the items having the highest correlations with g.  These would be the service-related items, especially Helpfulness.  But do we believe that this is a "causal" connection?  Our bifactor model points the arrow in the opposite direction.  Generalized affect "causes" variation in self-report measures of Helpfulness.  The measurement model is reflective and not formative.

The data generation process is straightforward.  A respondent is asked to rate Helpfulness.  Because they can, they take the easiest path and respond using System I thinking (Kahneman's "thinking fast").  They do not reflect about what constitutes helpfulness.  They do not try to remember how many and to what extent they have experienced those constituent elements that might fall under the "helpfulness" heading.  To be honest, it is not clear what the questionnaire writer meant by helpfulness, not to the respondent and probably not to the writer.

If our goal is to learn what interventions work to improve outcomes, we could replace the Helpfulness rating with a checklist of flight attendant behaviors.  Did the flight attendant ask if you needed anything?  Did they offer to assist you with your carry-on luggage?  True, such items are binary and would require an item response or item factor analysis to score.  But such items cannot be answered using generalized affect alone.  A respondent is forced to "think slow" and retrieve specific memories of what did and did not happen.  Moreover, the airline now has at least the hope that with an appropriate analysis they can know how each action impacts an outcome variable.

Not Just for Halo Effects

Bifactor modeling is an impressive technique.  It starts with a bunch of overlapping variables with considerable shared variation and yields a much smaller number of underlying orthogonal dimensions with relatively clear interpretations.  The bifactor model works particularly well with self-reports of human perception because our first response is to "think fast" (the general factor) and then, if needed, to "think slow" (the specific factors).

My proposal is that we stop asking questions that predominantly measure generalized impressions.  However, as I noted above, if we ask about occurrences using a checklist, we can no longer use structural equation models that assume continuous observed variables.  While the next version of lavaan will provide support for categorical observed responses, for now, we would need the r package mirt (multidimensional item response theory).  It is called "item response" because the earliest work was done in educational measurement trying to understand the relationship between the examinee's ability and the likelihood of correctly answering individual test items.  Some prefer to call this item factor analysis.  We might think of item factor analysis as extending factor analysis with continuous manifest variables to factor analysis with categorical manifest variables.  "Manifest" is added because we are talking about the observed variables.

The r package mirt contains a bfactor() function for dealing with binary variables (e.g., correct/incorrect, yes/no, and did it/did not).  An example is an achievement test with several reading passages followed by a small battery of questions for each passage (called a "testlet").  This is a bifactor model since all the reading passages are measuring the same general factor, but there is additional covariation due to each specific reading passage.

The mirt function bfactor() also handles ordered categories (e.g., never, sometimes, often, always) that ought not be treated as equal interval continuous scales.  For example, a personality psychologist might want to measure a multifaceted construct, like extraversion, using a frequency scale with ordered categories.  Extraversion is multifaceted in that it has many possible manifestations, such as,  gregarious, assertiveness, warmth, and others.  The bifactor model works well with such multifaceted construct because it separates the general from the more specific components (see this link for an example and references).

Conclusions

The large loadings from our general factor suggest that every rating is predominately a measure of brand affinity.  It is, as if, we kept asking the same question over and over again, "How much do you like the brand?"  The bifactor model offers us a means for assessing the extent of such "exaggerated emotional coherence."  But the bifactor model does not create the data that we have failed to collect. 

Affinity is extremely important to a brand, nevertheless, one cannot design an intervention strategy based on it alone.  Marketing efforts to improve satisfaction, retention, and recommendation require us to learn about the likely impact of specific features and services.  Even with differentiated measurements, it is difficult to infer what will happen when we do something from what we have observed to be associated in survey data (see Judea Pearl).  Perhaps we can make a start in this direction by moving toward more rigorous measurement standards in marketing research practice.


2 comments:

  1. Hello,
    I am trying to simulated the bi-fatcor model data. I read your previous post-"Network Visualization of Key Driver Analysis ". You created the loadings , but I am confused that why the results of loadings in this post are different the loadings you created.

    ReplyDelete
    Replies
    1. I tend to find a small group of dissatisfied customers in most datasets that I analyze. Something goes wrong for these individuals so that all their ratings are uniformly lower. Actually, I added 7 points to the last 950 observation so that one sees a bimodal latent satisfaction distribution with 5% (the first 50 observations) experiencing some type of delivery failure that upset them. This adds between-group variation to the correlations. You can try rerunning the analysis without this code and see the difference in the factor loadings.

      Delete