Let's start with the correlation network map produced by the R package qgraph. As always, all the R code can be found at the end of this post.

First, we need to discover the underlying pattern, so we will begin by looking for nodes with the highest correlations and thus interconnected with the thickest lines. Red lines indicate negative correlations (e.g., those who claim that they are "indifferent to others" are unlikely to tell us that they "inquire about others" or "comfort others"). Positive correlations are shown in green (e.g., several nodes toward the bottom of the network suggest that those who report "mood swings" and "panic easily" also said that they are easy to anger and irritate). The node "reflect on things" seems to be misplaced, but it is not. The thin red and green lines suggest that it has uniformly low correlations with all the other items, which explain why it is positioned at the periphery but closest to the other four items with which it is the most correlated.

Using this approach, we can identify several regions that are placed near each other because of their interconnections. For instance, the personal problems mentioned previously and located toward the bottom of the graph are separated from but linked to the measures of introversion ("difficult approach others" and "don't talk"), which in turn have strong negative correlations with extroversion ("makes friends"). As we continue up the graph on the left side, we find an active openness to others that becomes take charge and conscientious. If we continue back down the right side, respondents note what might be called work-related problems. Now, we have our story, and we can see the two-dimensional structure defining the correlation network: internal vs. external and in-control vs. out-of-control.

Next, we can compare this network representation with the more traditional factor model. Why do we observe correlations among observed variables? Correlations are the results of latent variables. We see this in the factor model diagram created using the same data. For example, individuals possess some degree of neuroticism (labeled RC2), therefore the five personal problem items are intercorrelated. The path coefficient associated with each arrow indicates the correlation between the factor and the observed variable, and the product of the path coefficients for any two observed variables is our estimate of the correlation between those two observed variables.

One should recognize that the two diagrams seek to account for the same correlation matrix. The factor model does so by postulating the presence of unseen forces or latent variables. However, we never observe neuroticism, and we understand that all we have is a pattern of higher correlations among those five self-reports. Without compelling evidence for the independent existence of such a latent variable, we might try to avoid making the reification fallacy and look for a different explanation.

The network model provides an alternative account. Perhaps the best overview of this approach can be found at the PsychoSystems Project. From a network perspective, correlations are observed because the nodes mutually interact. This is not a directed graph attempting to separate cause and effect. It is not a causal model. Perhaps in the beginning, there was a causal connection with one node occurring first and impacting the other nodes. But over time, these nodes have come to mutually support one another so that the unique effects of the self-report ratings can no longer be untangled.

Which of these two representations is better? If the observed variables are true reflections of an underlying trait that can be independently established, then the factor model offers a convenient hierarchical model. We think that we are observed five different things, but in fact, we are measuring five different manifestation of one underlying construct. On the other hand, a network of mutually supportive observations cannot be represented using a factor model. There are no factors, and asserting so ends the discussion prematurely. What are the relationships among the separate nodes? How can one intervene to break the cycle? Are there multiple leverage points? In previous posts, I showed how much can be gained using a network visualization of a key driver analysis and how much can be lost relying solely on an input-output regression model. Besides, why would you not generate the map when, as shown below, R makes it so easy to do?

**R code to create the two plots:**

library(psych) data(bfi) ratings<-bfi[,1:25] names(ratings)<-c( "Indifferent of others", "Inquire about others", "Comfort others", "Love children", "Make people at ease", "Exacting in my work", "Until perfect", "Do by plan", "Do halfway", "Waste time", "Don't talk", "Difficult approach others", "Know how to captivate people", "Make friends", "Take charge", "Angry easily", "Irritated easily", "Mood swings", "Feel blue", "Panic easily", "Full of ideas", "Avoid difficult reading", "Carry conversation higher", "Reflect on things", "Not probe deeply" ) fa.diagram(principal(ratings, nfactors=5), main="") library(qgraph) qgraph(cor(ratings, use="pairwise"), layout="spring", label.cex=0.9, labels=names(ratings), label.scale=FALSE)

Created by Pretty R at inside-R.org

Very nice!

ReplyDeleteI would like to note that you can plot loadings as in figure 2 with qgraph as well (via qgraph.loadings) or using the semPlot package:

library(semPlot)

pdf("loadings.pdf",height=10,width=6)

semPaths(principal(ratings, nfactors=5), "std", "est", rotation = 4,

sizeMan = 8, sizeMan2 = 3, shapeMan = "rectangle")

dev.off()

Good point. I considered talking about your other packages, but decided to keep it short and simply provide a link to your website. It's the picture that makes structural equation modeling easier to understand, and semPlot offers the complete package.

DeleteSimply Great. Simplicity and Art. I would be much thankful if you show how to classify one or more individual to one of the five factors. Thanks

ReplyDeleteE. Shabana (University Paris Diderot - Biology)

I am assuming that you are asking about factor scores, since the factor model assumes continuous latent variables. Let's pretend that each individual possesses some quantity of each of the factors, which we call the factor score. We can use their observed scores to estimate what that factor score might have been. The principal function from the psych package will generate factor scores. The defaults are principal component extraction and varimax rotation. You will need to put the results into an object, for example, out<-principal(ratings, nfactors=5, scores=TRUE). The object "out" is a list, and one of its elements holds the five factor scores for each respondent with complete data. You can get those factor scores with out$scores. Hopefully, this is what you were asking about.

Delete