Not every system of independent variables can be decomposed
into separate components, each with its own unique contribution. Sometimes our individual variables behave “as
a unit” and thus become so entangled that we cannot say where the effect of one
variable begins and the effect of another variable ends. In such cases, it might be best to ignore the
“need for decomposition” and treat multicollinearity, not as a problem to be
solved, but as a friendly reminder that our observed variables are reflecting a
common underlying dimension.
Perhaps it would be better to think of multicollinearity as nothing more than a pattern of highly correlated variables or a positive manifold, as it has come to be called.
Over a hundred years ago, Charles Spearman noted that
performance scores from different cognitive tasks were highly correlated. Wikipedia provides a comprehensive review and
a number of good examples of such correlation matrices. When looking at the recurring pattern of
positive correlations among almost all cognitive tasks, Spearman saw the
presence of a single latent ability dimension, which he called "g." Spearman was not interested in running
regression analyses with cognitive tasks as separate predictors. He was not concerned with the individual contribution
of each cognitive task controlling for all the other cognitive tasks. He did not see multicollinearity as a problem
but as an indication that each predictor was a manifestation of the same
underlying latent trait. Spearman was
inventing factor analysis and cared more about the latent trait than the manifest
variables. Multicollinearity was a
friend because it allowed Spearman to “see” behind the observed variables.
Had Spearman been a mathematician, he might have stopped
with the positive manifold, as Ramsay did in his paper “A Geometrical Approach to Item Response Theory.” But as a
psychologist, Spearman was studying intelligence and sought an explanation for
why the positive manifold was such a robust observation. Borrowing metaphors from his time, intelligence
was a mental energy or as he described it, “one can talk about mind power in much the same manner as about horse power.” As you might have guessed, psychology has
moved on to theories about cognitive processing to explain the positive
manifold (see footnote at the end of this post). Still, it is important to note that when we
repeatedly discover positive manifolds among our variables, we might wish to
ask why and not simply try a statistical “workaround” like relative importance. This is what item response theory (IRT) attempts
to do.
Item response theory follows Spearman’s lead.
Test scores on cognitive tasks are replaced with individual items, but
the focus remains on the latent trait responsible for the item score. In fact, items that do not measure the same
latent trait in the same way across respondents will be removed (differential item functioning). In an earlier post, I attempted an intuitive introduction to item response theory. I plan to return to this topic in future posts. The positive manifold is a common structure
underlying rating data (e.g., halo effects). My goal is to examine in some depth the cognitive and affective processes that are used when answering rating items and to show how the positive manifold results from such processes.
Footnote: Many
readers are likely to discover that most discussions of positive manifolds might
be just slightly out of their reach.
However, Cosma Shalizi has published a post on his blog (called “g, a Statistical Myth”) that is both comprehensive and not unnecessarily complicated. If you have read the link to Wikipedia, you
will know that there are three theories of g: mental energy, sampling theory,
and mutualism. Shalizi summarizes all
three with both pictures and lots of references to other work. As I believe that
the Borsboom links are so important, I will offer two of my own: one to Borsboom papers and the other to the
PsychoSystems Project. All of these
readings move us from the statistical model where latent variables are “convenient
fictions” to the substantive world where latent variables can be theoretically
tied to real outcomes that can be seen and felt.
No comments:
Post a Comment