Tuesday, January 28, 2014

Context Matters When Modeling Human Judgment and Choice

Herbert Simon was succinct when he argued that judgment and choice "is shaped by a scissor whose two blades are the structure of the task environment and the computational capabilities of the actor" (Simon, 1990, p.7). As a marketing researcher, I take Simon seriously and will not write a survey item without specifying the respondent's task and what cognitive processes will be involved in the task resolution.

Thus, when a client asks for an estimate of the proportion of European car rentals made in the United States that will be requests for automatic transmissions, I do not ask "On a scale for 1=poor to 10=excellent, how would you rate your ability to drive a car with a manual transmission?" Estimating one's ability, which involves an implicit comparison with others, does not come close to mimicking the car rental task structure. Nor would I ask for the likelihood of ordering an automatic transmission because probability estimation is not choice. Likelihood will tend to be more sensitive to factors that will never be considered when the task is choice. In addition, I need a context and a price. It probably makes a difference if the rental is for business or personal use, for driving in the mountains or in city traffic, the size of the vehicle, and much more. Lastly, the proportion of drivers capable of using a stick shift increases along with the additional cost for an automatic transmission. Given a large enough incremental price for automatic transmissions, many of us will discover our hidden abilities to shift manually.

The task structure and the cognitive processing necessary to complete the task determine what data need to be collected. In marketing research, the task is often the making of a purchase, that is, the selection of a single option from many available alternatives. Response substitution is not allowed. A ranking or a rating alters the task structure so that we are now measuring something other than what type of transmission will be requested. Different features become relevant when we choose, when we rate, and when we rank the same product configurations. Moreover, the divergence between choice and rating is only increased by repeated measures. The respondent will select the same alternative when minor features are varied, but that respondent will feel compelled to make minor adjustments in their ratings under the same conditions. Task structure elicits different cognitive processing as the respondent solves different problems. Ratings, ranking and choice are three different tasks. Each measures preference as constructed in order to answer the specific question.

Context matters when the goal is generalization, and one cannot generalize from the survey to the marketplace unless the essential task structure has been maintained. For example, I might wish to determine not only what type of car you intend to rent in your next purchase but what you might do over your next ten rentals. Now, we have changed the task structure because car rentals take place over time. We do not reserve our next ten rentals on a single occasion, nor can we anticipate how circumstances will change over time. The "next ten purchases question" seems to be more a measure of intensity than anticipated marketplace behavior.

Nor can one present a subset of available alternatives and ask for the most and least preferred from this reduced list without modifying the task structure and the cognitive processing used to solve the tasks. The alternatives that are available frame the choice task. I prefer A to B until you show me C, and then I decide not to buy anything. Or, adding a more expensive alternative to an existing product configuration increases the purchase of medium priced options by making them seem less expensive. Context matters. When we ignore it, we lose the ability to generalize our research to the marketplace. Finally, self-reports of the importance or contribution of features are not context-free; they simply lack any explicit context so that respondents can supply whatever context comes to mind or they can just chit-chat.

The implications for statistical modeling in R are clear. We begin with a description of the marketplace task. This determines our data collection procedures and places some difficult demands on the statistical model. For example, purchase requires a categorical dependent variable and a considerable amount of data to yield individual estimates. Yet, we cannot simply increase the number of choice sets given to each respondent because repeated measures from the same individual alters that individual's preferences (e.g., price sensitivity tends to increase over repeated exposures to price variation). Bayesian modeling within R allows us to exploit the hierarchical structure within a data set so that we can use data from all the respondents to compensate for our inability to collect much information from any one person. However, borrowing data from others in hierarchical Bayes is not unlike borrowing clothes from others; the sharing works only when the others are exchangeable and come from the same segment with the a common distribution of estimates.

None of this seems to be traditional preference elicitation, where we assume that preference is established and well-formed, requiring only some means for expression. Preference or value is the latent variable responsible for all observed indicators. Different elicitation methods may introduce some unique measurement effects, but they all tap the same latent construct. Simon, on the other hand, sees judgment and decision making as a form of problem solving. Preferences can still be measured, but preferences are constructed as solutions to specific problems within specific task structures. Although preference elicitation is clearly not dead, we can expect to see increasing movement toward context awareness in both computing and marketing.

Friday, January 17, 2014

Metaphors Matter: Factor Structure vs. Correlation Network Maps

The psych R package includes a data set called "bfi" with self-report ratings on 25 personality items along a 6-point agreement scale. All the details are provided in the documentation accompanying the package. My focus is how to represent the correlations among these ratings: factor analysis or network graphics?

Let's start with the correlation network map produced by the R package qgraph. As always, all the R code can be found at the end of this post.

First, we need to discover the underlying pattern, so we will begin by looking for nodes with the highest correlations and thus interconnected with the thickest lines. Red lines indicate negative correlations (e.g., those who claim that they are "indifferent to others" are unlikely to tell us that they "inquire about others" or "comfort others"). Positive correlations are shown in green (e.g., several nodes toward the bottom of the network suggest that those who report "mood swings" and "panic easily" also said that they are easy to anger and irritate). The node "reflect on things" seems to be misplaced, but it is not. The thin red and green lines suggest that it has uniformly low correlations with all the other items, which explain why it is positioned at the periphery but closest to the other four items with which it is the most correlated.

Using this approach, we can identify several regions that are placed near each other because of their interconnections.  For instance, the personal problems mentioned previously and located toward the bottom of the graph are separated from but linked to the measures of introversion ("difficult approach others" and "don't talk"), which in turn have strong negative correlations with extroversion ("makes friends").  As we continue up the graph on the left side, we find an active openness to others that becomes take charge and conscientious. If we continue back down the right side, respondents note what might be called work-related problems. Now, we have our story, and we can see the two-dimensional structure defining the correlation network: internal vs. external and in-control vs. out-of-control.

Next, we can compare this network representation with the more traditional factor model. Why do we observe correlations among observed variables? Correlations are the results of latent variables. We see this in the factor model diagram created using the same data. For example, individuals possess some degree of neuroticism (labeled RC2), therefore the five personal problem items are intercorrelated.  The path coefficient associated with each arrow indicates the correlation between the factor and the observed variable, and the product of the path coefficients for any two observed variables is our estimate of the correlation between those two observed variables.

One should recognize that the two diagrams seek to account for the same correlation matrix. The factor model does so by postulating the presence of unseen forces or latent variables. However, we never observe neuroticism, and we understand that all we have is a pattern of higher correlations among those five self-reports. Without compelling evidence for the independent existence of such a latent variable, we might try to avoid making the reification fallacy and look for a different explanation.

The network model provides an alternative account. Perhaps the best overview of this approach can be found at the PsychoSystems Project. From a network perspective, correlations are observed because the nodes mutually interact. This is not a directed graph attempting to separate cause and effect. It is not a causal model. Perhaps in the beginning, there was a causal connection with one node occurring first and impacting the other nodes. But over time, these nodes have come to mutually support one another so that the unique effects of the self-report ratings can no longer be untangled.

Which of these two representations is better? If the observed variables are true reflections of an underlying trait that can be independently established, then the factor model offers a convenient hierarchical model. We think that we are observed five different things, but in fact, we are measuring five different manifestation of one underlying construct. On the other hand, a network of mutually supportive observations cannot be represented using a factor model. There are no factors, and asserting so ends the discussion prematurely. What are the relationships among the separate nodes? How can one intervene to break the cycle? Are there multiple leverage points? In previous posts, I showed how much can be gained using a network visualization of a key driver analysis and how much can be lost relying solely on an input-output regression model. Besides, why would you not generate the map when, as shown below, R makes it so easy to do?

R code to create the two plots:

  "Indifferent of others",
  "Inquire about others",
  "Comfort others",
  "Love children",
  "Make people at ease",
  "Exacting in my work",
  "Until perfect",
  "Do by plan",
  "Do halfway",
  "Waste time",
  "Don't talk",
  "Difficult approach others",
  "Know how to captivate people",
  "Make friends",
  "Take charge",
  "Angry easily",
  "Irritated easily",
  "Mood swings",
  "Feel blue",
  "Panic easily",
  "Full of ideas",
  "Avoid difficult reading",
  "Carry conversation higher",
  "Reflect on things",
  "Not probe deeply"
fa.diagram(principal(ratings, nfactors=5), main="")
qgraph(cor(ratings, use="pairwise"), layout="spring",
       label.cex=0.9, labels=names(ratings), 

Created by Pretty R at inside-R.org

Friday, January 10, 2014

Finding the R community a barrier to entry, Python looks elsewhere for lunch

Tal Yarkoni's post on "The homogenization of scientific computing, or why Python is steadily eating other languages' lunch" is an enjoyable read of his transition from R to Python. He makes a good case, and I have no argument with his reasoning or the importance of Python in his work. But my experience has not been the same. I am a methodologist working in marketing. I could have called myself a data analyst in the sense that John Tukey used that term back in his 1962 paper on The Future of Data Analysis. Bill Venables speaks of R in a similar manner and quotes Tukey in his keynote at UseR! 2012, "Statistics work is detective work!" I like that description.

So when I turn to R, I am looking for more than code. "The game is afoot!" I require all the usual tools and perhaps something new or from another field of research. As an example, marketing is concerned with heterogeneity because "one size does not fit all." But every field is concerned with heterogeneity. It's the second moment of a distribution. We refer to it as heterogeneity in marketing, but you might call it variability, variation, dispersion, spread, diversity, or individual differences. There are even more words for the attempt to summarize and explain the second moment: density estimation, finite mixtures, seriation, sorting, clustering, grouping, segmenting, graph cutting, partitioning, and tessellation. R has a package for every term, from many differing points of view, and with more on the way every day.

Detective work borrows whatever assists in the hunt. As a marketing scientist trying to understand customer heterogeneity, R provides everything I need for clustering and finite mixture modeling. Moreover, R contributors provide more than a program, writing some of the best and most insightful papers in the field. However, why restrict myself to traditional approaches to understanding heterogeneity when R includes access to archetypal analysis, item response theory, and latent variable mixture models? These are three very different approaches that I can borrow only because they share a common R language.  It is extremely difficult to learn from fields with a different vocabulary. Even if the underlying math is the same, everything is called by a different name. R imposes constraints on the presentation of the material so that comprehension is still difficult but no longer impossible.

Of course, Python also has a mixture package, and perhaps at some point in the future we will see a Python community that will compete with R. Until then, Python will have to skip lunch.

Monday, January 6, 2014

An Introduction to Statistical Learning with Applications in R

Statistical learning theory offers an opportunity for those of us trained as social science methodologists to look at everything we have learned from a different perspective. For example, missing value imputation can be seen as matrix completion and recommender systems used to fill-in empty questionnaire items that were never shown to more than a few respondents by design. It is not difficult to show how to run the R package softImpute that makes all this happen.  But it can be overwhelming trying to learn about the underlying mechanism in enough detail that you have some confidence that you know what you are doing. One does not want to spend the time necessary to become a statistician, yet we need be aware of when and how to use specific models, and what can go wrong, and what to do when something goes wrong. At least with R, one can run analyses on data sets and work through concrete examples.

The publication of An Introduction to Statistical Learning with Applications in R (download the book pdf) provides a gentle introduction with lots of R code. The book achieves a nice balance and well worth looking at both for the beginner and the more experienced needing to explain to others with less training. As a bonus, Stanford's OpenEdX has scheduled a MOOC by Hastie and Tibshirani beginning in January 21 using this textbook.