Engaging Market Research: August 2013

Monday, August 26, 2013

Latent Variable Mixture Modeling: When Heterogeneity Requires Both Categories and Dimensions

Dichotomies come easily to us, especially when they are caricatures as shown in this cartoon. These personality types do seem real, and without much difficulty, we can anticipate how they might react in different situations. For example, if we were to give our Type A and Type B vacationers a checklist to indicate what activities they would like to do on their next trip, we would expect to observe two different patterns. Type A would select the more adventurous and challenging activities, while Type B would pick the opposite. That is, if we were to array the activities from relaxing to active, our Type A would be marking the more active items with the relaxing portion of the scale being checked by our Type B respondent. Although our example is hypothetical, market segmentation in tourism is an ongoing research area as you will see if you follow the link to an article by Sara Dolnicar, whose name is associated with several R packages.

Yet, personality type does not explain all the heterogeneity we observe. We would expect a different pattern of check marks for Type A and B, but we would not be surprised if "type" were also a matter of degree with respondents more or less reflecting their type. The more "devout" Type A checks only the most active items and rejects the less active along with all the passive activities. Similarly, the more "pure" Type B is likely to want only the most relaxing activities. Thus, we might need both personality type (categorical) and intensity of respective type (continuous) in order to explain all the observed heterogeneity.

Should we think of this dimension as graded membership in a personality type so that we need to represent personality type by a probability rather than all or none types? I would argue that vacation preference can be described more productively by two latent variables: a categorical framing or orientation (Type A vs. Type B) and a continuous latent trait controlling the expression of the type (perhaps a combination of prior experience plus risk aversion and public visibility). It's a two-step process. One picks a theme and then decides how far to go.

Of course, some customers might be "compromisers" and be searching for the ideal vacation that would balance active and relaxing, that is, the "just right" vacation. In just a case we would need an ideal point item response model (e.g., 2013 US Senate ideal points and R code for the analysis). However, to keep the presentation simple, we will assume that our vacationers want only a short trip with single theme: either a relaxing break or an active getaway. To clarify, voting for a U.S. Senator is a compromise because I select (vote for) a single individual who is closest to my positions on a large assortment of issues. Alternatively, a focused purchase such as a short vacation can seek to accomplish only one goal (e.g., most exciting, most relaxing, most educational, or most romantic).

In a previous post I showed how brand perceptions could be modeled using item response theory. Individuals do see brands differently, but those differences raise or lower all the attributes together rather than changing the rank ordering of the items. For instance, regardless of your like or dislike for BMWs, everyone would tend to see the attribute "well-engineered" as more associated with the car maker than "reasonably priced." Brand perceptions are constrained by an associative network connecting the attributes so that "a rising tide lifts all boats". As we have seen in our Type A-Type B example, this is not the case with preferences, which can often be ordered in reverse directions.

Where's the Latent Variable Mixture?

Heterogeneity among our respondents is explained by two latent variables. We cannot observe personality type, but we believe that it takes one of two possible values: Type A or Type B. If I were to select a respondent at random, they would be either Type A or Type B. In the case of our vacationers, being Type A or Type B would mean that they would see their vacation as an opportunity for something challenging or as a chance to relax. Given their personality type frame, our respondents need to decide next the intensity of their commitment. Because intensity is a continuous latent variable, we have a latent variable mixture.

Let's look at some R code and see if some concrete data will help clarify the discussion. We can start with a checklist containing 8 items ranging from relaxing to active, and we will need two groups of respondents for our Type A and Type B personalities. The sim.rasch() function from the psych package will work.

library(psych)
set.seed(16566)
TypeA<-sim.rasch(nvar=8, n=100, 
  d=c(+2.0, +1.5, +1.0, +0.5, -0.5, -1.0, -1.5, -2.0))
TypeB<-sim.rasch(nvar=8, n=100, 
  d=c(-2.0, -1.5, -1.0, -0.5, +0.5, +1.0, +1.5, +2.0))
 
more<-rbind(TypeA$items,TypeB$items)
segment<-c(rep(1,100),rep(2,100))
apply(more,2,table)
apply(TypeA$items, 2, table)
apply(TypeB$items, 2, table)

Created by Pretty R at inside-R.org

The sim.rasch() function begins with a series of default values. By default, our n=100 Type A and Type B vacationers come from a latent trait that is normally distributed with mean 0 and standard deviation 1. This latent trait can be thought of intensity, as we will soon see. So far the two types are the same, that is, two random samples from the same normal latent distribution. Their only difference is in d, which stands for difficulty. The term "difficulty" comes to us from educational testing where the latent trait is ability. A student has a certain amount of latent ability, and each item has a difficulty that "tests" the student's latent ability. Because latent ability and item difficulty are measured on the same scale, a student with average ability (mean=0) has a 50-50 chance of answering correctly an item of average difficulty (d=0). If d is a negative value, then the item is easier and our average student has a better than 50-50 chance of getting it right. On the other hand, items with positive d are more difficult and pose a greater challenge for our average student.

In our example, the eight activities flow from more relaxing to more active. Let's take a look at how an average Type A and Type B would respond to the checklist. Our average Type A has a latent intensity of 0, so the first item is a severe test with d=+2, and they are not likely to check it. The opposite is true for our average Type B respondent since d=-2 for their personality type. Checking relaxing items is easy for Type B and hard for Type A. And this pattern continues with difficulty moving in opposite directions for our two types until we reach item 8, which you will recall is the most active activity. It is an easy item for Type A (d=-2) because they like active. It is a difficult item for Type B (d=+2) because they dislike active. As a reminder, if our vacationers were seeking balance or if our items were too extreme (i.e., more challenging than Type As wanted or more relaxing than sought by our Type Bs), we would be fitting an ideal point model.

The sim.rasch() function stores its results in a list so that you need to access $items in order to retrieve the response patterns of zeros and ones for the two groups. If you run the apply functions to get your tables (see below), you will see the frequencies of checks (response=1) is increasing across the 8 items for Type A and decreasing for Type B, as one might have expected. Clearly, with real data we know none of this and all we have is a mixture of unobserved types of unobserved intensity.

Clustering and Other Separation Mechanisms

Unaware that our sample is a mixture of two separate personality types, we would be misled looking at the aggregate descriptive statistics. The total column suggests that all the activities are almost equally appealing when clearly that is not the case.

Number Checking Each Item
Item	Total	Type A	Type B
V1	100	14	86
V2	103	22	81
V3	96	28	68
V4	104	43	61
V5	94	60	34
V6	114	79	35
V7	94	75	19
V8	105	87	18

n	200	100	100

To get a better picture of the mixture of these two groups, we can look at the plot of all the respondents in the two-dimensional space formed by the first two principal components. This is fairly easy to do in R using prcomp() to calculate the principal component scores and plotting the unobserved personality type (which we only know because the data are simulated) along with arrows representing the projection of the 8 items onto this space.

pc<-prcomp(more)
plot(pc$x[,1:2],type="n")
text(pc$x[,1:2],col=segment,labels=segment)
arrows(0,0,pc$rotation[,1],pc$rotation[,2], length=.1)
text(pc$rotation[,1:2],labels=rownames(pc$rotation),cex=.75)

Created by Pretty R at inside-R.org

The resulting plot shows the separation between our two personality type (labeled 1 and 2 for A and B, respectively) and the distortion in the covariance structure that splits the 8 items into two factors (the first 4 items vs. the last 4 items).

Obviously, we need to "unmix" our data, and as you might have guessed from the well-defined separation in the above plot, any cluster analysis ought to be able to successfully recover the two segments (had we known that the number of clusters was two). K-means works fine, correctly classifying 94% of the respondents.

Had we stopped with a single categorical latent variable, however, we would have lost the ordering of our items. This is the essence of item response theory. Saying "It's the ordering of the items, stupid" might be a bit strong but may be required to focus attention on the need for an item response model. In addition to a categorical type, our data require a dimension or continuous latent variable that uses the same scale to differentiate simultaneously among items and individuals. Categories alone are not sufficient to describe fully the heterogeneity in our data.

The R package psychomix

The R package psychomix provides an accessible introduction to the topic of latent variable mixture modeling without needing to understand all the details. However, Muthen provides a readable overview for those wanting a more comprehensive summary. Searching his chapter for "IRT" should help one see where psychomix fits into the larger framework of latent variable mixture modeling.

We will be using the raschmix() function to test for 1 to 3 mixtures of different difficulty parameters. Obviously, we never know the respondents personality type with real data. In fact, we may not know if we have a mixture of different types at all. All we have is the response patterns of check marks across the 8 items. The function raschmix() must help us decide how many, if any, latent categories and the item difficulty parameters in each category. Fortunately, it all becomes clear with an example, so here is the R code to run the analysis.

library(psychomix)
mixture<-raschmix(more, k=1:3)
 
## inspect results
mixture
plot(mixture)
 
## select best BIC model
BIC(mixture)
best <- getModel(mixture, which = "BIC")
summary(best)
 
group<-clusters(best)

person<-apply(more,1,sum)
table(group,segment[(person>0 & person<8)])

Created by Pretty R at inside-R.org

At a minimum, the function raschmix() needs a data matrix [not a data frame, so use as.matrix()] and the number of mixtures to be tested. We have set k=1:3, so that we can compare the BIC for 1, 2, and 3 mixtures. The results have been stored in a list called mixture, and one extracts information from the list using methods. For example, typing "mixture" (the name of the list object holding the results) will produce the summary fit statistics.

	iter	converged	k	k0	logLik	AIC	BIC	ICL
1	2	TRUE	1	1	-1096	2223	2272	2222
2	10	TRUE	2	2	-949	1956	2051	2011
3	76	TRUE	3	3	-938	1963	2103	2083

Although one should use indexes such as the BIC cautiously, these statistics suggest that there are two mixtures. Because raschmix() relies on an expectation-maximum (EM) algorithm, you ought not be surprised if you get somewhat different results when you run this code. In fact, the solution for the 3-mixture model may not converge under the default 100 iteration limit. We use the getModel() method to extract the two mixture model with the highest BIC and print out the solution with summary().

	prior	size	post>0	ratio
Comp.1	0.503	97	178	0.545
Comp.2	0.497	97	169	0.574

Item Parameters:
	Comp.1	Comp.2
V1	2.35	-2.24
V2	1.64	-1.72
V3	1.21	-0.83
V4	0.38	-0.41
V5	-0.35	0.79
V6	-1.58	0.80
V7	-1.17	1.65
V8	-2.47	1.97

We started with 200 respondents, but six respondents were removed because three checked none of the items and three checked all of the items. That is why the sizes for the two mixture components do not sum to 200. The item parameters are the item difficulties that we specified with our d argument in sim.rasch() when we randomly generated the data. The first component looks like our Type A personality with the easiest to check activities toward the end of the list with the negative difficulty parameters. Type B is the opposite with the more passive activities at the beginning of the list being the easiest to check because they are the most preferred by this segment.

Finally, the last three lines of R code first identifies the cluster membership for every respondent using the psychomix method clusters() and then verifies its accuracy with tables(). As we saw with k-means earlier, we are able to correctly identify almost all the personality types when the two segments are well-separated by the reverse ordering of their difficulty parameters.

Of course, real data can be considerably messier than our simulation with sim.rasch(), requiring us to think hard before we start the analysis. In particular, items must be carefully selected since we are attempting to separate respondents using different response generation processes based solely on their pattern of checked boxes. Fortunately, markets have an underlying structure that helps us understand how consumer heterogeneity is formed and maintained.

Friday, August 16, 2013

Using Heatmaps to Uncover the Individual-Level Structure of Brand Perceptions

Heatmaps, when the rows and columns are appropriately ordered, provide insight into the data structure at the individual level. In an earlier post I showed a cluster heatmap with dendrograms for both the rows and the columns. In addition, I provided an example of what a heatmap might look like if the underlying structure were a scalogram or a Guttman scale such as what we would expect to find in item response theory (IRT). Although it is not blood spatter analysis from crime scene investigation, heatmaps can assist in deciding whether the underlying heterogeneity is a continuous (IRT model) or discrete (finite mixture model) latent variable.

For example, in my last post I generated 200 observations on 8 binary items using a Rasch simulation from the R package psych. As a reminder, we were attempting to simulate the perceptions of hungry airline passengers as they passed by a Subway restaurant on their way to the terminal to board their airplane. Using a checklist, respondents were asked to indicate if the restaurant had good seating and menu selection, timely ordering and food preparation, plus tasty, filling, healthy, and fresh food.

In order to show the underlying pattern of scores, we will need to sort both the rows and the columns by their marginal values. That is, one would calculate the total score across the 8 items for each respondent and sort by these marginal total scores. In addition, one would compute the column means across respondents for each of the 8 items and sort by these marginal item means.

In the above heatmap for our Rasch simulated data, we can see the typical Guttman scale pattern. As one moves from left to right, the item get easier, that is, the column becomes bluer. Similarly, as one travels down the heatmap from the top, we find respondents with increasingly higher scores. Both of these findings are expected given that the rows and columns have been sorted by their marginals. However, what is revealing in the heatmap is the pattern with which the data matrix changes from red to blue. We call this pattern "cumulative" because respondents appear to score higher by adding items to their checklists. Only a few did not check any of the items. Those who checked only one item tended to say that the food was fresh. Healthy, filling and tasty were added next. Only those giving Subway the highest scores marked the first four service items.

The R code is straightforward when you use the heatmap.2 function from the R package gplots. We start with the 200 x 8 data matrix (called ToyData) created in my last post, calculate row and column marginals, and sort the data matrix by the marginals. Then, we call the gplots package and run the heatmap.2 function. As you might imagine, there are a lot of options. Rowv and Colv are set to FALSE so that the current order of the rows and columns will be maintained. There is no dendrogram because we are not clustering the rows and columns. I am using red and blue for the colors. I am adding a color key, but leaving out the row labels.

item<-apply(ToyData,2,mean)
person<-apply(ToyData,1,sum)
ToyDataOrd<-ToyData[order(person),order(item)]
 
library(gplots)
heatmap.2(ToyDataOrd, Rowv=FALSE, Colv=FALSE, 
          dendrogram="none", col=redblue(16), 
          key=T, keysize=1.5, density.info="none", 
          trace="none", labRow=NA)

Created by Pretty R at inside-R.org

Why Does One Observe the Guttman Scale Pattern?

We find the Guttman scale pattern whenever there is a strong sequential or cumulative structure to the data (e.g., achievement test scores, physical impairment, cultural evolution, and political ideology). In the case of brand perceptions, we would only expect to see cumulative effects in well-formed product categories where there was universal agreement concerning the strengths and weaknesses of brands in the category.

In order to use an item response model, there must be sufficient constraints so that there is a cumulative pattern underlying the items. If I wanted to buy a hammer, I would need to choose between good, better, and best. The "best" does all the stuff done by the "better" and then some. Product features are cumulative. First class provides all the benefits of second class plus some extras. And the same holds for services. We can talk about meeting or exceeding expectation only because we all understand the cumulative ordering. The consumer knows when they receive only basic service, and they can tell you when they receive more than the minimal required. Again, the effects are cumulative. A successful brand must always provide the basics. They exceed our expectations by doing more, and we can capture that "more" by including additional items in our questionnaire.

Tuesday, August 13, 2013

The Brand as Affordance: Item Response Modeling of Brand Perceptions

It is just too easy to think of a brand as a web of associations. What comes to mind when I say "Subway Sandwich"? Did you remember a commercial or the "eat fresh" tagline? Without much effort, one can generate a long list of associations with the Subway brand, and why not map all those associations with a network structure? In marketing we refer to such a representation as a brand concept map. Although you can draw it by hand, R has qgraph and igraph and a lot of other packages that will create network representations from association matrices. But does this have anything to do with the purchase decision?

Suppose that you are in an airport with only a short time before they start boarding for your flight, and you are hungry. What comes to mind when you come across this Subway restaurant on your way to the terminal? In the abstract the brand may be a concept with a complex pattern of associations. In the marketplace, however, the brand is a delivery system. What do you want in this particular purchase context? Is it quick ordering, fast preparation, good seating, and menu selection; or is it freshness, filling, healthy and tasty?

The Latent Variable Underlying Brand Perception is Affordance

In the purchase context, brand is not a concept but an affordance. It is "as if" your personal preferences and the purchase occasion have generated a checklist of features and services that you desire in this particular context. Actually, they have created an ordered checklist of features and services that you would like the brand to deliver. The ordering reflects the severity of your demands, for example, your willingness to purchase if the brand fails to deliver. Thus, you may not be willing to compromise on speed and a place to sit. You can live without your favor cheese, but a long line is a deal breaker.

The brand must pass your "test" in order for you to become a customer. Your criteria are like items on an exam arranged in increasing difficulty for the brand to achieve. How well does Subway perform? Why not ask consumers to indicate which features and services a brand would be able to deliver in a specific purchase occasion? Our goal is to provide enough realism that consumers could see themselves buying in this context. They must be familiar enough with the brand that they can anticipate how well it would deliver each feature and service, and they must have sufficient experience in this or similar situations that they will know what they would want.

The result is a pattern of zeros and ones for each respondent, not unlike what we would find looking at the correct and incorrect scores for a series of item from an exam. In this case the exam is written by the consumer with item difficulty for the brand measured by the number of potential customers lost if the brand fails to perform.

An example might help, so why not use our eight attributes listed above: quick ordering, fast preparation, good seating, menu selection, freshness, filling, healthy, and tasty. We will use the airport description above, show the picture, and give each respondent the 8-item checklist. Let's pretend that we are such a respondent. We like the "subs" so we check "yes" for taste and filling. Although some of the sandwich do not seem that healthy or fresh to us, there are alternatives that are, so we check "yes" for healthy and fresh. But we have found slow service, limited menu options, and subpar seating. We end up with four ones and four zeros. And what do we do with those who really like Subway? They do not feel that the menu is limited; they can get their favorite sandwich. Nor do they believe that service is all that bad. The seating, on the other hand, is not the best so their checklist might have seven ones and only one zero.

Doesn't this seem like a test that Subway takes repeatedly with every potential customer? Successful brands are not unaware of what consumers want. In fact, they promote themselves as providing these benefits. Consumers hear the advertising and read the signage. Whether you love or hate Subway, freshness will be the attribute you most associate with the brand. It is Subway's unique selling proposition, and it appeals to consumers who hold a similar customer value proposition.

What about Quiznos? Yes, there is a product category, and every brand in the product category shares the same value proposition with the same rank ordering of feature/service delivery. My claim is that the response pattern will be the same for Subway and Quiznos because they belong to the same product category with the same value proposition. If Quiznos' market share is smaller than Subway's, then the percentage checking "yes" may be lower but the pattern will be the same (see Bryon Sharp for an entertaining overview at TED).

If we were to construct a perceptual map that plotted relative profiles (e.g., correspondence analysis), we would expect to see Subway and Quiznos near each other and some distance away from McDonalds, Burger King, and Wendy's. Moreover, the perceptual map would look the same regardless of how we segmented the respondents. I know this because my clients frequently ask to run the same correspondence map for different subgroups: users vs. nonuser, heavy users vs. light users, young vs. old, Brand A vs. Brand B, and so on. Of course, brands "bounce" around by a bit due to random variation, but the map maintains the same overall structure because we all share a common consumption culture.

Item Response Theory

Continuing with the testing metaphor, why not use item response theory to analyze our checklist data? Let's put Subway to the test, and see what respondents think when given our eight item checklist. I have listed all the R code that you will need to run a toy example with a simulated sample of 200 randomly generated respondents. I believe that the annotation is sufficient to explain the code, but I will provide an overview.

The latent variable is Subway's perceived affordance. That is, respondents are believed to possess some level of confidence or trust that Subway is capable of delivering on its brand promise. Subway promises freshness and healthy food but also taste and selection. Moreover, it claims to be a fast food, so there are additional service promises, although these might be considered secondary. The checklist contains items that can be arranged in order so that the checking of an item at a higher level implies the checking of items at all the lower levels. High and low levels refer to the latent variable, which in this case is affordance, or how difficult is it for Subway to deliver on its promise. You will see this in the R code as difficulty or d. Difficulty is measured on a scale with mean=0 and standard deviation=1 so that a +2 is a very difficulty item with few respondents likely to check and a -2 is so easy that most will mark the box.

Respondents are measured on the same scale as the items. Because item response theory began in educational assessment, we use the term "ability" to refer to our trust latent variable. If it helps, you can think of it as a respondent's ability to trust Subway to keep its promise. Regardless of its name, more and more items get checked as respondents score higher on the latent variable. Thus, consumers who love Subway have high ability to trust and check most of the items. Consumers who are not as fond have low ability to trust Subway and check fewer of the items. However, the checking is not random but follows item difficulty. We call this pattern of responding to items a Guttman scale.

When you run the R code below, you will create the following figure summarizing the main finding that as "ability to trust" increases, the probability of checking each item increases with the least difficult items most likely to be checked at any location along the scale. The curves follow the ordering of item difficulty with the four items concerning food near each other at the lower end of the ability scale and the four items measuring service grouping toward the higher end. Most trust Subway to deliver on the food; fewer believe that Subway will succeed when it comes to service.

To be clear, we are assuming the existence of a "fresh fast food" product category with a unique selling proposition consisting of a set of promises that can be ordered in terms of ease of delivery. Consumers, in general, understand what is promised and accept that some promises will be more difficult to keep than others. When the more difficult is achieved, we speak of exceeding expectations. Consumers see the brand as more or less committed to providing "fresh fast food" based on the brand's ability to deliver on each of the items. In fact, a brand's performance on an item is attributed to the brand's commitment. Consequently, all the items are interconnected in that they all depend on the brand as an intentional agent.

The item characteristic curves below show the entire response pattern. Each item that is checked is not an independent association with a brand. It is a reflection of the respondent's ability to trust that the brand will deliver on the "fresh fast food" value proposition. As a result, all the items are interdependent such that a change in any one item will impact all the remaining items. Moreover, marketing actions or events that directly impact the latent variable will move all the items higher or lower (e.g., corporate philanthropy or scandal).

Conclusion: If we ask about the brand outside of the purchase context, we are likely to discover a rich associative network as if the brand were a concept. However, when we embedded our respondent in a realistic setting, we learn that only a small portion of that vast associative network is needed to make a purchase. If our goal is to predict consumer response in the marketplace, we will need to be more careful about how we ask the question and how we analyze the resulting data.

# use psych package to simulate Rasch model data
library(psych)
 
# need to set seed in order to obtain same result each time
set.seed(36321)
 
# 8 items with difficulty specified by vector d
# 200 respondents from normal distribution with mean 0 and variance 1
Toy<-sim.rasch(nvar=8, n=200, d=c(+2.0, +1.5, +1.0, +0.5, -0.5, -1.0, -1.5, -2.0))
 
# output is a list with binary 0 and 1 items in $items
ToyData<-Toy$items
colnames(ToyData)<-c("seat", "menu", "order", "prepare", "taste", "filling", "healthy","fresh")
 
# optional ordering of data matrix to see how pattern of 0s and 1s change with increasing total scores
item<-apply(ToyData,2,mean)
person<-apply(ToyData,1,sum)
ToyDataOrd<-ToyData[order(person),order(item)]
col_mean<-apply(ToyDataOrd,2,mean)*100
ToyDataOrd<-rbind(ToyDataOrd,col_mean)
row_sum<-apply(ToyDataOrd,1,sum)
ToyDataOrd<-cbind(ToyDataOrd,row_sum)
ToyDataOrd<-cbind(1:201,ToyDataOrd)
round(ToyDataOrd)
 
# use ltm package to run Rasch model
library(ltm)
descript(ToyData)
fit<-rasch(ToyData)
fit
plot(fit)