Wednesday, July 18, 2012

Gamification Quantification

Surveys become engaging when they become games, or at least, take on some of the characteristics of games.  This is the argument made by those advocating the gamification of marketing research [http://researchaccess.com/2011/12/market-research-trends-2012-part-one-gamification/].

Because it is a new and evolving approach to writing a survey, there is no one data collection technique that can be called gamification.  But one method that is getting some traction is the replacement of long series of ratings with a sequence of elimination tournaments.  As an example, we would stop asking respondents to rate the importance of 20 or 30 or 50 features using a 5- or 7- or 10-point scale.  Instead, respondents would see a sequence of tasks or competitions that have the effect of sorting features into a set of ordered categories.

Like all elimination tournaments, the early rounds would remove the least important features so that respondents could concentrate on making finer distinctions among the more important features.  All the features losing in the first round would receive the same score = 1.  Features removed in the second round would get the second lowest score = 2.  And this would continue until the Kth round with the remaining features receiving a score = K.

For example, in the first round all the features would be presented and the respondents would be asked which of the features pass a easy test.  "Which, if any, of the following features do you not care about at all?  That is, when you are considering what to buy, you would not notice or think about this feature."  Note that we are asking the respondent to imagine themselves in the marketplace making a product purchase.  We want to access the information that customer would use when making an actual purchase.

The second round would provide a somewhat more severe marketplace test of the surviving features.  "Which, if any, of the following features would you consider but would not have much of an impact on your purchase?  That is, all things being equal, you might prefer a product with this feature but not enough that you would pay more for it or go out of your way to buy it."  A next obvious question would ask if they would be willing to pay more for a product with this feature.  Finally, we might end with the "deal breaker" question:  "I would not buy a product without this feature."

This sequence of increasing difficult tests through which features must pass is not unlike the purchase funnel where potential customers must pass through the stages of awareness, interest, desire and action (AIDA) or awareness, consideration, preference and purchase.  Thus, the purchase funnel asks potential customers if they are aware, would they consider, what is their preference, and would they buy.  We could have done the same here, asking the complete set of four questions each of the 20 to 50 features.  But we made it more of a game with less repetition over features and more competition among features.

There are no limits to the creativity of the tournament design.  We would expect that feature importance ratings would be adapted to fit the product category so that buying an airline ticket would not be handled the same as buying shampoo or subscribing to an online service.  In each case we are attempting to mimic the purchase context as closely as possible and ask about behavioral intent in the marketplace.  We do not ask the respondent to infer how important a feature is to them.  We ask, instead, what they might do or think when making a purchase and infer feature importance from their behavioral intent.

I have provided only one relatively straightforward example in order to illustrate the process.  However, you should note that we are not trying to rank order the features. This is not a MaxDiff task where respondents must tell us which feature is most and least important.  That is, suppose that a respondent finds none of the features to have any impact on their purchase behavior.  We are not forcing respondents to tell us which of the features that they do not care about are more important to them.  We must be able to distinguish between causal users who find few features to be important and more committed user who find many features to be important.  We are only trying to sort the features into a sequence of ordered categories as if the respondents were rating each feature. But now it's more game-like and thus more engaging.

How do analyze the results of our tournaments?

After our sequence of tournaments, we have an ordinal scale along which features can be arrayed.  Features with the highest score are better than features with the second highest score, but we do not know how much better.  It might be the case that the last hurdle is much more difficult than any of the previous tournaments so that only the most desirable features make it over.  Consequently, getting from a 3 to a 4 may be a much more impressive feat than moving from a 2 to a 3 (i.e., the scale is not equal interval).  Nor can we know the properties of our ordinal scale until the data are collected.  That is, our sequence of increasing more difficult tests that features need to pass will not be the same in every study.  The obstacles to buying are not the same as the obstacles to subscribing.

We will use the R statistical language to analyze our ordinal scale.  We do not want to assume that we have an equal-interval scale, because we do not.  We have a set of ordered categories into which features have been sorted by each respondent.  But, neither do we wish to learn the complexities of odds ratios or the different cumulative links models needed to run ordinal regression or item response analysis.

A compromise is optimal quantification, a one-step procedure that Jan de Leeuw and Patrick Mair use in their r package called aspect.  Intuitively, quantification is a straightforward process.  This means that the underlying mathematical requires an advanced degree but the output can be interpreted and used easily by any market research who can read a crosstabs.  Let's look at an example, which is far easier than trying to explain in the abstract.

The table below shows the results from our tournaments using 20 features and 200 respondents.


% Respondents
Scaled Category Scores
feature
mean
sd
1
2
3
4
1
2
3
4
1
2.15
0.84
29%
28%
44%
0%
-1.48
0.06
0.92
2
2.25
0.98
28%
32%
29%
12%
-1.40
0.00
0.64
1.73
3
2.27
1.06
31%
27%
27%
16%
-1.35
0.04
0.74
1.36
4
2.29
1.12
33%
25%
23%
20%
-1.28
-0.03
0.72
1.33
5
2.31
1.12
30%
33%
16%
23%
-1.27
-0.17
0.76
1.38
6
1.70
0.83
54%
22%
24%
0%
-0.85
0.37
1.56
7
1.57
0.76
59%
25%
17%
0%
-0.79
0.75
1.73
8
1.60
0.66
50%
41%
10%
0%
-0.97
0.77
1.71
9
1.61
0.71
52%
35%
13%
0%
-0.89
0.63
1.86
10
1.66
0.72
49%
37%
15%
0%
-0.88
0.41
1.96
11
2.84
1.09
16%
21%
27%
37%
-1.54
-0.94
0.18
1.08
12
2.96
1.14
16%
21%
17%
47%
-1.63
-0.82
-0.21
0.97
13
2.58
1.13
24%
24%
25%
28%
-1.43
-0.39
0.27
1.29
14
2.74
1.25
27%
15%
18%
42%
-1.30
-0.87
0.26
1.02
15
3.42
0.91
6%
13%
16%
66%
-2.20
-1.66
-0.74
0.68
16
1.70
0.66
42%
48%
11%
0%
-1.11
0.57
1.74
17
2.08
1.00
35%
32%
23%
11%
-1.20
0.11
0.92
1.69
18
1.45
0.50
55%
45%
0%
0%
-0.90
1.11
19
1.60
0.67
51%
39%
11%
0%
-0.94
0.76
1.77
20
1.96
1.08
49%
19%
21%
12%
-0.97
0.29
1.28
1.28

Looking at the means or the percentages, feature #15 is the winner.  It is eliminated by only 6% of the respondents in round 1 and almost two-thirds consider it a must-have feature.  Feature #18, on the other hand, does not do as well with more than half of the respondents telling us that they would not even notice or think about this feature and the remaining respondents saying it would not have much impact.  Feature #18 never makes it to round 3.

If all I was concern about was identifying the best feature, then my work is done - Feature 15 is the winner.  And who likes Feature 15?  Who should be my target market?  All I have is the 66% top-box respondents.  Is this my target - the top two-thirds? 

Perhaps we could narrow our target audience by using the other features.  If we looked at all the pairwise crosstabs (not shown here), we find that there is a strong relationship between Feature 12 and Feature 15 with respondents attracted to Feature 12 also attracted to Feature 15.  Could we not use Feature 12 with a smaller 47% top-box to help us narrow down our target market?  Better yet, could we not look for a factor structure underlying the 20 features and use all the features loading on the same factor as Feature 15? 

Since our feature scores are ordinal, we would not want to calculate a Pearson correlation coefficient that assumes equal intervals.  Instead, we would need to factor analyze a matrix of polychoric correlations.  Or, we could use the R package aspect to quantify the ordinal scales and give us interval-level data that we can use in any analysis.  This is what is shown in the last four columns of the above table.

So what is a scaled category score?  Well, let's see.  We would not be surprised to find that all the features were positively related either because different subgroups of features share common attributes within the subgroup (e.g., cost savings) or because respondents who are more involved with the product category tend to want more features and respondents who are less involved tend not to know or want as much from the category (e.g., heavy versus light users).  Scaled category scores use this interrelationship to place the ordinal categories along a continuum.

Let's look at Feature 15.  Everyone wants it.  Knowing that a particular respondent is one of the 66% who want Feature 15 is not very informative.  We see this in the scaled category scores for Feature 15.  A "4" is assigned a scale score of 0.68, and every respondent with a "4" on Feature 15 has their "4" replaced with 0.68.  Since these are z-scores, we know that a "4" on Feature 15 places any respondent who tells us that Feature 15 is a deal-breaker at 0.68 standard deviation units above the mean.  A "4" on Feature 12 puts the respondent almost one standard deviation above the mean.  In general, the scaled category scores for 4's are highest when there are fewer respondents giving 4's.  And a similar pattern can be seen for the 1's.

Let us assume that there is an underlying continuum called feature importance or demand.  Each feature and each respondent can be placed along this continuum.  Demanding customers who want lots of features are located toward the high end.  Causal users who are looking for only the most basic features can be found near the low end of this continuum.  Features wanted by everyone fall toward the low end because they are "easier" tests of demand.  Everyone wants these features and to know that you want them too tells me only that you exceed a low demand threshold.  Of course, if you are not interested in the most popular features, you must be at the very low end of the scale.  On the other hand, features desired by only a few are placed at the upper end of the demand continuum (that is, only the most demanding customers want these features).  Thus, quantification uses the network of relationships among the features and among the respondents to calculate both scaled category scores and scaled respondent scores. 

Finally, we should remember that these types of scale transformations are not new to data analysis.  For example, we use a log transformation to normalize skewed variables such as advertising expenditures.  Nonlinear relationships are made linear using power transformations (e.g., squaring X).  In this case, we are seeking transformations of the ordinal categories (originally coded as 1, 2, 3, and 4) so that the features have the highest possible correlations.  We have optimizes the sum of all the correlations by replacing the original score (the number of the round in which the feature was eliminated) with the new scaled category scores.

Conclusions

We are done.  We have transformed our ordinal categories obtained by entering the features into a sequence of increasingly more difficult tournaments.  We now have a new data set with individual respondent data, which can be analyzed using all our usual statistical techniques.  Feature importance or demand scores can be factor analyzed to determine if some features are more highly interrelated and form groups of features with common characteristics (e.g., save money).  Individual respondent scores can be clustered to determine if there are groups of customers seeking the same features.  Regression analyses can be run to predict any outcome measure of interest from the feature importance scores.  Moreover, we have engaged respondents to think about the features, not in the abstract without any referent, but within the actual purchase context.  We have kept the game realistic so that our findings can be generalized to the marketplace.

Appendix:  R code needed to reproduce the analysis

# generates random ordinal data with the following marginal probabilities
# and the correlations specified by the bifactor loadings

library(orddata)
prob <- list(
  c(30,30,40)/100,
  c(30,30,30,10)/100,
  c(30,25,30,15)/100,
  c(30,25,25,20)/100,
  c(30,30,20,20)/100,
  c(60,20,20)/100,
  c(60,25,15)/100,
  c(60,30,10)/100,
  c(55,35,10)/100,
  c(55,30,15)/100,
  c(15,20,25,40)/100,
  c(15,15,20,50)/100,
  c(25,20,25,30)/100,
  c(25,15,20,40)/100,
  c(5,10,15,70)/100,
  c(45,40,15)/100,
  c(35,35,20,10)/100,
  c(55,45)/100,
  c(55,35,10)/100,
  c(50,20,20,10)/100
  )
prob

loadings<-matrix(c(
.5,.5, 0, 0, 0,
.5,.5, 0, 0, 0,
.5,.5, 0, 0, 0,
.5,.5, 0, 0, 0,
.5,.5, 0, 0, 0,
.5,.0,.5, 0, 0,
.5,.0,.5, 0, 0,
.5,.0,.5, 0, 0,
.5,.0,.5, 0, 0,
.5,.0,.5, 0, 0,
.5,.0, 0,.5, 0,
.5,.0, 0,.5, 0,
.5,.0, 0,.5, 0,
.5,.0, 0,.5, 0,
.5,.0, 0,.5, 0,
.5,.0, 0, 0,.5,
.5,.0, 0, 0,.5,
.5,.0, 0, 0,.5,
.5,.0, 0, 0,.5,
.5,.0, 0, 0,.5),
  20, 5, byrow=TRUE)
loadings

cor_matrix<-loadings %*% t(loadings)
diag(cor_matrix)<-1
cor_matrix

ord<-rmvord(n = 200, probs = prob, Cor = cor_matrix)

# loads the aspect package and runs the optimal quantificiation


library(aspect)
rescaled<-corAspect(ord, level="ordinal")
summary(rescaled)
scores<-rescaled$scoremat

No comments:

Post a Comment