Monday, September 29, 2014

TURF Analysis: A Bad Answer to the Wrong Question

Now that R has a package performing Total Unduplicated Reach and Frequency (TURF) Analysis, it might be a good time to issue a warning to all R users. DON'T DO IT!

The technique itself is straight out of media buying from the 1950s. Given some number of n alternative advertising options (e.g., magazines), which set of size k will reach the most readers and be seen the most often? Unduplicated reach is the primary goal because we want everyone in the target audience to see the ad. In addition, it was believed that seeing the ad more than once would make the ad more effective (that is, until wearout), which is why frequency is a component. When TURF is used to create product lines (e.g., flavors of ice cream to carry given limited freezer space), frequency tends to be downplayed and the focus placed on reaching the largest percentage of potential customers. All this seems simple enough until one looks carefully at the details, and then one realizes that we are interpreting random variation.

The R package turfR includes an example showing how to use their turf() function by setting n to 10 and letting k range from 3 to 6.

library(turfR)
data(turf_ex_data)
ex1 <- turf(turf_ex_data, 10, 3:6)
ex1
Created by Pretty R at inside-R.org

This code produces a considerable amount of output. I will show only the first 10 best triplets from the 120 possible sets of three that can be formed from 10 alternatives. The rchX columns tells the weighted proportion of the 180 individuals in the dataset that would buy one of the 10 products listed in the columns labeled with integers from 1 to 10. Thus, according to the first row, 99.9% would buy something if Items 8, 9, and 10 were offered for sale.

combo
rchX
frqX
1
2
3
4
5
6
7
8
9
10
1
120
0.998673
2.448993
0
0
0
0
0
0
0
1
1
1
2
119
0.998673
2.431064
0
0
0
0
0
0
1
0
1
1
3
99
0.995773
1.984364
0
0
0
1
0
0
0
1
0
1
4
110
0.992894
2.185398
0
0
0
0
1
0
0
0
1
1
5
64
0.991567
1.898693
0
1
0
0
0
0
0
0
1
1
6
109
0.990983
2.106944
0
0
0
0
1
0
0
1
0
1
7
97
0.99085
1.966436
0
0
0
1
0
0
1
0
0
1
8
116
0.989552
2.341179
0
0
0
0
0
1
0
0
1
1
9
85
0.989552
2.042792
0
0
1
0
0
0
0
0
1
1
10
36
0.989552
1.800407
1
0
0
0
0
0
0
0
1
1

The sales pitch for TURF depends on showing only the "best" solution for 3 through 6. Once we look down the list, we find that there are lots of equally good combinations with different products (e.g., the combination in the 7th position yields 99.1% reach with products 4, 7 and 10). With a sample size of 180, I do not need to run a bootstrap to know that the drop from 99.9% to 99.1% reflects random variation or error.

Of course, the data from turfR is simulated, but I have worked with many clients and many different datasets across a range of categories and I have never found anything but random differences among the top solutions. I have seen solutions where the top several hundred combinations cannot be distinguished based on reach, which is reasonable given that the number of combinations increases rapidly with n and k (e.g., the R function choose(30,5) indicates that there are 142,506 possible combinations of 30 things in sets of 5). You can find an example of what I see over and over again by visiting the TURF website for XLSTAT software.

Obviously, there is no single best item combination that dominates all others. It could have been otherwise. For example, it is possible that the market consists of distinct segments with each wanting one and only one item.

With no overlapping in this Venn diagram, it is clear that vanilla is the best single item, followed by vanilla and chocolate as the best pair, and so on had there been more flavors separated in this manner.

However, consumer segments are seldom defined by individual offerings in the market. You do not stop buying toothpaste because your brand has been discontinued. TURF asks the wrong question because consumer segmentation is not item-based.

As a quick example, we can think about credit card reward programs with its categories covering airlines, cash back, gas rebates, hotel, points, shopping and travel. Each category could contain multiple reward offers. A TURF analysis would seek the best individual rewards ignoring the categories. Yet, comparison websites use categories to organize searches because consumer segments are structured around the benefits offered by each category.

The TURF Analysis procedure from XLSTAT allows you to download an Excel file with purchase intention ratings for 27 items from 185 respondents. A TURF analysis would require that we set a cutoff score to transform the 1 through 5 ratings into a 0/1 binary measure. I prefer to maintain the 5-point scale and treat purchase intent as an intensity score after subtracting one so that the scale now ranges from 0=not at all to 4=quite sure. A nonnegative matrix factorization (NMF) reveals that the 27 items in the columns fall into 8 separable row categories marked by the red indicating a high probability of membership and yellow with values close to zero showing the categories where the product does not belong.

The above heatmap displays the coefficients for each of the 27 products, as the original Excel file names them. Unfortunately, we have only the numbers and no description of the 27 products. Still, it is clear that interest has an underlying structure and that perhaps we ought to consider grouping the products based on shared features, benefits or usages. For example, what do Products 5, 6 and 17 clustered together at the end of this heatmap have in common? Understand, we are looking for stable effects that can be found in the data and in the market where purchases are actually made.

The right question asks about consumer heterogeneity and whether it supports product differentiation. Different product offerings are only needed when the market contains segments seeking different benefits. Those advocating TURF analysis often use ice cream flavors as their example, as I did in the above Venn diagram. What if the benefit driving sales of less common flavors was not the flavor itself but the variety associated with a new flavor or a special occasion when one wants to deviate from their norm? A segmentation, whether NMF or another clustering procedure, would uncover a group interested in less typical flavors (probably many such flavors). This is what I found from the purchase history of whiskey drinkers, a number of segments each buying one of the major brands and a special occasion or variety seeking segment buying many niche brands. All of this is missed by a TURF analysis that gives us instead a bad answer to the wrong question.

Appendix with R Code needed to generate the heatmap:

First, download the Excel file, convert it to csv format, and set the working directory to the location of the data file.

test<-read.csv("demoTurf.csv")
library(NMF)
fit<-nmf(test[,-1]-1, 8, method="lee", nrun=20)
coefmap(fit)

Created by Pretty R at inside-R.org

6 comments:

  1. Interesting post! Would this kind of analysis also work on Sales data ? As in would purchases or revenues indicate the amount of interest and help in bundling of products ?

    ReplyDelete
    Replies
    1. Thank you for taking the time to comment. Nonnegative matrix factorization requires a data matrix with only zeros and positive numbers, so counts or revenues can be analyzed. Moreover, it helps to have a sparse data matrix with lots of zeros. We see this when we have separation with different rows associated with different columns. In a previous post titled, Customer Segmentation Using Purchase History: Another Example of Matrix Factorization, I provided an example using a binary variable to indicate if a brand of Scotch whiskey had been purchased within a specific time interval. Had number of purchases been available, I would have preferred to use such an intensity measure expecting it to add differentiation among the respondents. It should be noted that the data are individual-level and not store data. There is no reason not to analyze store-level data as long as you have different stores selling different products, that is, separation and sparsity.

      Delete
  2. Thanks Joel for your post. I'm the author of turfR, so I thought I'd send a quick comment. I can't disagree with your take on TURF analysis. Especially when there are many items, we frequently end up interpreting noise. If we are just presenting a single combination of a given size to our end clients, those combinations can be fraught with random noise. However, we often present far more detail than that, and that's where TURF can still be useful.

    I also think that the NMF method is a good one for finding product x person segments, and that such an approach can be far superior to TURF as you've pointed out. However, I quibble with the idea that a 1-5 purchase intent scale can or should be treated as a linear construct. Such data are better treated as 0/1 binaries. I converted your data to 0/1 (top 2 box), added a "none" column since the NMF algorithm won't handle a matrix where some of the rows are uniformly zero, and ran the same NMF model you did, now with 9 segments. The most closely associated products were #7 and #8 (similar to your analysis), but the white space in the "blank" areas of the map tended to be quite a bit whiter. I think you end up modeling a fair amount more noise than you need to by assuming 1-5 (or 0-4) is linear.

    My code:

    test <- read.table("demoTURFbin.txt", header=TRUE) #after converting to binary, and saving as a tab-delimited text file

    none <- 1 - sign(apply(test[,-1], 1, sum))
    test <- cbind(test, none)

    fit <- nmf(test[,-1], 9, method="lee")
    coefmap(fit)

    ReplyDelete
    Replies
    1. Interesting and thank you for the reanalysis. It is good to see that the underlying structure seems to be maintained even when the rating scale has been dichotomized. As you note, your transformation makes the data matrix sparser, which is why you needed to add the “none” variable. In fact, the percentage of the cells that are zero is increased from 32% with a 0-4 rating to 79% with your dichotomy. You have more zeros in you data matrix, so you have more “white” in your coefficient heatmap.

      Delete
    2. Sparsity is part of it, yes. I would also argue that in a data set like this, the concept of "none of these" is an important one. Not only do people gravitate towards groups of products, but some people will choose not to purchase any products [or in this instance, it might just be scale-use bias, but that's another issue altogether]. Anyway, assuming that it's a real subset of people who are not interested in any of the products, we fail to capture them as a unique segment by keeping the scale intact and not adding a "none" variable. We do capture them by dichotomizing the data and adding the necessary "none" variable. Wouldn't it be valuable to be able to profile these folks so that we can perhaps understand why a certain group is not interested at all in any of the products?

      Jumping down off my soapbox now.

      Delete
    3. I appreciate your input and your contribution with the turfR package. No one gets rich maintaining a free program on R that others charge to use as proprietary software. More importantly, you have been careful to display all the output so that no one is misled. You have earned a right to that soapbox.

      Delete