Tuesday, July 15, 2014

Taking Inventory: Analyzing Data When Most Answer No, Never, or None

Consumer inventories, as the name implies, are tallies of things that consumers buy, use or do. Product inventories, for example, present consumers with rather long lists of all the offerings in a category and ask which or how many or how often they buy each one. Inventories, of course, are not limited to product listings. A tourist survey might inquire about all the different activities that one might have enjoyed on their last trip (see Dolnicar et al. for an example using the R package biclust). Customer satisfaction studies catalog all the possible problems that one could experience with their car, their airline, their bank, their kitchen appliances and a growing assortment of product categories. User experience research gathers frequency data for all product features and services. Music recommender systems seek to know what you have listened to and how often. Google Analytics keeps track of every click. Physicians inventory medical symptoms.

For most inventories the list is long, and the resulting data are sparse. The attempt to be comprehensive and exhaustive produces lists with many more items than any one consumer could possibly experience. Now, we must analyze a data matrix where no, never, or none is the dominant response. These data matrices can contain counts of the number of times in some time period (e.g., purchases), frequencies of occurrences (e.g., daily, weekly, monthly), or assessments of severity and intensity (e.g., a medical symptoms inventory). The entries are all nonnegative values. Presence and absence are coded zero and one, but counts, frequencies and intensities include other positive values to measure magnitude.

An actual case study would help, however, my example of a feature usage inventory relies on proprietary data that must remain confidential. This would be a severe limitation except that almost every customer inventory analysis will yield similar results under comparable conditions. Specifically, feature usage is not random or haphazard, but organized by wants and needs and structured by situation and task. There are latent components underlying all product and service usage. We use what we want and need, and our wants and needs flow from who we are and the limitations imposed by our circumstances.

In this study a sizable sample of customers were asked how often they used a list of 72 different features. Never was the most frequent response, although several features were used daily or several times a week. As you might expect, some features were used together to accomplish the same tasks, and tasks tended to be grouped into organized patterns for users with similar needs. That is, one would not be surprised to discover a smaller number of latent components controlling the observed frequencies of feature usage.

The R package NMF (nonnegative matrix factorization) searches for this underlying latent structure and displays it in a coefficient heatmap using the function coefmap(object), where object is the name of list return by the nmf function. If you are looking for detailed R code for running nmf, you can find it in two previous posts demonstrating how to identify pathways in the consumer purchase journey and how to uncover the structure underlying partial rankings of only the most important features (top of the heap).

The following plot contains 72 columns, one for each feature. The number of rows are supplied to the function by setting the rank. Here the rank was set to ten. In the same way as one decides on the best number of factors in factor analysis or the best number of clusters in cluster analysis, one can repeat the nmf with different ranks. Ten works as an illustration for our purposes. We start by naming those latent components in the rows. Rows 3 and 8 have many reddish rectangles side-by-side suggesting that several features are accessed together as a unit (e.g., all the features needed to take, view, and share pictures with your smartphone). Rows 1, 2, 4 and 5, on the other hand, have one defining feature with some possible support features (e.g., 4G cellular connectivity for your tablet).
The dendrogram at the top summarizes the clustering of features. The right hand side indicates the presence of two large clusters spanning most of the features. Both rows 3 and 8 pull together a sizable number of features. However, these blocks are not of uniform color hinting that some features may not be used as frequently as others of the same type. Rows 6, 7, 9 and 10 have a more uniform color, although the rectangles are smaller consisting of combinations of only 2, 3 or 4 features. The remaining rows seem to be defined by a single feature each. It is in the manner that one talks about NMF as a feature clustering technique.

You can see that NMF has been utilized as a rank-reduction technique. Those 4 blocks of features in rows 6, 7, 9 and 10 appear to function as units, that is, if one feature in the block is used, then all the features in the block are used, although to different degrees as shown by the varying colors of the adjacent rectangles. It is not uncommon to see a gate-keeping feature with a very high coefficient anchoring the component with support features that are used less frequently in the task. Moreover, features with mixture coefficients across different components imply that the same feature may serve different functions. For example, you can see in row 8 a grouping of features near the middle of the row with mixing coefficients in the 0.3 to 0.6 range for both rows 3 and 8. We can see the same pattern for a rectangle of features a little more left mixing rows 3 and 6. At least some of the features serve more than one purpose.

I would like to offer a little more detail so that you can begin to develop an intuitive understanding of what is meant by matrix factorization with nonnegativity constraints. There are no negative coefficients in H, so that nothing can be undone. Consequently, the components can be thought of as building blocks for each contain the minimal feature pattern that act together as a unit. Suppose that a segment only used their smartphones to make and receive calls so that their feature usage matrix had zeroes everywhere except for everyday use of the calling features. Would we not want a component to represent this usage pattern? And what if they also used their phone as a camera but only sometimes? Since there is probably not a camera-only segment, we would not expect to see camera-related features as a standalone component. We might find, instead, a single component with larger coefficients in H for calling features and smaller coefficients in the same row of H for the camera features.

Recalling What We Are Trying to Do

It always seems to help to recall that we are trying to factor our data matrix. We start with an inventory containing the usage frequency for some 72 features (columns) for all the individual users (rows). Can we still reproduce our data matrix using fewer columns? That is, can we find fewer than 72 component scores for individual respondents that will still reproduce approximately the scores for all 72 features? Knowing only the component scores for each individual in our matrix W, we will need a coefficient matrix H that takes the component scores and calculates feature scores. Then our data matrix V is approximated by W x H (see Wikipedia for a review).

We have seen H (feature coefficients), now let's look at W (latent component scores). Once again, NMF displays usage patterns for all the respondents with a heatmap. The columns are our components, which were defined earlier in terms of the features. Now, what about individual users? The components or columns constitute building blocks. Each user can decide to use only one of the components or some combination of several components. For example, one could choose to use only the calling features or seldom make calls and text almost everything or some mixture of these two components. This property is often referred to in the NMF literature as additivity (e.g., learning the parts of objects).

So, how should one interpret the above heatmap? Do we have 10 segments, one for each component? Such a segmentation could be achieved by simply classifying each respondent as belonging to the component with the highest score. We start with fuzzy membership and force it to be all or none. For example, the first block of users at the top of column 7 can be classified as Component #7 users, where Component #7 has been named based on the features in H with the largest coefficients. As an alternative, the clustered heatmap takes the additional step of running a hierarchical cluster analysis using distances based on all 10 components. By treating the 10 components as mixing coefficients, one could select any clustering procedure to form the segments. A food consumption study referenced in an earlier post reports on a k-means in the NMF-derived latent space.

Regardless of what you do next, the heatmap provides the overall picture and thus is a good place to start. Heatmaps can produce checkerboard patterns when different user groups are defined by their usage of completely different sets of features (e.g., a mall with distinct specialty stores attracting customers with diverse backgrounds). However, this is not what we see in this heatmap. Instead, Component #7 acts almost as continuous usage intensity factor: the more ways you use your smartphone, the more you use your smartphone (e.g., business and personal usage). The most frequent flyers fly for both business and pleasure. Cars with the most mileage both commute and go on vacation. Continuing with examples will only distract from the point that NMF has enabled us to uncover structure from a large and largely sparse data matrix. Whether heterogeneity takes a continuous or discrete form, we must be able to describe it before we can respond to it.

No comments:

Post a Comment