Preference begins with attention, a form of intention-guided perception. You enter the store thirsty on a hot summer day, and all you can see is the beverage cooler at the far end of the aisle with your focus drawn toward the cold beverages that you immediately recognize and desire. Focal attention is such a common experience that we seldom appreciate the important role that it plays in almost every activity. For instance, how are you able to read this post? Automatically and without awareness, you see words and phrases by blurring everything else in your perceptual field.
Similarly, when comparing products and deciding what to buy, you construct a simplified model of the options available and ignore all but the most important features. Selective attention simultaneously moves some aspects into the foreground and pushes everything else into the background, such is the nature of perception and cognition.
[Note: see "A Sparsity-Based Model of Bounded Rationality" for an economic perspective.]
Given that seeing and thinking are sparse by design, why not extend that sparsity to the statistical models used to describe human judgment and decision making? That cooler mentioned in the introductory paragraph is filled with beverages that fall into the goal-derived category of "things to drink on a hot summer day" and each has its own list of distinguishing features. The statistical modeling task begins with many options and even more distinguishing features so that the number of potential predictors is large. However, any particular individual selectively attends to only a small subset of products and features. This is what we mean by sparse predictive models - many variables in the equation but only a few with nonzero coefficients.
[Note: In order not to get lost in two different terminologies, one needs to be careful not to confuse sparse models with most parameters equal to zero and sparse matrices, which deals with storing and manipulating large data sets with lots of cells equal to zero.]
Statistical Learning with Sparsity
A direct approach might "bet on sparsity" and argue that only a few coefficients can be nonzero given the limitations of human attention and cognition. The R package glmnet will impose a budget on the total costs incurred from paying attention to many features when making a judgment or choice. Thus, with a limited span of attention, we would expect to be able to predict individual responses with only the most important features in the model. The modeler varies a tuning parameter controlling the limits of one's attention and watches predictors enter and leave the equation.
If everyone adopted the same purchase strategy, we could observe the purchase behavior of a group of customers and estimate a single set of parameters using glmnet. Instead of uniformity, however, we are more likely to find considerable heterogeneity with a mixture of different segments and substantial variation within each segment. All that is necessary to violate homogeneity is for the product category to have a high and low end, which is certainly the case with cold beverages. Now, the luxury consumer and the price sensitive will attend to different portions of the retail shelf and require that we be open to the possibility that our data are a mixture of different preference equations. Willingness to spend, of course, is but one of many possible ways of dividing up the product category. We could easily continue our differentiation of the cold beverage market by adding dimensions partitioning the cooler on the basis of calories, carbonation, coffees and teas, designer waters, alcohol content, and more.
Fragmented markets create problems for all statistical models assuming homogeneity and not just glmnet. Attention, the product of goal-directed intention, generates separated communities of consumers with awareness and knowledge of different brands and features within the same product category. The high-dimensional feature space resulting from the coevolving network of customer wants and product offerings forces us to identify a homogeneous consumer segment before fitting glmnet or any other predictive model. What matters in judgment and choice depends on where we focus our attention, which follows from our intentions, and intentions vary between individuals and across contexts (see Context Matters When Modeling Human Judgment and Choice).
Preference is constructed by the individual within a specific context as an intention to achieve some desired end state. Yet, the preference construction process tends to produce a somewhat limited result. A security camera placed by the rear beverage cooler would record a quick scan, followed by more activity as the search narrowed with a possible reset and another search begun or terminated without purchase. The beverage company has spent considerable money "teaching" you about their brand and the product category. You know what to look for before you enter the store because, as a knowledgeable consumer, you have learned a brand and product category representation and you have decided on an ideal positioning for yourself within this representation. For example, you know that beverages can be purchased in different size glass or plastic bottles, and you prefer the 12-ounce plastic bottle with a twist top.
Container preferences are but one of the building blocks acquired in order to complete the beverage purchase process. We can identify all the building blocks using nonnegative matrix factorization (NMF) and use that information to cluster consumers and features simultaneously. This is how we discover which consumers quickly find the regular colas and decide among the brands, types, favors, sizes, and containers available within this subcategory. Finally, we have our relatively homogenous dataset of regular cola drinkers for the glmnet R package to analyze. More accurately, we will have separate datasets for each joint customer-feature block and will need to derive different equations with different variables for different consumer segments.