R may be the lingua franca, yet many of the packages within the R library seem to be written in different languages. We can follow the R code because we know how to program but still feel that we have missed something in the translation.
R provides an open environment for code from different communities, each with their own set of exemplars, where the term "exemplar" has been borrowed from Thomas Kuhn's work on normal science. You need only to examine the datasets that each R package includes to illustrate its capabilities in order to understand the diversity of paradigms spanned. As an example, the datasets from the Clustering and Finite Mixture Task View demonstrate the dependence of the statistical models on the data to be analyzed. Those seeking to identifying communities in social networks might be using similar terms as those trying to recognize objects in visual images, yet the different referents (exemplars) change the meanings of those terms.
Thinking in Terms of Causes and Effects
Of course, there are exceptions, for instance, regression models can be easily understood across applications as the "pulling of levers" especially for those of us seeking to intervene and change behavior (e.g., marketing research). Increased spending on advertising yields greater awareness and generates more sales, that is, pulling the ad spending lever raises revenue (see the R package CausalImpact). The same reasoning underlies choice modeling with features as levers and purchase as the effect (see the R package bayesm).
The above picture captures this mechanistic "pulling the lever" that dominates much of our thinking about the marketing mix. The exemplar "explains" through analogy. You might prefer "adjusting the dials" as an updated version, but the paradigm remains cause-and-effect with each cause separable and under the control of the marketer. Is this not what we mean by the relative contribution of predictors? Each independent variable in a regression equation has its own unique effect on the outcome. We pull each lever a distance of one standard deviation (the beta weight), sum the changes on the outcome (sometimes theses betas are squared before adding), and then divide by the total.
The Challenge from Neural Networks
So, how do we make sense of neural networks and deep learning? Is the R package neuralnet simply another method for curve fitting or estimating the impact of features? Geoffrey Hinton might think differently. The Intro Video for Coursera's Neural Networks for Machine Learning offers a different exemplar - handwritten digit recognition. If he is curve fitting, the features are not given but extracted so that learning is possible (i.e., the features are not obvious but constructed from the input to solve the task at hand). The first chapter of Michael Nielsen's online book, Using Neural Nets to Recognize Handwritten Digits, provides the details. Isabelle Guyon's pattern recognition course adds an animated gif displaying visual perception as an active process.
On the other hand, a choice model begins with the researcher deciding what features should be varied. The product space is partitioned and presented as structured feature lists. What alternative does the consumer have, except to respond to variations in the feature levels? I attend to price because you keep changing the price. Wider ranges and greater variation only focus my attention. However, in real setting the shelves and the computer screens are filled with competing products waiting for consumers to define their own differentiating features. Smart Watches from Google Shopping provides a clear illustration of the divergence of purchase processes in the real world and in the laboratory.
To be clear, when the choice model and the neural network speak of input, they are referring to two very different things. The exemplars from choice modeling are deciding how best to commute and comparing a few offers for same product or service. This works when you are choosing between two cans of chicken soup by reading the ingredients on their labels. It does not describe how one selects a cheese from the huge assortment found in many stores.
Neural networks take a different view of the task. In less than five minutes Hinton's video provides the exemplar for representation learning. Input enters as it does in real settings. Features that successfully differentiate among the digits are learned over time. We see that learning in the video when the neural net generates its own handwritten digits for the numbers 2 and 8. It is not uncommon to write down a number that later we or others have difficulty reading. Legibility is valued so that we can say that an easier to read "2" is preferred over a "2" that is harder to identify. But what makes one "2" a better two than another "2" takes some training, as machine learning teaches us.
We are all accomplished at number recognition and forget how much time and effort it took to reach this level of understanding (unless we know young children in the middle of the learning process). What year is MCMXCIX? The letters are important, but so are their relative positions (e.g. X=10 and IX=9 in the year 1999). We are not pulling levers any more, at least not until the features have been constructed. What are those features in typical choice situations? What you want to eat for breakfast, lunch or dinner (unless you snack instead) often depends on your location, available time and money, future dining plans, motivation for eating, and who else is present (context-aware recommender systems).
Adopting a different perspective, our choice modeler sees the world as well-defined and decomposable into separate factors that can be varied systematically according to some experimental design. Under such constraints the consumer behaves as the model predicts (a self-fulling prophecy?). Meanwhile, in the real world, consumers struggle to learn a product representation that makes choice possible.
Thinking Outside the Choice Modeling Box
The features we learn may be relative to the competitive set, which is why adding a more expensive alternative makes what is now the mid-priced option appear less expensive. Situation plays an important role for the movie I view when alone is not the movie I watch with my kids. Framing has an impact, which is why advertising tries to convince you that an expensive purchase is a gift that you give to yourself. Moreover, we cannot forget intended usage for that Smartphone is a camera, a GPS, and I believe you get the point. We may have many more potential features than included in our choice design.
It may be the case that the final step before purchase can be described as a tradeoff among a small set of features varying over only a few alternatives in our consideration set. If we can mimic that terminal stage with a choice model, we might have a good chance to learn something about the marketplace. How did the consumer get to that last choice point? Why these features and those alternative products or services? In order to answer such questions, we will need to look outside the choice modeling box.