Opening paragraph from John Tukey "The Future of Data Analysis" (1962)
To begin, we must acknowledge that these labels are largely administrative based on who signs your paycheck. Still, I prefer the name "data analysis" with its active connotation. I understand the desire to rebrand data analysis as "data science" given the availability of so much digital information. As data has become big, it has become the star and the center of attention.
We can borrow from Breiman's two cultures of statistical modeling to clarify the changing focus. If our data collection is directed by a generative model, we are members of an established data modeling community and might call ourselves statisticians. On the other hand, the algorithmic modeler (although originally considered a deviant but now rich and sexy) took whatever data was available and made black box predictions. If you need a guide to applied predictive modeling in R, Max Kuhn might be a good place to start.
Nevertheless, causation keeps sneaking in through the back door in the form of causal networks. As an example, choice modeling can be justified as an "as if" predictive modeling but then it cannot be used for product design or pricing. As Judea Pearl notes, most data analysis is "not associational but causal in nature."
Does an inductive bias or schema predispose us to see the world as divided into causes and effects with features creating preference and preference impacting choice? Technically, the hierarchical Bayes choice model does not require the experimental manipulation of feature levels, for example, reporting the likelihood of bus ridership for individuals with differing demographics. Even here, it is difficult not be see causation at work with demographics becoming stereotypes. We want to be able to turn the dial, or at least selection different individuals, and watch choices change. Are such cognitive tendencies part of statistics?
Moreover, data visualization has always been an integral component in the R statistical programming language. Is data visualization statistics? And what of presentations like Hans Rosling's Let My Dataset Change Your Mindset? Does statistics include argumentation and persuasion?
Hadley Wickham and the Cognitive Interpretation of Data Analysis
You have seen all of his data manipulation packages in R, but you may have missed the theoretical foundations in the paper "A Cognitive Interpretation of Data Analysis" by Grolemund and Wickham. Sensemaking is offered as an organizing force with data analysis as an external tool to aid understanding. We can make sensemaking less vague with an illustration.
Perceptual maps are graphical displays of a data matrix such as the one below from an earlier post showing the association between 14 European car models and 27 attributes. Our familiarity with Euclidean spaces aid in the interpretation of the 14 x 27 association table. It summarizes the data using a picture and enables us to speak of repositioning car models. The joint plot can be seen as the competitive landscape and soon the language of marketing warfare brings this simple 14 x 27 table to life. Where is the high ground or an opening for a new entry? How can we guard against an attack from below? This is sensemaking, but is it statistics?
We can borrow from Breiman's two cultures of statistical modeling to clarify the changing focus. If our data collection is directed by a generative model, we are members of an established data modeling community and might call ourselves statisticians. On the other hand, the algorithmic modeler (although originally considered a deviant but now rich and sexy) took whatever data was available and made black box predictions. If you need a guide to applied predictive modeling in R, Max Kuhn might be a good place to start.
Nevertheless, causation keeps sneaking in through the back door in the form of causal networks. As an example, choice modeling can be justified as an "as if" predictive modeling but then it cannot be used for product design or pricing. As Judea Pearl notes, most data analysis is "not associational but causal in nature."
Does an inductive bias or schema predispose us to see the world as divided into causes and effects with features creating preference and preference impacting choice? Technically, the hierarchical Bayes choice model does not require the experimental manipulation of feature levels, for example, reporting the likelihood of bus ridership for individuals with differing demographics. Even here, it is difficult not be see causation at work with demographics becoming stereotypes. We want to be able to turn the dial, or at least selection different individuals, and watch choices change. Are such cognitive tendencies part of statistics?
Moreover, data visualization has always been an integral component in the R statistical programming language. Is data visualization statistics? And what of presentations like Hans Rosling's Let My Dataset Change Your Mindset? Does statistics include argumentation and persuasion?
Hadley Wickham and the Cognitive Interpretation of Data Analysis
You have seen all of his data manipulation packages in R, but you may have missed the theoretical foundations in the paper "A Cognitive Interpretation of Data Analysis" by Grolemund and Wickham. Sensemaking is offered as an organizing force with data analysis as an external tool to aid understanding. We can make sensemaking less vague with an illustration.
Perceptual maps are graphical displays of a data matrix such as the one below from an earlier post showing the association between 14 European car models and 27 attributes. Our familiarity with Euclidean spaces aid in the interpretation of the 14 x 27 association table. It summarizes the data using a picture and enables us to speak of repositioning car models. The joint plot can be seen as the competitive landscape and soon the language of marketing warfare brings this simple 14 x 27 table to life. Where is the high ground or an opening for a new entry? How can we guard against an attack from below? This is sensemaking, but is it statistics?
I consider myself to be a marketing researcher, though with a PhD, I get more work calling myself a marketing scientist. I am a data analyst and not a statistician, yet in casual conversation I might say that I am a statistician in the hope that the label provides some information. It seldom does.
I deal in sensemaking. First, I attempt to understand how consumers make sense of products and decide what to buy. Then, I try to represent what I have learned in a form that assists in strategic marketing. My audience has no training in research or mathematics. Statistics plays a role and R helps, but I never wanted to be a statistician. Not that there is anything wrong with that.
I deal in sensemaking. First, I attempt to understand how consumers make sense of products and decide what to buy. Then, I try to represent what I have learned in a form that assists in strategic marketing. My audience has no training in research or mathematics. Statistics plays a role and R helps, but I never wanted to be a statistician. Not that there is anything wrong with that.