Friday, January 10, 2014

Finding the R community a barrier to entry, Python looks elsewhere for lunch

Tal Yarkoni's post on "The homogenization of scientific computing, or why Python is steadily eating other languages' lunch" is an enjoyable read of his transition from R to Python. He makes a good case, and I have no argument with his reasoning or the importance of Python in his work. But my experience has not been the same. I am a methodologist working in marketing. I could have called myself a data analyst in the sense that John Tukey used that term back in his 1962 paper on The Future of Data Analysis. Bill Venables speaks of R in a similar manner and quotes Tukey in his keynote at UseR! 2012, "Statistics work is detective work!" I like that description.

So when I turn to R, I am looking for more than code. "The game is afoot!" I require all the usual tools and perhaps something new or from another field of research. As an example, marketing is concerned with heterogeneity because "one size does not fit all." But every field is concerned with heterogeneity. It's the second moment of a distribution. We refer to it as heterogeneity in marketing, but you might call it variability, variation, dispersion, spread, diversity, or individual differences. There are even more words for the attempt to summarize and explain the second moment: density estimation, finite mixtures, seriation, sorting, clustering, grouping, segmenting, graph cutting, partitioning, and tessellation. R has a package for every term, from many differing points of view, and with more on the way every day.

Detective work borrows whatever assists in the hunt. As a marketing scientist trying to understand customer heterogeneity, R provides everything I need for clustering and finite mixture modeling. Moreover, R contributors provide more than a program, writing some of the best and most insightful papers in the field. However, why restrict myself to traditional approaches to understanding heterogeneity when R includes access to archetypal analysis, item response theory, and latent variable mixture models? These are three very different approaches that I can borrow only because they share a common R language.  It is extremely difficult to learn from fields with a different vocabulary. Even if the underlying math is the same, everything is called by a different name. R imposes constraints on the presentation of the material so that comprehension is still difficult but no longer impossible.

Of course, Python also has a mixture package, and perhaps at some point in the future we will see a Python community that will compete with R. Until then, Python will have to skip lunch.


No comments:

Post a Comment