Friday, November 14, 2014

In Praise of Substantive Expertise in Data Science

Substantive expertise makes it into the Data Science Venn Diagram from DataCamp's infographic on how to become a data scientist. It's one of the three circles of equal size along with programming and statistics. Regrettably, substantive expertise is never mentioned in the definition of a data scientist as "someone who is better at statistics than any software engineer and better at software engineering than any statistician." And it gets no step. Statistics is the first step, and the remaining steps cover programming in all its varying forms. "Alas, poor Substance! I knew him, DataCamp."

All of this, of course, is to be taken playfully. I have no quarrel with any of DataCamp's 8-step program. I only ask that we recognize that there are three circles of equal value. Some of us come to data science with substantive expertise and seeking new models for old problems. Some even contribute libraries applying those models in their particular areas of substantive expertise. R provides a common language through which we can visit foreign disciplines and see the same statistical models from a different perspective.

John Chambers reminds us in his UseR! 2014 keynote address that R began as a "user-centric scientific software tool" providing "an interface to the very best numerical algorithms." Adding an open platform for user-submitted packages, R also becomes the interface to a diverse range of applications. This is R's unique selling proposition. It is where one goes for new ways of seeing.

No comments:

Post a Comment