November 15, 2012
Innovation in Statistical Computing

In A Capitalist’s Dilemma, Whoever Wins on Tuesday, Clayton Christensen lays out three kinds of innovations through which an industry cycles:

  • Empowering Innovations - those that offer products and services to a new customer base. The classic empowering (or disruptive) innovation is Ford Motor Company’s introduction of the low-cost Model T coupled with the ability of Ford’s own workers to afford such a car.
  • Sustaining Innovations - those that improve on the value of current products and services by replacing them with newer and better ones. Christensen offers the hybrid Toyota Prius as an example.
  • and Efficiency Innovations - those that reduce the cost of making and distributing current products and services, such as steel minimills and low cost car insurance like Geico.

Today, I see this cycle coming full circle in the field of statistical computing, and specifically with R.

There is no question that John Chamber’s S system has been an empowering innovation. The S System was remarkable in that it pioneered the use of data visualization and interactive computing. Prior to S, statisticians wrote single programs to perform a single task, or they bundled these programs together into algorithmic collections or subprograms.

Without a doubt, the open source R project (not unlike S) can be viewed as a sustaining innovation. It improves on S in many ways, preserving and enhancing the interactive environment, the language, data visualization, etc. More importantly, it integrates the ability to easily download and use software located on CRAN (Comprehensive R Archive Network).

Finally, there are many efficiency innovations that have occured with R, mainly through new R packages. There are too many to list, but Paul Murrell’s grid package gave birth to lattice and ggplot2 improving data visualization, and Hadley Wickam’s devtools package made it easy to create and distribute packages.

But the biggest efficiency innovation to alter statistical computing in R has been the  creation of RStudio, an open source IDE for R. No other IDE, commercial or open source, can touch the feature set or even quality of RStudio’s products.

Two observations about RStudio have brought me to this conclusion:

  • their complete IDE can run in the browser, offering the possibility to harness supercomputing facilities and big data from a laptop, and easing systems administration of many R users by managing only one R install.
  • and the ability to quickly create packages and share them with others. This video shows the bare minimum steps needed to bundle your code and share it with millions, in under two minutes!

Truth be told, RStudio leverages all the good work made by others. For instance, it’s Wickam’s devtools package underneath the hood driving RStudio’s packaging feature. It’s Yihui’s knitr package along with Sweave that makes writing R documentation in RStudio such a pleasure. But it’s in the engineering, the stitching  together of all these packages that creates an innovative experience. And it’s too soon to tell, but we may look back on this period in history and say that RStudio was more than an efficiency innovation; it might just have been disruptive, too.

11:50am  |   URL: http://tmblr.co/Zf5rDyXKoTY0
(View comments  
Filed under: r rstats 
  1. ukituki reblogged this from jeffreyhorner and added:
    http://jeffreyhorner.tumblr.com/post/35782252672/innovation-in-statistical-computing
  2. jasonpbecker reblogged this from jeffreyhorner
  3. jeffreyhorner posted this
Blog comments powered by Disqus