December 3, 2012
It’s Time For A Change: A Shiny One

I presented rApache to the public for the first time at the Directions in Statistical Computing workshop in August 2005 (paper), almost seven years ago. It might have been novel, maybe even crazy at the time, but I think rApache showed people a new way to bring R to the web.

I presented brew, a templating framework for mixing HTML and R code, on a poster at useR! 2007. When used with rApache, it’s basically just like using PHP… but with R.

I wrote rApache to scratch an itch. I wrote brew on a whim, just to see if I could do it. They’re both open source, so anyone can use them or change them, but I’m kind of bored with them.

I like shiny. shiny is the web application framework I had hoped to write one day, but I had lost motivation and what’s worse lost touch with what was going on in the web programming space.

I’m going to write shiny applications like this one now.

Disclaimer: I helped implement the shiny server running on RStudio’s beta site. It’ll be open sourced soon.

10:55am  |   URL: http://tmblr.co/Zf5rDyYaPNYS
(View comments
Filed under: R rstats 
November 19, 2012
RMySQL Looking For A New Maintainer

Please contact me if you’d like to take over maintainership of this popular R package.

9:43am  |   URL: http://tmblr.co/Zf5rDyXbpGZ1
(View comments  
Filed under: R rstats 
November 15, 2012
Innovation in Statistical Computing

In A Capitalist’s Dilemma, Whoever Wins on Tuesday, Clayton Christensen lays out three kinds of innovations through which an industry cycles:

  • Empowering Innovations - those that offer products and services to a new customer base. The classic empowering (or disruptive) innovation is Ford Motor Company’s introduction of the low-cost Model T coupled with the ability of Ford’s own workers to afford such a car.
  • Sustaining Innovations - those that improve on the value of current products and services by replacing them with newer and better ones. Christensen offers the hybrid Toyota Prius as an example.
  • and Efficiency Innovations - those that reduce the cost of making and distributing current products and services, such as steel minimills and low cost car insurance like Geico.

Today, I see this cycle coming full circle in the field of statistical computing, and specifically with R.

There is no question that John Chamber’s S system has been an empowering innovation. The S System was remarkable in that it pioneered the use of data visualization and interactive computing. Prior to S, statisticians wrote single programs to perform a single task, or they bundled these programs together into algorithmic collections or subprograms.

Without a doubt, the open source R project (not unlike S) can be viewed as a sustaining innovation. It improves on S in many ways, preserving and enhancing the interactive environment, the language, data visualization, etc. More importantly, it integrates the ability to easily download and use software located on CRAN (Comprehensive R Archive Network).

Finally, there are many efficiency innovations that have occured with R, mainly through new R packages. There are too many to list, but Paul Murrell’s grid package gave birth to lattice and ggplot2 improving data visualization, and Hadley Wickam’s devtools package made it easy to create and distribute packages.

But the biggest efficiency innovation to alter statistical computing in R has been the  creation of RStudio, an open source IDE for R. No other IDE, commercial or open source, can touch the feature set or even quality of RStudio’s products.

Two observations about RStudio have brought me to this conclusion:

  • their complete IDE can run in the browser, offering the possibility to harness supercomputing facilities and big data from a laptop, and easing systems administration of many R users by managing only one R install.
  • and the ability to quickly create packages and share them with others. This video shows the bare minimum steps needed to bundle your code and share it with millions, in under two minutes!

Truth be told, RStudio leverages all the good work made by others. For instance, it’s Wickam’s devtools package underneath the hood driving RStudio’s packaging feature. It’s Yihui’s knitr package along with Sweave that makes writing R documentation in RStudio such a pleasure. But it’s in the engineering, the stitching  together of all these packages that creates an innovative experience. And it’s too soon to tell, but we may look back on this period in history and say that RStudio was more than an efficiency innovation; it might just have been disruptive, too.

11:50am  |   URL: http://tmblr.co/Zf5rDyXKoTY0
(View comments  
Filed under: r rstats 
October 17, 2012
Deploy Rook Apps: Part II

In Part I, I described how you can deploy your Rook applications with rApache. This post describes how you can do it with R itself. But before we get into that, I’d like to explain the off-again on-again relationship Rook has had with CRAN, R’s package archive network.

Since inception (of Rook, not the movie), I wanted to give Rook the most flexibility possible, and that meant discovering how R’s internal web server worked. By inspecting the code from startDynamicHelp in the tools package, I discovered there were two basic calls to start and stop the server:

.Internal(startHTTPD("127.0.0.1", ports[i]))

and

.Internal(stopHTTPD())

but it turns out that inclusion of .Internal calls is a violation of CRAN’s Policy:

CRAN packages should use only the public API. Hence they should not use entry
points not declared as API in installed headers nor .Internal()l nor .Call()
etc calls to base packages. Such usages can cause packages to break at any
time, even in patched versions of R.

Understood. R-Core does a herculean job of maintaining the package repository with very little human and physical capital, and ensuring that R packages behave nicely from one R release to another is a task that all package authors should strive for. So, I yanked those calls out of Rook and play nicely by calling startDynamicHelp.

Unfortunately, that hobbles Rook in just the slightest way; it can no longer listen on any other IP address other than 127.0.0.1 … at least out of the box, but you as a Rook user are in full control of your R environment. That leads me to the following recipe for deploying a Rook app.

Yes, Using Only R, You Can Deploy A Rook App

So here’s a recipe I cooked up to circumvent R’s http environment. I don’t recommend doing this for production, but it’s nice to show a few friends and co-workers. This is an Rscript file which you can execute from the shell. It starts up Rook on port 8000 and will listen on the 0.0.0.0 IP address. That means it will listen on your loopback device as well as any other network device you have set up on your machine. If you want to be really savy, you could even change the myPort variable to 80, like a real web server! Just know that’s a priviledged port and will need root access.

The recipe adds the test application from the Rook package system files, and it’s easy to add more than one application if you like.

#!/usr/bin/env Rscript

library(Rook)

myPort <- 8000
myInterface <- "0.0.0.0"
status <- -1

# R 2.15.1 uses .Internal, but the next release of R will use a .Call.
# Either way it starts the web server.
if (as.integer(R.version[["svn rev"]]) > 59600) {
    status <- .Call(tools:::startHTTPD, myInterface, myPort)
} else {
    status <- .Internal(startHTTPD(myInterface, myPort))
}

if (status == 0) {
    unlockBinding("httpdPort", environment(tools:::startDynamicHelp))
    assign("httpdPort", myPort, environment(tools:::startDynamicHelp))

    s <- Rhttpd$new()
    s$listenAddr <- myInterface
    s$listenPort <- myPort

    # Change this line to your own application. You can add more than one
    # application if you like
    s$add(name = "test", app = system.file("exampleApps/RookTestApp.R", package = "Rook"))

    # Now make the console go to sleep. Of course the web server will still be
    # running.
    while (TRUE) Sys.sleep(24 * 60 * 60)
}

# If we get here then the web server didn't start up properly
warning("Oops! Couldn't start Rook app")

8:31pm  |   URL: http://tmblr.co/Zf5rDyVVW2Jg
(View comments  
Filed under: rstats 
July 27, 2012
rApache 1.2.0 Released

With this release comes a minor change in behavior: for requests that have been configured with RFileEval, RFileHandler, or using the r-script handler, rApache will set the working directory to the file’s directory.

For instance with a Rook deployment like this:

 <Location /hmisc> 
        SetHandler r-handler 
        RFileEval "/home/hornerj/Hmisc/config.R:Rook::Server$call(app)" 
</Location> 

It makes sense to change the working directory to /home/hornerj/Hmisc. That way, the examples in the Rook package can work without change.

Also, for:

<Directory /home/hornerj/rapache/test/brew> 
  SetHandler r-script 
  RHandler brew::brew 
</Directory> 

and a request of /home/hornerj/rapache/test/brew/simple.html, it makes sense to set the working directory to:

/home/hornerj/rapache/test/brew

Or if the request was /home/hornerj/rapache/test/brew/subdir/foo.html, it makes sense to set it to:

/home/hornerj/rapache/test/brew/subdir

Yay for minor releases!

9:50am  |   URL: http://tmblr.co/Zf5rDyQCi2Au
(View comments
Filed under: R rstats 
July 23, 2012
Deploy Rook Apps with rApache: Part I

Since rApache 1.1.15 you’ve been able to deploy you Rook applications like so:

# Run the Rook application named 'app'. On each request, the expression 
# 'Rook::Server$call(app)' is evaluated in an environment populated by
# rookapp.R. 'app' is expected to be found in that environment.
<Location /test/RookApp>
        SetHandler r-handler
        RFileEval /path/to/Rook/App/rookapp.R:Rook::Server$call(app)
</Location>

Let’s go through the above example step by step, starting with the Location directive from apache.

The Location Directive Works on URLs

In apache, the Location directive works on the URL space of the server. In this case, we are telling apache that URLs starting with /test/RookApp are hooked up to our Rook application.

SetHandler Tells Apache That R Is In Charge

Of course you know apache is modular, and one way that third party modules (like rApache) can tell apache what it can do is by registering handlers, basically text strings. When a web request comes in, apache runs through its config files and figures out what handler has been assigned to the request. Then it runs through all of the third party modules and asks each one of them if they handle the particular handler. In our example, rApache knows how to handle “r-handler” stuff. So by placing SetHandler r-handler within our Location directive above, rApache will take over handling the request.

RFileEval: An Absolute File Path And an Expression

Here comes a bit of magic. The RFileEval directive is not an apache directive. Rather, it is an rApache directive. The syntax is “file:expression”. When a request comes in, rApache will create an anonymous R environment and execute each expression located in file. The equivalent R command is something like:

sys.source(file,envir=new.env())

Then after that, the expression is run within the anonymous environment. In our example, the expression is Rook::Server$call(app). Rook::Server is an object from the Rook package. app is a variable that must be found by lexical scope in the anonymous environment. So you better name your Rook application app in your file. It doesn’t have to be called app. You could have easily named your app foo. Then you’ll need to change the expression to Rook::Server$call(foo).

Here’s the cool part: rApache keeps the anonymous environment around after the request. When a new request comes in, it checks the timestamp of the file. If it hasn’t changed, then there’s nothing left to do except run the expression Rook::Server$call(app). However, if the timestamp has changed (meaning that someone edited the file), then the file is re-evaluated in a new anonymous environment and THEN the Rook expression is run.

Was I right? Cool? Cool. Expect more deployment posts in the following… days… hopefully.

4:08pm  |   URL: http://tmblr.co/Zf5rDyPyi-bR
(View comments  
Filed under: R rstats 
June 29, 2012
Wrap-up on Blogging with R Markdown and tumblr

This is a wrap-up post to summarize a few of the issues I’ve found so far with blogging on tumblr with R Markdown.

tumblr Puts a 1Mb Cap On Its HTML Editor

Fair warning.

When I tried eating my own dogfood while writing the previous posts, I found that I had to manually upload all those pretty screenshots of the tumblr interface. For some reason, tumblr was truncating the HTML I was pasting into its editor. By trial and error, I found out that they place a cap of around 1Mb on the HTML. That’s essentially 96 R plots at 504x432 pixels. How do I know? Because I placed this bit of code:

for (i in 1:96) 
   plot(rnorm(i), main = paste(i, "Squares"), 
    col = rainbow(i, alpha = runif(i, 0, 1))[round(runif(i, 1, i))], 
    pch = ".", cex = round(runif(i, 1, 100)))

into an R Markdown file, rendered it to HTML with markdownToHTML(), and uploaded it to my test blog http://testerester.tumblr.com a number of times. Maxed out at 96. Regardless, that’s 96 images I didn’t have to upload manually!

R Highlighting Is Now Fixed

I presumed that the hosted version of highlight.js contained a language definition for R. It actually does not, but it’s easy to include one. I’ve done such and am now hosting my own highlight.packed.js on rapache.net here:

http://rapache.net/stylesheets/highlight.pack.js

and have updated the tumblr R Markdown theme here:

https://github.com/jeffreyhorner/RFMExamples/blob/master/R-Markdown-tumblr-theme.html

You Can Drop the Save in the Edit/Save/Knit Iteration

JJ Allaire assures me that you don’t have to save your R Markdown document before you knit it in RStudio. It is saved automatically, and dropping the save action speeds up iterative development by a factor 1.5!

3:17pm  |   URL: http://tmblr.co/Zf5rDyONYaXZ
(View comments  
Filed under: R rstats 
June 26, 2012
Blog with R Markdown and tumblr: Part II

In Part I of this series I described how to set up your tumblr blog so that you can create posts like those on the example site R Markdown Blog.

Now I’ll describe how you can actually create such posts. I’ll be using the RStudio IDE for the desktop in all the steps below, but know that you can use your own version of R and your own editor for steps 1, 2, and 4. I personaly like the the RStudio knitr integration. It provides a really easy and fast iterative process to quickly edit markdown and render to HTML.

Step 1: Install The Latest Version of the R markdown package

markdown version 0.5.2 is needed for this process, and since it’s currently not on CRAN (it’s on its way) you will need to get it from github. This is easily done with Hadley Wickam’s devtools package. Follow these steps to install devtools, markdown, and knitr which you will need in later steps:

install.packages("devtools")
library(devtools)
install_github("markdown", "rstudio")
library(markdown)
install.packages("knitr")
library(knitr)

Step 2: Create a New R Markdown Document

In RStudio, click on File -> New -> R Markdown. This will create a new untitled file with some example markdown text. The first two lines of the file contain a proper title for the document in markdown syntax, but we won’t need that for our blog post. Go ahead and delete them.

Save the file and name it First-Post.Rmd.

Step 3: Click the “Knit HTML” Button

That button is just above the first line of the file. You should see a ball of yarn with a knitting needle sticking in it. After clicking the button you should see a couple of windows flicker by with info, and then ultimately this:

rendered html

If your window looks like this, then congratulations! You just created a valid R Markdown document and rendered it into an HTML page. This step automatically creates a new file called First-Post.html, but we’re not ready to blog just yet.

Side Note About Iterative Development

If your window doesn’t look like the above, then you’ve got some editing to do. You will now enter an iterative edit/save/knit loop, and this is where RStudio really shines. Here are the steps:

  1. Make your edits to First-Post.Rmd.
  2. Type Ctrl-s to save.
  3. Type Ctrl-Shift-h to re-knit the document. equivalent to clicking the “Knit HTML” button.
  4. If you get the output you want, your done, if not go to 1.

Simple as that!

Suppose you’re not using RStudio. Then you can still get pretty close to the above. Using your favorite editor, your favorite browser, and another R IDE, follow these steps:

  1. Make your edits to First-Post.Rmd, then save.
  2. Execute the following commands in R:
knit2html("First-Post.Rmd")  # Step 3
browseURL("First-Post.html")  # Step 3.1

Your browser should open with First-Post.html displayed. If you get the output you want, hurray! Otherwise go to step 1.

So goes iterative development ;) Now on to blogging…

Step 4: Execute knit() and markdown2HTML() Manually

Up to this step, we’ve been using knit commands to create complete HTML documents, those that contain beginning HTML tags like <html>, <head>, and <body>… tags that every blogging platform will not accept in their blog posts. These commands also inject javascript code to render math equations and highlight R code chunks, which is nice, but those are already in our tumblr theme we created in Part I (something I didn’t tell you before).

So now our job is to prepare the HTML without these tags and extra javascript. We need to call knit() and then markdownToHTML() with the fragment.only option set to TRUE. Run these steps manually in RStudio (or R):

knit("First-Post.Rmd")

# produces the intermediate file 'First-Post.md'

markdownToHTML("First-Post.md", "First-Post.html", fragment.only = TRUE)

Now open up First-Post.html in your editor and you should see the following:

rstudio first post

I’ve highlighted lines 1 and 22 with a red circle. Notice on line 1 that there’s no beginning HTML markup as I described above. That’s good, and your output should look similar if not the same.

Also notice on line 22 that the <img> tag looks a little unusual. That’s because by default markdownToHTML will automatically embed locally linked images using base64 encoding. You really don’t need to know how it’s encoded, but just know that your browser will show you the image that you were expecting. That’s the beauty of the markdown package. You have just one HTML document that contains all your codes and plots!

Okay, now we leave RStudio for a second and go to tumblr…

Step 5: Log In to tumblr and Click the “Gear” Icon

I’ve highlighted it in red:

gear icon

Step 6: Click the “plain text/HTML” Radio Button, Then Click “Save”

The blog posts we are creating will contain HTML, so we want to ensure that we’re using the correct editor, highlighted in red:

plain editor

Note that the “Save” button is at the bottom of the page, so you’ll probably have to use the scroll bar on your browser to get there.

Step 6: Click on the “Text” Icon to Start a New Blog Post

I still get tripped up on this as I haven’t blogged much in the past, but once you log in to your tumblr account and click on your blog name, you will want to click the icon circled in red below to start a new “Text” blog post:

new text blog

Be sure that you see “HTML enabled” highlighted below in red:

text post editor

Go ahead and fill out the Title with “Text” (or something else to your liking).

Step 7: Copy First-Post.html

Switch back to RStudio, and copy the entire text of First-Post.html using Ctrl-a then Ctrl-c (or your favorite incantation).

Step 8: Paste First-Post.html Into the tumblr Editor, Then Click “Create Post”

Now switch back to your browser, Paste First-Post.html (Ctrl-v) into the tumblr editor, scroll down to the bottom of the page and click the “Create Post” button.

You will now be taken back to your Dashboard and you should see something like this:

dashboard post

What you are now looking at is your Dashboard’s interpretation of your blog post. That’s okay, but what you really want to see is your blog. Click on the button I’ve highlighted in red above to get to your blog, and you should see something like the example I created at http://jeffrey-horner.tumblr.com/.

Step 8: Done!

Congratulations! You just created your first blog post with R Markdown. Now go back to R and create some meaninguful statistical content that we can all learn from! And don’t forget to blog about it!

Cheers!

1:31pm  |   URL: http://tmblr.co/Zf5rDyOAOLKZ
(View comments  
Filed under: R rstats 
June 24, 2012
Blog with R Markdown and tumblr: Part I

I finally got a chance this weekend to settle on a way to include R Markdown into my blogging process. I needed to do this as my subsequent postings will involve more code chunks regarding Rook deployment and examples, and R Markdown formats and highlights code chunks like a boss! If you want to incorporate R code, math equations, and R plots like this example, follow these steps to create a tumblr blog and get your theme ready to write your first post.

Step 1: Create a tumblr Blog

Easy enough, just got to http://www.tumblr.com, sign up, login, and follow the steps to name your first blog. There will be other interstitial steps like asking how old you are, asking who you want to follow, etc. The important thing is to name your new blog and get to your Dashboard. It will look something like this:

tumblr dashboard

tumblr has done a great job of creating a simple visual interface, but sometimes even the interface can break down and leave the user confused about where they are and what they need to do next. For instance, notice that in the red circle and just under the word Dashboard is a small triangle as if it’s pointing to the word. The light-blue on dark-blue color scheme hides that a bit, and maybe you can change that? I don’t know, but that’s the indicator that you’re on your tumblr Dashboard screen.

Step 2: Click on the Name of Your Blog

Just to the right of the word Dashboard in the screen above, you should see the word Untitled. That’s the name of the blog in this example. Click on it and you should see:

tumblr blog dashboard

Now that little triangle thingy has moved underneath the word Untitled. That’s where we want to be.

Step 3: Click on “Customize theme”

Now, before you see the following screen, you may see a screen that tells you to verify your email address. Go do that and then get back to your blog page by following Step 2 above. Done? Okay, here’s what you’ll see:

customize theme

Step 4: Click on “Edit HTML”

You’re now presented with a screen with two columns: on the left is an editor with the html code of your theme, and on the right is a preview of what your blog looks like based on the contents of the editor.

edit theme

Step 5: Pay Attention!

Because the next steps are really important.

Step 6: Delete Everything in the Editor

I typically do this by making sure my mouse focus is in the editor, typing Ctrl-a (hold down the “Ctrl” key and click “a” once), and then typing the Delete key. Make sure your editor window now looks like this:

empty editor

Step 7: Copy the R Markdown tumblr Theme

In a new browser tab or window, surf to

https://raw.github.com/jeffreyhorner/RFMExamples/master/R-Markdown-tumblr-theme.html

and copy the entire file by typing Ctrl-a and then Ctrl-c (or your computer may have a variant of this, for instance on Macs you would use Command-a and then Command-c, at least I think that’s right).

Step 8: Paste it into the Editor and Click “Update preview”

Now switch back to the browser tab or window that contains the tumblr editor and paste the contents into the editor, then click the green Update preview button located at the top of the screen. It should look something like this:

updated preview

Step 9: Click the “Save” Button, Then the “Appearance” Button, Then the “Close” Button

Hurray! You just updated your blog theme! Now you’re well on your way to creating your first blog post with R Markdown. I’ll cover that in Part II tomorrow.

3:09pm  |   URL: http://tmblr.co/Zf5rDyO24RBU
(View comments  
Filed under: R rstats 
June 13, 2012
Rook Tutorial at useR! 2012

I had such a blast presenting my tutorial on Rook yesterday. Thanks go out to all who
attended!

All the slides are online here and I’ll be updating my RookTutorial github project with
all the great suggestions I got from the attendees.

Also, check back soon as I’m planning more postings on Rook.

Cheers!

12:47pm  |   URL: http://tmblr.co/Zf5rDyNKDGWe
(View comments
Filed under: R rstats