I presented rApache to the public for the first time at the Directions in Statistical Computing workshop in August 2005 (paper), almost seven years ago. It might have been novel, maybe even crazy at the time, but I think rApache showed people a new way to bring R to the web.
I presented brew, a templating framework for mixing HTML and R code, on a poster at useR! 2007. When used with rApache, it’s basically just like using PHP… but with R.
I wrote rApache to scratch an itch. I wrote brew on a whim, just to see if I could do it. They’re both open source, so anyone can use them or change them, but I’m kind of bored with them.
I like shiny. shiny is the web application framework I had hoped to write one day, but I had lost motivation and what’s worse lost touch with what was going on in the web programming space.
Empowering Innovations - those that offer products and services to a new customer base. The classic empowering (or disruptive) innovation is Ford Motor Company’s introduction of the low-cost Model T coupled with the ability of Ford’s own workers to afford such a car.
Sustaining Innovations - those that improve on the value of current products and services by replacing them with newer and better ones. Christensen offers the hybrid Toyota Prius as an example.
and Efficiency Innovations - those that reduce the cost of making and distributing current products and services, such as steel minimills and low cost car insurance like Geico.
Today, I see this cycle coming full circle in the field of statistical computing, and specifically with R.
There is no question that John Chamber’s S system has been an empowering innovation. The S System was remarkable in that it pioneered the use of data visualization and interactive computing. Prior to S, statisticians wrote single programs to perform a single task, or they bundled these programs together into algorithmic collections or subprograms.
Without a doubt, the open source R project (not unlike S) can be viewed as a sustaining innovation. It improves on S in many ways, preserving and enhancing the interactive environment, the language, data visualization, etc. More importantly, it integrates the ability to easily download and use software located on CRAN (Comprehensive R Archive Network).
Finally, there are many efficiency innovations that have occured with R, mainly through new R packages. There are too many to list, but Paul Murrell’s grid package gave birth to lattice and ggplot2 improving data visualization, and Hadley Wickam’s devtools package made it easy to create and distribute packages.
But the biggest efficiency innovation to alter statistical computing in R has been the creation of RStudio, an open source IDE for R. No other IDE, commercial or open source, can touch the feature set or even quality of RStudio’s products.
Two observations about RStudio have brought me to this conclusion:
their complete IDE can run in the browser, offering the possibility to harness supercomputing facilities and big data from a laptop, and easing systems administration of many R users by managing only one R install.
Truth be told, RStudio leverages all the good work made by others. For instance, it’s Wickam’s devtools package underneath the hood driving RStudio’s packaging feature. It’s Yihui’s knitr package along with Sweave that makes writing R documentation in RStudio such a pleasure. But it’s in the engineering, the stitching together of all these packages that creates an innovative experience. And it’s too soon to tell, but we may look back on this period in history and say that RStudio was more than an efficiency innovation; it might just have been disruptive, too.
With this release comes a minor change in behavior: for requests that have been configured with RFileEval, RFileHandler, or using the r-script handler, rApache will set the working directory to the file’s directory.
Since rApache 1.1.15 you’ve been able to deploy you Rook applications like so:
# Run the Rook application named 'app'. On each request, the expression
# 'Rook::Server$call(app)' is evaluated in an environment populated by
# rookapp.R. 'app' is expected to be found in that environment.
Let’s go through the above example step by step, starting with the Location directive from apache.
The Location Directive Works on URLs
In apache, the Location directive works on the URL space of the server. In this case, we are telling apache that URLs starting with /test/RookApp are hooked up to our Rook application.
SetHandler Tells Apache That R Is In Charge
Of course you know apache is modular, and one way that third party modules (like rApache) can tell apache what it can do is by registering handlers, basically text strings. When a web request comes in, apache runs through its config files and figures out what handler has been assigned to the request. Then it runs through all of the third party modules and asks each one of them if they handle the particular handler. In our example, rApache knows how to handle “r-handler” stuff. So by placing SetHandler r-handler within our Location directive above, rApache will take over handling the request.
RFileEval: An Absolute File Path And an Expression
Here comes a bit of magic. The RFileEval directive is not an apache directive. Rather, it is an rApache directive. The syntax is “file:expression”. When a request comes in, rApache will create an anonymous R environment and execute each expression located in file. The equivalent R command is something like:
Then after that, the expression is run within the anonymous environment. In our example, the expression is Rook::Server$call(app). Rook::Server is an object from the Rook package. app is a variable that must be found by lexical scope in the anonymous environment. So you better name your Rook application app in your file. It doesn’t have to be called app. You could have easily named your app foo. Then you’ll need to change the expression to Rook::Server$call(foo).
Here’s the cool part: rApache keeps the anonymous environment around after the request. When a new request comes in, it checks the timestamp of the file. If it hasn’t changed, then there’s nothing left to do except run the expression Rook::Server$call(app). However, if the timestamp has changed (meaning that someone edited the file), then the file is re-evaluated in a new anonymous environment and THEN the Rook expression is run.
Was I right? Cool? Cool. Expect more deployment posts in the following… days… hopefully.
This is a wrap-up post to summarize a few of the issues I’ve found so far with blogging on tumblr with R Markdown.
tumblr Puts a 1Mb Cap On Its HTML Editor
When I tried eating my own dogfood while writing the previous posts,
I found that I had to manually upload all those pretty screenshots of
the tumblr interface. For some reason, tumblr was truncating the
HTML I was pasting into its editor. By trial and error, I found out that
they place a cap of around 1Mb on the HTML. That’s essentially 96 R plots
at 504x432 pixels. How do I know? Because I placed this bit of code:
for (i in 1:96)
plot(rnorm(i), main = paste(i, "Squares"),
col = rainbow(i, alpha = runif(i, 0, 1))[round(runif(i, 1, i))],
pch = ".", cex = round(runif(i, 1, 100)))
into an R Markdown file, rendered it to HTML with markdownToHTML(), and uploaded it to my test blog http://testerester.tumblr.coma number of times. Maxed out at 96. Regardless, that’s 96 images I didn’t have to upload manually!
R Highlighting Is Now Fixed
I presumed that the hosted version of highlight.js contained a language definition for R. It actually does not, but it’s easy to include one. I’ve done such and am now hosting my own highlight.packed.js on rapache.net here:
You Can Drop the Save in the Edit/Save/Knit Iteration
JJ Allaire assures me that you don’t have to save your R Markdown document before you knit it in RStudio. It is saved automatically, and dropping the save action speeds up iterative development by a factor 1.5!
In Part I of this series I described how to set up your tumblr blog so that you can create posts like those on the example site R Markdown Blog.
Now I’ll describe how you can actually create such posts. I’ll be using the RStudio IDE for the desktop in all the steps below, but know that you can use your own version of R and your own editor for steps 1, 2, and 4. I personaly like the the RStudio knitr integration. It provides a really easy and fast iterative process to quickly edit markdown and render to HTML.
Step 1: Install The Latest Version of the R markdown package
markdown version 0.5.2 is needed for this process, and since it’s currently not on CRAN (it’s on its way) you will need to get it from github. This is easily done with Hadley Wickam’s devtools package. Follow these steps to install devtools, markdown, and knitr which you will need in later steps:
In RStudio, click on File -> New -> R Markdown. This will create a new untitled file with some example markdown text. The first two lines of the file contain a proper title for the document in markdown syntax, but we won’t need that for our blog post. Go ahead and delete them.
Save the file and name it First-Post.Rmd.
Step 3: Click the “Knit HTML” Button
That button is just above the first line of the file. You should see a ball of yarn with a knitting needle sticking in it. After clicking the button
you should see a couple of windows flicker by with info, and then ultimately this:
If your window looks like this, then congratulations! You just created a valid R Markdown document and rendered it into an HTML page. This step automatically creates a new file called First-Post.html, but we’re not ready to blog just yet.
Side Note About Iterative Development
If your window doesn’t look like the above, then you’ve got some editing to do. You will now enter an iterative edit/save/knit loop, and this is where RStudio really shines. Here are the steps:
Make your edits to First-Post.Rmd.
Type Ctrl-s to save.
Type Ctrl-Shift-h to re-knit the document. equivalent to clicking the “Knit HTML” button.
If you get the output you want, your done, if not go to 1.
Simple as that!
Suppose you’re not using RStudio. Then you can still get pretty close to the above. Using your favorite editor, your favorite browser, and another R IDE, follow these steps:
Your browser should open with First-Post.html displayed. If you get the output you want, hurray! Otherwise go to step 1.
So goes iterative development ;) Now on to blogging…
Step 4: Execute knit() and markdown2HTML() Manually
Now open up First-Post.html in your editor and you should see the following:
I’ve highlighted lines 1 and 22 with a red circle. Notice on line 1 that there’s no beginning HTML markup as I described above. That’s good, and your output should look similar if not the same.
Also notice on line 22 that the <img> tag looks a little unusual. That’s because by default markdownToHTML will automatically embed locally linked images using base64 encoding. You really don’t need to know how it’s encoded, but just know that your browser will show you the image that you were expecting. That’s the beauty of the markdown package. You have just one HTML document that contains all your codes and plots!
Okay, now we leave RStudio for a second and go to tumblr…
Step 5: Log In to tumblr and Click the “Gear” Icon
I’ve highlighted it in red:
Step 6: Click the “plain text/HTML” Radio Button, Then Click “Save”
The blog posts we are creating will contain HTML, so we want to ensure that we’re using the correct editor, highlighted in red:
Note that the “Save” button is at the bottom of the page, so you’ll probably have to use the scroll bar on your browser to get there.
Step 6: Click on the “Text” Icon to Start a New Blog Post
I still get tripped up on this as I haven’t blogged much in the past, but once you log in to your tumblr account and click on your blog name, you will want to click the icon circled in red below to start a new “Text” blog post:
Be sure that you see “HTML enabled” highlighted below in red:
Go ahead and fill out the Title with “Text” (or something else to your liking).
Step 7: Copy First-Post.html
Switch back to RStudio, and copy the entire text of First-Post.html using Ctrl-a then Ctrl-c (or your favorite incantation).
Step 8: Paste First-Post.html Into the tumblr Editor, Then Click “Create Post”
Now switch back to your browser, Paste First-Post.html (Ctrl-v) into the tumblr editor, scroll down to the bottom of the page and click the “Create Post” button.
You will now be taken back to your Dashboard and you should see something like this:
What you are now looking at is your Dashboard’s interpretation of your blog post. That’s okay, but what you really want to see is your blog. Click on the button I’ve highlighted in red above to get to your blog, and you should see something like the example I created at http://jeffrey-horner.tumblr.com/.
Step 8: Done!
Congratulations! You just created your first blog post with R Markdown. Now go back to R and create some meaninguful statistical content that we can all learn from! And don’t forget to blog about it!
I finally got a chance this weekend to settle on a way to include R Markdown into my blogging process. I needed to do this as my subsequent postings will involve more code chunks regarding Rook deployment and examples, and R Markdown formats and highlights code chunks like a boss! If you want to incorporate R code, math equations, and R plots like this example, follow these steps to create a tumblr blog and get your theme ready to write your first post.
Step 1: Create a tumblr Blog
Easy enough, just got to http://www.tumblr.com, sign up, login, and follow the steps to name your first blog. There will be other interstitial steps like asking how old you are, asking who you want to follow, etc. The important thing is to name your new blog and get to your Dashboard. It will look something like this:
tumblr has done a great job of creating a simple visual interface, but sometimes even the interface can break down and leave the user confused about where they are and what they need to do next. For instance, notice that in the red circle and just under the word Dashboard is a small triangle as if it’s pointing to the word. The light-blue on dark-blue color scheme hides that a bit, and maybe you can change that? I don’t know, but that’s the indicator that you’re on your tumblr Dashboard screen.
Step 2: Click on the Name of Your Blog
Just to the right of the word Dashboard in the screen above, you should see the word Untitled. That’s the name of the blog in this example. Click on it and you should see:
Now that little triangle thingy has moved underneath the word Untitled. That’s where we want to be.
Step 3: Click on “Customize theme”
Now, before you see the following screen, you may see a screen that tells you to verify your email address. Go do that and then get back to your blog page by following Step 2 above. Done? Okay, here’s what you’ll see:
Step 4: Click on “Edit HTML”
You’re now presented with a screen with two columns: on the left is an editor with the html code of your theme, and on the right is a preview of what your blog looks like based on the contents of the editor.
Step 5: Pay Attention!
Because the next steps are really important.
Step 6: Delete Everything in the Editor
I typically do this by making sure my mouse focus is in the editor, typing Ctrl-a (hold down the “Ctrl” key and click “a” once), and then typing the Delete key. Make sure your editor window now looks like this:
and copy the entire file by typing Ctrl-a and then Ctrl-c (or your computer may have a variant of this, for instance on Macs you would use Command-a and then Command-c, at least I think that’s right).
Step 8: Paste it into the Editor and Click “Update preview”
Now switch back to the browser tab or window that contains the tumblr editor and paste the contents into the editor, then click the green Update preview button located at the top of the screen. It should look something like this:
Step 9: Click the “Save” Button, Then the “Appearance” Button, Then the “Close” Button
Hurray! You just updated your blog theme! Now you’re well on your way to creating your first blog post with R Markdown. I’ll cover that in Part II tomorrow.