This first tutorial describes the software configuration we use for data analysis in the R statistical software package. This R installation procedure provides all the analysis and reporting tools required to perform all of the statistical characterisation and validation analyses for test development and create publication-quality reports. A free (but optional) script editor, RStudio, adds some convenience features. Regardless of which editor you use, there are a few R ‘packages’ that must be installed from within R before you proceed to the tutorials. All the software installs easily on Windows, Mac and Linux.
Another advantage of R is it is consistent with the recommendations for ‘modularity’ and transparency advocated in Dudley et al (1985).
Begin with bare-bones R:
At http://cran.rstudio.com/ select the link for the latest version of R for your operating system (from the top frame, “Download and Install R”). For example, for Windows follow the Windows link then select “base” or “install R for the first time”. Follow the instructions.
Consider using RStudio as your editor:
All the tutorial steps work equally well with the native R editor (aka IDE) and RStudio, but we highly recommend the RStudio interface because of its additional convenience features: syntax highlighting, package management, file browser, plot viewing window, help menus, menus to facilitate creating reports and presentations, interface with version control systems (i.e. git and svn). RStudio installs and integrates the report writing tools we used to produce this website (e.g. Pandoc). These reporting tools also include export to PDF, Microsoft Word and other formats, and are described in more detail later. Select the RStudio ‘Installer’ download link for your operating system from http://www.rstudio.com/products/rstudio/download/.
Most data analysis functions ship with the basic installation, but we will need a few more packages for the upcoming tutorials. User-contributed R packages bundle related functions together. For example, ‘nlme’ and minpack.lm’ add cutting-edge nonlinear regression techniques, whereas ‘grid.extra’, facilitate plot formatting that looks especially good for online viewing. Other packages include data sets and data preparation functions.
In the ‘Console’ (at the >
prompt) enter the lines of code below to install all packages required for this tutorial.
install.packages(c("dplyr", "drc", "ggplot2", "gridExtra", "gthemes",
"investr", "knitr", "minpack.lm", "nCal", "nlme",
"nlstools", "plotrix", "plyr", "reshape2",
"RColorBrewer"),
repos = "http://cran.us.r-project.org", dependencies = T)
You’re done!
All the analysis and reporting methods presented in these tutorials use plain text files as input. Despite the possibly unfamiliar file extensions (e.g. .R, .Rmd, .csv, .md
), they can be opened in any text editor as with .txt
files (you will get no strange characters or illegible preamble). Basically, a .R
or .Rmd
file is opened in the ‘Source’ window of RStudio. Commands are highlighted and sent to the ‘Console’ with the ‘Run’ button or Ctrl-enter.
In a Rmarkdown script (extension .Rmd
, also known as a ‘knitr’ script), report text is written as in any Markdown, plain text or LaTeX document file and R commands are written between code block tags:
```{r [chunk-name], [chunk options]} [R code] ```
These ‘chunks’ are distinguished from prose and interpreted by the R packages, ‘knitr’ and ‘rmarkdown’. Comments can also be written in the code blocks by starting each comment line with #
as in plain .R
files. R-LaTeX scripts are differentiated from R-Markdown scripts by the extension .Rnw
instead of .Rmd
. Both choices, as well as the basic .R
script, are available through RStudio’s new document dialogue, which will open a simple template. Sending report text (prose, not code) or an incomplete line of code to the console will produce error messages, but is of no real consequence. If R gets stuck on something, press the stop symbol on the top of the ‘Console’ window (in RStudio). The RStudio documentation is thorough.
For reports, some simple ‘mark-up’—character combinations—is used to add formatting instructions to the text for the parser or interpreter. .Rmd
files use Markdown interpreters for formatting and document structure (e.g. headings). The Rmarkdown website describes how stand-alone documents such as in MS Word or PDF formats, and websites such as this one are created using RStudio. If you prefer LaTeX as a mark-up language, a file format similar to .Rmd
, ‘Sweave/noweb’ (extension .Rnw
), follows the same code-text mixing principle with a few alterations in mark-up syntax (R code is the same). Techniques for presentation slides and interactive data-analysis websites are also covered in RStudio’s RMarkdown tutorials.
Coming soon: the source files for this website
These tutorials will not cover all the analysis basics that an introductory text to the R language would. If you want to stray from the scripts (and we recommend you eventually do), you may need a more general understanding of data types and basic operations. There are myriad books and free online tutorials for R. A good general source is Quick-R. For a more advanced ‘data science’ point of view, Hadley Wickham’s “Advanced R” ebook is great and always developing. Or just “Google” for a tutorial with examples related to your field. The R user community is very helpful.
Dudley, R A, P Edwards, R P Ekins, D J Finney, I G McKenzie, G M Raab, D Rodbard, and R P Rodgers. 1985. “Guidelines for immunoassay data processing.” Clinical Chemistry 31 (8): 1264–71. http://www.ncbi.nlm.nih.gov/pubmed/3893796.