Using O’Connell’s ELISA data set introduced on the previous page, we prepare a data.frame for analysis.


Learning objectives for this section:


Set up the R environment

In each of the tutorials, we will set up the R environment (or workspace) needed to perform analyses (and control output formatting). If tutorials are run one after the other some commands may be redundant, but reloading packages and preferences will not create any problems.

# Load the needed packages (in case not already done)
# recent versions of Rstudio load knitr as needed
# require(knitr)

# Plot formatting
source("AMgraph.R")
## Loading required package: ggplot2
## Loading required package: ggthemes
## Loading required package: gridExtra
## Loading required package: grid
## Loading required package: plotrix
# Data set tools
require(reshape2)
## Loading required package: reshape2
require(plyr)
## Loading required package: plyr
require(dplyr) 
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require(grDevices)

# When working interactively (i.e. not calling this file from another
# function e.g. 'Knit' button or render() in outside script), set working
# directory to the location of this file. This way you will be working from
# the a common perspective---outside functions and paths will work either way.
# getwd()
# setwd("./source/R-new")

# Set options: output width, when scientific notation is used (see options help),
#              do not include statistical significance star indicators
options(width=90, scipen = 4, show.signif.stars = FALSE)

Data preparation

Data may be entered manually or imported from many different file formats (native Rdata, comma-separated, tab-delimited, etc).

Data sets can be stored as lists or rectangular tables (data.frames in R). Data.frames holding data that has some sort of repeating structure (groups, replicates) can be ‘long’ or ‘wide’. A long data.frame will have all the response data in one long column the calibrator concentration, group and replicate labels in other columns.

Manual data entry of O’Connell’s ELISA data

The following code shows how the O’Connell data may be entered manually in vector, or list, format then converted to a long data.frame. Later the long data.frame is converted to a wide one, which can be helpful for onscreen viewing or printing, or sharing data between R and other curve-fitting software such as GraphPad.

# O'Connell's ELISA data
# Enter the 12 calibrator concentrations
# Repeat sequence of known concentrations 3 times to correspond to 3 replicates
conc <- rep(c(0, 0, 3, 8, 23, 69, 206, 617, 1852, 5556, 16667, 50000), 3)
# Optical Density measurements
od <- c(
  # Replicate 1
  0.100, 0.120, 0.120, 0.120, 0.130, 0.153, 0.195, 0.280, 0.433, 0.593,
  0.823, 0.933,
  
  # Replicate 2
  0.115, 0.110, 0.123, 0.118, 0.133, 0.160, 0.208, 0.305, 0.455, 0.668,
  0.850, 1.078,
  
  # Replicate 3
  0.118, 0.115, 0.108, 0.118, 0.158, 0.150, 0.218, 0.323, 0.490, 0.760,
  0.973, 0.825)

# Assign calibrator and replicate id just for easy identification
rep <- rep(c(1, 2, 3), each = 12) 
calib <- rep(c("B1", "B2", paste("S0", seq(1:9), sep = ""), "S10"), 3) 

# Combine vectors into a data.frame, 'ocon'. 
# We will call this data.frame repeatedly in the steps below
ocon <- data.frame(calib, conc, rep, od)

# Print 1st 6 lines to console to check structure
head(ocon)
##   calib conc rep    od
## 1    B1    0   1 0.100
## 2    B2    0   1 0.120
## 3   S01    3   1 0.120
## 4   S02    8   1 0.120
## 5   S03   23   1 0.130
## 6   S04   69   1 0.153
# Or a more formally described structure:
str(ocon)
## 'data.frame':    36 obs. of  4 variables:
##  $ calib: Factor w/ 12 levels "B1","B2","S01",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ conc : num  0 0 3 8 23 ...
##  $ rep  : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ od   : num  0.1 0.12 0.12 0.12 0.13 0.153 0.195 0.28 0.433 0.593 ...
# Save data as a .csv
# File path: "../" = go up one directory
# write.csv(ocon, file = "../data/ocon.csv", row.names = FALSE, quote = FALSE)

Numerical summaries

Let’s summarise the data by concentration to get a ‘feel’ for our data set and check for errors:

# Mean response for concentration = 3
# Notice how the full data.frame$variable is needed each time a variable is named 
# The argument na.rm=TRUE tells R to drop any missing observations
mean(ocon$od[ocon$conc == 3], na.rm = TRUE)
## [1] 0.117
sd(ocon$od[ocon$conc == 3], na.rm = TRUE)
## [1] 0.007937254
# Mean by calibrator concentration in one shot
ddply(ocon, .(conc), summarise, 
      Mean = mean(od, na.rm = TRUE),
      SD   = sd(od, na.rm = TRUE))
##     conc      Mean          SD
## 1      0 0.1130000 0.007211103
## 2      3 0.1170000 0.007937254
## 3      8 0.1186667 0.001154701
## 4     23 0.1403333 0.015373137
## 5     69 0.1543333 0.005131601
## 6    206 0.2070000 0.011532563
## 7    617 0.3026667 0.021594752
## 8   1852 0.4593333 0.028746014
## 9   5556 0.6736667 0.083644087
## 10 16667 0.8820000 0.079956238
## 11 50000 0.9453333 0.126950121

Let’s examine minima and maxima for each concentration to check for extreme values:

oconMinMax <- ddply(ocon, .(conc), summarise,
              n = length(na.omit(od)), 
              Min = min(od, na.rm = TRUE), 
              Max = max(od, na.rm = TRUE))

oconMinMax
##     conc n   Min   Max
## 1      0 6 0.100 0.120
## 2      3 3 0.108 0.123
## 3      8 3 0.118 0.120
## 4     23 3 0.130 0.158
## 5     69 3 0.150 0.160
## 6    206 3 0.195 0.218
## 7    617 3 0.280 0.323
## 8   1852 3 0.433 0.490
## 9   5556 3 0.593 0.760
## 10 16667 3 0.823 0.973
## 11 50000 3 0.825 1.078

No glaring errors. We’ll examine the structure in more detail in the next 2 tutorials.

Transposing a data.frame

A wide, rather than long, format is nice for viewing and comparing calibrator replicates. Also, conversion between long and wide formats would be useful if one had to share data between R and other software such as GraphPad or a spreadsheet.

Prepare data set for GraphPad-type software:

# Reshape (aka transpose) long data set into wide format 
oconWide <- dcast(ocon, calib + conc ~ rep , value.var = "od")

# Fix the variable names
names(oconWide)[3:5]  <- paste("rep", names(oconWide[3:5]), sep = "")
# In this format, several runs could fit on one page. 
oconWide
##    calib  conc  rep1  rep2  rep3
## 1     B1     0 0.100 0.115 0.118
## 2     B2     0 0.120 0.110 0.115
## 3    S01     3 0.120 0.123 0.108
## 4    S02     8 0.120 0.118 0.118
## 5    S03    23 0.130 0.133 0.158
## 6    S04    69 0.153 0.160 0.150
## 7    S05   206 0.195 0.208 0.218
## 8    S06   617 0.280 0.305 0.323
## 9    S07  1852 0.433 0.455 0.490
## 10   S08  5556 0.593 0.668 0.760
## 11   S09 16667 0.823 0.850 0.973
## 12   S10 50000 0.933 1.078 0.825
# Limit the data to non-zeros and export it for later use in GraphPad
# write.csv(oconWide[3:12, 2:5], file.path(data.path, "ocon_GraphPad.csv"), quote = FALSE, 
#          row.names = FALSE, na = " ")

Convert a wide data.frame to long format:

oconLong <- melt(oconWide, id = c("calib", "conc"))
head(oconLong, 15)
##    calib  conc variable value
## 1     B1     0     rep1 0.100
## 2     B2     0     rep1 0.120
## 3    S01     3     rep1 0.120
## 4    S02     8     rep1 0.120
## 5    S03    23     rep1 0.130
## 6    S04    69     rep1 0.153
## 7    S05   206     rep1 0.195
## 8    S06   617     rep1 0.280
## 9    S07  1852     rep1 0.433
## 10   S08  5556     rep1 0.593
## 11   S09 16667     rep1 0.823
## 12   S10 50000     rep1 0.933
## 13    B1     0     rep2 0.115
## 14    B2     0     rep2 0.110
## 15   S01     3     rep2 0.123
# Rename 'variable' to 'rep' and 'value' column to 'od'
names(oconLong)[c(2, 3)] <- c('rep', 'od') 
str(oconLong)
## 'data.frame':    36 obs. of  4 variables:
##  $ calib: Factor w/ 12 levels "B1","B2","S01",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ rep  : num  0 0 3 8 23 ...
##  $ od   : Factor w/ 3 levels "rep1","rep2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ value: num  0.1 0.12 0.12 0.12 0.13 0.153 0.195 0.28 0.433 0.593 ...

Quick-R is a great resource for more details about entering data and R data types.

We have what we need to proceed to characterising the variance of the O’Connell data. For more practice with data entry and wrangling, see data preparation for R’s ELISA data set.