Donnerstag, 18. August 2016

R: Plotting multiple categorical data

Frequently, values are plotted against a categorical argument. The categorical argument can be thought of something like a weekday label or a step in a specific technical procedure ("Step 1", "Step 2", ...). Of course, these arguments could be converted to a numeric value used for plotting and after plotting, one would overwrite the x-axis labels to get the categorical data on the axis. But if Excel can do it, then it should be possible to do it in R as well. Here's how.
The data in the example is from the book "Design and Analysis of Clinical Trials: Concepts and Methodologies" by Chow (full data set).

Here, the categorical data (x-axis) are five repeated measurements of diastolic blood pressure, denoted "DBP1", ..., "DBP5", where DBP1 is the baseline value. 40 subjects have been treated with either a hypertensive agent (A) or placebo (B). After averaging over treatment groups we end up with two groups we want to inspect.

> d <- aggregate(dat[, 3:7], by=list(TRT=dat$TRT), FUN=mean)
> d
  TRT   DBP1  DBP2   DBP3   DBP4   DBP5
1   A 116.55 113.5 110.70 106.25 101.35
2   B 116.75 115.2 114.05 112.45 111.95

How can now d be plotted to immediately make the difference between the treatments visible? Lattice can plot multiple data records (the A- and B-rows) against categorical data, but for that, the data has to be in long format. This can be generated using the melt function from the reshape2 package.

> m <- melt(d, id="TRT", measure.vars=colnames(d[, 2:ncol(d)]))
> m
   TRT variable  value
1    A     DBP1 116.55
2    B     DBP1 116.75
3    A     DBP2 113.50
4    B     DBP2 115.20
5    A     DBP3 110.70
6    B     DBP3 114.05
7    A     DBP4 106.25
8    B     DBP4 112.45
9    A     DBP5 101.35
10   B     DBP5 111.95

The melted data.frame m can now be plotted.

> xyplot(m$value~m$variable, type="o", group=m$TRT, auto.key=list(T))

The resulting figure looks like this: