Sonntag, 4. Dezember 2016

R: Sample Size Calculator

Given a beta, how many patients need to be enrolled in a two-armed study of conventional against experimental therapy? Significance is set to 0.05, recruitment period and follow-up period can be defined together with median disease free survival times for the two study groups (implemented in R/Shiny):

Github repository

Try it (also mobile)!
https://mzhku.shinyapps.io/size/

R Shiny: Some reactivity by example

Start R from the directory containing ui.r and server.r, import the shiny library and start the (development) server using runApp().

server.r


function(input, output) {
    rv <- reactiveValues()

    # Only modify and output reactive input
    output$numOut1 <- renderText({
        input$n+10
    })
    
    # Modify, reassign and output reactive input.
    output$numOut2 <- renderText({
        rv$a <- input$n+20
    })
    
    # Use modified reactive input, modify and output.
    output$num1 <- renderText({
        rv$s <- rv$a+runif(1)
    })
    
    # Assign reactive input locally, modify locally, print.
    output$num2 <- renderText({
        k <- input$n
        paste(as.character(runif(1)), "here", k-10)
    })
    
    # Use reactive value from another reactive function,
    # modify and print.
    output$let <- renderText({
        paste(as.character(rv$s), sample(letters, 1), sep="")
    })

}

ui.r


fluidPage(
    numericInput(inputId="n", "Shiny Reactivity Example", value=25),

    textOutput(outputId="numOut1"),
    
    textOutput(outputId="numOut2"),
    
    textOutput(outputId="num1"),
    textOutput(outputId="num2"),
    textOutput(outputId="let")
)   

Donnerstag, 18. August 2016

R: Plotting multiple categorical data

Frequently, values are plotted against a categorical argument. The categorical argument can be thought of something like a weekday label or a step in a specific technical procedure ("Step 1", "Step 2", ...). Of course, these arguments could be converted to a numeric value used for plotting and after plotting, one would overwrite the x-axis labels to get the categorical data on the axis. But if Excel can do it, then it should be possible to do it in R as well. Here's how.
The data in the example is from the book "Design and Analysis of Clinical Trials: Concepts and Methodologies" by Chow (full data set).

Here, the categorical data (x-axis) are five repeated measurements of diastolic blood pressure, denoted "DBP1", ..., "DBP5", where DBP1 is the baseline value. 40 subjects have been treated with either a hypertensive agent (A) or placebo (B). After averaging over treatment groups we end up with two groups we want to inspect.

> d <- aggregate(dat[, 3:7], by=list(TRT=dat$TRT), FUN=mean)
> d
  TRT   DBP1  DBP2   DBP3   DBP4   DBP5
1   A 116.55 113.5 110.70 106.25 101.35
2   B 116.75 115.2 114.05 112.45 111.95

How can now d be plotted to immediately make the difference between the treatments visible? Lattice can plot multiple data records (the A- and B-rows) against categorical data, but for that, the data has to be in long format. This can be generated using the melt function from the reshape2 package.

> m <- melt(d, id="TRT", measure.vars=colnames(d[, 2:ncol(d)]))
> m
   TRT variable  value
1    A     DBP1 116.55
2    B     DBP1 116.75
3    A     DBP2 113.50
4    B     DBP2 115.20
5    A     DBP3 110.70
6    B     DBP3 114.05
7    A     DBP4 106.25
8    B     DBP4 112.45
9    A     DBP5 101.35
10   B     DBP5 111.95

The melted data.frame m can now be plotted.

> xyplot(m$value~m$variable, type="o", group=m$TRT, auto.key=list(T))

The resulting figure looks like this:
 

Donnerstag, 7. Juli 2016

Python: Concate code to a single string

Using the code below, the code from a source file can be concatenated to a single string. This string however can still be printed and formates nicely on a JavaScript console. It returns a string variable `s' and requires two command line arguments.
# Usage:
# > python ~/path/to/script/multiline.py <name_of_ouput_variable>  
    filename = sys.argv[1]
    
    f = open(filename, "r")
    d = f.readlines()
    f.close()
    
    s = "var %s = " % sys.argv[2]
    s = s.strip()
    
    # s += '"' + "".join(d).replace("\n", "") + '";'

    cnt = 0
    n = len(d)

    # '\\n' is explicitly written to the 'source'-file such that
    # the JavaScript console can print the source code including line
    # breaks.
    for line in d:
        # First line.
        if cnt == 0:
            s += "\n" + '  "' + d[cnt].replace("\n", "") + '\\n"\n'
            cnt += 1
        # Inbetween lines.
        elif cnt > 0 and cnt < (n-1):
            s += '+ "' + d[cnt].replace("\n", "") + '\\n"\n'
            cnt += 1
        # Last line.
        else:
            s += '+ "' + d[cnt].replace("\n", "") + '"'

Sonntag, 1. Mai 2016

R: Replacement functions

If you use R, you've been using so called replacement functions such as '<-' to assign a value to a variable or 'colnames' to define names of columns in a data frame.
Remember, in R everything operation is a function call (therefore also the assignment operations).
In the following, the behaviour of the replacement functions are illustrated (and compared to a -failing- ordinary approach) and code is shown to list all replacement functions in the base R package.

Replacement functions act as if they modify their arguments in place such as in

colnames(d) <- c("Input", "Output")

They have the identifier '<-' at the end of their name and return a modified copy of the argument object (non-primitive replacement functions) or the same object (primitive replacement functions).

At the R prompt, the following will not work:

> `second` <- function(x, value) {
+   x[2] <- value
+   x
+ }
> x <- 1:10
> x
 [1]  1  2  3  4  5  6  7  8  9 10
> second(x) <- 9
Fehler in second(x) <- 9 : konnte Funktion "second<-" nicht finden


As one can see, behind the scenes, R is looking for a function called 'second<-'. So lets do the same thing but using this name:

> `second<-` <- function(x, value) {
+   x[2] <- value
+   x
+ }


Now, the assignment at the second position of the vector works:

> second(x) <- 9
> x
 [1]  1  9  3  4  5  6  7  8  9 10


The following code allows to list all replacement functions in are and check if they are primitive:



# Get all objects from the base package, then
# functions from among those objects.
objs <- mget(ls("package:base"), inherits=TRUE)
funs <- Filter(is.function, objs)
 
# Replacement functions have '<-' at the end of their name.
get_rep_fun <- function(fun_i) {
  if(substr(fun_i, nchar(fun_i) - 1, nchar(fun_i)) == "<-") {
    fun_i
  }
}
 
# The list returned by 'lapply' contains many 'NULL'
# entries. Only non-'NULL' function name strings remain
# after 'unlist'.
rep_funs <- lapply(names(funs), get_rep_fun)
rep_funs <- unlist(rep_funs)
 
# Identify primitive replacement functions.
get_prim_rep_fun <- function(rep_fun) {
  if(is.primitive(rep_fun)) {
    rep_fun
  }
}
 
# Get the actual function object by 'sapply'-ing 'get' to
# each string identifier from the 'rep_funs' vector.
prim_rep_funs <- lapply(sapply(rep_funs, get), is.primitive)
 
# Prepare data frame for convenience.
k <- data.frame(fun_name = rep_funs, prim=unlist(prim_rep_funs))
 
# Rows are named after functions -> reduce redundancy.
rownames(k) <- NULL



Sonntag, 3. April 2016

Clustering data with k-means and plotting for exploratory analysis

I clustered data from my paper A computational method for the systematic screening of reaction barriers in enzymes: searching for Bacillus circulans xylanase mutants with greater activity towards a synthetic substrate using k-means clustering and plotted the results using the `Lattice` library in R.
This analysis could be used to further automate selection of reaction barriers for including or excluding from analysis if the barrier profile is not physically meaningfull.

Raw data:
http://pastebin.com/41TrNNvH

A detailed report on the analyis is available here:
http://rpubs.com/mzh/167769

Github repository:
https://github.com/mzhKU/Enzyme-Barrier-Clustering

The critical part of the code is where the two plots (the k-means clusters, red lines, and the raw data of each cluster, blue lines) are overlayed using the lattice panel plot construct (in 003_execute.r):

# 'group' required to prevent drawing of a continuous line connecting the last
# data point of a mutant with the first data point of the next mutant barrier.
xyplot(Barrier_Cl ~ x|Cluster_fac, data=kmdf, iedf_i=iedf,
       ylim=c(-20, 50), ylab="Reaction Energy [kcal/mol]",
       xlab="Reaction Coordinate", xlim=c(1, 12), strip=T,
       # Note:
       # 'x' and 'y' are the data from the cluster data frame 'kmdf'.
       # 'iedf_i' is the 'iedf'-data frame provided to the 'i'-nner panel function.
       panel = function(x, y, iedf_i)
       {
            panel.xyplot(x=iedf_i[iedf_i$Cluster==panel.number(), ]$x_axis,
                         y=iedf_i[iedf_i$Cluster==panel.number(), ]$Barrier_i,
                         group=iedf_i$id, subscripts=TRUE, type="o", col="blue")
            panel.xyplot(x, y, type="l", col="red", lwd=2)
       }
)