Mercurial > repos > fubar > edger_test

<tool id="rgedgeR" name="edgeR" version="0.18">
  <description>digital DGE between two groups of replicates</description>
  <command interpreter="python">
     rgToolFactory.py --script_path "$runme" --interpreter "Rscript" --tool_name "edgeR"
    --output_dir "$html_file.files_path" --output_html "$html_file" --output_tab "$outtab" --make_HTML "yes"
  </command>
  <inputs>
    <param name="input1"  type="data" format="tabular" label="Select an input matrix - rows are contigs, columns are counts for each sample"
       help="Use the HTSeq based count matrix preparation tool to create these count matrices from BAM files and a GTF file"/>
    <param name="title" type="text" value="DGE" size="80" label="Title for job outputs" help="Supply a meaningful name here to remind you what the outputs contain">
      <sanitizer invalid_char="">
        <valid initial="string.letters,string.digits"><add value="_" /> </valid>
      </sanitizer>
    </param>
    <param name="treatment_name" type="text" value="Treatment" size="50" label="Treatment Name"/>
    <param name="Treat_cols" label="Select columns containing treatment." type="data_column" data_ref="input1" numerical="True"
         multiple="true" use_header_names="true" size="120" display="checkboxes">
        <validator type="no_options" message="Please select at least one column."/>
    </param>
    <param name="control_name" type="text" value="Control" size="50" label="Control Name"/>
    <param name="Control_cols" label="Select columns containing control." type="data_column" data_ref="input1" numerical="True"
         multiple="true" use_header_names="true" size="120" display="checkboxes" optional="true">
    </param>
    <param name="fQ" type="float" value="0.3" size="5" label="Non-differential contig count quantile threshold - zero to analyze all non-zero read count contigs"
     help="May be a good or a bad idea depending on the biology and the question. EG 0.3 = sparsest 30% of contigs with at least one read are removed before analysis"/>
    <param name="useQuantile" type="boolean" truevalue="T" checked='false' falsevalue="" size="1" label="Non differential filter - remove contigs below a threshold (1 per million) for half or more samples"
     help="May be a good or a bad idea depending on the biology and the question. This was the old default. Quantile based is available as an alternative"/>
    <param name="priorn" type="integer" value="4" size="3" label="prior.df for tagwise dispersion - lower value = more emphasis on each tag's variance - note this used to be prior.n"
     help="Zero = auto-estimate. 1 to force high variance tags out. Use a small value to 'smooth' small samples. See edgeR docs and note below"/>
    <param name="fdrthresh" type="float" value="0.05" size="5" label="P value threshold for FDR filtering for amily wise error rate control"
     help="Conventional default value of 0.05 recommended"/>
    <param name="fdrtype" type="select" label="FDR (Type II error) control method"
         help="Use fdr or bh typically to control for the number of tests in a reliable way">
            <option value="fdr" selected="true">fdr</option>
            <option value="BH">Benjamini Hochberg</option>
            <option value="BY">Benjamini Yukateli</option>
            <option value="bonferroni">Bonferroni</option>
            <option value="hochberg">Hochberg</option>
            <option value="holm">Holm</option>
            <option value="hommel">Hommel</option>
            <option value="none">no control for multiple tests</option>
    </param>
  </inputs>
  <outputs>
    <data format="tabular" name="outtab" label="${title}.xls"/>
    <data format="html" name="html_file" label="${title}.html"/>
    <data format="gsearank" name="outgsea" label="${title}.gsearank">
        <filter> makeRank == 'Yes' </filter>
    </data>
  </outputs>
<configfiles>
<configfile name="runme">

# edgeR.Rscript
# updated npv 2011 for R 2.14.0 and edgeR 2.4.0 by ross
# Performs DGE on a count table containing n replicates of two conditions
#
# Parameters
#
# 1 - Output Dir

# Original edgeR code by: S.Lunke and A.Kaspi
sink(stdout(),append=T,type="message")
reallybig = log10(.Machine\$double.xmax)
reallysmall = log10(.Machine\$double.xmin)
require('stringr')
require('gplots')
library('ggplot2')
library('gridExtra')

hmap2 = function(cmat,nsamp=100,outpdfname='heatmap2.pdf', TName='Treatment',group=NA,myTitle='title goes here')
{
# Perform clustering for significant pvalues after controlling FWER
    samples = colnames(cmat)
    gu = unique(group)
    if (length(gu) == 2) {
        col.map = function(g) {if (g==gu[1]) "#FF0000" else "#0000FF"}
        pcols = unlist(lapply(group,col.map))
        } else {
        colours = rainbow(length(gu),start=0,end=4/6)
        pcols = colours[match(group,gu)]        }
    print(paste('pcols',pcols))
    gn = rownames(cmat)
    dm = cmat[(! is.na(gn)),]
    # remove unlabelled hm rows
    nprobes = nrow(dm)
    # sub = paste('Showing',nprobes,'contigs ranked for evidence of differential abundance')
    if (nprobes &gt; nsamp) {
      dm =dm[1:nsamp,]
      #sub = paste('Showing',nsamp,'contigs ranked for evidence for differential abundance out of',nprobes,'total')
    }
    newcolnames = substr(colnames(dm),1,20)
    colnames(dm) = newcolnames
    pdf(outpdfname)
    heatmap.2(dm,main=myTitle,ColSideColors=pcols,col=topo.colors(100),dendrogram="col",key=T,density.info='none',
         Rowv=F,scale='row',trace='none',margins=c(8,8),cexRow=0.4,cexCol=0.5)
    dev.off()
}

hmap = function(cmat,nmeans=4,outpdfname="heatMap.pdf",nsamp=250,TName='Treatment',group=NA,myTitle="Title goes here")
{
    # for 2 groups only was
    #col.map = function(g) {if (g==TName) "#FF0000" else "#0000FF"}
    #pcols = unlist(lapply(group,col.map))
    gu = unique(group)
    colours = rainbow(length(gu),start=0.3,end=0.6)
    pcols = colours[match(group,gu)]
    nrows = nrow(cmat)
    mtitle = paste(myTitle,'Heatmap: n contigs =',nrows)
    if (nrows &gt; nsamp)  {
               cmat = cmat[c(1:nsamp),]
               mtitle = paste('Heatmap: Top ',nsamp,' DE contigs (of ',nrows,')',sep='')
          }
    newcolnames = substr(colnames(cmat),1,20)
    colnames(cmat) = newcolnames
    pdf(outpdfname)
    heatmap(cmat,scale='row',main=mtitle,cexRow=0.3,cexCol=0.4,Rowv=NA,ColSideColors=pcols)
    dev.off()
}

qqPlot = function(descr='Title',pvector, ...)
# stolen from https://gist.github.com/703512
{
    o = -log10(sort(pvector,decreasing=F))
    e = -log10( 1:length(o)/length(o) )
    o[o==-Inf] = reallysmall
    o[o==Inf] = reallybig
    pdfname = paste(gsub(" ","", descr , fixed=TRUE),'pval_qq.pdf',sep='_')
    maint = paste(descr,'QQ Plot')
    pdf(pdfname)
    plot(e,o,pch=19,cex=1, main=maint, ...,
        xlab=expression(Expected~~-log[10](italic(p))),
        ylab=expression(Observed~~-log[10](italic(p))),
        xlim=c(0,max(e)), ylim=c(0,max(o)))
    lines(e,e,col="red")
    grid(col = "lightgray", lty = "dotted")
    dev.off()
}

smearPlot = function(DGEList,deTags, outSmear, outMain)
        {
        pdf(outSmear)
        plotSmear(DGEList,de.tags=deTags,main=outMain)
        grid(col="blue")
        dev.off()
        }


boxPlot = function(rawdat,tmdat,maint,myTitle)
  {
  # give up on boxplot - it's just too buggy
  rscolnames = substr(colnames(rawdat),1,25)
  colnames(rawdat) = rscolnames
  ccolnames = substr(colnames(tmdat),1,25)
  colnames(tmdat) = ccolnames
  print(paste('rawdat',paste(rscolnames,collapse=',')))
  print(paste('tmdat',paste(ccolnames,collapse=',')))
  pdfname = paste(gsub(" ","", myTitle , fixed=TRUE),"sampleBoxplot.pdf",sep='_')
  raw = data.frame(rawdat)
  cn = rscolnames
  rdat = reshape(raw, direction="long",varying=list(cn),v.names="counts",times=cn)
  rdat\$Sample = factor(rdat\$time,levels=cn)
  rdat\$Counts = log(rdat\$counts + 0.1)
  p1 = ggplot(rdat,aes(x=Sample,y=Counts)) + geom_boxplot(notch=T) + ylab("log Count")
  p1 = p1 + theme(axis.text.x  = element_text(angle=90, size=9)) + ggtitle('Raw Contig Counts')
  raw = data.frame(tmdat)
  cn = ccolnames
  rdat = reshape(raw, direction="long",varying=list(cn),v.names="counts",times=cn)
  rdat\$Sample = factor(rdat\$time,levels=cn)
  rdat\$Counts = log(rdat\$counts + 0.1)
  p2 = ggplot(rdat,aes(x=Sample,y=Counts)) + geom_boxplot(notch=T) + ylab("log Count")
  p2 = p2 + theme(axis.text.x  = element_text(angle=90, size=9)) + ggtitle('Normalised Contig Counts')
  pdf(pdfname)
  grid.arrange(p1,p2,nrow=1)
  dev.off()
}

cumPlot = function(rawrs,cleanrs,maint,myTitle)
{   # updated to use ecdf
        pdfname = paste(gsub(" ","", myTitle , fixed=TRUE),"RowsumCum.pdf",sep='_')
        defpar = par(no.readonly=T)
        pdf(pdfname)
        par(mfrow=c(2,1))
        lrs = log(rawrs,10)
        lim = max(lrs)
        hist(lrs,breaks=100,main=paste('Before:',maint),xlab="# Reads (log)",
             ylab="Count",col="maroon",sub=myTitle, xlim=c(0,lim),las=1)
        grid(col="blue")
        lrs = log(cleanrs,10)
        hist(lrs,breaks=100,main=paste('After:',maint),xlab="# Reads (log)",
             ylab="Count",col="maroon",sub=myTitle,xlim=c(0,lim),las=1)
        grid(col="blue")
        dev.off()
        par(defpar)
}

cumPlot1 = function(rawrs,cleanrs,maint,myTitle)
{   # updated to use ecdf
        pdfname = paste(gsub(" ","", myTitle , fixed=TRUE),"RowsumCum.pdf",sep='_')
        pdf(pdfname)
        par(mfrow=c(2,1))
        lastx = max(rawrs)
        rawe = knots(ecdf(rawrs))
        cleane = knots(ecdf(cleanrs))
        cy = 1:length(cleane)/length(cleane)
        ry = 1:length(rawe)/length(rawe)
        plot(rawe,ry,type='l',main=paste('Before',maint),xlab="Log Contig Total Reads",
             ylab="Cumulative proportion",col="maroon",log='x',xlim=c(1,lastx),sub=myTitle)
        grid(col="blue")
        plot(cleane,cy,type='l',main=paste('After',maint),xlab="Log Contig Total Reads",
             ylab="Cumulative proportion",col="maroon",log='x',xlim=c(1,lastx),sub=myTitle)
        grid(col="blue")
        dev.off()
}


edgeIt = function (Count_Matrix,group,outputfilename,fdrtype='fdr',priorn=5,fdrthresh=0.05,outputdir='.',
    myTitle='edgeR',libSize=c(),useQuantile="T",filterquantile=0.2,subjects=c()) {

        # Error handling
        if (length(unique(group))!=2){
                print("Number of conditions identified in experiment does not equal 2")
                q()
        }
        require(edgeR)
        mt = paste(unlist(strsplit(myTitle,'_')),collapse=" ")
        allN = nrow(Count_Matrix)
        nscut = round(ncol(Count_Matrix)/2)
        colTotmillionreads = colSums(Count_Matrix)/1e6
        rawrs = rowSums(Count_Matrix)
        nonzerod = Count_Matrix[(rawrs &gt; 0),] # remove all zero count genes
        nzN = nrow(nonzerod)
        nzrs = rowSums(nonzerod)
        zN = allN - nzN
        print('# Quantiles for non-zero row counts:',quote=F)
        print(quantile(nzrs,probs=seq(0,1,0.1)),quote=F)
        if (useQuantile == "T")
        {
        gt1rpin3 = rowSums(Count_Matrix/expandAsMatrix(colTotmillionreads,dim(Count_Matrix)) &gt;= 1) &gt;= nscut
        lo = colSums(Count_Matrix[!gt1rpin3,])
        workCM = Count_Matrix[gt1rpin3,]
        cleanrs = rowSums(workCM)
        cleanN = length(cleanrs)
        meth = paste( "After removing",length(lo),"contigs with fewer than",nscut,"sample read counts &gt;= 1 per million, there are",sep="")
        print(paste("Read",allN,"contigs. Removed",zN,"contigs with no reads.",meth,cleanN,"contigs"),quote=F)
        maint = paste('Filter &gt;=1/million reads in &gt;=',nscut,'samples')
        }
        else {
        useme = (nzrs &gt; quantile(nzrs,filterquantile))
        workCM = nonzerod[useme,]
        lo = colSums(nonzerod[!useme,])
        cleanrs = rowSums(workCM)
        cleanN = length(cleanrs)
        meth = paste("After filtering at count quantile =",filterquantile,"there are",sep="")
        print(paste('Read',allN,"contigs. Removed",zN,"with no reads.",meth,cleanN,"contigs"),quote=F)
        maint = paste('Filter below',filterquantile,'quantile')
        }
        cumPlot(rawrs=rawrs,cleanrs=cleanrs,maint=maint,myTitle=myTitle)

        print(paste("# Total low count contigs per sample = ",paste(lo,collapse=',')),quote=F)
        rsums = rowSums(workCM)
        # Setup DGEList object
        DGEList = DGEList(counts=workCM, group = group)
        #Extract T v C names
        TName=unique(group)[1]
        CName=unique(group)[2]
        if (length(subjects) == 0) { mydesign = model.matrix(~group)
               } else { sf = factor(subjects)
                       mydesign = model.matrix(~sf+group)
                       }
        print.noquote(paste('Using samples:',paste(colnames(workCM),collapse=',')))
        print.noquote('Using design matrix:')
        print.noquote(mydesign)
        print.noquote(paste("prior.df =",priorn))
        DGEList = calcNormFactors(DGEList)
        DGEList = estimateGLMCommonDisp(DGEList,mydesign)
        comdisp = DGEList\$common.dispersion
        DGEList = estimateGLMTrendedDisp(DGEList,mydesign)
        DGEList = estimateGLMTagwiseDisp(DGEList,mydesign)
        DGLM = glmFit(DGEList,design=mydesign)
        co = length(colnames(mydesign))
        DE = glmLRT(DGLM,coef=co) # always last one - subject is first if needed
        goodness = gof(DGLM, pcutoff=fdrthresh)
        if (sum(goodness\$outlier) &gt; 0) {
            print.noquote('GLM outliers:')
            print.noquote(rownames(DE)[(goodness\$outlier != 0)])
            z = limma::zscoreGamma(goodness\$gof.statistic, shape=goodness\$df/2, scale=2)
            pdf(paste(outputdir,paste(mt,"GoodnessofFit.pdf",sep='_'),sep='/'))
            qq = qqnorm(z, panel.first=grid(), main="tagwise dispersion")
            abline(0,1,lwd=3)
            points(qq\$x[goodness\$outlier],qq\$y[goodness\$outlier], pch=16, col="dodgerblue")
            dev.off()
            } else { print('No GLM fit outlier genes found\n')}
        estpriorn = getPriorN(DGEList)
        print(paste("Common Dispersion =",comdisp,"CV = ",sqrt(comdisp),"getPriorN = ",estpriorn),quote=F)
        efflib = DGEList\$samples\$lib.size*DGEList\$samples\$norm.factors
        normData = (1e+06*DGEList\$counts/efflib)
        #normData = (1e+06 * DGEList\$counts/expandAsMatrix(DGEList\$samples\$lib.size, dim(DGEList)))
        colnames(normData) = paste( colnames(normData),'N',sep="_")
        print(paste('Raw sample read totals',paste(colSums(nonzerod,na.rm=T),collapse=',')))
        nzd = data.frame(log(nonzerod + 1e-2,10))
        boxPlot(rawdat=nzd,tmdat=normData,maint='TMM Normalisation',myTitle=myTitle)
        #Prepare our output file
        output = cbind(
                Name=as.character(rownames(DGEList\$counts)),
                DE\$table,
                adj.p.value=p.adjust(DE\$table\$PValue, method=fdrtype),
                Dispersion=DGEList\$tagwise.dispersion,totreads=rsums,normData,
                DGEList\$counts
        )
        soutput = output[order(output\$PVal),] # sorted into p value order - for quick toptable
        nreads = soutput\$totreads # ordered same way
        print('# writing output',quote=F)
        write.table(soutput,outputfilename, quote=FALSE, sep="\t",row.names=F)
        tt = topTags(DE,n=nrow(DE))
        rn = rownames(tt\$table)
        reg = "^chr([0-9]+):([0-9]+)-([0-9]+)"
        org="hg19"
        genecards="&lt;a href='http://www.genecards.org/index.php?path=/Search/keyword/"
        ucsc = paste("&lt;a href='http://genome.ucsc.edu/cgi-bin/hgTracks?db=",org,sep='')
        testreg = str_match(rn,reg)
        if (sum(!is.na(testreg[,1]))/length(testreg[,1]) &gt; 0.9) # is ucsc style string
        {
          urls = paste(ucsc,'&amp;position=chr',testreg[,2],':',testreg[,3],"-",testreg[,4],"'&gt;",rn,'&lt;/a&gt;',sep='')
        } else {
          urls = paste(genecards,rn,"'&gt;",rn,'&lt;/a&gt;',sep="")
        }
        cat("# Top tags\n")
        tt\$table = cbind(tt\$table,ntotreads=nreads,URL=urls) # add to end so table isn't laid out strangely
        print(tt[1:50,])
        pdf(paste(mt,"BCV_vs_abundance.pdf",sep='_'))
        plotBCV(DGEList, cex=0.3, main="Biological CV vs abundance")
        dev.off()
        # Plot MAplot
        fname = gsub(' ','_',myTitle,fixed=T)
        deTags = rownames(output[output\$adj.p.value &lt; fdrthresh,])
        nsig = length(deTags)
        print(paste('#',nsig,'tags significant at adj p=',fdrthresh),quote=F)
        print('# deTags',quote=F)
        print(head(deTags))
        dg = DGEList[order(DE\$table\$PValue),]
        #normData = (1e+06 * dg\$counts/expandAsMatrix(dg\$samples\$lib.size, dim(dg)))
        efflib = dg\$samples\$lib.size*dg\$samples\$norm.factors
        normData = (1e+06*dg\$counts/efflib)
        outpdfname=paste(mt,"heatmap.pdf",sep='_')
        hmap2(normData,nsamp=100,TName=TName,group=group,outpdfname=outpdfname,myTitle=myTitle)
        outSmear = paste(outputdir,paste(fname,"Smearplot.pdf",sep='_'),sep='/')
        outMain = paste("Smear Plot for ",TName,' Vs ',CName,' (FDR@',fdrthresh,' N = ',nsig,')',sep='')
        smearPlot(DGEList=DGEList,deTags=deTags, outSmear=outSmear, outMain = outMain)
        qqPlot(descr=myTitle,pvector=DE\$table\$PValue)
        # Plot MDS
        ug = unique(group)
        sample_colors =  match(DGEList\$samples\$group,ug) #ifelse (DGEList\$samples\$group==group[1], 1, 2)
        pdf(paste(outputdir,paste(fname,"MDSplot.pdf",sep='_'),sep='/'))
        plotMDS.DGEList(DGEList,main=paste("MDS Plot for",TName,'Vs',CName),cex=0.5,col=sample_colors,pch=sample_colors)
        legend(x="topleft", legend = c(group[1],group[length(group)]),col=c(1,2), pch=19)
        grid(col="blue")
        dev.off()
        if (FALSE==TRUE) {
        # need a design matrix and glm to use this
        glmfit = glmFit(DGEList, design)
        goodness = gof(glmfit, pcutoff=fdrpval)
        sum(goodness\$outlier)
        rownames(d)[goodness\$outlier]
        z = limma::zscoreGamma(goodness\$gof.statistic, shape=goodness\$df/2, scale=2)
        pdf(paste(outputdir,paste(fname,"GoodnessofFit.pdf",sep='_'),sep='/'))
        qq = qqnorm(z, panel.first=grid(), main="tagwise dispersion")
        abline(0,1,lwd=3)
        points(qq\$x[goodness\$outlier],qq\$y[goodness\$outlier], pch=16, col="dodgerblue")
        dev.off()
        }
        #Return our main table
        output

}       #Done

options(width=512)
Out_Dir = "$html_file.files_path"
Input =  "$input1"
ORG = "$input1.dbkey"
TreatmentName = "$treatment_name"
TreatmentCols = "$Treat_cols"
ControlName = "$control_name"
ControlCols= "$Control_cols"
outputfilename = "$outtab"
fdrtype = "$fdrtype"
priorn = $priorn
fdrthresh = $fdrthresh
useQuantile = "$useQuantile"
fQ = $fQ # non-differential centile cutoff
myTitle = "$title"
makeRank = "$makeRank"
outgsea = ""
if (makeRank &gt; "") outgsea = "$outgsea"
#Set our columns
TCols           = as.numeric(strsplit(TreatmentCols,",")[[1]])-1
CCols           = as.numeric(strsplit(ControlCols,",")[[1]])-1
cat('# got TCols=')
cat(TCols)
cat('; CCols=')
cat(CCols)
cat('\n')


# Create output dir if non existent
  if (file.exists(Out_Dir) == F) dir.create(Out_Dir)

Count_Matrix = read.table(Input,header=T,row.names=1,sep='\t')                           #Load tab file assume header
Count_Matrix = Count_Matrix[,c(TCols,CCols)]
rn = rownames(Count_Matrix)
islib = rn %in% c('librarySize','NotInBedRegions')
LibSizes = Count_Matrix[subset(rn,islib),][1] # take first
Count_Matrix = Count_Matrix[subset(rn,! islib),]
group = c(rep(TreatmentName,length(TCols)), rep(ControlName,length(CCols)) )             #Build a group descriptor
group = factor(group, levels=c(ControlName,TreatmentName))
colnames(Count_Matrix) = paste(group,colnames(Count_Matrix),sep="_")                   #Relable columns
if (priorn &lt;= 0) {priorn = ceiling(20/(length(group)-1))} # estimate prior.n if not provided
# see http://comments.gmane.org/gmane.comp.lang.r.sequencing/2009
results = edgeIt(Count_Matrix=Count_Matrix,group=group,outputfilename=outputfilename,fdrtype=fdrtype,priorn=priorn,fdrthresh=fdrthresh,
   outputdir=Out_Dir,myTitle=myTitle,libSize=c(),useQuantile=useQuantile,filterquantile=fQ) #Run the main function
# for the log


sessionInfo()


</configfile>
</configfiles>
<tests>
<test>
<param name='input1' value='DGEtest.xls' ftype='tabular' />
 <param name='treatment_name' value='case' />
 <param name='title' value='DGEtest' />
 <param name='fdrtype' value='fdr' />
 <param name='priorn' value="5" />
 <param name='fdrthresh' value="0.05" />
 <param name='control_name' value='control' />
 <param name='Treat_cols' value='c3,c6,c9' />
 <param name='Control_cols' value='c2,c5,c8' />
 <output name='outtab' file='DGEtest1out.xls' ftype='tabular' compare='diff' />
 <output name='html_file' file='DGEtest1out.html' ftype='html' compare='diff' lines_diff='20' />
</test>
</tests>
<help>
**What it does**

Performs digital gene expression analysis between a treatment and control on a matrix.

**Documentation** Please see documentation_ for methods and parameter details

**Input**

A matrix consisting of non-negative integers. The matrix must have a unique header row identifiying the samples, as well as a unique set of row names
as  the first column.

**Output**

A matrix which consists the original data and relative expression levels and some helpful plots

**Note on edgeR versions**

The edgeR authors made a small cosmetic change in the name of one important variable (from p.value to PValue)
breaking this and all other code that assumed the old name for this variable,
between edgeR2.4.4 and 2.4.6 (the version for R 2.14 as at the time of writing).
This means that all code using edgeR is sensitive to the version. I think this was a very unwise thing
to do because it wasted hours of my time to track down and will similarly cost other edgeR users dearly
when their old scripts break. This tool currently now works with 2.4.6.

**Note on prior.N**

http://seqanswers.com/forums/showthread.php?t=5591 says:

*prior.n*

The value for prior.n determines the amount of smoothing of tagwise dispersions towards the common dispersion.
You can think of it as like a "weight" for the common value. (It is actually the weight for the common likelihood
in the weighted likelihood equation). The larger the value for prior.n, the more smoothing, i.e. the closer your
tagwise dispersion estimates will be to the common dispersion. If you use a prior.n of 1, then that gives the
common likelihood the weight of one observation.

In answer to your question, it is a good thing to squeeze the tagwise dispersions towards a common value,
or else you will be using very unreliable estimates of the dispersion. I would not recommend using the value that
you obtained from estimateSmoothing()---this is far too small and would result in virtually no moderation
(squeezing) of the tagwise dispersions. How many samples do you have in your experiment?
What is the experimental design? If you have few samples (less than 6) then I would suggest a prior.n of at least 10.
If you have more samples, then the tagwise dispersion estimates will be more reliable,
so you could consider using a smaller prior.n, although I would hesitate to use a prior.n less than 5.

**Attribution** Copyright Ross Lazarus (ross period lazarus at gmail period com) May 2012
Derived from the implementation by Antony Kaspi and Sebastian Lunke at the BakerIDI

All rights reserved.

Licensed under the LGPL_

.. _LGPL: http://www.gnu.org/copyleft/lesser.html
.. _documentation: http://bioconductor.org/packages/release/bioc/html/edgeR.html
</help>

</tool>
author	fubar
date	Wed, 12 Jun 2013 02:58:43 -0400
parents
children