Mercurial > repos > proteore > proteore_topgo
changeset 0:92dfcfb03add draft
planemo upload commit 2e441b4969ae7cf9aeb227a1d47c43ef7268a5e6-dirty
| author | proteore |
|---|---|
| date | Wed, 22 Aug 2018 10:39:30 -0400 |
| parents | |
| children | 5569a3f066cf |
| files | README.rst enrichment_v3.R test-data/Barplot_output_for_topGO_analysis_BP_category.png test-data/Dotplot_output_for_topGO_analysis_BP_category.png test-data/ID_Converter_FKW_Lacombe_et_al_2017_OK.txt test-data/Text_output_for_topGO_analysis_BP_category.tabular topGO.xml |
| diffstat | 7 files changed, 902 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Wed Aug 22 10:39:30 2018 -0400 @@ -0,0 +1,68 @@ +Wrapper for topGO Tool +====================== + +**Authors** + +Alexa A and Rahnenfuhrer J (2016). topGO: Enrichment Analysis for Gene Ontology. R package version 2.30.0. + +**Galaxy integration** + +Lisa Peru, T.P. Lien Nguyen, Florence Combes, Yves Vandenbrouck CEA, INSERM, CNRS, Grenoble-Alpes University, BIG Institute, FR + +Sandra Dérozier, Olivier Rué, Christophe Caron, Valentin Loux INRA, Paris-Saclay University, MAIAGE Unit, Migale Bioinformatics platform + +This work has been partially funded through the French National Agency for Research (ANR) IFB project. + +Contact support@proteore.org for any questions or concerns about the Galaxy implementation of this tool. + +---------------------- + +**Galaxy component based on R package topGO.** + +**Input required** + +This component works with Ensembl gene ids (e.g : ENSG0000013618). You can +copy/paste these identifiers or supply a tabular file (.csv, .tsv, .txt, .tab) +where there are contained. + +**Principle** + +This component provides the GO terms representativity of a gene list in one ontology category (Biological Process "BP", Cellular Component "CC", Molecular Function "MF"). This representativity is evaluated in comparison to the background list of all human genes associated associated with GO terms of the chosen category (BP,CC,MF). This background is given by the R package "org.Hs.eg.db", which is a genome wide association package for **human**. + +**Output** + +Three kind of outputs are available : a textual output, a barplot output and +a dotplot output. + +*Textual output* : +The text output lists all the GO-terms that were found significant under the specified threshold. + + +The different fields are as follow : + +- Annotated : number of genes in org.Hs.eg.db which are annotated with the GO-term. + +- Significant : number of genes belonging to your input which are annotated with the GO-term. + +- Expected : show an estimate of the number of genes a node of size Annotated would have if the significant genes were to be randomly selected from the gene universe. + +- pvalues : pvalue obtained after the test + +- ( qvalues : additional column with adjusted pvalues ) + + +**Tests** + +topGO provides a classic fisher test for evaluating if some GO terms are over-represented in your gene list, but other options are also provided (elim, weight01,parentchild). For the merits of each option and their algorithmic descriptions, please refer to topGO manual : +https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf + +**Multiple testing corrections** + +Furthermore, the following corrections for multiple testing can also be applied : +- holm +- hochberg +- hommel +- bonferroni +- BH +- BY +- fdr
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/enrichment_v3.R Wed Aug 22 10:39:30 2018 -0400 @@ -0,0 +1,381 @@ +# enrichment_v3.R +# Usage : Rscript --vanilla enrichment_v3.R --inputtype tabfile (or +# copypaste) --input file.txt --ontology "BP/CC/MF" --option option (e.g +# : classic/elim...) --threshold threshold --correction correction --textoutput +# text --barplotoutput barplot +# --dotplotoutput dotplot --column column --geneuniver human +# e.g : Rscript --vanilla enrichment_v3.R --inputtype tabfile --input file.txt +# --ontology BP --option classic --threshold 1e-15 --correction holm +# --textoutput TRUE +# --barplotoutput TRUE --dotplotoutput TRUE --column c1 --geneuniverse +# org.Hs.eg.db +# INPUT : +# - type of input. Can be ids separated by a blank space (copypast), or a text +# file (tabfile) +# - file with at least one column of ensembl ids +# - gene ontology category : Biological Process (BP), Cellular Component (CC), Molecular Function (MF) +# - test option (relative to topGO algorithms) : elim, weight01, parentchild, or no option (classic) +# - threshold for enriched GO term pvalues (e.g : 1e-15) +# - correction for multiple testing (see p.adjust options : holm, hochberg, hommel, bonferroni, BH, BY,fdr,none +# - outputs wanted in this order text, barplot, dotplot with boolean value (e.g +# : TRUE TRUE TRUE ). +# Declare the output not wanted as none +# - column containing the ensembl ids if the input file is a tabfile +# - gene universe reference for the user chosen specie +# - header : if the input is a text file, does this text file have a header +# (TRUE/FALSE) +# +# OUTPUT : +# - outputs commanded by the user named respectively result.tsv for the text +# results file, barplot.png for the barplot image file and dotplot.png for the +# dotplot image file + + +# loading topGO library +library(topGO) + +# Read file and return file content as data.frame +readfile = function(filename, header) { + if (header == "true") { + # Read only first line of the file as header: + headers <- read.table(filename, nrows = 1, header = FALSE, sep = "\t", stringsAsFactors = FALSE, fill = TRUE, na.strings=c("", "NA"), blank.lines.skip = TRUE, quote = "") + #Read the data of the files (skipping the first row) + file <- read.table(filename, skip = 1, header = FALSE, sep = "\t", stringsAsFactors = FALSE, fill = TRUE, na.strings=c("", "NA"), blank.lines.skip = TRUE, quote = "") + # Remove empty rows + file <- file[!apply(is.na(file) | file == "", 1, all), , drop=FALSE] + #And assign the header to the data + names(file) <- headers + } + else { + file <- read.table(filename, header = FALSE, sep = "\t", stringsAsFactors = FALSE, fill = TRUE, na.strings=c("", "NA"), blank.lines.skip = TRUE, quote = "") + # Remove empty rows + file <- file[!apply(is.na(file) | file == "", 1, all), , drop=FALSE] + } + return(file) +} + +check_ens_ids <- function(vector) { + ens_pattern = "^(ENS[A-Z]+[0-9]{11}|[A-Z]{3}[0-9]{3}[A-Za-z](-[A-Za-z])?|CG[0-9]+|[A-Z0-9]+\\.[0-9]+|YM[A-Z][0-9]{3}[a-z][0-9])$" + return(grepl(ens_pattern,vector)) +} + +'%!in%' <- function(x,y)!('%in%'(x,y)) + + +# Parse command line arguments + +args = commandArgs(trailingOnly = TRUE) + +# create a list of the arguments from the command line, separated by a blank space +hh <- paste(unlist(args),collapse=' ') + +# delete the first element of the list which is always a blank space +listoptions <- unlist(strsplit(hh,'--'))[-1] + +# for each input, split the arguments with blank space as separator, unlist, +# and delete the first element which is the input name (e.g --inputtype) +options.args <- sapply(listoptions,function(x){ + unlist(strsplit(x, '[ \t\n]+'))[-1] + }) +# same as the step above, except that only the names are kept +options.names <- sapply(listoptions,function(x){ + option <- unlist(strsplit(x, '[ \t\n]+'))[1] +}) +names(options.args) <- unlist(options.names) + + +if (length(options.args) != 12) { + stop("Not enough/Too many arguments", call. = FALSE) +} + +typeinput = options.args[1] +listfile = options.args[2] +onto = as.character(options.args[3]) +option = as.character(options.args[4]) +correction = as.character(options.args[6]) +threshold = as.numeric(options.args[5]) +text = as.character(options.args[7]) +barplot = as.character(options.args[8]) +dotplot = as.character(options.args[9]) +column = as.numeric(gsub("c","",options.args[10])) +geneuniverse = as.character(options.args[11]) +header = as.character(options.args[12]) + +if (typeinput=="copypaste"){ + sample = as.data.frame(unlist(listfile)) + sample = sample[,column] +} +if (typeinput=="tabfile"){ + + if (header=="TRUE"){ + sample = readfile(listfile, "true") + }else{ + sample = readfile(listfile, "false") + } + sample = sample[,column] +} + +#check of ENS ids +if (! any(check_ens_ids(sample))){ + print("no ensembl gene ids found in your ids list, please check your IDs in input or the selected column of your input file") + stop() +} + +# Launch enrichment analysis and return result data from the analysis or the null +# object if the enrichment could not be done. +goEnrichment = function(geneuniverse,sample,onto){ + + # get all the GO terms of the corresponding ontology (BP/CC/MF) and all their + # associated ensembl ids according to the org package + xx = annFUN.org(onto,mapping=geneuniverse,ID="ensembl") + allGenes = unique(unlist(xx)) + # check if the genes given by the user can be found in the org package (gene + # universe), that is in + # allGenes + if (length(intersect(sample,allGenes))==0){ + + print("None of the input ids can be found in the org package data, enrichment analysis cannot be realized. \n The inputs ids probably have no associated GO terms.") + return(c(NULL,NULL)) + + } + + geneList = factor(as.integer(allGenes %in% sample)) + names(geneList) <- allGenes + + + #topGO enrichment + + + # Creation of a topGOdata object + # It will contain : the list of genes of interest, the GO annotations and the GO hierarchy + # Parameters : + # ontology : character string specifying the ontology of interest (BP, CC, MF) + # allGenes : named vector of type numeric or factor + # annot : tells topGO how to map genes to GO annotations. + # argument not used here : nodeSize : at which minimal number of GO annotations + # do we consider a gene + + myGOdata = new("topGOdata", description="SEA with TopGO", ontology=onto, allGenes=geneList, annot = annFUN.org, mapping=geneuniverse,ID="ensembl") + + + # Performing enrichment tests + result <- runTest(myGOdata, algorithm=option, statistic="fisher") + return(c(result,myGOdata)) +} + +# Some libraries such as GOsummaries won't be able to treat the values such as +# "< 1e-30" produced by topGO. As such it is important to delete the < char +# with the deleteInfChar function. Nevertheless the user will have access to the original results in the text output. +deleteInfChar = function(values){ + + lines = grep("<",values) + if (length(lines)!=0){ + for (line in lines){ + values[line]=gsub("<","",values[line]) + } + } + return(values) +} + +corrMultipleTesting = function(result, myGOdata,correction,threshold){ + + # adjust for multiple testing + if (correction!="none"){ + # GenTable : transforms the result object into a list. Filters can be applied + # (e.g : with the topNodes argument, to get for instance only the n first + # GO terms with the lowest pvalues), but as we want to apply a correction we + # take all the GO terms, no matter their pvalues + allRes <- GenTable(myGOdata, test = result, orderBy = "result", ranksOf = "result",topNodes=length(attributes(result)$score)) + # Some pvalues given by topGO are not numeric (e.g : "<1e-30). As such, these + # values are converted to 1e-30 to be able to correct the pvalues + pvaluestmp = deleteInfChar(allRes$test) + + # the correction is done from the modified pvalues + allRes$qvalues = p.adjust(pvaluestmp, method = as.character(correction), n = length(pvaluestmp)) + allRes = as.data.frame(allRes) + + # Rename the test column by pvalues, so that is more explicit + nb = which(names(allRes) %in% c("test")) + + names(allRes)[nb] = "pvalues" + + allRes = allRes[which(as.numeric(allRes$pvalues) <= threshold),] + if (length(allRes$pvalues)==0){ + print("Threshold was too stringent, no GO term found with pvalue equal or lesser than the threshold value") + return(NULL) + } + allRes = allRes[order(allRes$qvalues),] + } + + if (correction=="none"){ + # get all the go terms under user threshold + mysummary <- summary(attributes(result)$score <= threshold) + numsignif <- as.integer(mysummary[[3]]) + # get all significant nodes + allRes <- GenTable(myGOdata, test = result, orderBy = "result", ranksOf = "result",topNodes=numsignif) + + + allRes = as.data.frame(allRes) + # Rename the test column by pvalues, so that is more explicit + nb = which(names(allRes) %in% c("test")) + names(allRes)[nb] = "pvalues" + if (numsignif==0){ + + print("Threshold was too stringent, no GO term found with pvalue equal or lesser than the threshold value") + return(NULL) + } + + allRes = allRes[order(allRes$pvalues),] + } + + return(allRes) +} + +# roundValues will simplify the results by rounding down the values. For instance 1.1e-17 becomes 1e-17 +roundValues = function(values){ + for (line in 1:length(values)){ + values[line]=as.numeric(gsub(".*e","1e",as.character(values[line]))) + } + return(values) +} + +createDotPlot = function(data, onto){ + + values = deleteInfChar(data$pvalues) + values = roundValues(values) + values = as.numeric(values) + + geneRatio = data$Significant/data$Annotated + goTerms = data$Term + count = data$Significant + + labely = paste("GO terms",onto,sep=" ") + png(filename="dotplot.png",res=300, width = 3200, height = 3200, units = "px") + sp1 = ggplot(data,aes(x=geneRatio,y=goTerms, color=values,size=count)) +geom_point() + scale_colour_gradientn(colours=c("red","violet","blue")) + xlab("Gene Ratio") + ylab(labely) + labs(color="p-values\n") + + plot(sp1) + dev.off() +} + +createBarPlot = function(data, onto){ + + + values = deleteInfChar(data$pvalues) + values = roundValues(values) + + values = as.numeric(values) + goTerms = data$Term + count = data$Significant + png(filename="barplot.png",res=300, width = 3200, height = 3200, units = "px") + + labely = paste("GO terms",onto,sep=" ") + p<-ggplot(data, aes(x=goTerms, y=count,fill=values)) + ylab("Gene count") + xlab(labely) +geom_bar(stat="identity") + scale_fill_gradientn(colours=c("red","violet","blue")) + coord_flip() + labs(fill="p-values\n") + plot(p) + dev.off() +} + + +# Produce the different outputs +createOutputs = function(result, cut_result,text, barplot, dotplot, onto){ + + + if (is.null(result)){ + + if (text=="TRUE"){ + + err_msg = "None of the input ids can be found in the org package data, enrichment analysis cannot be realized. \n The inputs ids probably either have no associated GO terms or are not ENSG identifiers (e.g : ENSG00000012048)." + write.table(err_msg, file='result.csv', quote=FALSE, sep='\t', col.names = T, row.names = F) + + } + + if (barplot=="TRUE"){ + + png(filename="barplot.png") + plot.new() + #text(0,0,err_msg) + dev.off() + } + + if (dotplot=="TRUE"){ + + png(filename="dotplot.png") + plot.new() + #text(0,0,err_msg) + dev.off() + + } + return(TRUE) + } + + + if (is.null(cut_result)){ + + + if (text=="TRUE"){ + + err_msg = "Threshold was too stringent, no GO term found with pvalue equal or lesser than the threshold value." + write.table(err_msg, file='result.csv', quote=FALSE, sep='\t', col.names = T, row.names = F) + + } + + if (barplot=="TRUE"){ + + png(filename="barplot.png") + plot.new() + text(0,0,err_msg) + dev.off() + } + + if (dotplot=="TRUE"){ + + png(filename="dotplot.png") + plot.new() + text(0,0,err_msg) + dev.off() + + } + return(TRUE) + + + + } + + if (text=="TRUE"){ + write.table(cut_result, file='result.csv', quote=FALSE, sep='\t', col.names = T, row.names = F) + } + + if (barplot=="TRUE"){ + + createBarPlot(cut_result, onto) + } + + if (dotplot=="TRUE"){ + + createDotPlot(cut_result, onto) + } + return(TRUE) +} + + + +# Load R library ggplot2 to plot graphs +library(ggplot2) + +# Launch enrichment analysis +allresult = goEnrichment(geneuniverse,sample,onto) +result = allresult[1][[1]] +myGOdata = allresult[2][[1]] +if (!is.null(result)){ + + # Adjust the result with a multiple testing correction or not and with the user + # p-value cutoff + cut_result = corrMultipleTesting(result,myGOdata, correction,threshold) +}else{ + + cut_result=NULL + +} + + +createOutputs(result, cut_result,text, barplot, dotplot, onto) +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/ID_Converter_FKW_Lacombe_et_al_2017_OK.txt Wed Aug 22 10:39:30 2018 -0400 @@ -0,0 +1,152 @@ +Protein accession number (UniProt) Protein name Number of peptides (razor + unique) neXtProt_ID UniProt.ID GeneID MIM Ensembl +P15924 Desmoplakin 69 NX_P15924 DESP_HUMAN 1832 125647; 605676; 607450; 607655; 609638; 612908; 615821 ENSG00000096696 +P02538 Keratin, type II cytoskeletal 6A 53 NX_P02538 K2C6A_HUMAN 3853 148041; 615726 ENSG00000205420 +P02768 Serum albumin 44 NX_P02768 ALBU_HUMAN 213 103600; 615999; 616000 ENSG00000163631 +P08779 Keratin, type I cytoskeletal 16 29 NX_P08779 K1C16_HUMAN 3868 148067; 167200; 613000 ENSG00000186832 +Q02413 Desmoglein-1 24 NX_Q02413 DSG1_HUMAN 1828 125670; 148700; 615508 ENSG00000134760 +P07355 Annexin A2;Putative annexin A2-like protein 22 NX_P07355 ANXA2_HUMAN 302 151740 ENSG00000182718 +P14923 Junction plakoglobin 22 NX_P14923 PLAK_HUMAN 3728 173325; 601214; 611528 ENSG00000173801 +P02788 Lactotransferrin 21 NX_P02788 TRFL_HUMAN 4057 150210 ENSG00000012223 +Q9HC84 Mucin-5B 21 NX_Q9HC84 MUC5B_HUMAN 727897 178500; 600770 ENSG00000117983 +P29508 Serpin B3 20 NX_P29508 SPB3_HUMAN 6317 600517 ENSG00000057149 +P63261 Actin, cytoplasmic 2 19 NX_P63261 ACTG_HUMAN 71 102560; 604717; 614583 ENSG00000184009 +Q8N1N4 Keratin, type II cytoskeletal 78 18 NX_Q8N1N4 K2C78_HUMAN 196374 611159 ENSG00000170423 +Q04695 Keratin, type I cytoskeletal 17 18 NX_Q04695 K1C17_HUMAN 3872 148069; 167210; 184500 ENSG00000128422 +P01876 Ig alpha-1 chain C region 16 NX_P01876 IGHA1_HUMAN NA 146900 ENSG00000211895; ENSG00000282633 +Q01469 Fatty acid-binding protein 5, epidermal 15 NX_Q01469 FABP5_HUMAN 2171 605168 ENSG00000164687 +P31944 Caspase-14 15 NX_P31944 CASPE_HUMAN 23581 605848; 617320 ENSG00000105141 +P01833 Polymeric immunoglobulin receptor 15 NX_P01833 PIGR_HUMAN 5284 173880 ENSG00000162896 +P06733 Alpha-enolase 15 NX_P06733 ENOA_HUMAN 2023 172430 ENSG00000074800 +P25311 Zinc-alpha-2-glycoprotein 15 NX_P25311 ZA2G_HUMAN 563 194460 ENSG00000160862 +Q15149 Plectin 15 NX_Q15149 PLEC_HUMAN 5339 131950; 226670; 601282; 612138; 613723; 616487 ENSG00000178209 +P19013 Keratin, type II cytoskeletal 4 13 NX_P19013 K2C4_HUMAN NA 123940; 193900 ENSG00000170477 +Q6KB66 Keratin, type II cytoskeletal 80 13 NX_Q6KB66 K2C80_HUMAN 144501 611161 ENSG00000167767 +Q08188 Protein-glutamine gamma-glutamyltransferase E 12 NX_Q08188 TGM3_HUMAN 7053 600238; 617251 ENSG00000125780 +P13646 Keratin, type I cytoskeletal 13 11 NX_P13646 K1C13_HUMAN 3860 148065; 615785 ENSG00000171401 +Q86YZ3 Hornerin 11 NX_Q86YZ3 HORN_HUMAN 388697 616293 ENSG00000197915 +P04259 Keratin, type II cytoskeletal 6B 10 NX_P04259 K2C6B_HUMAN 3854 148042; 615728 ENSG00000185479 +P02545 Prelamin-A/C;Lamin-A/C 10 NX_P02545 LMNA_HUMAN 4000 115200; 150330; 151660; 159001; 176670; 181350; 212112; 248370; 275210; 605588; 610140; 613205; 616516 ENSG00000160789 +P04083 Annexin A1 10 NX_P04083 ANXA1_HUMAN 301 151690 ENSG00000135046 +P11021 78 kDa glucose-regulated protein 10 NX_P11021 GRP78_HUMAN 3309 138120 ENSG00000044574 +P02787 Serotransferrin 9 NX_P02787 TRFE_HUMAN 7018 190000; 209300 ENSG00000091513 +P04040 Catalase 9 NX_P04040 CATA_HUMAN 847 115500; 614097 ENSG00000121691 +P31151 Protein S100-A7 9 NX_P31151 S10A7_HUMAN 6278 600353 ENSG00000143556 +P31947 14-3-3 protein sigma 9 NX_P31947 1433S_HUMAN 2810 601290 ENSG00000175793 +Q96P63 Serpin B12 9 NX_Q96P63 SPB12_HUMAN 89777 615662 ENSG00000166634 +P14618 Pyruvate kinase PKM 9 NX_P14618 KPYM_HUMAN 5315 179050 ENSG00000067225 +P60174 Triosephosphate isomerase 9 NX_P60174 TPIS_HUMAN 7167 190450; 615512 ENSG00000111669 +Q06830 Peroxiredoxin-1 9 NX_Q06830 PRDX1_HUMAN 5052 176763 ENSG00000117450 +P01040 Cystatin-A 8 NX_P01040 CYTA_HUMAN 1475 184600; 607936 ENSG00000121552 +P05089 Arginase-1 8 NX_P05089 ARGI1_HUMAN 383 207800; 608313 ENSG00000118520 +P01834 Ig kappa chain C region 8 NX_P01834 IGKC_HUMAN NA 147200; 614102 NA +P04406 Glyceraldehyde-3-phosphate dehydrogenase 8 NX_P04406 G3P_HUMAN 2597 138400 ENSG00000111640 +P0DMV9 Heat shock 70 kDa protein 1B 8 NX_P0DMV9 HS71B_HUMAN 3303; 3304 140550; 603012 ENSG00000204388; ENSG00000224501; ENSG00000212866; ENSG00000231555; ENSG00000232804 +P13639 Elongation factor 2 8 NX_P13639 EF2_HUMAN 1938 130610; 609306 ENSG00000167658 +P35579 Myosin-9 8 NX_P35579 MYH9_HUMAN 4627 153640; 153650; 155100; 160775; 600208; 603622; 605249 ENSG00000100345 +P68371 Tubulin beta-4B chain 8 NX_P68371 TBB4B_HUMAN 10383 602660 ENSG00000188229 +Q8WVV4 Protein POF1B 8 NX_Q8WVV4 POF1B_HUMAN 79983 300603; 300604 ENSG00000124429 +O75635 Serpin B7 7 NX_O75635 SPB7_HUMAN 8710 603357; 615598 ENSG00000166396 +P01857 Ig gamma-1 chain C region 7 NX_P01857 IGHG1_HUMAN NA 147100; 254500 ENSG00000211896; ENSG00000277633 +P61626 Lysozyme C 7 NX_P61626 LYSC_HUMAN 4069 105200; 153450 ENSG00000090382 +P68363 Tubulin alpha-1B chain 7 NX_P68363 TBA1B_HUMAN 10376 602530 ENSG00000123416 +P01009 Alpha-1-antitrypsin;Short peptide from AAT 6 NX_P01009 A1AT_HUMAN 5265 107400; 613490 ENSG00000197249 +P07900 Heat shock protein HSP 90-alpha 6 NX_P07900 HS90A_HUMAN 3320 140571 ENSG00000080824 +Q9NZH8 Interleukin-36 gamma 6 NX_Q9NZH8 IL36G_HUMAN 56300 605542 ENSG00000136688 +O43707 Alpha-actinin-4;Alpha-actinin-1 6 NX_O43707 ACTN4_HUMAN 81 603278; 604638 ENSG00000130402; ENSG00000282844 +O75223 Gamma-glutamylcyclotransferase 6 NX_O75223 GGCT_HUMAN 79017 137170 ENSG00000006625 +P00338 L-lactate dehydrogenase A chain 6 NX_P00338 LDHA_HUMAN 3939 150000; 612933 ENSG00000134333 +P07339 Cathepsin D 6 NX_P07339 CATD_HUMAN 1509 116840; 610127 ENSG00000117984 +P62987 Ubiquitin-60S ribosomal protein L40 6 NX_P62987 RL40_HUMAN 7311 191321 ENSG00000221983 +P10599 Thioredoxin 6 NX_P10599 THIO_HUMAN 7295 187700 ENSG00000136810 +Q9UGM3 Deleted in malignant brain tumors 1 protein 6 NX_Q9UGM3 DMBT1_HUMAN 1755 137800; 601969 ENSG00000187908 +Q9UI42 Carboxypeptidase A4 6 NX_Q9UI42 CBPA4_HUMAN 51200 607635 ENSG00000128510 +P47929 Galectin-7 5 NX_P47929 LEG7_HUMAN 3963; 653499 600615; 617139 ENSG00000178934; ENSG00000205076; ENSG00000282902; ENSG00000283082 +Q13867 Bleomycin hydrolase 5 NX_Q13867 BLMH_HUMAN 642 602403 ENSG00000108578 +Q6P4A8 Phospholipase B-like 1 5 NX_Q6P4A8 PLBL1_HUMAN 79887 NA ENSG00000121316 +O75369 Filamin-B 5 NX_O75369 FLNB_HUMAN 2317 108720; 108721; 112310; 150250; 272460; 603381 ENSG00000136068 +P00441 Superoxide dismutase [Cu-Zn] 5 NX_P00441 SODC_HUMAN 6647 105400; 147450 ENSG00000142168 +P04792 Heat shock protein beta-1 5 NX_P04792 HSPB1_HUMAN 3315 602195; 606595; 608634 ENSG00000106211 +P11142 Heat shock cognate 71 kDa protein 5 NX_P11142 HSP7C_HUMAN 3312 600816 ENSG00000109971 +P58107 Epiplakin 5 NX_P58107 EPIPL_HUMAN 83481 607553 NA +P60842 Eukaryotic initiation factor 4A-I 5 NX_P60842 IF4A1_HUMAN 1973 602641 ENSG00000161960 +P62937 Peptidyl-prolyl cis-trans isomerase A 5 NX_P62937 PPIA_HUMAN 5478 123840 ENSG00000196262 +P63104 14-3-3 protein zeta/delta 5 NX_P63104 1433Z_HUMAN 7534 601288 ENSG00000164924 +Q92820 Gamma-glutamyl hydrolase 5 NX_Q92820 GGH_HUMAN 8836 601509 ENSG00000137563 +O75342 Arachidonate 12-lipoxygenase, 12R-type 4 NX_O75342 LX12B_HUMAN 242 242100; 603741 ENSG00000179477 +P09211 Glutathione S-transferase P 4 NX_P09211 GSTP1_HUMAN 2950 134660 ENSG00000084207 +P31025 Lipocalin-1 4 NX_P31025 LCN1_HUMAN 3933 151675 ENSG00000160349 +P48594 Serpin B4 4 NX_P48594 SPB4_HUMAN 6318 600518 ENSG00000206073 +Q14574 Desmocollin-3 4 NX_Q14574 DSC3_HUMAN 1825 600271; 613102 ENSG00000134762 +Q5T750 Skin-specific protein 32 4 NX_Q5T750 XP32_HUMAN 100129271 NA ENSG00000198854 +Q6UWP8 Suprabasin 4 NX_Q6UWP8 SBSN_HUMAN 374897 609969 ENSG00000189001 +O60911 Cathepsin L2 4 NX_O60911 CATL2_HUMAN 1515 603308 ENSG00000136943 +P00558 Phosphoglycerate kinase 1 4 NX_P00558 PGK1_HUMAN 5230 300653; 311800 ENSG00000102144 +P04075 Fructose-bisphosphate aldolase A 4 NX_P04075 ALDOA_HUMAN 226 103850; 611881 ENSG00000149925 +P07384 Calpain-1 catalytic subunit 4 NX_P07384 CAN1_HUMAN 823 114220; 616907 ENSG00000014216 +P0CG05 Ig lambda-2 chain C regions 4 NA NA NA NA NA +P18206 Vinculin 4 NX_P18206 VINC_HUMAN 7414 193065; 611407; 613255 ENSG00000035403 +P62258 14-3-3 protein epsilon 4 NX_P62258 1433E_HUMAN 7531 605066 ENSG00000108953; ENSG00000274474 +P68871 Hemoglobin subunit beta 4 NX_P68871 HBB_HUMAN 3043 140700; 141900; 603902; 603903; 611162; 613985 ENSG00000244734 +Q9C075 Keratin, type I cytoskeletal 23 4 NX_Q9C075 K1C23_HUMAN 25984 606194 ENSG00000108244; ENSG00000263309 +A8K2U0 Alpha-2-macroglobulin-like protein 1 3 NX_A8K2U0 A2ML1_HUMAN 144568 610627 ENSG00000166535 +P00738 Haptoglobin 3 NX_P00738 HPT_HUMAN 3240 140100; 614081 ENSG00000257017 +P01011 Alpha-1-antichymotrypsin 3 NX_P01011 AACT_HUMAN 12 107280 ENSG00000196136 +P02763 Alpha-1-acid glycoprotein 1 3 NX_P02763 A1AG1_HUMAN 5004 138600 ENSG00000229314 +P18510 Interleukin-1 receptor antagonist protein 3 NX_P18510 IL1RA_HUMAN 3557 147679; 612628; 612852 ENSG00000136689 +P22528 Cornifin-B 3 NX_P22528 SPR1B_HUMAN 6699 182266 ENSG00000169469 +P30740 Leukocyte elastase inhibitor 3 NX_P30740 ILEU_HUMAN 1992 130135 ENSG00000021355 +P80188 Neutrophil gelatinase-associated lipocalin 3 NX_P80188 NGAL_HUMAN 3934 600181 ENSG00000148346 +Q15828 Cystatin-M 3 NX_Q15828 CYTM_HUMAN 1474 601891 ENSG00000175315 +Q9HCY8 Protein S100-A14 3 NX_Q9HCY8 S10AE_HUMAN 57402 607986 ENSG00000189334 +P01623 Ig kappa chain V-III region 3 NA NA NA NA NA +P01877 Ig alpha-2 chain C region 3 NX_P01877 IGHA2_HUMAN NA 147000 ENSG00000211890 +P06396 Gelsolin 3 NX_P06396 GELS_HUMAN 2934 105120; 137350 ENSG00000148180 +P14735 Insulin-degrading enzyme 3 NX_P14735 IDE_HUMAN 3416 146680 ENSG00000119912 +P20933 N(4)-(beta-N-acetylglucosaminyl)-L-asparaginase 3 NX_P20933 ASPG_HUMAN 175 208400; 613228 ENSG00000038002 +P25788 Proteasome subunit alpha type-3 3 NX_P25788 PSA3_HUMAN 5684 176843; 176845 ENSG00000100567 +P26641 Elongation factor 1-gamma 3 NX_P26641 EF1G_HUMAN 1937 130593 ENSG00000254772 +P36952 Serpin B5 3 NX_P36952 SPB5_HUMAN 5268 154790 ENSG00000206075 +P40926 Malate dehydrogenase, mitochondrial 3 NX_P40926 MDHM_HUMAN 4191 154100; 617339 ENSG00000146701 +Q9Y6R7 IgGFc-binding protein 3 NX_Q9Y6R7 FCGBP_HUMAN 8857 617553 ENSG00000281123 +O95274 Ly6/PLAUR domain-containing protein 3 2 NX_O95274 LYPD3_HUMAN 27076 609484 ENSG00000124466 +P00491 Purine nucleoside phosphorylase 2 NX_P00491 PNPH_HUMAN 4860 164050; 613179 ENSG00000198805 +P04080 Cystatin-B 2 NX_P04080 CYTB_HUMAN 1476 254800; 601145 ENSG00000160213 +P09972 Fructose-bisphosphate aldolase C 2 NX_P09972 ALDOC_HUMAN 230 103870 ENSG00000109107 +P19012 Keratin, type I cytoskeletal 15 2 NX_P19012 K1C15_HUMAN 3866 148030 ENSG00000171346 +P20930 Filaggrin 2 NX_P20930 FILA_HUMAN 2312 135940; 146700; 605803 ENSG00000143631 +Q96FX8 p53 apoptosis effector related to PMP-22 2 NX_Q96FX8 PERP_HUMAN 64065 609301 ENSG00000112378 +Q9UIV8 Serpin B13 2 NX_Q9UIV8 SPB13_HUMAN 5275 604445 ENSG00000197641 +P01625 Ig kappa chain V-IV region Len 2 NA NA NA NA NA +P01765 Ig heavy chain V-III region TIL 2 NA NA NA NA NA +P01766 Ig heavy chain V-III region BRO 2 NX_P01766 HV313_HUMAN NA NA ENSG00000211942; ENSG00000282286 +P01860 Ig gamma-3 chain C region 2 NX_P01860 IGHG3_HUMAN NA 147120 NA +P01871 Ig mu chain C region 2 NX_P01871 IGHM_HUMAN NA 147020; 601495 ENSG00000211899; ENSG00000282657 +P05090 Apolipoprotein D 2 NX_P05090 APOD_HUMAN 347 107740 ENSG00000189058 +P06870 Kallikrein-1 2 NX_P06870 KLK1_HUMAN 3816 147910; 615953 ENSG00000167748 +P07858 Cathepsin B 2 NX_P07858 CATB_HUMAN 1508 116810 ENSG00000164733 +P08865 40S ribosomal protein SA 2 NX_P08865 RSSA_HUMAN 3921 150370; 271400 ENSG00000168028 +P11279 Lysosome-associated membrane glycoprotein 1 2 NX_P11279 LAMP1_HUMAN 3916 153330 ENSG00000185896 +P13473 Lysosome-associated membrane glycoprotein 2 2 NX_P13473 LAMP2_HUMAN 3920 300257; 309060 ENSG00000005893 +P19971 Thymidine phosphorylase 2 NX_P19971 TYPH_HUMAN 1890 131222; 603041 ENSG00000025708 +P23284 Peptidyl-prolyl cis-trans isomerase B 2 NX_P23284 PPIB_HUMAN 5479 123841; 259440 ENSG00000166794 +P23396 40S ribosomal protein S3 2 NX_P23396 RS3_HUMAN 6188 600454 ENSG00000149273 +P25705 ATP synthase subunit alpha, mitochondrial 2 NX_P25705 ATPA_HUMAN 498 164360; 615228; 616045 ENSG00000152234 +P27482 Calmodulin-like protein 3 2 NX_P27482 CALL3_HUMAN 810 114184 ENSG00000178363 +P31949 Protein S100-A11 2 NX_P31949 S10AB_HUMAN 6282 603114 ENSG00000163191 +P40121 Macrophage-capping protein 2 NX_P40121 CAPG_HUMAN 822 153615 ENSG00000042493 +P42357 Histidine ammonia-lyase 2 NX_P42357 HUTH_HUMAN 3034 235800; 609457 ENSG00000084110 +P47756 F-actin-capping protein subunit beta 2 NX_P47756 CAPZB_HUMAN 832 601572 ENSG00000077549 +P48637 Glutathione synthetase 2 NX_P48637 GSHB_HUMAN 2937 231900; 266130; 601002 ENSG00000100983 +P49720 Proteasome subunit beta type-3 2 NX_P49720 PSB3_HUMAN 5691 602176 ENSG00000277791; ENSG00000275903 +P50395 Rab GDP dissociation inhibitor beta 2 NX_P50395 GDIB_HUMAN 2665 600767 ENSG00000057608 +P59998 Actin-related protein 2/3 complex subunit 4 2 NX_P59998 ARPC4_HUMAN 10093 604226 ENSG00000241553 +P61160 Actin-related protein 2 2 NX_P61160 ARP2_HUMAN 10097 604221 ENSG00000138071 +P61916 Epididymal secretory protein E1 2 NX_P61916 NPC2_HUMAN 10577 601015; 607625 ENSG00000119655 +P04745 Alpha-amylase 1 23 NX_P04745 AMY1_HUMAN 276; 277; 278 104700; 104701; 104702 ENSG00000174876; ENSG00000187733; ENSG00000237763 +Q9NZT1 Calmodulin-like protein 5 8 NX_Q9NZT1 CALL5_HUMAN 51806 605183 ENSG00000178372 +P12273 Prolactin-inducible protein 6 NX_P12273 PIP_HUMAN 5304 176720 ENSG00000159763 +Q96DA0 Zymogen granule protein 16 homolog B 5 NX_Q96DA0 ZG16B_HUMAN 124220 NA ENSG00000162078; ENSG00000283056 +P01036 Cystatin-S 5 NX_P01036 CYTS_HUMAN 1472 123857 ENSG00000101441 +Q8TAX7 Mucin-7 2 NX_Q8TAX7 MUC7_HUMAN 4589 158375; 600807 ENSG00000171195 +P01037 Cystatin-SN 2 NX_P01037 CYTN_HUMAN 1469 123855 ENSG00000170373 +P09228 Cystatin-SA 2 NX_P09228 CYTT_HUMAN 1470 123856 ENSG00000170369
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/Text_output_for_topGO_analysis_BP_category.tabular Wed Aug 22 10:39:30 2018 -0400 @@ -0,0 +1,36 @@ +GO.ID Term Annotated Significant Expected pvalues qvalues +GO:0070268 cornification 125 19 0.85 1.3e-20 1.007175e-16 +GO:0001895 retina homeostasis 72 13 0.49 1.9e-15 9.8135e-12 +GO:0010951 negative regulation of endopeptidase act... 253 19 1.72 8.7e-15 3.3701625e-11 +GO:0061621 canonical glycolysis 28 7 0.19 6.1e-10 1.89039e-06 +GO:0018149 peptide cross-linking 60 7 0.41 1.7e-07 0.000420578571428571 +GO:0061436 establishment of skin barrier 20 5 0.14 1.9e-07 0.000420578571428571 +GO:0042542 response to hydrogen peroxide 127 11 0.87 4.1e-07 0.00079411875 +GO:0002576 platelet degranulation 142 9 0.97 5.6e-07 0.000964133333333333 +GO:0098869 cellular oxidant detoxification 112 8 0.76 9.8e-07 0.00151851 +GO:0006094 gluconeogenesis 82 7 0.56 1.4e-06 0.00197209090909091 +GO:0001580 detection of chemical stimulus involved ... 55 6 0.37 2.0e-06 0.0025825 +GO:0007568 aging 297 11 2.02 6.1e-06 0.00727073076923077 +GO:0042744 hydrogen peroxide catabolic process 23 4 0.16 1.6e-05 0.017561 +GO:0045104 intermediate filament cytoskeleton organ... 47 5 0.32 1.7e-05 0.017561 +GO:0002934 desmosome organization 10 3 0.07 3.6e-05 0.03486375 +GO:0042493 response to drug 434 12 2.96 4.2e-05 0.0375142105263158 +GO:0045471 response to ethanol 138 7 0.94 4.5e-05 0.0375142105263158 +GO:0061740 protein targeting to lysosome involved i... 2 2 0.01 4.6e-05 0.0375142105263158 +GO:0070527 platelet aggregation 61 5 0.42 6.0e-05 0.046485 +GO:0046686 response to cadmium ion 64 5 0.44 7.5e-05 0.0553392857142857 +GO:0046718 viral entry into host cell 154 7 1.05 9.0e-05 0.0633886363636364 +GO:0043163 cell envelope organization 3 2 0.02 0.00014 0.0943173913043478 +GO:0070301 cellular response to hydrogen peroxide 83 5 0.57 0.00026 0.166869230769231 +GO:1903923 positive regulation of protein processin... 4 2 0.03 0.00027 0.166869230769231 +GO:0046716 muscle cell cellular homeostasis 19 3 0.13 0.00028 0.166869230769231 +GO:0051016 barbed-end actin filament capping 21 3 0.14 0.00038 0.218077777777778 +GO:0033591 response to L-ascorbic acid 5 2 0.03 0.00045 0.249026785714286 +GO:0019730 antimicrobial humoral response 95 5 0.65 0.00048 0.256468965517241 +GO:0006953 acute-phase response 56 4 0.38 0.00057 0.294405 +GO:0086073 bundle of His cell-Purkinje myocyte adhe... 6 2 0.04 0.00068 0.32926875 +GO:0071638 negative regulation of monocyte chemotac... 6 2 0.04 0.00068 0.32926875 +GO:0031069 hair follicle morphogenesis 27 3 0.18 0.00080 0.375636363636364 +GO:0048102 autophagic cell death 7 2 0.05 0.00095 0.408895833333333 +GO:0009635 response to herbicide 7 2 0.05 0.00095 0.408895833333333 +GO:0044829 positive regulation by host of viral gen... 7 2 0.05 0.00095 0.408895833333333
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/topGO.xml Wed Aug 22 10:39:30 2018 -0400 @@ -0,0 +1,265 @@ +<tool id="topGO" name="topGO" version="2018.08.22"> + <description> + Enrichment analysis for Gene Ontology + </description> + <requirements> + <requirement type="package" version="3.4.1">R</requirement> + <requirement type="package" version="2.2.1">r-ggplot2</requirement> + <requirement type="package" version="3.5.0">bioconductor-org.hs.eg.db</requirement> + <requirement type="package" version="1.56.0">bioconductor-graph</requirement> + <requirement type="package" version="1.40.0">bioconductor-annotationdbi</requirement> + <requirement type="package" version="3.5.0">bioconductor-go.db</requirement> + <requirement type="package" version="2.30.0">bioconductor-topgo</requirement> + </requirements> + <stdio> + <exit_code range="1:" /> + </stdio> + <command><![CDATA[ + + #if $inputtype.filetype == "file_all": + Rscript --vanilla $__tool_directory__/enrichment_v3.R + --inputtype tabfile + --input '$inputtype.genelist' + --ontology '$ontocat' + --option '$option' + --threshold '$threshold' + --correction '$correction' + --textoutput '$condtext.textoutput' + --barplotoutput '$condbar.barplotoutput' + --dotplotoutput '$conddot.dotplotoutput' + --column '$inputtype.column' + --geneuniverse '$geneuniverse' + --header '$inputtype.header' + #end if + + + #if $inputtype.filetype == "copy_paste": + Rscript --vanilla $__tool_directory__/enrichment_v3.R + --inputtype copypaste + --input '$inputtype.genelist' + --ontology '$ontocat' + --option '$option' + --threshold '$threshold' + --correction '$correction' + --textoutput '$condtext.textoutput' + --barplotoutput '$condbar.barplotoutput' + --dotplotoutput '$conddot.dotplotoutput' + --column c1 + --geneuniverse '$geneuniverse' + --header None + #end if + + + + ]]></command> + + <inputs> + <conditional name="inputtype"> + <param name="filetype" type="select" label="Select your type of input file" help="The identifiers must be Ensembl gene IDs (e.g : ENSG00000139618). If it is not the case, please use the ID Mapping tool."> + <option value="file_all" selected="true">Input file containing your identifiers</option> + <option value="copy_paste">Copy/paste your list of IDs</option> + </param> + <when value="copy_paste"> + <param name="genelist" type="text" label="Enter a list of identifiers"> + <sanitizer> + <valid initial="string.printable"> + <remove value="'"/> + </valid> + <mapping initial="none"> + <add source="'" target="__sq__"/> + </mapping> + </sanitizer> + </param> + </when> + <when value="file_all"> + <param name="genelist" type="data" format="txt,tabular" label="Choose an input file" help="This file must imperatively have 1 column filled with IDs consistent with the database that will be used. Please use the MappingIDs component if this is not the case."/> + <param name="column" type="text" label="Please specify the column where you would like to apply the comparison (e.g : Enter c1)" value="c1"/> + + <param name="header" type="select" label="Does your file have a header?" multiple="false" optional="false"> + <option value="TRUE" selected="true">Yes</option> + <option value="FALSE" selected="false">No</option> + </param> + </when> + </conditional> + <param name="geneuniverse" type="select" label="Select a specie"> + <option value="org.At.tair.db" >Arabidopsis</option> + <option value="org.Ce.eg.db" >C.elegans</option> + <option value="org.Dm.eg.db" >Fly</option> + <option value="org.Hs.eg.db" selected="true">Human</option> + <option value="org.Mm.eg.db" >Mouse</option> + <option value="org.Sc.sqd.db" >Yeast</option> + </param> + + <param name="ontocat" type="select" label="Ontology category"> + <option value="BP" >Biological Process</option> + <option value="CC" >Cellular Component</option> + <option value="MF" >Molecular Function</option> + </param> + + <param name="option" type="select" label="Choose the topGO option for your analysis"> + <option value="classic" >Classic fisher test</option> + <option value="elim" selected="true">Elim</option> + <option value="weight01" >Weight01</option> + <option value="parentchild" >ParentChild</option> + </param> + <param name="threshold" type="text" label="Enter the p-value threshold level under the form 1e-level wanted (e.g : 1e-3)" value="1e-3"/> + <param name="correction" label="Choose a correction for multiple testing" type="select"> + <option value="none" >None</option> + <option value="holm">Holm correction</option> + <option value="hochberg" >Hochberg correction</option> + <option value="hommel" >Hommel correction</option> + <option value="bonferroni" >Bonferroni correction</option> + <option value="BH" selected="true">Benjamini and Hochberg</option> + <option value="BY" >Benjamini and Yekutieli</option> + <option value="fdr" >FDR</option> + </param> + <conditional name="condtext"> + <param name="textoutput" type="select" label="Generate a text file for results"> + <option value="TRUE">Yes</option> + <option value="FALSE">No</option> + </param> + <when value="TRUE"/> + <when value="FALSE"/> + </conditional> + <conditional name="condbar"> + <param name="barplotoutput" type="select" label="Generate a barplot of over-represented GO terms"> + <option value="TRUE">Yes</option> + <option value="FALSE">No</option> + </param> + <when value="TRUE"/> + <when value="FALSE"/> + </conditional> + <conditional name="conddot"> + <param name="dotplotoutput" type="select" label="Generate a dotplot of over-represented GO terms"> + <option value="TRUE">Yes</option> + <option value="FALSE">No</option> + </param> + <when value="TRUE"/> + <when value="FALSE"/> + </conditional> + </inputs> + <outputs> + + <data name="outputtext" format="tabular" label="Text output for topGO analysis $ontocat category" from_work_dir="result.csv"> + <filter>condtext['textoutput']=="TRUE"</filter> + </data> + + <data name="outputbarplot" format="png" label="Barplot output for topGO analysis $ontocat category" from_work_dir="barplot.png"> + <filter>condbar['barplotoutput']=="TRUE"</filter> + </data> + + <data name="outputdotplot" format="png" label="Dotplot output for topGO analysis $ontocat category" from_work_dir="dotplot.png"> + <filter>conddot['dotplotoutput']=="TRUE"</filter> + </data> + + </outputs> + <tests> + <test> + <conditional name="inputtype"> + <param name="filetype " value="tabfile"/> + <param name="genelist" value="ID_Converter_FKW_Lacombe_et_al_2017_OK.txt"/> + <param name="column" value="c8"/> + <param name="header" value="TRUE"/> + </conditional> + <param name="ontocat" value="BP"/> + <param name="option" value="elim"/> + <param name="threshold" value="1e-3"/> + <param name="correction" value="BH"/> + <conditional name="condtext"> + <param name="textoutput" value="TRUE"/> + </conditional> + <conditional name="condbar"> + <param name="barplotoutput" value="TRUE"/> + </conditional> + <conditional name="conddot"> + <param name="dotoutput" value="TRUE"/> + </conditional> + <param name="geneuniverse" value="org.Hs.eg.db"/> + <output name="outputtext" file="Text_output_for_topGO_analysis_BP_category.tabular"/> + <output name="outputbarplot" file="Barplot_output_for_topGO_analysis_BP_category.png"/> + <output name="outputdotplot" file="Dotplot_output_for_topGO_analysis_BP_category.png"/> + </test> + </tests> + <help><![CDATA[ + + +**Galaxy component based on R package topGO.** + +**Input required** + +This component works with Ensembl gene ids (e.g : ENSG0000013618). You can +copy/paste these identifiers or supply a tabular file (.csv, .tsv, .txt, .tab) +where there are contained. + +**Principle** + +This component provides the GO terms representativity of a gene list in one ontology category (Biological Process "BP", Cellular Component "CC", Molecular Function "MF"). This representativity is evaluated in comparison to the background list of all human genes associated associated with GO terms of the chosen category (BP,CC,MF). This background is given by the R package "org.Hs.eg.db", which is a genome wide association package for **human**. + +**Output** + +Three kind of outputs are available : a textual output, a barplot output and +a dotplot output. + +*Textual output* : +The text output lists all the GO-terms that were found significant under the specified threshold. + + +The different fields are as follow : + +- Annotated : number of genes in org.Hs.eg.db which are annotated with the GO-term. + +- Significant : number of genes belonging to your input which are annotated with the GO-term. + +- Expected : show an estimate of the number of genes a node of size Annotated would have if the significant genes were to be randomly selected from the gene universe. + +- pvalues : pvalue obtained after the test + +- ( qvalues : additional column with adjusted pvalues ) + + +**Tests** + +topGO provides a classic fisher test for evaluating if some GO terms are over-represented in your gene list, but other options are also provided (elim, weight01,parentchild). For the merits of each option and their algorithmic descriptions, please refer to topGO manual : +https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf + +**Multiple testing corrections** + +Furthermore, the following corrections for multiple testing can also be applied : + +- holm + +- hochberg + +- hommel + +- bonferroni + +- BH + +- BY + +- fdr + +----- + +.. class:: infomark + +**Authors** + +Alexa A and Rahnenfuhrer J (2016). topGO: Enrichment Analysis for Gene Ontology. R package version 2.30.0. + +**Galaxy integration** + +Lisa Peru, T.P. Lien Nguyen, Florence Combes, Yves Vandenbrouck CEA, INSERM, CNRS, Grenoble-Alpes University, BIG Institute, FR + +Sandra Dérozier, Olivier Rué, Christophe Caron, Valentin Loux INRA, Paris-Saclay University, MAIAGE Unit, Migale Bioinformatics platform + +This work has been partially funded through the French National Agency for Research (ANR) IFB project. + +Contact support@proteore.org for any questions or concerns about the Galaxy implementation of this tool. + +]]></help> + <citations> + </citations> + +</tool>
