Mercurial > repos > iuc > mageck_pathway
comparison mageck_pathway.xml @ 0:8fda298bf9c7 draft
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/mageck commit 3a58259cd0f035510963d470ea8ddbe551aea058-dirty
| author | iuc |
|---|---|
| date | Wed, 14 Feb 2018 06:09:17 -0500 |
| parents | |
| children | 5ddd7e2ace85 |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:8fda298bf9c7 |
|---|---|
| 1 <?xml version="1.0"?> | |
| 2 <tool id="mageck_pathway" name="MAGeCK pathway" version="@VERSION@" > | |
| 3 <description>- given a ranked gene list, test whether one pathway is enriched</description> | |
| 4 <macros> | |
| 5 <import>mageck_macros.xml</import> | |
| 6 </macros> | |
| 7 <expand macro="requirements" /> | |
| 8 <expand macro="version" /> | |
| 9 <command detect_errors="exit_code"><![CDATA[ | |
| 10 | |
| 11 mageck pathway | |
| 12 | |
| 13 --gene-ranking '$gene_ranking' | |
| 14 --gmt-file '$gmt_file' | |
| 15 -n sample1 | |
| 16 | |
| 17 #if $adv.single_ranking: | |
| 18 --single-ranking | |
| 19 #end if | |
| 20 --method $adv.method | |
| 21 --sort-criteria $adv.sort_criteria | |
| 22 --ranking-column $adv.ranking_column | |
| 23 --ranking-column-2 $adv.ranking_column2 | |
| 24 --pathway-alpha $adv.pathway_alpha | |
| 25 --permutation $adv.permutation | |
| 26 | |
| 27 ]]></command> | |
| 28 <inputs> | |
| 29 <param name="gene_ranking" argument="--gene-ranking" type="data" format="tabular" label="Gene Ranking file" help="The gene ranking file generated by the gene test step. Only one enrichment comparison will be performed." /> | |
| 30 <param name="gmt_file" argument="--gmt-file" type="data" format="tabular" label="Pathway GMT file" help="The pathway file in GMT format. See Help below for more information" /> | |
| 31 | |
| 32 <section name="adv" title="Advanced Options"> | |
| 33 <param name="single_ranking" argument="--single-ranking" type="boolean" truevalue="--single-ranking" falsevalue="" checked="false" optional="true" | |
| 34 label="Single ranking file" | |
| 35 help="The provided file is a (single) gene ranking file, either positive or negative selection. Only one enrichment comparison will be performed. Default: No" /> | |
| 36 <param name="method" argument="--method" type="select" label="Method for testing pathway enrichment" > | |
| 37 <option value="gsea" selected="True">GSEA</option> | |
| 38 <option value="rra">RRA</option> | |
| 39 </param> | |
| 40 <expand macro="sort_criteria" /> | |
| 41 <param name="ranking_column" argument="--ranking-column" type="data_column" data_ref="gene_ranking" value="2" optional="true" | |
| 42 label="Gene Summary file column" help="Column number or label in gene summary file for gene ranking; can be either an integer of column number, or a string of column label. Default: 2 (the 3rd column)" /> | |
| 43 <param name="ranking_column2" argument="--ranking-column-2" type="data_column" data_ref="gene_ranking" value="8" optional="true" | |
| 44 label="Gene Summary file column" help="Column number or label in gene summary file for gene ranking; can be either an integer of column number, or a string of column label. This option is used to determine the column for positive selections and is disabled if --single-ranking is specified. Default: 8 (the 9th column)" /> | |
| 45 <param name="pathway_alpha" argument="--pathway-alpha" type="float" min="0" value="0.25" optional="true" | |
| 46 label="Alpha value for RRA pathway enrichment" help="The default alpha value for RRA pathway enrichment. Default: 0.25" /> | |
| 47 <param argument="--permutation" type="integer" min="0" value="1000" optional="true" label="Permutation number for GSEA" help="Default: 1000" /> | |
| 48 <param name="out_log" type="boolean" truevalue="True" falsevalue="" checked="false" | |
| 49 label="Output logfile" help="This file includes the logging information during the execution. Default: No" /> | |
| 50 </section> | |
| 51 | |
| 52 </inputs> | |
| 53 | |
| 54 <outputs> | |
| 55 <data name="pathway_summary" format="tabular" from_work_dir="*.pathway_summary.txt" label="${tool.name} on ${on_string}: Pathway Summary" /> | |
| 56 <data name="log" format="tabular" from_work_dir="*.log" label="${tool.name} on ${on_string}: Log" > | |
| 57 <filter>adv['out_log'] is True</filter> | |
| 58 </data> | |
| 59 </outputs> | |
| 60 <tests> | |
| 61 <test><!-- Ensure MAGeCK's demo1 test works --> | |
| 62 <param name="gene_ranking" ftype="tabular" value="out.test.gene_summary.txt" /> | |
| 63 <param name="gmt_file" ftype="tabular" value="in.mageckQC.gmt" /> | |
| 64 <param name="ranking_column" value="2" /> | |
| 65 <param name="out_log" value="True"/> | |
| 66 <output name="pathway_summary" value="out.pathway.pathway_summary.txt" /> | |
| 67 <output name="log" value="out.pathway.log.txt" compare="sim_size" /> | |
| 68 </test> | |
| 69 </tests> | |
| 70 | |
| 71 <help><![CDATA[ | |
| 72 .. class:: infomark | |
| 73 | |
| 74 **What it does** | |
| 75 | |
| 76 MAGeCK pathway can also invoke robust ranking aggregation (RRA) to test if a pathway is enriched in one particular gene ranking, see **More Information** below. | |
| 77 | |
| 78 ----- | |
| 79 | |
| 80 **Inputs** | |
| 81 | |
| 82 **Gene Ranking files** | |
| 83 | |
| 84 A gene ranking file is required as input and can be produced using **mageck test**. An example of the gene ranking file (gene summary file) is as follows: | |
| 85 | |
| 86 ======= ======= ============= =============== =========== ============ ================= =========== ============= =============== =========== ============ ================= =========== | |
| 87 **id** **num** **neg|score** **neg|p-value** **neg|fdr** **neg|rank** **neg|goodsgrna** **neg|lfc** **pos|score** **pos|p-value** **pos|fdr** **pos|rank** **pos|goodsgrna** **pos|lfc** | |
| 88 ------- ------- ------------- --------------- ----------- ------------ ----------------- ----------- ------------- --------------- ----------- ------------ ----------------- ----------- | |
| 89 ESPL1 12 6.4327e-10 7.558e-06 7.9e-05 1 -2.35 11 0.99725 0.99981 0.999992 615 0 -0.07 | |
| 90 RPL18 12 6.4671e-10 7.558e-06 7.9e-05 2 -2.12 11 0.99799 0.99989 0.999992 620 0 -0.32 | |
| 91 CDK1 12 2.6439e-09 7.558e-06 7.9e-05 3 -1.93 12 1.0 0.99999 0.999992 655 0 -0.12 | |
| 92 ======= ======= ============= =============== =========== ============ ================= =========== ============= =============== =========== ============ ================= =========== | |
| 93 | |
| 94 | |
| 95 **Pathway file** | |
| 96 | |
| 97 MAGeCK pathway also requires a pathway file in GMT format. The GMT (Gene Matrix Transposed) file format is a tab delimited file format that describes gene sets and is consistent with the `GMT file in Gene Set Enrichment Analysis (GSEA)`_. In the GMT format, each row represents a gene set, with the first column containing the gene set name, and the second column containing a description for the gene set, followed by the names or ids of the genes in the gene set. You can download different GMT pathway files directly from the `GSEA MSigDB database`_. An example of the GMT format is as follows: | |
| 98 | |
| 99 ============= ============================================================= ======================= | |
| 100 Gene Set Name Description Genes | |
| 101 ------------- ------------------------------------------------------------- ----------------------- | |
| 102 KEGG_RIBOSOME http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_RIBOSOME RPL35 RPL23 RPL3... | |
| 103 ============= ============================================================= ======================= | |
| 104 | |
| 105 ----- | |
| 106 | |
| 107 **Outputs** | |
| 108 | |
| 109 **Pathway summary file** | |
| 110 | |
| 111 An example of the pathway summary output file is as follows: | |
| 112 | |
| 113 ============= ======= ============= =========== =============== =========== ============ ================ ============= ============= =========== =============== =========== ============ ================ =========== | |
| 114 **id** **num** **neg|score** **neg|rra** **neg|p-value** **neg|fdr** **neg|rank** **neg|goodgene** **neg|lfc** **pos|score** **pos|rra** **pos|p-value** **pos|fdr** **pos|rank** **pos|goodgene** **pos|lfc** | |
| 115 ------------- ------- ------------- ----------- --------------- ----------- ------------ ---------------- ------------- ------------- ----------- --------------- ----------- ------------ ---------------- ----------- | |
| 116 KEGG_RIBOSOME 88 1 0 0 0 1 0 0 1 0 0 0 1 00 | |
| 117 ============= ======= ============= =========== =============== =========== ============ ================ ============= ============= =========== =============== =========== ============ ================ =========== | |
| 118 | |
| 119 The contents of each column is as follows: | |
| 120 | |
| 121 * **id** Gene ID | |
| 122 * **num** The number of targeting sgRNAs for each gene | |
| 123 * **neg|score** The RRA lo value of this gene in negative selection | |
| 124 * **neg|p-value** The raw p-value (using permutation) of this gene in negative selection | |
| 125 * **neg|fdr** The false discovery rate of this gene in negative selection | |
| 126 * **neg|rank** The ranking of this gene in negative selection | |
| 127 * **neg|goodsgrna** The number of "good" sgRNAs, i.e., sgRNAs whose ranking is below the alpha cutoff (determined by the --gene-test-fdr-threshold option), in negative selection. | |
| 128 * **neg|lfc** The log fold change of this gene in negative selection | |
| 129 * **pos|score** The number of targeting sgRNAs for each gene in positive selection (usually the same as num.neg) | |
| 130 * **pos|score** The RRA lo value of this gene in negative selection | |
| 131 * **pos|p-value** The raw p-value of this gene in positive selection | |
| 132 * **pos|fdr** The false discovery rate of this gene in positive selection | |
| 133 * **pos|rank** The ranking of this gene in positive selection | |
| 134 * **pos|goodsgrna** The number of "good" sgRNAs, i.e., sgRNAs whose ranking is below the alpha cutoff (determined by the --gene-test-fdr-threshold option), in positive selection. | |
| 135 * **pos|lfc** The log fold change of this gene in positive selection | |
| 136 | |
| 137 Genes are ranked by the p.neg field (by default). If you need a ranking by the p.pos, you can use the --sort-criteria option. | |
| 138 | |
| 139 ----- | |
| 140 | |
| 141 **More Information** | |
| 142 | |
| 143 **Overview of the MAGeCK algorithm** | |
| 144 | |
| 145 Briefly, read counts from different samples are first median-normalized to adjust for the effect of library sizes and read count distributions. Then the variance of read counts is estimated by sharing information across features, and a negative binomial (NB) model is used to test whether sgRNA abundance differs significantly between treatments and controls. This approach is similar to those used for differential RNA-Seq analysis. We rank sgRNAs based on P-values calculated from the NB model, and use a modified robust ranking aggregation (RRA) algorithm named α-RRA to identify positively or negatively selected genes. More specifically, α-RRA assumes that if a gene has no effect on selection, then sgRNAs targeting this gene should be uniformly distributed across the ranked list of all the sgRNAs. α-RRA ranks genes by comparing the skew in rankings to the uniform null model, and prioritizes genes whose sgRNA rankings are consistently higher than expected. α-RRA calculates the statistical significance of the skew by permutation, and a detailed description of the algorithm is presented in the Materials and methods section of the `MAGeCK paper`_. Finally, MAGeCK reports positively and negatively selected pathways by applying α-RRA to the rankings of genes in a pathway. | |
| 146 | |
| 147 For more information on using MAGeCK, see the `MAGeCK website here`_. | |
| 148 | |
| 149 .. _`GMT file in Gene Set Enrichment Analysis (GSEA)`: http://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29 | |
| 150 .. _`GSEA MSigDB database`: http://software.broadinstitute.org/gsea/login.jsp | |
| 151 .. _`MAGeCK paper`: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4 | |
| 152 .. _`MAGeCK website here`: https://sourceforge.net/p/mageck/wiki/QA/#using-mageck | |
| 153 | |
| 154 ]]></help> | |
| 155 <expand macro="citations" /> | |
| 156 </tool> |
