| 
9
 | 
     1 <tool name="FastQC: Comprehensive QC" id="fastqc" version="0.53">
 | 
| 
 | 
     2   <description>reporting for short read sequence</description>
 | 
| 
 | 
     3   <command interpreter="python">
 | 
| 
 | 
     4     rgFastQC.py -i "$input_file" -d "$html_file.files_path" -o "$html_file" -n "$out_prefix" -f "$input_file.ext" -j "$input_file.name" 
 | 
| 
 | 
     5 #if $contaminants.dataset and str($contaminants) > ''
 | 
| 
 | 
     6 -c "$contaminants"
 | 
| 
 | 
     7 #end if
 | 
| 
 | 
     8 -e fastqc
 | 
| 
 | 
     9   </command>
 | 
| 
 | 
    10   <requirements>
 | 
| 
 | 
    11     <requirement type="package" version="0.10.1">fastqc_dist_0_10_1_dependency</requirement>
 | 
| 
 | 
    12   </requirements>
 | 
| 
 | 
    13   <inputs>
 | 
| 
 | 
    14     <param format="fastqsanger,fastq,bam,sam" name="input_file" type="data" label="Short read data from your current history" />
 | 
| 
 | 
    15     <param name="out_prefix" value="FastQC" type="text" label="Title for the output file - to remind you what the job was for" size="80"
 | 
| 
 | 
    16       help="Letters and numbers only please - other characters will be removed">
 | 
| 
 | 
    17     <sanitizer invalid_char="">
 | 
| 
 | 
    18         <valid initial="string.letters,string.digits"/>
 | 
| 
 | 
    19     </sanitizer>
 | 
| 
 | 
    20     </param>
 | 
| 
 | 
    21     <param name="contaminants" type="data" format="tabular" optional="true" label="Contaminant list" 
 | 
| 
 | 
    22            help="tab delimited file with 2 columns: name and sequence.  For example: Illumina Small RNA RT Primer	CAAGCAGAAGACGGCATACGA"/>
 | 
| 
 | 
    23   </inputs>
 | 
| 
 | 
    24   <outputs>
 | 
| 
 | 
    25     <data format="html" name="html_file"  label="${out_prefix}_${input_file.name}.html" />
 | 
| 
 | 
    26   </outputs>
 | 
| 
 | 
    27   <tests>
 | 
| 
 | 
    28     <test>
 | 
| 
 | 
    29       <param name="input_file" value="1000gsample.fastq" />
 | 
| 
 | 
    30       <param name="out_prefix" value="fastqc_out" />
 | 
| 
 | 
    31       <param name="contaminants" value="fastqc_contaminants.txt" ftype="tabular" />
 | 
| 
 | 
    32       <output name="html_file" file="fastqc_report.html" ftype="html" lines_diff="100"/>
 | 
| 
 | 
    33     </test>
 | 
| 
 | 
    34   </tests>
 | 
| 
 | 
    35   <help>
 | 
| 
 | 
    36 
 | 
| 
 | 
    37 .. class:: infomark
 | 
| 
 | 
    38 
 | 
| 
 | 
    39 **Purpose**
 | 
| 
 | 
    40 
 | 
| 
 | 
    41 FastQC aims to provide a simple way to do some quality control checks on raw
 | 
| 
 | 
    42 sequence data coming from high throughput sequencing pipelines. 
 | 
| 
 | 
    43 It provides a modular set of analyses which you can use to give a quick
 | 
| 
 | 
    44 impression of whether your data has any problems of 
 | 
| 
 | 
    45 which you should be aware before doing any further analysis.
 | 
| 
 | 
    46 
 | 
| 
 | 
    47 The main functions of FastQC are:
 | 
| 
 | 
    48 
 | 
| 
 | 
    49 - Import of data from BAM, SAM or FastQ files (any variant)
 | 
| 
 | 
    50 - Providing a quick overview to tell you in which areas there may be problems
 | 
| 
 | 
    51 - Summary graphs and tables to quickly assess your data
 | 
| 
 | 
    52 - Export of results to an HTML based permanent report
 | 
| 
 | 
    53 - Offline operation to allow automated generation of reports without running the interactive application
 | 
| 
 | 
    54 
 | 
| 
 | 
    55 
 | 
| 
 | 
    56 -----
 | 
| 
 | 
    57 
 | 
| 
 | 
    58 
 | 
| 
 | 
    59 .. class:: infomark
 | 
| 
 | 
    60 
 | 
| 
 | 
    61 **FastQC**
 | 
| 
 | 
    62 
 | 
| 
 | 
    63 This is a Galaxy wrapper. It merely exposes the external package FastQC_ which is documented at FastQC_
 | 
| 
 | 
    64 Kindly acknowledge it as well as this tool if you use it.
 | 
| 
 | 
    65 FastQC incorporates the Picard-tools_ libraries for sam/bam processing.
 | 
| 
 | 
    66 
 | 
| 
 | 
    67 The contaminants file parameter was borrowed from the independently developed
 | 
| 
 | 
    68 fastqcwrapper contributed to the Galaxy Community Tool Shed by J. Johnson.
 | 
| 
 | 
    69 
 | 
| 
 | 
    70 -----
 | 
| 
 | 
    71 
 | 
| 
 | 
    72 .. class:: infomark
 | 
| 
 | 
    73 
 | 
| 
 | 
    74 **Inputs and outputs**
 | 
| 
 | 
    75 
 | 
| 
 | 
    76 FastQC_ is the best place to look for documentation - it's very good. 
 | 
| 
 | 
    77 A summary follows below for those in a tearing hurry.
 | 
| 
 | 
    78 
 | 
| 
 | 
    79 This wrapper will accept a Galaxy fastq, sam or bam as the input read file to check.
 | 
| 
 | 
    80 It will also take an optional file containing a list of contaminants information, in the form of
 | 
| 
 | 
    81 a tab-delimited file with 2 columns, name and sequence.
 | 
| 
 | 
    82 
 | 
| 
 | 
    83 The tool produces a single HTML output file that contains all of the results, including the following:
 | 
| 
 | 
    84 
 | 
| 
 | 
    85 - Basic Statistics
 | 
| 
 | 
    86 - Per base sequence quality
 | 
| 
 | 
    87 - Per sequence quality scores
 | 
| 
 | 
    88 - Per base sequence content
 | 
| 
 | 
    89 - Per base GC content
 | 
| 
 | 
    90 - Per sequence GC content
 | 
| 
 | 
    91 - Per base N content
 | 
| 
 | 
    92 - Sequence Length Distribution
 | 
| 
 | 
    93 - Sequence Duplication Levels
 | 
| 
 | 
    94 - Overrepresented sequences
 | 
| 
 | 
    95 - Kmer Content
 | 
| 
 | 
    96 
 | 
| 
 | 
    97 All except Basic Statistics and Overrepresented sequences are plots.
 | 
| 
 | 
    98  .. _FastQC: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
 | 
| 
 | 
    99  .. _Picard-tools: http://picard.sourceforge.net/index.shtml
 | 
| 
 | 
   100 
 | 
| 
 | 
   101 </help>
 | 
| 
 | 
   102 </tool>
 |