# HG changeset patch
# User devteam
# Date 1380134328 14400
# Node ID 82e8c467e2ec2d598b6471f9bfd9de549376cee8
Uploaded
diff -r 000000000000 -r 82e8c467e2ec fasta_clipping_histogram.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/fasta_clipping_histogram.xml Wed Sep 25 14:38:48 2013 -0400
@@ -0,0 +1,112 @@
+
+ chart
+
+ fastx_toolkit
+
+ fasta_clipping_histogram.pl $input $outfile
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file.
+
+**TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results.
+
+-----
+
+**Output Examples**
+
+In the following library, most sequences are 24-mers to 27-mers.
+This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place).
+
+.. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_1.png
+
+
+In the following library, most sequences are 19,22 or 23-mers.
+This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place).
+
+.. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_2.png
+
+
+-----
+
+
+**Input Formats**
+
+This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so::
+
+ >sequence1
+ AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG
+ >sequence2
+ GTGTGTGTGGGAAGTTGACACAGTA
+ >sequence3
+ CCTTGAGATTAACGCTAATCAAGTAAAC
+
+
+If the sequences span over multiple lines::
+
+ >sequence1
+ CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG
+ TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG
+ aactggtctttacctTTAAGTTG
+
+Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences::
+
+ >sequence1
+ CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG
+
+
+-----
+
+
+
+**Multiplicity counts (a.k.a reads-count)**
+
+If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing).
+
+Example 1 - The following FASTA file *does not* have multiplicity counts::
+
+ >seq1
+ GGATCC
+ >seq2
+ GGTCATGGGTTTAAA
+ >seq3
+ GGGATATATCCCCACACACACACAC
+
+Each sequence is counts as one, to produce the following chart:
+
+.. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_3.png
+
+
+Example 2 - The following FASTA file have multiplicity counts::
+
+ >seq1-2
+ GGATCC
+ >seq2-10
+ GGTCATGGGTTTAAA
+ >seq3-3
+ GGGATATATCCCCACACACACACAC
+
+The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart:
+
+.. image:: ${static_path}/fastx_icons/fasta_clipping_histogram_4.png
+
+Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts.
+
+------
+
+This tool is based on `FASTX-toolkit`__ by Assaf Gordon.
+
+ .. __: http://hannonlab.cshl.edu/fastx_toolkit/
+
+
+
+
diff -r 000000000000 -r 82e8c467e2ec tool_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_dependencies.xml Wed Sep 25 14:38:48 2013 -0400
@@ -0,0 +1,6 @@
+
+
+
+
+
+