| 0 | 1 <?xml version="1.0" encoding="utf-8"?> | 
|  | 2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" | 
|  | 3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | 
|  | 4 <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> | 
|  | 5 <head> | 
|  | 6 <!-- 2019-12-09 Po 07:55 --> | 
|  | 7 <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> | 
|  | 8 <meta name="viewport" content="width=device-width, initial-scale=1" /> | 
|  | 9 <title>RepeatExplorer documentation</title> | 
|  | 10 <meta name="generator" content="Org mode" /> | 
|  | 11 <meta name="author" content="petr" /> | 
|  | 12 <style type="text/css"> | 
|  | 13  <!--/*--><![CDATA[/*><!--*/ | 
|  | 14   .title  { text-align: center; | 
|  | 15              margin-bottom: .2em; } | 
|  | 16   .subtitle { text-align: center; | 
|  | 17               font-size: medium; | 
|  | 18               font-weight: bold; | 
|  | 19               margin-top:0; } | 
|  | 20   .todo   { font-family: monospace; color: red; } | 
|  | 21   .done   { font-family: monospace; color: green; } | 
|  | 22   .priority { font-family: monospace; color: orange; } | 
|  | 23   .tag    { background-color: #eee; font-family: monospace; | 
|  | 24             padding: 2px; font-size: 80%; font-weight: normal; } | 
|  | 25   .timestamp { color: #bebebe; } | 
|  | 26   .timestamp-kwd { color: #5f9ea0; } | 
|  | 27   .org-right  { margin-left: auto; margin-right: 0px;  text-align: right; } | 
|  | 28   .org-left   { margin-left: 0px;  margin-right: auto; text-align: left; } | 
|  | 29   .org-center { margin-left: auto; margin-right: auto; text-align: center; } | 
|  | 30   .underline { text-decoration: underline; } | 
|  | 31   #postamble p, #preamble p { font-size: 90%; margin: .2em; } | 
|  | 32   p.verse { margin-left: 3%; } | 
|  | 33   pre { | 
|  | 34     border: 1px solid #ccc; | 
|  | 35     box-shadow: 3px 3px 3px #eee; | 
|  | 36     padding: 8pt; | 
|  | 37     font-family: monospace; | 
|  | 38     overflow: auto; | 
|  | 39     margin: 1.2em; | 
|  | 40   } | 
|  | 41   pre.src { | 
|  | 42     position: relative; | 
|  | 43     overflow: visible; | 
|  | 44     padding-top: 1.2em; | 
|  | 45   } | 
|  | 46   pre.src:before { | 
|  | 47     display: none; | 
|  | 48     position: absolute; | 
|  | 49     background-color: white; | 
|  | 50     top: -10px; | 
|  | 51     right: 10px; | 
|  | 52     padding: 3px; | 
|  | 53     border: 1px solid black; | 
|  | 54   } | 
|  | 55   pre.src:hover:before { display: inline;} | 
|  | 56   /* Languages per Org manual */ | 
|  | 57   pre.src-asymptote:before { content: 'Asymptote'; } | 
|  | 58   pre.src-awk:before { content: 'Awk'; } | 
|  | 59   pre.src-C:before { content: 'C'; } | 
|  | 60   /* pre.src-C++ doesn't work in CSS */ | 
|  | 61   pre.src-clojure:before { content: 'Clojure'; } | 
|  | 62   pre.src-css:before { content: 'CSS'; } | 
|  | 63   pre.src-D:before { content: 'D'; } | 
|  | 64   pre.src-ditaa:before { content: 'ditaa'; } | 
|  | 65   pre.src-dot:before { content: 'Graphviz'; } | 
|  | 66   pre.src-calc:before { content: 'Emacs Calc'; } | 
|  | 67   pre.src-emacs-lisp:before { content: 'Emacs Lisp'; } | 
|  | 68   pre.src-fortran:before { content: 'Fortran'; } | 
|  | 69   pre.src-gnuplot:before { content: 'gnuplot'; } | 
|  | 70   pre.src-haskell:before { content: 'Haskell'; } | 
|  | 71   pre.src-hledger:before { content: 'hledger'; } | 
|  | 72   pre.src-java:before { content: 'Java'; } | 
|  | 73   pre.src-js:before { content: 'Javascript'; } | 
|  | 74   pre.src-latex:before { content: 'LaTeX'; } | 
|  | 75   pre.src-ledger:before { content: 'Ledger'; } | 
|  | 76   pre.src-lisp:before { content: 'Lisp'; } | 
|  | 77   pre.src-lilypond:before { content: 'Lilypond'; } | 
|  | 78   pre.src-lua:before { content: 'Lua'; } | 
|  | 79   pre.src-matlab:before { content: 'MATLAB'; } | 
|  | 80   pre.src-mscgen:before { content: 'Mscgen'; } | 
|  | 81   pre.src-ocaml:before { content: 'Objective Caml'; } | 
|  | 82   pre.src-octave:before { content: 'Octave'; } | 
|  | 83   pre.src-org:before { content: 'Org mode'; } | 
|  | 84   pre.src-oz:before { content: 'OZ'; } | 
|  | 85   pre.src-plantuml:before { content: 'Plantuml'; } | 
|  | 86   pre.src-processing:before { content: 'Processing.js'; } | 
|  | 87   pre.src-python:before { content: 'Python'; } | 
|  | 88   pre.src-R:before { content: 'R'; } | 
|  | 89   pre.src-ruby:before { content: 'Ruby'; } | 
|  | 90   pre.src-sass:before { content: 'Sass'; } | 
|  | 91   pre.src-scheme:before { content: 'Scheme'; } | 
|  | 92   pre.src-screen:before { content: 'Gnu Screen'; } | 
|  | 93   pre.src-sed:before { content: 'Sed'; } | 
|  | 94   pre.src-sh:before { content: 'shell'; } | 
|  | 95   pre.src-sql:before { content: 'SQL'; } | 
|  | 96   pre.src-sqlite:before { content: 'SQLite'; } | 
|  | 97   /* additional languages in org.el's org-babel-load-languages alist */ | 
|  | 98   pre.src-forth:before { content: 'Forth'; } | 
|  | 99   pre.src-io:before { content: 'IO'; } | 
|  | 100   pre.src-J:before { content: 'J'; } | 
|  | 101   pre.src-makefile:before { content: 'Makefile'; } | 
|  | 102   pre.src-maxima:before { content: 'Maxima'; } | 
|  | 103   pre.src-perl:before { content: 'Perl'; } | 
|  | 104   pre.src-picolisp:before { content: 'Pico Lisp'; } | 
|  | 105   pre.src-scala:before { content: 'Scala'; } | 
|  | 106   pre.src-shell:before { content: 'Shell Script'; } | 
|  | 107   pre.src-ebnf2ps:before { content: 'ebfn2ps'; } | 
|  | 108   /* additional language identifiers per "defun org-babel-execute" | 
|  | 109        in ob-*.el */ | 
|  | 110   pre.src-cpp:before  { content: 'C++'; } | 
|  | 111   pre.src-abc:before  { content: 'ABC'; } | 
|  | 112   pre.src-coq:before  { content: 'Coq'; } | 
|  | 113   pre.src-groovy:before  { content: 'Groovy'; } | 
|  | 114   /* additional language identifiers from org-babel-shell-names in | 
|  | 115      ob-shell.el: ob-shell is the only babel language using a lambda to put | 
|  | 116      the execution function name together. */ | 
|  | 117   pre.src-bash:before  { content: 'bash'; } | 
|  | 118   pre.src-csh:before  { content: 'csh'; } | 
|  | 119   pre.src-ash:before  { content: 'ash'; } | 
|  | 120   pre.src-dash:before  { content: 'dash'; } | 
|  | 121   pre.src-ksh:before  { content: 'ksh'; } | 
|  | 122   pre.src-mksh:before  { content: 'mksh'; } | 
|  | 123   pre.src-posh:before  { content: 'posh'; } | 
|  | 124   /* Additional Emacs modes also supported by the LaTeX listings package */ | 
|  | 125   pre.src-ada:before { content: 'Ada'; } | 
|  | 126   pre.src-asm:before { content: 'Assembler'; } | 
|  | 127   pre.src-caml:before { content: 'Caml'; } | 
|  | 128   pre.src-delphi:before { content: 'Delphi'; } | 
|  | 129   pre.src-html:before { content: 'HTML'; } | 
|  | 130   pre.src-idl:before { content: 'IDL'; } | 
|  | 131   pre.src-mercury:before { content: 'Mercury'; } | 
|  | 132   pre.src-metapost:before { content: 'MetaPost'; } | 
|  | 133   pre.src-modula-2:before { content: 'Modula-2'; } | 
|  | 134   pre.src-pascal:before { content: 'Pascal'; } | 
|  | 135   pre.src-ps:before { content: 'PostScript'; } | 
|  | 136   pre.src-prolog:before { content: 'Prolog'; } | 
|  | 137   pre.src-simula:before { content: 'Simula'; } | 
|  | 138   pre.src-tcl:before { content: 'tcl'; } | 
|  | 139   pre.src-tex:before { content: 'TeX'; } | 
|  | 140   pre.src-plain-tex:before { content: 'Plain TeX'; } | 
|  | 141   pre.src-verilog:before { content: 'Verilog'; } | 
|  | 142   pre.src-vhdl:before { content: 'VHDL'; } | 
|  | 143   pre.src-xml:before { content: 'XML'; } | 
|  | 144   pre.src-nxml:before { content: 'XML'; } | 
|  | 145   /* add a generic configuration mode; LaTeX export needs an additional | 
|  | 146      (add-to-list 'org-latex-listings-langs '(conf " ")) in .emacs */ | 
|  | 147   pre.src-conf:before { content: 'Configuration File'; } | 
|  | 148 | 
|  | 149   table { border-collapse:collapse; } | 
|  | 150   caption.t-above { caption-side: top; } | 
|  | 151   caption.t-bottom { caption-side: bottom; } | 
|  | 152   td, th { vertical-align:top;  } | 
|  | 153   th.org-right  { text-align: center;  } | 
|  | 154   th.org-left   { text-align: center;   } | 
|  | 155   th.org-center { text-align: center; } | 
|  | 156   td.org-right  { text-align: right;  } | 
|  | 157   td.org-left   { text-align: left;   } | 
|  | 158   td.org-center { text-align: center; } | 
|  | 159   dt { font-weight: bold; } | 
|  | 160   .footpara { display: inline; } | 
|  | 161   .footdef  { margin-bottom: 1em; } | 
|  | 162   .figure { padding: 1em; } | 
|  | 163   .figure p { text-align: center; } | 
|  | 164   .equation-container { | 
|  | 165     display: table; | 
|  | 166     text-align: center; | 
|  | 167     width: 100%; | 
|  | 168   } | 
|  | 169   .equation { | 
|  | 170     vertical-align: middle; | 
|  | 171   } | 
|  | 172   .equation-label { | 
|  | 173     display: table-cell; | 
|  | 174     text-align: right; | 
|  | 175     vertical-align: middle; | 
|  | 176   } | 
|  | 177   .inlinetask { | 
|  | 178     padding: 10px; | 
|  | 179     border: 2px solid gray; | 
|  | 180     margin: 10px; | 
|  | 181     background: #ffffcc; | 
|  | 182   } | 
|  | 183   #org-div-home-and-up | 
|  | 184    { text-align: right; font-size: 70%; white-space: nowrap; } | 
|  | 185   textarea { overflow-x: auto; } | 
|  | 186   .linenr { font-size: smaller } | 
|  | 187   .code-highlighted { background-color: #ffff00; } | 
|  | 188   .org-info-js_info-navigation { border-style: none; } | 
|  | 189   #org-info-js_console-label | 
|  | 190     { font-size: 10px; font-weight: bold; white-space: nowrap; } | 
|  | 191   .org-info-js_search-highlight | 
|  | 192     { background-color: #ffff00; color: #000000; font-weight: bold; } | 
|  | 193   .org-svg { width: 90%; } | 
|  | 194   /*]]>*/--> | 
|  | 195 </style> | 
|  | 196 <link rel="stylesheet" type="text/css" href="style1.css" /> | 
|  | 197 <script type="text/javascript"> | 
|  | 198 /* | 
|  | 199 @licstart  The following is the entire license notice for the | 
|  | 200 JavaScript code in this tag. | 
|  | 201 | 
|  | 202 Copyright (C) 2012-2019 Free Software Foundation, Inc. | 
|  | 203 | 
|  | 204 The JavaScript code in this tag is free software: you can | 
|  | 205 redistribute it and/or modify it under the terms of the GNU | 
|  | 206 General Public License (GNU GPL) as published by the Free Software | 
|  | 207 Foundation, either version 3 of the License, or (at your option) | 
|  | 208 any later version.  The code is distributed WITHOUT ANY WARRANTY; | 
|  | 209 without even the implied warranty of MERCHANTABILITY or FITNESS | 
|  | 210 FOR A PARTICULAR PURPOSE.  See the GNU GPL for more details. | 
|  | 211 | 
|  | 212 As additional permission under GNU GPL version 3 section 7, you | 
|  | 213 may distribute non-source (e.g., minimized or compacted) forms of | 
|  | 214 that code without the copy of the GNU GPL normally required by | 
|  | 215 section 4, provided you include this license notice and a URL | 
|  | 216 through which recipients can access the Corresponding Source. | 
|  | 217 | 
|  | 218 | 
|  | 219 @licend  The above is the entire license notice | 
|  | 220 for the JavaScript code in this tag. | 
|  | 221 */ | 
|  | 222 <!--/*--><![CDATA[/*><!--*/ | 
|  | 223  function CodeHighlightOn(elem, id) | 
|  | 224  { | 
|  | 225    var target = document.getElementById(id); | 
|  | 226    if(null != target) { | 
|  | 227      elem.cacheClassElem = elem.className; | 
|  | 228      elem.cacheClassTarget = target.className; | 
|  | 229      target.className = "code-highlighted"; | 
|  | 230      elem.className   = "code-highlighted"; | 
|  | 231    } | 
|  | 232  } | 
|  | 233  function CodeHighlightOff(elem, id) | 
|  | 234  { | 
|  | 235    var target = document.getElementById(id); | 
|  | 236    if(elem.cacheClassElem) | 
|  | 237      elem.className = elem.cacheClassElem; | 
|  | 238    if(elem.cacheClassTarget) | 
|  | 239      target.className = elem.cacheClassTarget; | 
|  | 240  } | 
|  | 241 /*]]>*///--> | 
|  | 242 </script> | 
|  | 243 </head> | 
|  | 244 <body> | 
|  | 245 <div id="content"> | 
|  | 246 <h1 class="title">RepeatExplorer documentation</h1> | 
|  | 247 <h1 id="clust"> Cluster annotation table </h1> | 
|  | 248 | 
|  | 249 <dl class="org-dl"> | 
|  | 250 <dt>Cluster</dt><dd>cluster index, contain link to individual cluster report</dd> | 
|  | 251 <dt>Supercluster</dt><dd>Supercluster index, contains link inf individual supercluster report</dd> | 
|  | 252 <dt>Proportion<code>[%]</code></dt><dd>Proportion of the reads in the cluster with respect to the amount of number of analyzed sequence.</dd> | 
|  | 253 <dt>Proportions adjusted<code>[%]</code></dt><dd>Adjusted genome proportion can differ from unadjusted value if the Perform automatic filtering of abundant satellite repeats was on. Sequences belonging to high abundance satellites were partially removed from all-to-all comparison and clustering. This causes that the Genome proportion estimate for these satellite is underestimated. Adjusted Genome proportion provide corrected estimate of ‘real’ genomic proportion for particular satellite repeat.</dd> | 
|  | 254 <dt>Number of reads</dt><dd>number of reads in the cluster</dd> | 
|  | 255 <dt>Graph layout</dt><dd>Preview of graph based visualization of sequence reads cluster. More detailed graph layout can be foun in individual cluster reports</dd> | 
|  | 256 <dt>Similarity hits</dt><dd>summarize the proportion of reads in the clusters with similarity to REXdb or DNA reference databases. Only hits with proportion above 0.1% are shown</dd> | 
|  | 257 <dt>LTR detection </dt><dd>Show if the LTR with primer binding site was detected on contig assembly and what type of tRNA is used for priming.</dd> | 
|  | 258 <dt>Satellite probability</dt><dd>provide empirical probability that cluster represent satellite</dd> | 
|  | 259 <dt>TAREAN classification </dt><dd>TAREAN divides clusters into five categories described in box 9.</dd> | 
|  | 260 <dt>Consensus length</dt><dd>For clusters analyzed by TAREAN module, the best estimate of monomer length is shown.</dd> | 
|  | 261 <dt>Consensus</dt><dd>The best consensus estimate reconstructed by TAREAN module</dd> | 
|  | 262 <dt>Kmer analysis</dt><dd>if cluster was analyzed by TAREAN, this field contains the link to the detailed TAREAN kmer analysis (box 10)</dd> | 
|  | 263 <dt>Connected component index C,  Pair completeness index P, Kmer coverage</dt><dd>statistics reported by TAREAN module</dd> | 
|  | 264 <dt>|V| </dt><dd>Number of vertices of the graph</dd> | 
|  | 265 <dt>|E| </dt><dd>Number of edges of the graph</dd> | 
|  | 266 </dl> | 
|  | 267 | 
|  | 268 | 
|  | 269 <h1 id="superclust"> Supercluster annotation table </h1> | 
|  | 270 | 
|  | 271 <dl class="org-dl"> | 
|  | 272 <dt>Supercluster</dt><dd>supercluster index</dd> | 
|  | 273 <dt>Reads</dt><dd>number of reads in supercluster</dd> | 
|  | 274 <dt>Automatic classification</dt><dd>Result of automatic supercluster classification</dd> | 
|  | 275 <dt>Similarity hits</dt><dd>Number similarity hits against REXdb and DNA database are shown in the classification tree structure together with the number of reads assigned to putative satellite cluster  and information about detection of LTR/PBS. The parts of the tree without any evidences are pruned off.</dd> | 
|  | 276 <dt>TAREAN annotation</dt><dd>Clusters which are part of supercluster and classified by TAREAN as putative satellite are listed here</dd> | 
|  | 277 <dt>Clusters</dt><dd>hyperlinked list of clusters which are part of the superclusters.</dd> | 
|  | 278 </dl> | 
|  | 279 | 
|  | 280 <h1 id="tra"> Tandem repeat analysis </h1> | 
|  | 281 | 
|  | 282 <p> | 
|  | 283 TAREAN divides clusters into five categories with corresponding files in the | 
|  | 284 archive: | 
|  | 285 </p> | 
|  | 286 | 
|  | 287 <ul class="org-ul"> | 
|  | 288 <li>High confidence satellites with consensus sequences in file <code>TR_consensus_rank_1_.fasta</code></li> | 
|  | 289 <li>Low confidence satellites with consensus sequences in file <code>TR_consensus_rank_2_.fasta</code></li> | 
|  | 290 <li>Putative LTR element with consensus sequences in file <code>TR_consensus_rank_3_.fasta</code></li> | 
|  | 291 <li>rDNA with consensus in <code>TR_consensus_rank_4_.fasta</code></li> | 
|  | 292 <li>other clusters – these clusters are not reconstructed by TAREAN because no potential tandem like structure was found.</li> | 
|  | 293 </ul> | 
|  | 294 | 
|  | 295 <p> | 
|  | 296 Summary tables from TAREAN html report include following information: | 
|  | 297 </p> | 
|  | 298 | 
|  | 299 <dl class="org-dl"> | 
|  | 300 <dt>Cluster</dt><dd>cluster identifier</dd> | 
|  | 301 <dt>Proportion<code>[%]</code></dt><dd>(Number of sequences in cluster/Number of sequences in clustering) x 100%</dd> | 
|  | 302 <dt>Proportion adjusted<code>[%]</code></dt><dd></dd> | 
|  | 303 | 
|  | 304 <dt>Number of reads</dt><dd>Number of reads in the cluster</dd> | 
|  | 305 <dt>Satellite probability</dt><dd>Empirical probability estimate that cluster sequences are derived from satellite repeat. This estimate is based on analysis of  manually anotated and experimentaly validated satellite repeats</dd> | 
|  | 306 <dt>Consensus length</dt><dd></dd> | 
|  | 307 | 
|  | 308 <dt>Consensus</dt><dd>Consensus sequence is outcome of kmer-based analysis and represents the most probable satellite monomer sequence, other alternative consensus sequences are included in individual cluster reports</dd> | 
|  | 309 <dt>Graph layout</dt><dd>Graph-based visualization of similarities among sequence reads</dd> | 
|  | 310 <dt>Kmer analysis</dt><dd>hyperlink to Individual clusters TAREAN kmer report (fig X, box 10)</dd> | 
|  | 311 <dt>Connected component index C</dt><dd>Proportion of nodes of the graph which are part of the the largest strongly connected component</dd> | 
|  | 312 <dt>Pair completeness index P</dt><dd>Proportion of reads with available mate-pair within the same cluster</dd> | 
|  | 313 <dt>Kmer coverage</dt><dd>Sum of relative frequencies of all kmers used for consensus sequence reconstruction</dd> | 
|  | 314 <dt>|V|</dt><dd>Number of vertices of the graph</dd> | 
|  | 315 <dt>|E|</dt><dd>Number of edges of the graph</dd> | 
|  | 316 <dt>PBS score</dt><dd>Primer binding site detection score</dd> | 
|  | 317 <dt>Similarity hits</dt><dd>similarity hits based on the search using blastn/blastx against built-in databases of known  sequences. By default, this will contain similarity hits to built in database which include rDNA sequences, plastid and mitochondrial sequences. If TAREAN was run within RepeatExplorer2 pipeline, it will also contain information about similarity hist against REXdb database.</dd> | 
|  | 318 </dl> | 
|  | 319 | 
|  | 320 <p> | 
|  | 321 In individual clusters TAREAN report contain other variant of consensus | 
|  | 322 sequences sorted by kmer coverage score. For each consensus, corresponding | 
|  | 323 de-Bruijn graph representation and corresponding sequence logo is shown. | 
|  | 324 </p> | 
|  | 325 | 
|  | 326 <h1 id="kmer"> TAREAN k-mer analysis report </h1> | 
|  | 327 | 
|  | 328 <p> | 
|  | 329 TAREAN module generates kmer analysis report for each cluster assigned to a putative satellite, rDNA or a putative LTR category. Monomer sequences  of putative tandem repeats  are reconstructed using k-mer based method using the most frequent k-mers. Several k-mer lengths are evaluated and the best estimated of monomer consensus sequence are reported. Kmer analysis summary contain the following information: | 
|  | 330 </p> | 
|  | 331 <dl class="org-dl"> | 
|  | 332 <dt>k-mer length</dt><dd>length of the k-mer used for monomer reconstruction</dd> | 
|  | 333 <dt>Variant index</dt><dd>Each kmer of given length can yield multiple consensus variant. Variants are indexed</dd> | 
|  | 334 <dt>k-mer coverage score </dt><dd>is sum of proportions of all k-mer used for reconstruction of particular monomer. If the value is 1 then all kmers from corresponding cluster were used for reconstruction of monomer meaning that there is no variability. The more variable the monomer, the lower the k-mer coverage score.</dd> | 
|  | 335 <dt>Consensus length</dt><dd>length of estimated monomer</dd> | 
|  | 336 <dt>Consensus</dt><dd>consensus sequence shows the consensus sequence extracted from position probability matrix.</dd> | 
|  | 337 <dt>k-mer bases graph</dt><dd>the visualization of de-Bruijn graph. Each vertex corespond to single k-mer. Size of vertex is proportional to the kmer frequency. Path which was used to reconstruct monomer sequence is grey out.</dd> | 
|  | 338 <dt>Sequence logo </dt><dd>visualization of position probability matrices for corresponding consensus variant.</dd> | 
|  | 339 </dl> | 
|  | 340 </div> | 
|  | 341 </body> | 
|  | 342 </html> |