annotate README.org @ 0:d14182506989 draft default tip

"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
author petrn
date Tue, 15 Feb 2022 16:44:31 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
1 #+TITLE: RepeatExplorer based Assembly Annotation Pipeline
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
2
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
3 * Tools in repository
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
4 ** Extract Repeat Library from RepeatExplorer Archive
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
5 (=extract_re_contigs.xml=)
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
6
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
7 This toll will extract library of repeats based on RepeatExplorer2 analysis. Library is available as fasta file. Tool also filter out all the contig parts which has read depth and length below threshold. Parts of contigs with read depth below threshold are hardmasker. Contigs with full hardmasking are removed completelly
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
8
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
9 ** Format repeat library
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
10 (=format_repeat_library.xml=)
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
11
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
12 This tool append classification of repeats to library of repeats. Type of repeat is then part of sequence name in format:
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
13
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
14 ~>sequence_id#classification_level1/classification_level2/...~ this enable to specify classification hierarchy
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
15 Classification of sequneces in library is provided using =CLUSTER_TABLE.csv= (part of RE2 output)
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
16
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
17 This file can then be used for annotation of repeat in your assembly:
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
18 ** Repeat Annotation
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
19 (=repeat_annotate_custom.xml=)
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
20
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
21 Internally annotation is performed using RepeatMasker search. Output from RepeatMasker is parsed to remove duplicated and overlaping annotations, Conflicts in annotations are resolved using hierarchical classification of repeats provided in custom database.
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
22 ** TODO Summarize Annotation
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
23 This tool will create summary table from GFF annotation.
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
24 * test data
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
25
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
26 - ~test_assembly_1.fasta~ with ~test_db_1_satellites.fasta~ (include CLASS followed by double underscore - syntax 1)
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
27 - ~test_assembly_2.fasta~ with ~test_db_2_RE_repeats.fasta~ (include full hierarchical classification)
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
28
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
29
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
30
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
31 #+begin_comment
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
32 create tarball for toolshed:
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
33 tar -czvf ../repeat_annotation_pipeline.tar.gz --exclude test_data \
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
34 --exclude .git --exclude tmp --exclude hg_repository --exclude .idea --exclude .gitignore .
d14182506989 "planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff changeset
35 #+end_comment