Mercurial > repos > petrn > repeat_annotation_pipeline
annotate README.org @ 0:d14182506989 draft default tip
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
author | petrn |
---|---|
date | Tue, 15 Feb 2022 16:44:31 +0000 |
parents | |
children |
rev | line source |
---|---|
0
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
1 #+TITLE: RepeatExplorer based Assembly Annotation Pipeline |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
2 |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
3 * Tools in repository |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
4 ** Extract Repeat Library from RepeatExplorer Archive |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
5 (=extract_re_contigs.xml=) |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
6 |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
7 This toll will extract library of repeats based on RepeatExplorer2 analysis. Library is available as fasta file. Tool also filter out all the contig parts which has read depth and length below threshold. Parts of contigs with read depth below threshold are hardmasker. Contigs with full hardmasking are removed completelly |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
8 |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
9 ** Format repeat library |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
10 (=format_repeat_library.xml=) |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
11 |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
12 This tool append classification of repeats to library of repeats. Type of repeat is then part of sequence name in format: |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
13 |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
14 ~>sequence_id#classification_level1/classification_level2/...~ this enable to specify classification hierarchy |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
15 Classification of sequneces in library is provided using =CLUSTER_TABLE.csv= (part of RE2 output) |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
16 |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
17 This file can then be used for annotation of repeat in your assembly: |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
18 ** Repeat Annotation |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
19 (=repeat_annotate_custom.xml=) |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
20 |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
21 Internally annotation is performed using RepeatMasker search. Output from RepeatMasker is parsed to remove duplicated and overlaping annotations, Conflicts in annotations are resolved using hierarchical classification of repeats provided in custom database. |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
22 ** TODO Summarize Annotation |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
23 This tool will create summary table from GFF annotation. |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
24 * test data |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
25 |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
26 - ~test_assembly_1.fasta~ with ~test_db_1_satellites.fasta~ (include CLASS followed by double underscore - syntax 1) |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
27 - ~test_assembly_2.fasta~ with ~test_db_2_RE_repeats.fasta~ (include full hierarchical classification) |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
28 |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
29 |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
30 |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
31 #+begin_comment |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
32 create tarball for toolshed: |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
33 tar -czvf ../repeat_annotation_pipeline.tar.gz --exclude test_data \ |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
34 --exclude .git --exclude tmp --exclude hg_repository --exclude .idea --exclude .gitignore . |
d14182506989
"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
petrn
parents:
diff
changeset
|
35 #+end_comment |