diff README.org @ 0:d14182506989 draft default tip

"planemo upload commit d7966a292ed4209f4058e77ab8c0e49a67847b16-dirty"
author petrn
date Tue, 15 Feb 2022 16:44:31 +0000
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README.org	Tue Feb 15 16:44:31 2022 +0000
@@ -0,0 +1,35 @@
+#+TITLE: RepeatExplorer based Assembly Annotation Pipeline
+
+* Tools in repository
+**  Extract Repeat Library from RepeatExplorer Archive
+(=extract_re_contigs.xml=)
+
+This toll  will extract library of repeats  based on RepeatExplorer2 analysis. Library is available as fasta file. Tool also filter out all  the contig parts which has read depth and length below threshold. Parts of contigs with read depth below threshold are hardmasker. Contigs with full hardmasking are removed completelly
+
+** Format repeat library
+(=format_repeat_library.xml=)
+
+This tool append classification of repeats to library of repeats. Type of repeat is then part of sequence name in format:
+
+~>sequence_id#classification_level1/classification_level2/...~ this enable to specify classification hierarchy
+Classification of sequneces in library is provided using  =CLUSTER_TABLE.csv= (part of RE2 output)
+
+This file can then be used for annotation of repeat in your assembly:
+** Repeat Annotation
+(=repeat_annotate_custom.xml=)
+
+ Internally annotation is performed using RepeatMasker search. Output from RepeatMasker is parsed to remove duplicated and overlaping annotations, Conflicts in annotations are resolved using hierarchical classification of repeats provided in custom database. 
+** TODO Summarize Annotation
+This tool will create summary table from GFF annotation.
+* test data
+
+- ~test_assembly_1.fasta~ with ~test_db_1_satellites.fasta~ (include CLASS followed by double underscore - syntax 1)
+- ~test_assembly_2.fasta~ with ~test_db_2_RE_repeats.fasta~ (include full hierarchical classification)
+
+
+
+#+begin_comment
+create tarball for toolshed:
+tar -czvf ../repeat_annotation_pipeline.tar.gz --exclude test_data \
+--exclude .git  --exclude tmp  --exclude hg_repository --exclude .idea --exclude .gitignore .
+#+end_comment