Mercurial > repos > pimarin > recentrifuge

--- a/recentrifuge.xml	Wed Apr 06 13:52:48 2022 +0000
+++ b/recentrifuge.xml	Wed Apr 06 14:54:52 2022 +0000
@@ -365,130 +365,145 @@
       <output name="logfile" file="kraken_test/test3_tsv.log" lines_diff="20"/>
     </test>
   </tests>
-  <help>
-    <![CDATA[
-    =-= /home/pierre/anaconda3/envs/rcf/bin/rcf =-= v1.8.1 - Mar 2022 =-= by Jose Manuel Martí =-=
-  usage: rcf [-h] [-V] [-n PATH] [--format GENERIC_FORMAT]
-             (-f FILE | -g FILE | -l FILE | -r FILE | -k FILE) [-o FILE]
-             [-e OUTPUT_TYPE] [-p] [--nohtml] [-a | -c CONTROLS_NUMBER]
-             [-s SCORING] [-y NUMBER] [-m INT] [-x TAXID] [-i TAXID] [-z NUMBER]
-             [-w INT] [-u SUMMARY_BEHAVIOR] [-t] [--nokollapse] [-d] [--strain]
-             [--sequential]
-  Robust comparative analysis and contamination removal for metagenomics
-  options:
-    -h, --help            show this help message and exit
-    -V, --version         show program's version number and exit
-  input:
-    Define Recentrifuge input files and formats
-    -n PATH, --nodespath PATH
-                          path for the nodes information files (nodes.dmp and
-                          names.dmp from NCBI)
-    --format GENERIC_FORMAT
-                          format of the output files from a generic classifier
-                          included with the option -g; It is a string like
-                          "TYP:csv,TID:1,LEN:3,SCO:6,UNC:0" where valid file
-                          TYPes are csv/tsv/ssv, and the rest of fields indicate
-                          the number of column used (starting in 1) for the
-                          TaxIDs assigned, the LENgth of the read, the SCOre
-                          given to the assignment, and the taxid code used for
-                          UNClassified reads
-    -f FILE, --file FILE  Centrifuge output files; if a single directory is
-                          entered, every .out file inside will be taken as a
-                          different sample; multiple -f is available to include
-                          several Centrifuge samples
-    -g FILE, --generic FILE
-                          output file from a generic classifier; it requires the
-                          flag --format (see such option for details); multiple
-                          -g is available to include several generic samples
-    -l FILE, --lmat FILE  LMAT output dir or file prefix; if just "." is
-                          entered, every subdirectory under the current
-                          directory will be taken as a sample and scanned
-                          looking for LMAT output files; multiple -l is
-                          available to include several samples
-    -r FILE, --clark FILE
-                          CLARK full-mode output files; if a single directory is
-                          entered, every .csv file inside will be taken as a
-                          different sample; multiple -r is available to include
-                          several CLARK, CLARK-l, and CLARK-S full-mode samples
-    -k FILE, --kraken FILE
-                          Kraken output files; if a single directory is entered,
-                          every .krk file inside will be taken as a different
-                          sample; multiple -k is available to include several
-                          Kraken (version 1 or 2) samples
-  output:
-    Related to the Recentrifuge output files
-    -o FILE, --outprefix FILE
-                          output prefix; if not given, it will be inferred from
-                          input files; an HTML filename is still accepted for
-                          backwards compatibility with legacy --outhtml option
-    -e OUTPUT_TYPE, --extra OUTPUT_TYPE
-                          type of extra output to be generated, and can be one
-                          of ['FULL', 'CSV', 'MULTICSV', 'TSV', 'DYNOMICS']
-    -p, --pickle          pickle (serialize) statistics and data results in
-                          pandas DataFrames (format affected by selection of
-                          --extra)
-    --nohtml              suppress saving the HTML output file
-  tuning:
-    Coarse tuning of algorithm parameters
-    -a, --avoidcross      avoid cross analysis
-    -c CONTROLS_NUMBER, --controls CONTROLS_NUMBER
-                          this number of first samples will be treated as
-                          negative controls; default is no controls
-    -s SCORING, --scoring SCORING
-                          type of scoring to be applied, and can be one of
-                          ['SHEL', 'LENGTH', 'LOGLENGTH', 'NORMA', 'LMAT',
-                          'CLARK_C', 'CLARK_G', 'KRAKEN', 'GENERIC']
-    -y NUMBER, --minscore NUMBER
-                          minimum score/confidence of the classification of a
-                          read to pass the quality filter; all pass by default
-    -m INT, --mintaxa INT
-                          minimum taxa to avoid collapsing one level into the
-                          parent (if not specified a value will be automatically
-                          assigned)
-    -x TAXID, --exclude TAXID
-                          NCBI taxid code to exclude a taxon and all underneath
-                          (multiple -x is available to exclude several taxid)
-    -i TAXID, --include TAXID
-                          NCBI taxid code to include a taxon and all underneath
-                          (multiple -i is available to include several taxid);
-                          by default, all the taxa are considered for inclusion
-  fine tuning:
-    Fine tuning of algorithm parameters
-    -z NUMBER, --ctrlminscore NUMBER
-                          minimum score/confidence of the classification of a
-                          read in control samples to pass the quality filter; it
-                          defaults to "minscore"
-    -w INT, --ctrlmintaxa INT
-                          minimum taxa to avoid collapsing one level into the
-                          parent (if not specified a value will be automatically
-                          assigned)
-    -u SUMMARY_BEHAVIOR, --summary SUMMARY_BEHAVIOR
-                          choice for summary behaviour, and can be one of
-                          ['ADD', 'ONLY', 'AVOID']
-    -t, --takeoutroot     remove counts directly assigned to the "root" level
-    --nokollapse          show the "cellular organisms" taxon
-  advanced:
-    Advanced modes of running
-    -d, --debug           increase output verbosity and perform additional
-                          checks
-    --strain              set strain level instead of species as the resolution
-                          limit for the robust contamination removal algorithm;
-                          use with caution, this is an experimental feature
-    --sequential          deactivate parallel processing
-  rcf - Release 1.8.1 - Mar 2022
-      Copyright (C) 2017–2022, Jose Manuel Martí Martínez
-      This program is free software: you can redistribute it and/or modify
-      it under the terms of the GNU Affero General Public License as
-      published by the Free Software Foundation, either version 3 of the
-      License, or (at your option) any later version.
-      This program is distributed in the hope that it will be useful,
-      but WITHOUT ANY WARRANTY; without even the implied warranty of
-      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-      GNU Affero General Public License for more details.
-      You should have received a copy of the GNU Affero General Public License
-      along with this program.  If not, see <https://www.gnu.org/licenses/>.
-
-    ]]>
-  </help>
+  <help><![CDATA[
+usage: rcf [-h] [-V] [-n PATH] [--format GENERIC_FORMAT]
+         (-f FILE | -g FILE | -l FILE | -r FILE | -k FILE) [-o FILE]
+         [-e OUTPUT_TYPE] [-p] [--nohtml] [-a | -c CONTROLS_NUMBER]
+         [-s SCORING] [-y NUMBER] [-m INT] [-x TAXID] [-i TAXID] [-z NUMBER]
+         [-w INT] [-u SUMMARY_BEHAVIOR] [-t] [--nokollapse] [-d] [--strain]
+         [--sequential]
+
+Robust comparative analysis and contamination removal for metagenomics
+
+options:
+-h, --help            show this help message and exit
+-V, --version         show program's version number and exit
+
+input:
+Define Recentrifuge input files and formats
+
+-n PATH, --nodespath PATH
+                      path for the nodes information files (nodes.dmp and
+                      names.dmp from NCBI)
+--format GENERIC_FORMAT
+                      format of the output files from a generic classifier
+                      included with the option -g; It is a string like
+                      "TYP:csv,TID:1,LEN:3,SCO:6,UNC:0" where valid file
+                      TYPes are csv/tsv/ssv, and the rest of fields indicate
+                      the number of column used (starting in 1) for the
+                      TaxIDs assigned, the LENgth of the read, the SCOre
+                      given to the assignment, and the taxid code used for
+                      UNClassified reads
+-f FILE, --file FILE  Centrifuge output files; if a single directory is
+                      entered, every .out file inside will be taken as a
+                      different sample; multiple -f is available to include
+                      several Centrifuge samples
+-g FILE, --generic FILE
+                      output file from a generic classifier; it requires the
+                      flag --format (see such option for details); multiple
+                      -g is available to include several generic samples
+-l FILE, --lmat FILE  LMAT output dir or file prefix; if just "." is
+                      entered, every subdirectory under the current
+                      directory will be taken as a sample and scanned
+                      looking for LMAT output files; multiple -l is
+                      available to include several samples
+-r FILE, --clark FILE
+                      CLARK full-mode output files; if a single directory is
+                      entered, every .csv file inside will be taken as a
+                      different sample; multiple -r is available to include
+                      several CLARK, CLARK-l, and CLARK-S full-mode samples
+-k FILE, --kraken FILE
+                      Kraken output files; if a single directory is entered,
+                      every .krk file inside will be taken as a different
+                      sample; multiple -k is available to include several
+                      Kraken (version 1 or 2) samples
+
+output:
+Related to the Recentrifuge output files
+
+-o FILE, --outprefix FILE
+                      output prefix; if not given, it will be inferred from
+                      input files; an HTML filename is still accepted for
+                      backwards compatibility with legacy --outhtml option
+-e OUTPUT_TYPE, --extra OUTPUT_TYPE
+                      type of extra output to be generated, and can be one
+                      of ['FULL', 'CSV', 'MULTICSV', 'TSV', 'DYNOMICS']
+-p, --pickle          pickle (serialize) statistics and data results in
+                      pandas DataFrames (format affected by selection of
+                      --extra)
+--nohtml              suppress saving the HTML output file
+
+tuning:
+Coarse tuning of algorithm parameters
+
+-a, --avoidcross      avoid cross analysis
+-c CONTROLS_NUMBER, --controls CONTROLS_NUMBER
+                      this number of first samples will be treated as
+                      negative controls; default is no controls
+-s SCORING, --scoring SCORING
+                      type of scoring to be applied, and can be one of
+                      ['SHEL', 'LENGTH', 'LOGLENGTH', 'NORMA', 'LMAT',
+                      'CLARK_C', 'CLARK_G', 'KRAKEN', 'GENERIC']
+-y NUMBER, --minscore NUMBER
+                      minimum score/confidence of the classification of a
+                      read to pass the quality filter; all pass by default
+-m INT, --mintaxa INT
+                      minimum taxa to avoid collapsing one level into the
+                      parent (if not specified a value will be automatically
+                      assigned)
+-x TAXID, --exclude TAXID
+                      NCBI taxid code to exclude a taxon and all underneath
+                      (multiple -x is available to exclude several taxid)
+-i TAXID, --include TAXID
+                      NCBI taxid code to include a taxon and all underneath
+                      (multiple -i is available to include several taxid);
+                      by default, all the taxa are considered for inclusion
+
+fine tuning:
+Fine tuning of algorithm parameters
+
+-z NUMBER, --ctrlminscore NUMBER
+                      minimum score/confidence of the classification of a
+                      read in control samples to pass the quality filter; it
+                      defaults to "minscore"
+-w INT, --ctrlmintaxa INT
+                      minimum taxa to avoid collapsing one level into the
+                      parent (if not specified a value will be automatically
+                      assigned)
+-u SUMMARY_BEHAVIOR, --summary SUMMARY_BEHAVIOR
+                      choice for summary behaviour, and can be one of
+                      ['ADD', 'ONLY', 'AVOID']
+-t, --takeoutroot     remove counts directly assigned to the "root" level
+--nokollapse          show the "cellular organisms" taxon
+
+advanced:
+Advanced modes of running
+
+-d, --debug           increase output verbosity and perform additional
+                      checks
+--strain              set strain level instead of species as the resolution
+                      limit for the robust contamination removal algorithm;
+                      use with caution, this is an experimental feature
+--sequential          deactivate parallel processing
+
+rcf - Release 1.8.1 - Mar 2022
+
+  Copyright (C) 2017–2022, Jose Manuel Martí Martínez
+
+  This program is free software: you can redistribute it and/or modify
+  it under the terms of the GNU Affero General Public License as
+  published by the Free Software Foundation, either version 3 of the
+  License, or (at your option) any later version.
+
+  This program is distributed in the hope that it will be useful,
+  but WITHOUT ANY WARRANTY; without even the implied warranty of
+  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+  GNU Affero General Public License for more details.
+
+  You should have received a copy of the GNU Affero General Public License
+  along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+
+  ]]></help>
   <expand macro="citations"/>
 </tool>