| 
14
 | 
     1 This repository contains the **Naive Variant Caller** tool.
 | 
| 
13
 | 
     2 
 | 
| 
 | 
     3 ------
 | 
| 
 | 
     4 
 | 
| 
 | 
     5 **What it does**
 | 
| 
 | 
     6 
 | 
| 
 | 
     7 This tool is a naive variant caller that processes aligned sequencing reads from the BAM format and produces a VCF file containing per position variant calls. This tool allows multiple BAM files to be provided as input and utilizes read group information to make calls for individual samples. 
 | 
| 
 | 
     8 
 | 
| 
 | 
     9 User configurable options allow filtering reads that do not pass mapping or base quality thresholds and minimum per base read depth; user's can also specify the ploidy and whether to consider each strand separately. 
 | 
| 
 | 
    10 
 | 
| 
 | 
    11 In addition to calling alternate alleles based upon simple ratios of nucleotides at a position, per base nucleotide counts are also provided. A custom tag, NC, is used within the Genotype fields. The NC field is a comma-separated listing of nucleotide counts in the form of <nucleotide>=<count>, where a plus or minus character is prepended to indicate strand, if the strandedness option was specified.
 | 
| 
 | 
    12  
 | 
| 
 | 
    13 
 | 
| 
 | 
    14 ------
 | 
| 
 | 
    15 
 | 
| 
 | 
    16 **Inputs**
 | 
| 
 | 
    17 
 | 
| 
 | 
    18 Accepts one or more BAM input files and a reference genome from the built-in list or from a FASTA file in your history.
 | 
| 
 | 
    19 
 | 
| 
 | 
    20 
 | 
| 
 | 
    21 **Outputs**
 | 
| 
 | 
    22 
 | 
| 
 | 
    23 The output is in VCF format.
 | 
| 
 | 
    24 
 | 
| 
 | 
    25 Example VCF output line, without reporting by strand:
 | 
| 
21
 | 
    26     ``chrM	16029	.	T	G,A,C	.	.	AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095	GT:AC:AF:NC	0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:A=9,C=5,T=9629,G=15,``
 | 
| 
13
 | 
    27 
 | 
| 
 | 
    28 Example VCF output line, when reporting by strand:
 | 
| 
21
 | 
    29     ``chrM	16029	.	T	G,A,C	.	.	AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095	GT:AC:AF:NC	0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:+T=3972,-A=9,-C=5,-T=5657,-G=15,``
 | 
| 
13
 | 
    30 
 | 
| 
 | 
    31 **Options**
 | 
| 
 | 
    32 
 | 
| 
 | 
    33 Reference Genome:
 | 
| 
 | 
    34 
 | 
| 
21
 | 
    35     Ensure that you have selected the correct reference genome, either from the list of built-in genomes or by selecting the corresponding FASTA file from your history.
 | 
| 
13
 | 
    36 
 | 
| 
 | 
    37 Restrict to regions:
 | 
| 
 | 
    38 
 | 
| 
21
 | 
    39     You can specify any number of regions on which you would like to receive results. You can specify just a chromosome name, or a chromosome name and start postion, or a chromosome name and start and end position for the set of desired regions. 
 | 
| 
13
 | 
    40 
 | 
| 
 | 
    41 Minimum number of reads needed to consider a REF/ALT:
 | 
| 
 | 
    42 
 | 
| 
21
 | 
    43     This value declares the minimum number of reads containing a particular base at each position in order to list and use said allele in genotyping calls. Default is 0.
 | 
| 
13
 | 
    44 
 | 
| 
 | 
    45 Minimum base quality:
 | 
| 
 | 
    46 
 | 
| 
21
 | 
    47     The minimum base quality score needed for the position in a read to be used for nucleotide counts and genotyping. Default is no filter.
 | 
| 
13
 | 
    48 
 | 
| 
 | 
    49 Minimum mapping quality:
 | 
| 
 | 
    50 
 | 
| 
21
 | 
    51     The minimum mapping quality score needed to consider a read for nucleotide counts and genotyping. Default is no filter.
 | 
| 
13
 | 
    52 
 | 
| 
 | 
    53 Ploidy:
 | 
| 
 | 
    54 
 | 
| 
21
 | 
    55     The number of genotype calls to make at each reported position.
 | 
| 
13
 | 
    56 
 | 
| 
 | 
    57 Only write out positions with with possible alternate alleles:
 | 
| 
 | 
    58 
 | 
| 
21
 | 
    59     When set, only positions which have at least one non-reference nucleotide which passes declare filters will be present in the output.
 | 
| 
13
 | 
    60 
 | 
| 
 | 
    61 Report counts by strand:
 | 
| 
 | 
    62 
 | 
| 
21
 | 
    63     When set, nucleotide counts (NC) will be reported in reference to the aligned read's source strand. Reported as: <strand><BASE>=<COUNT>.
 | 
| 
13
 | 
    64 
 | 
| 
 | 
    65 Choose the dtype to use for storing coverage information:
 | 
| 
 | 
    66 
 | 
| 
21
 | 
    67     This controls the maximum depth value for each nucleotide/position/strand (when specified). Smaller values require the least amount of memory, but have smaller maximal limits.
 | 
| 
19
 | 
    68  
 | 
| 
21
 | 
    69         +--------+----------------------------+
 | 
| 
 | 
    70         | name   | maximum coverage value     |
 | 
| 
 | 
    71         +========+============================+
 | 
| 
 | 
    72         | uint8  | 255                        |
 | 
| 
 | 
    73         +--------+----------------------------+
 | 
| 
 | 
    74         | uint16 | 65,535                     |
 | 
| 
 | 
    75         +--------+----------------------------+
 | 
| 
 | 
    76         | uint32 | 4,294,967,295              |
 | 
| 
 | 
    77         +--------+----------------------------+
 | 
| 
 | 
    78         | uint64 | 18,446,744,073,709,551,615 |
 | 
| 
 | 
    79         +--------+----------------------------+
 | 
| 
13
 | 
    80 
 | 
| 
19
 | 
    81 
 | 
| 
13
 | 
    82 ------
 | 
| 
 | 
    83 
 | 
| 
 | 
    84 **Citation**
 | 
| 
 | 
    85 
 | 
| 
 | 
    86 If you use this tool, please cite Blankenberg D, et al. *In preparation.*
 |