| 14 | 1 This repository contains the **Naive Variant Caller** tool. | 
| 13 | 2 | 
|  | 3 ------ | 
|  | 4 | 
|  | 5 **What it does** | 
|  | 6 | 
|  | 7 This tool is a naive variant caller that processes aligned sequencing reads from the BAM format and produces a VCF file containing per position variant calls. This tool allows multiple BAM files to be provided as input and utilizes read group information to make calls for individual samples. | 
|  | 8 | 
|  | 9 User configurable options allow filtering reads that do not pass mapping or base quality thresholds and minimum per base read depth; user's can also specify the ploidy and whether to consider each strand separately. | 
|  | 10 | 
|  | 11 In addition to calling alternate alleles based upon simple ratios of nucleotides at a position, per base nucleotide counts are also provided. A custom tag, NC, is used within the Genotype fields. The NC field is a comma-separated listing of nucleotide counts in the form of <nucleotide>=<count>, where a plus or minus character is prepended to indicate strand, if the strandedness option was specified. | 
|  | 12 | 
|  | 13 | 
|  | 14 ------ | 
|  | 15 | 
|  | 16 **Inputs** | 
|  | 17 | 
|  | 18 Accepts one or more BAM input files and a reference genome from the built-in list or from a FASTA file in your history. | 
|  | 19 | 
|  | 20 | 
|  | 21 **Outputs** | 
|  | 22 | 
|  | 23 The output is in VCF format. | 
|  | 24 | 
|  | 25 Example VCF output line, without reporting by strand: | 
| 21 | 26     ``chrM	16029	.	T	G,A,C	.	.	AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095	GT:AC:AF:NC	0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:A=9,C=5,T=9629,G=15,`` | 
| 13 | 27 | 
|  | 28 Example VCF output line, when reporting by strand: | 
| 21 | 29     ``chrM	16029	.	T	G,A,C	.	.	AC=15,9,5;AF=0.00155311658729,0.000931869952371,0.000517705529095	GT:AC:AF:NC	0/0:15,9,5:0.00155311658729,0.000931869952371,0.000517705529095:+T=3972,-A=9,-C=5,-T=5657,-G=15,`` | 
| 13 | 30 | 
|  | 31 **Options** | 
|  | 32 | 
|  | 33 Reference Genome: | 
|  | 34 | 
| 21 | 35     Ensure that you have selected the correct reference genome, either from the list of built-in genomes or by selecting the corresponding FASTA file from your history. | 
| 13 | 36 | 
|  | 37 Restrict to regions: | 
|  | 38 | 
| 21 | 39     You can specify any number of regions on which you would like to receive results. You can specify just a chromosome name, or a chromosome name and start postion, or a chromosome name and start and end position for the set of desired regions. | 
| 13 | 40 | 
|  | 41 Minimum number of reads needed to consider a REF/ALT: | 
|  | 42 | 
| 21 | 43     This value declares the minimum number of reads containing a particular base at each position in order to list and use said allele in genotyping calls. Default is 0. | 
| 13 | 44 | 
|  | 45 Minimum base quality: | 
|  | 46 | 
| 21 | 47     The minimum base quality score needed for the position in a read to be used for nucleotide counts and genotyping. Default is no filter. | 
| 13 | 48 | 
|  | 49 Minimum mapping quality: | 
|  | 50 | 
| 21 | 51     The minimum mapping quality score needed to consider a read for nucleotide counts and genotyping. Default is no filter. | 
| 13 | 52 | 
|  | 53 Ploidy: | 
|  | 54 | 
| 21 | 55     The number of genotype calls to make at each reported position. | 
| 13 | 56 | 
|  | 57 Only write out positions with with possible alternate alleles: | 
|  | 58 | 
| 21 | 59     When set, only positions which have at least one non-reference nucleotide which passes declare filters will be present in the output. | 
| 13 | 60 | 
|  | 61 Report counts by strand: | 
|  | 62 | 
| 21 | 63     When set, nucleotide counts (NC) will be reported in reference to the aligned read's source strand. Reported as: <strand><BASE>=<COUNT>. | 
| 13 | 64 | 
|  | 65 Choose the dtype to use for storing coverage information: | 
|  | 66 | 
| 21 | 67     This controls the maximum depth value for each nucleotide/position/strand (when specified). Smaller values require the least amount of memory, but have smaller maximal limits. | 
| 19 | 68 | 
| 21 | 69         +--------+----------------------------+ | 
|  | 70         | name   | maximum coverage value     | | 
|  | 71         +========+============================+ | 
|  | 72         | uint8  | 255                        | | 
|  | 73         +--------+----------------------------+ | 
|  | 74         | uint16 | 65,535                     | | 
|  | 75         +--------+----------------------------+ | 
|  | 76         | uint32 | 4,294,967,295              | | 
|  | 77         +--------+----------------------------+ | 
|  | 78         | uint64 | 18,446,744,073,709,551,615 | | 
|  | 79         +--------+----------------------------+ | 
| 13 | 80 | 
| 19 | 81 | 
| 13 | 82 ------ | 
|  | 83 | 
|  | 84 **Citation** | 
|  | 85 | 
|  | 86 If you use this tool, please cite Blankenberg D, et al. *In preparation.* |