# HG changeset patch # User blankenberg # Date 1606269474 0 # Node ID 46f45544839fd822f3736959f4cec7ea47265754 Uploaded diff -r 000000000000 -r 46f45544839f plink2.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/plink2.xml Wed Nov 25 01:57:54 2020 +0000 @@ -0,0 +1,8535 @@ + + + plink + + + + + + plink2 --version + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + denote a required parameter, where the text between the + angle brackets describes its nature. + * ['square brackets + single-quotes'] denotes an optional modifier. Use the + EXACT text in the quotes. + * [{bar|separated|braced|bracketed|values}] denotes a collection of mutually + exclusive optional modifiers (again, the exact text must be used). When + there are no outer square brackets, one of the choices must be selected. + * ['quoted_text='] denotes an optional modifier that + must begin with the quoted text, and be followed by a value with no + whitespace in between. '|' may also be used here to indicate mutually + exclusive options. + * [square brackets without quotes or braces] denote an optional parameter, + where the text between the brackets describes its nature. + * An ellipsis (...) indicates that you may enter multiple parameters of the + specified type. + * A "column set descriptor" is either + 1. a comma-separated sequence of column set names; this is interpreted as + the full list of column sets to include. + 2. a comma-separated sequence of column set names, all preceded by '+' or + '-'; this is interpreted as a list of changes to the default. + + plink2 [command flag(s)...] [other flag(s)...] + plink2 --help [flag name(s)...] + + Most PLINK runs require exactly one main input fileset. The following flags + are available for defining its form and location: + + --pfile ['vzs'] : Specify .pgen + .pvar[.zst] + .psam prefix. + --pgen : Specify full name of .pgen/.bed file. + --pvar : Specify full name of .pvar/.bim file. + --psam : Specify full name of .psam/.fam file. + + --bfile ['vzs'] : Specify .bed + .bim[.zst] + .fam prefix. + --bpfile ['vzs'] : Specify .pgen + .bim[.zst] + .fam prefix. + + --keep-autoconv : When importing non-PLINK-binary data, don't delete + autogenerated binary fileset at end of run. + + --no-fid : .fam file does not contain column 1 (family ID). + --no-parents : .fam file does not contain columns 3-4 (parents). + --no-sex : .fam file does not contain column 5 (sex). + + --vcf ['dosage='] + --bcf ['dosage='] (not implemented yet) : + Specify full name of .vcf{|.gz|.zst} or BCF2 file to import. + * These can be used with --psam/--fam. + * By default, dosage information is not imported. To import the GP field + (must be VCFv4.3-style 0..1, one probability per possible genotype), add + 'dosage=GP' (or 'dosage=GP-force', see below). To import Minimac3-style + DS+HDS phased dosage, add 'dosage=HDS'. 'dosage=DS' (or anything else + for now) causes the named field to be interpreted as a Minimac3-style + dosage. + Note that, in the dosage=GP case, PLINK 2 collapses the probabilities + down to dosages; you cannot use PLINK 2 to losslessly convert VCF + FORMAT:GP data to e.g. BGEN format. To make this more obvious, PLINK 2 + now errors out when dosage=GP is used on a file with a FORMAT:DS header + line and --import-dosage-certainty wasn't specified, since dosage=DS + extracts the same information more quickly in this situation. You can + suppress this error with 'dosage=GP-force'. + In all of these cases, hardcalls are regenerated from scratch from the + dosages. As a consequence, variants with no GT field can now be + imported; they will be assumed to contain only diploid calls when HDS is + also absent. + + --data [REF/ALT mode] ['gzs'] + --bgen [REF/ALT mode] ['snpid-chr'] + --gen [REF/ALT mode] + --sample : + Specify an Oxford-format dataset to import. --data specifies a .gen[.zst] + + .sample pair, while --bgen specifies a BGEN v1.1+ file. + * If a BGEN v1.2+ file contains sample IDs, it may be imported without a + companion .sample file. + * With 'snpid-chr', chromosome codes are read from the 'SNP ID' field + instead of the usual chromosome field. + * The following REF/ALT modes are supported: + 'ref-first': The first allele for each variant is REF. + 'ref-last': The last allele for each variant is REF. + 'ref-unknown' (default): The last allele for each variant is treated as + provisional-REF. + This parameter will be required instead of optional in alpha 3. + + --haps [{ref-first | ref-last}] + --legend : + Specify .haps [+ .legend] file(s) to import. + * When --legend is specified, it's assumed that the --haps file doesn't + contain header columns. + * On chrX, the second male column may contain dummy '-' entries. (However, + PLINK 2 currently cannot handle omitted male columns.) + * If not used with --sample, new sample IDs are of the form 'per#/per#'. + + --map : Specify full name of .map file. + --import-dosage ['noheader'] ['id-delim='] + ['skip0='] ['skip1='] ['skip2='] ['dose1'] + ['format='] [{ref-first | ref-last}] + ['single-chr='] ['chr-col-num='<#>] + ['pos-col-num='<#>] : + Specify PLINK 1.x-style dosage file to import. + * You must also specify a companion .psam/.fam file. + * By default, PLINK assumes that the file contains a header line, which has + 'SNP' in (1-based) column i+1, 'A1' in column i+j+2, 'A2' in column + i+j+3, and sample FID/IIDs starting from column i+j+k+4. (i/j/k are + normally zero, but can be changed with 'skip0', 'skip1', and 'skip2' + respectively. FID/IID are normally assumed to be separate tokens, but if + they're merged into a single token you can specify the delimiter with + 'id-delim='.) If such a header line is not present, use the 'noheader' + modifier; samples will then be assumed to appear in the same order as + they do in the .psam/.fam file. + * You may specify a companion .map file. If you do not, + * 'single-chr=' can be used to specify that all variants are on the named + chromosome. Otherwise, you can use 'chr-col-num=' to read chromosome + codes from the given (1-based) column number. + * 'pos-col-num=' causes bp coordinates to be read from the given column + number. + * The 'format=' modifier lets you specify the number of values used to + represent each dosage. 'format=1' normally indicates a single 0..2 A1 + expected count; 'dose1' modifies this to a 0..1 frequency. 'format=2' + indicates a 0..1 homozygous A1 likelihood followed by a 0..1 het + likelihood. 'format=3' indicates 0..1 hom A1, 0..1 het, 0..1 hom A2. + 'format=infer' (the default) infers the format from the number of columns + in the first nonheader line. + + --dummy [missing dosage freq] [missing pheno freq] + [{acgt | 1234 | 12}] ['pheno-ct='] ['scalar-pheno'] + ['dosage-freq='] + This generates a fake input dataset with the specified number of samples + and SNPs. + * By default, the missing dosage and phenotype frequencies are zero. + These can be changed by providing 3rd and 4th numeric parameters. + * By default, allele codes are As and Bs; this can be changed with the + 'acgt', '1234', or '12' modifier. + * By default, one binary phenotype is generated. 'pheno-ct=' can be used + to change the number of phenotypes, and 'scalar-pheno' causes these + phenotypes to be normally distributed scalars. + * By default, all (nonmissing) dosages are in {0,1,2}. To make some of + them take on decimal values, use 'dosage-freq='. (These dosages are + affected by --hard-call-threshold and --dosage-erase-threshold.) + + --fa : Specify full name of reference FASTA file. + + Output files have names of the form 'plink2.' by default. You can + change the 'plink2' prefix with + + --out : Specify prefix for output files. + + Most runs also require at least one of the following commands: + + --rm-dup [mode] ['list'] + Remove all but one instance of each duplicate-ID variant (ignoring the + missing ID), and (with the 'list' modifier) write a list of duplicated IDs + to .rmdup.list. + The following modes of operation are supported: + * 'error' (default) causes this to error out when there's a genotype data + or other mismatch between the records. A list of affected IDs is written + to .rmdup.mismatch. + * 'retain-mismatch' causes all instances of a duplicate-ID variant to be + retained when there's a genotype data or variant info mismatch; otherwise + one instance is kept. The .rmdup.mismatch file is also written. + * 'exclude-mismatch' removes all instances of duplicate-ID mismatched + variants instead. + * 'exclude-all' causes all instances of duplicate-ID variants to be + removed, even when the actual records are identical. + * 'force-first' causes only the first instance of duplicate-ID variants to + be kept, under all circumstances. + + --make-pgen ['vzs'] ['format='] ['trim-alts'] ['erase-phase'] + ['erase-dosage'] ['pvar-cols='] + ['psam-cols='] + --make-bpgen ['vzs'] ['format='] ['trim-alts'] ['erase-phase'] + ['erase-dosage'] + --make-bed ['vzs'] ['trim-alts'] + Create a new PLINK binary fileset (--make-pgen = .pgen + .pvar[.zst] + + .psam, --make-bpgen = .pgen + .bim[.zst] + .fam). + * Unlike the automatic text-to-binary converters (which only heed + chromosome filters), this supports all of PLINK's filtering flags. + * The 'vzs' modifier causes the variant file (.pvar/.bim) to be + Zstd-compressed. + * The 'format' modifier requests an uncompressed fixed-variant-width .pgen + file. (These do not directly support multiallelic variants.) The + following format code is currently supported: + 2: just like .bed, except with an extended (12-byte instead of 3-byte) + header containing variant/sample counts, and rotated genotype codes + (00 = hom ref, 01 = het, 10 = hom alt, 11 = missing). + * The 'erase-phase' and 'erase-dosage' modifiers prevent phase and dosage + information from being written to the new .pgen. + * The first five columns of a .pvar file are always #CHROM/POS/ID/REF/ALT. + Supported optional .pvar column sets are: + xheader: All ## header lines (yeah, this is technically not a column), + except for possibly FILTER/INFO definitions when those + column(s) have been removed. Without this, only the #CHROM + header line is kept. + maybequal: QUAL. Omitted if all remaining values are missing. + qual: Force QUAL column to be written even when empty. + maybefilter: FILTER. Omitted if all remaining values are missing. + filter: Force FILTER column to be written even when empty. + maybeinfo: INFO. Omitted if all remaining values are missing, or if + INFO:PR is the only subfield. + info: Force INFO column to be written. + maybecm: Centimorgan coordinate. Omitted if all remaining values = 0. + cm: Force CM column to be written even when empty. + The default is xheader,maybequal,maybefilter,maybeinfo,maybecm. + * Supported column sets for the .psam file are: + maybefid: Family ID, '0' = missing. Omitted if all values missing. + fid: Force FID column to be written even when empty. + maybesid: Source ID, '0' = missing. Omitted if all values missing. + sid: Force SID column to be written even when empty. + maybeparents: Father and mother IIDs. Omitted if all values missing. + parents: Force PAT and MAT columns to be written even when empty. + sex: '1' = male, '2' = female, 'NA' = missing. + pheno1: First active phenotype. If none, all column entries are set to + the --output-missing-phenotype string. + phenos: All active phenotypes, if any. (Can be combined with pheno1 to + force at least one phenotype column to be written.) + The default is maybefid,maybesid,maybeparents,sex,phenos. + + --make-just-pvar ['zs'] ['cols='] + --make-just-psam ['cols='] + --make-just-bim ['zs'] + --make-just-fam + Variants of --make-pgen/--make-bed which only write a new variant or sample + file. These don't always require an input genotype file. + USE THESE CAUTIOUSLY. It is very easy to desynchronize your binary + genotype data and your sample/variant indexes if you use these commands + improperly. If you have any doubt, stick with --make-[b]pgen/--make-bed. + + --export [{01 | 12}] ['bgz'] ['id-delim='] + ['id-paste='] ['include-alt'] + ['omit-nonmale-y'] ['spaces'] ['vcf-dosage='] ['ref-first'] + ['bits='<#>] + Create a new fileset with all filters applied. The following output + formats are supported: + (actually, only A, AD, A-transpose, bgen-1.x, ind-major-bed, haps, + hapslegend, oxford, and vcf are implemented for now) + * '23': 23andMe 4-column format. This can only be used on a single + sample's data (--keep may be handy), and does not support + multicharacter allele codes. + * 'A': Sample-major additive (0/1/2) coding, suitable for loading from R. + If you need uncounted alleles to be named in the header line, add + the 'include-alt' modifier. + * 'AD': Sample-major additive (0/1/2) + dominant (het=1/hom=0) coding. + Also supports 'include-alt'. + * 'A-transpose': Variant-major 0/1/2. + * 'beagle': Unphased per-autosome .dat and .map files, readable by early + BEAGLE versions. + * 'beagle-nomap': Single .beagle.dat file. + * 'bgen-1.x': Oxford-format .bgen + .sample. For v1.2/v1.3, sample + identifiers are stored in the .bgen (with id-delim and + id-paste settings applied), and default precision is 16-bit + (use the 'bits' modifier to reduce this). + * 'bimbam': Regular BIMBAM format. + * 'bimbam-1chr': BIMBAM format, with a two-column .pos.txt file. Does not + support multiple chromosomes. + * 'fastphase': Per-chromosome fastPHASE files, with + .chr-.phase.inp filename extensions. + * 'fastphase-1chr': Single .phase.inp file. Does not support + multiple chromosomes. + * 'haps', 'hapslegend': Oxford-format .haps + .sample[ + .legend]. All + data must be biallelic and phased. When the 'bgz' + modifier is present, the .haps file is + block-gzipped. + * 'HV': Per-chromosome Haploview files, with .chr-{.ped,.info} + filename extensions. + * 'HV-1chr': Single Haploview .ped + .info file pair. Does not support + multiple chromosomes. + * 'ind-major-bed': PLINK 1 sample-major .bed (+ .bim + .fam). + * 'lgen': PLINK 1 long-format (.lgen + .fam + .map), loadable with --lfile. + * 'lgen-ref': .lgen + .fam + .map + .ref, loadable with --lfile + + --reference. + * 'list': Single genotype-based list, up to 4 lines per variant. To omit + nonmale genotypes on the Y chromosome, add the 'omit-nonmale-y' + modifier. + * 'rlist': .rlist + .fam + .map fileset, where the .rlist file is a + genotype-based list which omits the most common genotype for + each variant. Also supports 'omit-nonmale-y'. + * 'oxford': Oxford-format .gen + .sample. When the 'bgz' modifier is + present, the .gen file is block-gzipped. + * 'ped': PLINK 1 sample-major (.ped + .map), loadable with --file. + * 'compound-genotypes': Same as 'ped', except that the space between each + pair of same-variant allele codes is removed. + * 'structure': Structure-format. + * 'transpose': PLINK 1 variant-major (.tped + .tfam), loadable with + --tfile. + * 'vcf', 'vcf-4.2': VCF (default version 4.3). If PAR1 and PAR2 are + present, they are automatically merged with chrX, with + proper handling of chromosome codes and male ploidy. + When the 'bgz' modifier is present, the VCF file is + block-gzipped. + The 'id-paste' modifier controls which .psam columns + are used to construct sample IDs (choices are maybefid, + fid, iid, maybesid, and sid; default is + maybefid,iid,maybesid), while the 'id-delim' modifier + sets the character between the ID pieces (default '_'). + Dosages are not exported unless the 'vcf-dosage=' + modifier is present. The following five dosage export + modes are supported: + 'GP': genotype posterior probabilities (v4.3 only). + 'DS': Minimac3-style dosages, omitted for hardcalls. + 'DS-force': Minimac3-style dosages, never omit. + 'HDS': Minimac3-style phased dosages, omitted for + hardcalls and unphased calls. Also includes + 'DS' output. + 'HDS-force': Always report DS and HDS. + In addition, + * The '12' modifier causes alt1 alleles to be coded as '1' and ref alleles + to be coded as '2', while '01' maps alt1 -> 0 and ref -> 1. + * The 'spaces' modifier makes the output space-delimited instead of + tab-delimited, whenever both are permitted. + * For biallelic formats where it's unspecified whether the reference/major + allele should appear first or second, --export defaults to second for + compatibility with PLINK 1.9. Use 'ref-first' to change this. + (Note that this doesn't apply to the 'A', 'AD', and 'A-transpose' + formats; use --export-allele to control which alleles are counted there.) + + --freq ['zs'] ['counts'] ['cols='] ['bins-only'] + ['refbins=' | 'refbins-file='] + ['alt1bins=' | 'alt1bins-file='] + Empirical allele frequency report. By default, only founders are + considered. Dosages are taken into account (e.g. heterozygous haploid + calls count as 0.5). + Supported column sets are: + chrom: Chromosome ID. + pos: Base-pair coordinate. + (ID is always present, and positioned here.) + ref: Reference allele. + alt1: Alternate allele 1. + alt: All alternate alleles, comma-separated. + reffreq: Reference allele frequency/dosage. + alt1freq: Alt1 frequency/dosage. + altfreq: Comma-separated frequencies/dosages for all alternate alleles. + freq: Similar to altfreq, except ref is also included at the start. + eq: Comma-separated = for all present alleles. (If no + alleles are present, the column contains a single '.'.) + eqz: Same as eq, except zero-counts are included. + alteq/alteqz: Same as eq/eqz, except reference allele is omitted. + numeq: 0=,1=, etc. Zero-counts are omitted. + altnumeq: Same as numeq, except reference allele is omitted. + machr2: Unphased MaCH imputation quality metric. + minimac3r2: Phased Minimac3 imputation quality. + nobs: Number of allele observations. + The default is chrom,ref,alt,altfreq,nobs. + Additional .afreq.{ref,alt1}.bins (or .acount.{ref,alt1}.bins with + 'counts') file(s) are generated when 'refbins='/'refbins-file=' or + 'alt1bins='/'alt1bins-file=' is present; these report the total number of + frequencies or counts in each left-closed, right-open interval. (If you + only want these histogram(s), and not the main report, add 'bins-only'.) + + --geno-counts ['zs'] ['cols='] + Variant-based hardcall genotype count report (considering both alleles + simultaneously in the diploid case). Nonfounders are now included; use + --keep-founders if this is a problem. Heterozygous haploid calls are + treated as missing. + Supported column sets are: + chrom: Chromosome ID. + pos: Base-pair coordinate. + (ID is always present, and positioned here.) + ref: Reference allele. + alt1: Alternate allele 1. + alt: All alternate alleles, comma-separated. + homref: Homozygous-ref count. + refalt1: Heterozygous ref-alt1 count. + refalt: Comma-separated het ref-altx counts. + homalt1: Homozygous-alt1 count. + altxy: Comma-separated altx-alty counts, in (1/1)-(1/2)-(2/2)-(1/3)-... + order. + xy: Similar to altxy, except the reference allele is treated as alt0, + and the sequence starts (0/0)-(0/1)-(1/1)-(0/2)-... + hapref: Haploid-ref count. + hapalt1: Haploid-alt1 count. + hapalt: Comma-separated haploid-altx counts. + hap: Similar to hapalts, except ref is also included at the start. + numeq: 0/0=,0/1=,1/1=,...,0= + etc. Zero-counts are omitted. (If all genotypes are missing, the + column contains a single '.'.) + missing: Number of missing genotypes. + nobs: Number of (nonmissing) genotype observations. + The default is chrom,ref,alt,homref,refalt,altxy,hapref,hapalt,missing. + + --sample-counts ['zs'] ['cols='] + Sample-based hardcall genotype count report. + * Unknown-sex samples are treated as female. + * Heterozygous haploid calls (MT included) are treated as missing. + * As with other PLINK 2 commands, SNPs that have not been left-normalized + are counted as non-SNP non-symbolic. (Use e.g. --normalize when that's a + problem.) + * Supported column sets are: + maybefid: FID, if that column was present in the input. + fid: Force FID column to be written even when absent in the input. + (IID is always present, and positioned here.) + maybesid: SID, if that column was present in the input. + sid: Force SID column to be written even when absent in the input. + sex: '1' = male, '2' = female, 'NA' = missing. + hom: Homozygous genotype count. + homref: Homozygous-ref genotype count. + homalt: Homozygous-alt genotype count. + homaltsnp: Homozygous-alt SNP count. + het: Heterozygous genotype count. + refalt: Heterozygous ref-altx genotype count. + het2alt: Heterozygous altx-alty genotype count. + hetsnp: Heterozygous SNP count. + dipts: Diploid SNP transition count. + ts: SNP transition count (excluding chrY for females). + diptv: Diploid SNP transversion count. + tv: SNP transversion count. + dipnonsnpsymb: Diploid non-SNP, non-symbolic count. + nonsnpsymb: Non-SNP, non-symbolic count. + symbolic: Symbolic variant count. + nonsnp: Non-SNP count. + dipsingle: Number of singletons relative to this dataset, across just + diploid calls. (Note that if the ALT allele in a chrX + biallelic variant appears in exactly one female and one + male, that counts as a singleton for just the female.) + single: Number of singletons relative to this dataset. + haprefwfemaley: Haploid-ref count, counting chrY for everyone. + hapref: Haploid-ref count, excluding chrY for females. + hapaltwfemaley: Haploid-alt count, counting chrY for everyone. + hapalt: Haploid-alt count, excluding chrY for females. + missingwfemaley: Missing call count, counting chrY for everyone. + missing: Missing call count, excluding chrY for females. + The default is maybefid,maybesid,homref,homaltsnp,hetsnp,dipts,diptv, + dipnonsnpsymb,dipsingle,haprefwfemaley,hapaltwfemaley,missingwfemaley. + * The 'hetsnp', 'dipts'/'ts'/'diptv'/'tv', 'dipnonsnpsymb'/'nonsnpsymb', + 'symbolic', and 'nonsnp' columns count each ALT allele in a heterozygous + altx-alty call separately, since they can be of different subtypes. + (I.e. if they are of the same subtype, the corresponding count is + incremented by 2.) As a consequence, these columns are unaffected by + variant split/join. + + --missing ['zs'] [{sample-only | variant-only}] + ['scols='] ['vcols='] + Generate sample- and variant-based missing data reports (or just one report + if 'sample-only'/'variant-only' is specified). + As of alpha 2, mixed MT hardcalls appear in the heterozygous haploid stats. + Supported column sets in the sample-based report are: + maybefid: FID, if that column was present in the input. + fid: Force FID column to be written even when absent in the input. + (IID is always present, and positioned here.) + maybesid: SID, if that column was present in the input. + sid: Force SID column to be written even when absent in the input. + misspheno1: First active phenotype missing (Y/N)? Always 'Y' if no + phenotypes are loaded. + missphenos: A Y/N column for each loaded phenotype. (Can be combined + with misspheno1 to force at least one such column.) + nmissdosage: Number of missing dosages. + nmiss: Number of missing hardcalls, not counting het haploids. + nmisshh: Number of missing hardcalls, counting het haploids. + hethap: Number of heterozygous haploid hardcalls. + nobs: Denominator (male count on chrY, otherwise total sample count). + fmissdosage: Missing dosage rate. + fmiss: Missing hardcall rate, not counting het haploids. + fmisshh: Missing hardcall rate, counting het haploids. + The default is maybefid,maybesid,missphenos,nmiss,nobs,fmiss. + Supported column sets in the variant-based report are: + chrom: Chromosome ID. + pos: Base-pair coordinate. + (ID is always present, and positioned here.) + ref: Reference allele. + alt1: Alternate allele 1. + alt: All alternate alleles, comma-separated. + nmissdosage: Number of missing dosages. + nmiss: Number of missing hardcalls, not counting het haploids. + nmisshh: Number of missing hardcalls, counting het haploids. + hethap: Number of heterozygous haploid calls. + nobs: Number of potentially valid calls. + fmissdosage: Missing dosage rate. + fmiss: Missing hardcall rate, not counting het haploids. + fmisshh: Missing hardcall rate, counting het haploids. + fhethap: Heterozygous haploid rate. + The default is chrom,nmiss,nobs,fmiss. + + --hardy ['zs'] ['midp'] ['redundant'] ['cols='] + Hardy-Weinberg exact test p-value report(s). + * By default, only founders are considered; change this with --nonfounders. + * chrX is now omitted from the main .hardy report. Instead, + (if present) it gets its own .hardy.x report based on the + method described in Graffelman J, Weir BS (2016) Hardy-Weinberg + equilibrium and the X chromosome. + * For variants with k alleles where k>2, k separate 'biallelic' tests are + performed, each reported on its own line. However, biallelic variants + are normally reported on a single line, since the counts/frequencies + would be mirror-images and the p-values would be the same. You can add + the 'redundant' modifier to force biallelic variant results to be + reported on two lines for parsing convenience. + * There is currently no special handling of case/control phenotypes. + Supported column sets are: + chrom: Chromosome ID. + pos: Base-pair coordinate. + (ID is always present, and positioned here.) + ref: Reference allele. + alt1: Alternate allele 1. + alt: All alternate alleles, comma-separated. + (A1 is always present, and positioned here.) + ax: Non-A1 allele(s), comma-separated. + gcounts: Hom-A1 count, total number of het-A1 calls, and total number of + nonmissing calls with no copies of A1. On chrX, these are + followed by male A1 and male non-A1 counts. + gcount1col: gcounts values in a single comma-separated column. + hetfreq: Observed and expected het-A1 frequencies. + sexaf: Female and male A1 observed allele frequencies (chrX only). + femalep: Female-only p/midp-value (chrX only). + p: Hardy-Weinberg equilibrium exact test p/midp-value. + The default is chrom,ax,gcounts,hetfreq,sexaf,p. + + --indep-pairwise ['kb'] [step size (variant ct)] + + Generate a list of variants in approximate linkage equilibrium. + * For multiallelic variants, major allele counts are used in the r^2 + computation. + * With the 'kb' modifier, the window size is in kilobase instead of variant + count units. (Pre-'kb' space is optional, i.e. + "--indep-pairwise 500 kb 0.5" and "--indep-pairwise 500kb 0.5" have the + same effect.) + * The step size now defaults to 1 if it's unspecified, and *must* be 1 if + the window is in kilobase units. + * Note that you need to rerun PLINK using --extract or --exclude on the + .prune.in/.prune.out file to apply the list to another computation... and + as with other applications of --extract/--exclude, duplicate variant IDs + are a problem. --indep-pairwise still runs to completion for now when + duplicate variant IDs are present, but that will become an error in alpha + 3. + + --ld ['dosage'] ['hwe-midp'] + This displays diplotype frequencies, r^2, and D' for a single pair of + variants. + * For multiallelic variants, major allele counts/dosages are used. + * Phase information is used when both variants are on the same chromosome. + * When there is at least one sample with unphased het calls for both + variants, diplotype frequencies are estimated using the Hill equation. + If there are multiple biologically possible local maxima, all are + displayed, along with HWE exact test statistics. + * By default, only hardcalls are considered. Add the 'dosage' modifier if + you want dosages to be taken into account. (In the diploid case, an + unphased dosage of x is interpreted as P(0/0) = 1 - x, P(0/1) = x when x + is in 0..1.) + + --sample-diff ['id-delim='] ['dosage' | 'dosage='] + ['include-missing'] [{pairwise | counts-only}] + ['fname-id-delim='] ['zs'] ['cols='] + ['counts-cols='] + {base= | ids=} [other sample ID(s)...] + --sample-diff ['id-delim='] ['dosage' | 'dosage='] + ['include-missing'] [{pairwise | counts-only}] + ['fname-id-delim='] ['zs'] ['cols='] + ['counts-cols='] file= + (alias: --sdiff) + Report discordances and discordance-counts between pairs of samples. If + chrX or chrY is present, sex must be defined and consistent. + * There are three ways to specify which sample pairs to compare. To + compare a single baseline sample against some others, start the + (space-delimited) sample ID list with 'base='. To perform an all-vs.-all + comparison, start it with 'ids=' instead. To compare sample pairs listed + in a file, use 'file='. + Note that 'base='/'ids='/'file=' must be positioned after all modifiers. + * Sample IDs are interpreted as if they were in a VCF header line, with + 'id-delim=' having the usual effect. + * By default, comparisons are based on hardcalls. Use 'dosage' to compare + dosages instead; you can combine this with a tolerance in [0, 0.5). + * By default, if one genotype is missing and the other isn't, that doesn't + count as a difference; this can be changed with 'include-missing'. + * By default, a single main report is written to + [.].sdiff. To write separate pairwise + ...sdiff reports for each compared ID pair, add + the 'pairwise' modifier. To omit the main report, add the 'counts-only' + modifier. (Note that, if you're only interested in nonmissing autosomal + biallelic hardcalls, --make-king-table provides a more efficient way to + compute just counts.) + * By default, if an output filename has a multipart sample ID, the parts + will be delimited by '_'; use 'fname-id-delim=' to change this. + Supported main-report column sets are: + chrom: Chromosome ID. + pos: Base-pair coordinate. + (Variant ID is always present, and positioned here.) + ref: Reference allele. + alt: All alternate alleles, comma-separated. + maybefid: FID1/FID2, if that column was in the input. Requires 'id'. + fid: Force FID1/FID2 even when FID was absent in the input. + id: IID1/IID2. + maybesid: SID1/SID2, if that column was in the input. Requires 'id'. + sid: Force SID1/SID2 even when SID was absent in the input. + geno: Unphased GT or DS for the two samples. + The default is usually chrom,pos,ref,alt,maybefid,id,maybesid,geno; the + sample IDs are removed from the default in 'pairwise' mode. + Supported discordance-count-summary column sets are: + maybefid: FID1/FID2, if that column was in the input. + fid: Force FID1/FID2 even when FID was absent in the input. + (IID1/IID2 are always present.) + maybesid: SID1/SID2, if that column was in the input. + sid: Force SID1/SID2 even when SID was absent in the input. + nobs: Number of variants considered. This includes variants where one or + both variants are missing iff 'include-missing' was specified. + nobsibs: ibs0+ibs1+ibs2. + ibs0: Number of diploid variants with no common hardcall alleles. + ibs1: Number of diploid variants with exactly 1 common hardcall allele. + ibs2: Number of diploid variants with both hardcall alleles matching. + halfmiss: Number of variants with exactly 1 missing genotype/dosage. + Ignored without 'include-missing'. + diff: Total number of differences. + The default is maybefid,maybesid,nobs,halfmiss,diff. + + --make-king [{square | square0 | triangle}] [{zs | bin | bin4}] + KING-robust kinship estimator, described by Manichaikul A, Mychaleckyj JC, + Rich SS, Daly K, Sale M, Chen WM (2010) Robust relationship inference in + genome-wide association studies. By default, this writes a + lower-triangular tab-delimited table of kinship coefficients to + .king, and a list of the corresponding sample IDs to + .king.id. The first row of the .king file contains a single + kinship coefficient, the second row has the + and kinship values in that order, + etc. + * Only autosomes are currently considered. + * Pedigree information is currently ignored; the between-family estimator + is used for all pairs. + * For multiallelic variants, REF allele counts are used. + * If the 'square' or 'square0' modifier is present, a square matrix is + written instead; 'square0' fills the upper right triangle with zeroes. + * If the 'zs' modifier is present, the .king file is Zstd-compressed. + * If the 'bin' modifier is present, a binary (square) matrix of + double-precision floating point values, suitable for loading from R, is + instead written to .king.bin. ('bin4' specifies + single-precision numbers instead.) This can be combined with 'square0' + if you still want the upper right zeroed out, or 'triangle' if you don't + want to pad the upper right at all. + * The computation can be subdivided with --parallel. + --make-king-table ['zs'] ['counts'] ['rel-check'] ['cols='] + Similar to --make-king, except results are reported in KING's original + .kin0 text table format (with minor changes, e.g. row order is more + friendly to incremental addition of samples), --king-table-filter can be + used to restrict the report to high kinship values, and the 'rel-check' + modifier can be used to restrict to same-FID pairs. + Supported column sets are: + maybefid: FID1/FID2, if that column was in the input. Requires 'id'. + fid: Force FID1/FID2 even when FID was absent in the input. + id: IID1/IID2 (column headers are actually 'ID1'/'ID2' to match KING). + maybesid: SID1/SID2, if that column was in the input. Requires 'id'. + sid: Force SID1/SID2 even when SID was absent in the input. + nsnp: Number of variants considered (autosomal, neither call missing). + hethet: Proportion/count of considered call pairs which are het-het. + ibs0: Proportion/count of considered call pairs which are opposite homs. + ibs1: HET1_HOM2 and HET2_HOM1 proportions/counts. + kinship: KING-robust between-family kinship estimator. + The default is maybefid,id,maybesid,nsnp,hethet,ibs0,kinship. + hethet/ibs0/ibs1 values are proportions unless the 'counts' modifier is + present. If id is omitted, a .kin0.id file is also written. + + --make-rel ['cov'] ['meanimpute'] [{square | square0 | triangle}] + [{zs | bin | bin4}] + Write a lower-triangular variance-standardized relationship matrix to + .rel, and corresponding IDs to .rel.id. + * This computation assumes that variants do not have very low MAF, or + deviate greatly from Hardy-Weinberg equilibrium. + * Also, it's usually best to perform this calculation on a variant set in + approximate linkage equilibrium. + * The 'cov' modifier replaces the variance-standardization step with basic + mean-centering, causing a covariance matrix to be calculated instead. + * The computation can be subdivided with --parallel. + --make-grm-list ['cov'] ['meanimpute'] ['zs'] [{id-header | iid-only}] + --make-grm-bin ['cov'] ['meanimpute'] [{id-header | iid-only}] + --make-grm-list causes the relationships to be written to GCTA's original + list format, which describes one pair per line, while --make-grm-bin writes + them in GCTA 1.1+'s single-precision triangular binary format. Note that + these formats explicitly report the number of valid observations (where + neither sample has a missing call) for each pair, which is useful input for + some scripts. + + --pca [count] [{approx | meanimpute}] ['scols='] + --pca [{biallelic-var-wts | var-wts}] [count] [{approx | meanimpute}] ['vzs'] + ['scols='] ['vcols='] + Extracts top principal components from the variance-standardized + relationship matrix. + * It is usually best to perform this calculation on a variant set in + approximate linkage equilibrium, with no very-low-MAF variants. + * By default, 10 PCs are extracted; you can adjust this by passing a + numeric parameter. (Note that 10 is lower than the PLINK 1.9 default of + 20; this is due to the randomized algorithm's memory footprint growing + quadratically w.r.t. the PC count.) + * The 'approx' modifier causes the standard deterministic computation to be + replaced with the randomized algorithm originally implemented for + Galinsky KJ, Bhatia G, Loh PR, Georgiev S, Mukherjee S, Patterson NJ, + Price AL (2016) Fast Principal-Component Analysis Reveals Convergent + Evolution of ADH1B in Europe and East Asia. This can be a good idea when + you have >5k samples. + * The randomized algorithm always uses mean imputation for missing genotype + calls. For comparison purposes, you can use the 'meanimpute' modifier to + request this behavior for the standard computation. + * 'scols=' can be used to customize how sample IDs appear in the .eigenvec + file. (maybefid, fid, maybesid, and sid supported; default is + maybefid,maybesid.) + * The 'biallelic-var-wts' modifier requests an additional + one-line-per-variant .eigenvec.var file with PCs expressed as variant + weights instead of sample weights, with the condition that all variants + must be biallelic. When it's present, 'vzs' causes the .eigenvec.var + file to be Zstd-compressed. + 'vcols=' can be used to customize the report columns; supported column + sets are: + chrom: Chromosome ID. + pos: Base-pair coordinate. + (ID is always present, and positioned here.) + ref: Reference allele. + alt1: Alternate allele 1. + alt: All alternate alleles, comma-separated. + maj: Major allele. + nonmaj: All nonmajor alleles, comma-separated. + (PCs are always present, and positioned here. Signs are w.r.t. the + major, not necessarily reference, allele.) + Default is chrom,maj,nonmaj. + * In this build, 'var-wts' generates the same report as biallelic-var-wts, + except with the "all variants must be biallelic" restriction lifted. + This is temporary. It will no longer be supported as of alpha 3; + instead, there will be an 'allele-wts' mode which seamlessly handles + multiallelic variants, at the cost of generating more verbose + one-line-per-allele output. + + --king-cutoff [.king.bin + .king.id fileset prefix] + Exclude one member of each pair of samples with KING-robust kinship greater + than the given threshold. Remaining/excluded sample IDs are written to + .king.cutoff.in.id + .king.cutoff.out.id. + If present, the .king.bin file must be triangular (either precision is ok). + + --write-covar ['cols='] + If covariates are defined, an updated version (with all filters applied) is + automatically written to .cov whenever --make-pgen, + --make-just-psam, --export, or a similar command is present. However, if + you do not wish to simultaneously generate a new sample file, you can use + --write-covar to just produce a pruned covariate file. + Supported column sets are: + maybefid: FID, if that column was in the input. + fid: Force FID column to be written even when absent in the input. + maybesid: SID, if that column was in the input. + sid: Force SID column to be written even when absent in the input. + maybeparents: Father/mother IIDs ('0' = missing), if columns in input. + parents: Force PAT/MAT columns to be written even when absent in input. + sex: '1' = male, '2' = female, 'NA' = missing. + pheno1: First active phenotype. If none, all column entries are set to + the --output-missing-phenotype string. + phenos: All active phenotypes, if any. (Can be combined with pheno1 to + force at least one phenotype column to be written.) + (Covariates are always present, and positioned here.) + The default is maybefid,maybesid. + + --write-samples + Report IDs of all samples which pass your filters/inclusion thresholds. + + --write-snplist ['zs'] + List all variants which pass your filters/inclusion thresholds. + + --glm ['zs'] ['omit-ref'] [{sex | no-x-sex}] ['log10'] ['pheno-ids'] + [{genotypic | hethom | dominant | recessive}] ['interaction'] + ['hide-covar'] ['intercept'] [{no-firth | firth-fallback | firth}] + ['cols='] ['local-covar='] ['local-pvar='] + ['local-psam='] ['local-omit-last' | 'local-cats='] + Basic association analysis on quantitative and/or case/control phenotypes. + For each variant, a linear (for quantitative traits) or logistic (for + case/control) regression is run with the phenotype as the dependent + variable, and nonmajor allele dosage(s) and a constant-1 column as + predictors. + * There is usually an additive effect line for every nonmajor allele, and + no such line for the major allele. To omit REF alleles instead of major + alleles, add the 'omit-ref' modifier. (When performing interaction + testing, this tends to cause the multicollinearity check to fail for + low-ref-frequency variants.) + * By default, sex (male = 1, female = 2; note that this is a change from + PLINK 1.x) is automatically added as a predictor for X chromosome + variants, and no others. The 'sex' modifier causes it to be added + everywhere (except chrY), while 'no-x-sex' excludes it entirely. + * The 'log10' modifier causes p-values to be reported in -log10(p) form. + * 'pheno-ids' causes the samples used in each set of regressions to be + written to an .id file. (When the samples differ on chrX or chrY, .x.id + and/or .y.id files are also written.) + * The 'genotypic' modifier adds an additive effect/dominance deviation 2df + joint test (0-2 and 0..1..0 coding), while 'hethom' uses 0..0..1 and + 0..1..0 coding instead. + * 'dominant' and 'recessive' specify a model assuming full dominance or + recessiveness, respectively, for the ref allele. I.e. the genotype + column is recoded as 0..1..1 or 0..0..1, respectively. + * 'interaction' adds genotype x covariate interactions to the model. Note + that this tends to produce 'NA' results (due to the multicollinearity + check) when the reference allele is 'wrong'; --maj-ref can be used to + enable analysis of those variants. + * Additional predictors can be added with --covar. By default, association + statistics are reported for all nonconstant predictors; 'hide-covar' + suppresses covariate-only results, while 'intercept' causes intercepts + to be reported. + * For logistic regression, when the phenotype [quasi-]separates the + genotype, an NA result is currently reported by default. To fall back on + Firth logistic regression instead when the basic logistic regression + fails to converge, add the 'firth-fallback' modifier (highly recommended, + will become the default when beta testing begins). To eliminate the + special case and use Firth logistic regression everywhere, add 'firth'. + 'no-firth' can be used to prevent Firth regression from being attempted + in a way that'll still work after alpha testing completes. + * To add covariates which are not constant across all variants, add the + 'local-covar=', 'local-pvar=', and 'local-psam=' modifiers, and use full + filenames for each. + Normally, the local-covar file should have c * n real-valued columns, + where the first c columns correspond to the first sample in the + local-psam file, columns (c+1) to 2c correspond to the second sample, + etc.; and the mth line corresponds to the mth nonheader line of the + local-pvar file. (Variants outside of the local-pvar file are excluded + from the regression.) The local covariates are assigned the names + LOCAL1, LOCAL2, etc.; to exclude the last local covariate from the + regression (necessary if they are e.g. local ancestry coefficients which + sum to 1), add 'local-omit-last'. + Alternatively, with 'local-cats=', the local-covar file is expected to + have n columns with integer-valued entries in [1, k]. These category + assignments are expanded into (k-1) local covariates in the usual manner. + The main report supports the following column sets: + chrom: Chromosome ID. + pos: Base-pair coordinate. + (ID is always present, and positioned here.) + ref: Reference allele. + alt1: Alternate allele 1. + alt: All alternate alleles, comma-separated. + (A1 is always present, and positioned here. For multiallelic variants, + this column may contain multiple comma-separated alleles when the result + doesn't depend on which allele is A1.) + ax: Non-A1 alleles, comma-separated. + a1count: A1 allele count (can be decimal with dosage data). + totallele: Allele observation count (can be higher than --freq value, due + to inclusion of het haploids and chrX model). + a1countcc: A1 count in cases, then controls (case/control only). + totallelecc: Case and control allele observation counts. + gcountcc: Genotype hardcall counts (neither-A1, het-A1, A1-A1) in cases, + then controls (case/control only). + a1freq: A1 allele frequency. + a1freqcc: A1 frequency in cases, then controls (case/control only). + machr2: Unphased MaCH imputation quality (frequently labeled 'INFO'). + firth: Reports whether Firth regression was used (firth-fallback only). + test: Test identifier. (Required unless only one test is run.) + nobs: Number of samples in the regression. + beta: Regression coefficient (for A1 if additive test). + orbeta: Odds ratio for case/control, beta for quantitative traits. + (Ignored if 'beta' column set included.) + se: Standard error of beta. + ci: Bounds of symmetric approximate confidence interval (requires --ci). + tz: T-statistic for linear regression, Wald Z-score for logistic/Firth. + p: Asymptotic p-value (or -log10(p)) for T/Z-statistic. + err: Error code for NA results. + The default is chrom,pos,ref,alt,firth,test,nobs,orbeta,se,ci,tz,p. + + --score [i] [j] [k] [{header | header-read}] + [{center | variance-standardize | dominant | recessive}] + ['no-mean-imputation'] ['se'] ['zs'] ['ignore-dup-ids'] + [{list-variants | list-variants-zs}] ['cols='] + Apply linear scoring system(s) to each sample. + The input file should have one line per scored variant. Variant IDs are + read from column #i and allele codes are read from column #j, where i + defaults to 1 and j defaults to i+1. For now, only one allele per + multiallelic variant may be assigned an explicit score; contact us if you + need this changed. + * By default, a single column of input coefficients is read from column #k, + where k defaults to j+1. (--score-col-nums can be used to specify + multiple columns.) + * 'header-read' causes the first line of the input file to be treated as a + header line containing score names. Otherwise, score(s) are assigned the + names 'SCORE1', 'SCORE2', etc.; and 'header' just causes the first line + to be entirely ignored. + * By default, copies of unnamed alleles contribute zero to score, while + missing genotypes contribute an amount proportional to the loaded (via + --read-freq) or imputed allele frequency. To throw out missing + observations instead (decreasing the denominator in the final average + when this happens), use the 'no-mean-imputation' modifier. + * You can use the 'center' modifier to shift all genotypes to mean zero, or + 'variance-standardize' to linearly transform the genotypes to mean-0, + variance-1. + * The 'dominant' modifier causes dosages greater than 1 to be treated as 1, + while 'recessive' uses max(dosage - 1, 0) on diploid chromosomes. + ('dominant', 'recessive', and 'variance-standardize' cannot be used with + chrX.) + * The 'se' modifier causes the input coefficients to be treated as + independent standard errors; in this case, standard errors for the score + average/sum are reported. (Note that this will systematically + underestimate standard errors when scored variants are in LD.) + * By default, --score errors out if a variant ID in the input file appears + multiple times in the main dataset. Use the 'ignore-dup-ids' modifier to + skip them instead (a warning is still printed if such variants are + present). + * The 'list-variants[-zs]' modifier causes variant IDs used for scoring to + be written to .sscore.vars[.zst]. + The main report supports the following column sets: + maybefid: FID, if that column was in the input. + fid: Force FID column to be written even when absent in the input. + (IID is always present, and positioned here.) + maybesid: SID, if that column was in the input. + sid: Force SID column to be written even when absent in the input. + pheno1: First active phenotype. + phenos: All active phenotypes, if any. + nmissallele: Number of nonmissing alleles. + denom: Denominator of score average (equal to nmissallele value when + 'no-mean-imputation' specified). + dosagesum: Sum of named allele dosages. + scoreavgs: Score averages. + scoresums: Score sums. + The default is maybefid,maybesid,phenos,nmissallele,dosagesum,scoreavgs. + For more sophisticated polygenic risk scoring, we recommend the PRSice-2 + software package (https://www.prsice.info/ ). + + --variant-score ['zs'] ['bin' | 'cols='] + (alias: --vscore) + Apply linear scoring system(s) to each variant. Each reported variant + score is the dot product of a sample-weight vector with the + total-ALT-dosage vector, with MAF-based mean imputation applied to missing + dosages. + Input file format: one line per sample, each starting with an ID and + followed by scoring weight(s); it can also have a header line with the + sample ID representation and the score name(s). + The usual .vscore text report supports the following column sets: + chrom: Chromosome ID. + pos: Base-pair coordinate. + (ID is always present, and positioned here.) + ref: Reference allele. + alt1: Alternate allele 1. + alt: All alternate alleles, comma-separated. + altfreq: ALT allele frequency used for mean-imputation. + nmiss: Number of missing (and thus mean-imputed) dosages. + nobs: Number of (nonmissing) sample observations. + (Variant scores are always present, and positioned here.) + Default is chrom,pos,ref,alt. + If binary output is requested instead, the main .vscore.bin matrix contains + double-precision floating-point values, column (score) ID(s) are saved to a + .vscore.cols, and variant IDs are saved to + .vscore.vars[.zst]. + + --adjust-file ['zs'] ['gc'] ['cols='] + ['log10'] ['input-log10'] ['test='] + Given a file with unfiltered association test results, report some basic + multiple-testing corrections, sorted in increasing-p-value order. + * 'gc' causes genomic-controlled p-values to be used in the formulas. + (This tends to be overly conservative. We note that LD Score regression + usually does a better job of calibrating lambda; see Lee JJ, McGue M, + Iacono WG, Chow CC (2018) The accuracy of LD Score regression as an + estimator of confounding and genetic correlations in genome-wide + association studies.) + * 'log10' causes negative base 10 logs of p-values to be reported, instead + of raw p-values. 'input-log10' specifies that the input file contains + -log10(p) values. + * If the input file contains multiple tests per variant which are + distinguished by a 'TEST' column (true for --linear/--logistic/--glm), + you must use 'test=' to select the test to process. + The following column sets are supported: + chrom: Chromosome ID. + pos: Base-pair coordinate. + (ID is always present, and positioned here.) + ref: Reference allele. + alt1: Alternate allele 1. + alt: All alternate alleles, comma-separated. + a1: Tested allele. (Omitted if missing from input file.) + unadj: Unadjusted p-value. + gc: Devlin & Roeder (1999) genomic control corrected p-value (additive + models only). + qq: P-value quantile. + bonf: Bonferroni correction. + holm: Holm-Bonferroni (1979) adjusted p-value. + sidakss: Sidak single-step adjusted p-value. + sidaksd: Sidak step-down adjusted p-value. + fdrbh: Benjamini & Hochberg (1995) step-up false discovery control. + fdrby: Benjamini & Yekutieli (2001) step-up false discovery control. + Default set is chrom,a1,unadj,gc,bonf,holm,sidakss,sidaksd,fdrbh,fdrby. + --genotyping-rate ['dosage'] + Report genotyping rate in log (this was automatic in PLINK 1.x). + + --pgen-info + Reports basic information about a .pgen file. + + --validate + Validates all variant records in a .pgen file. + + --zst-decompress <.zst file> [output filename] + (alias: --zd) + Decompress a Zstd-compressed file. If no output filename is specified, the + file is decompressed to standard output. + This cannot be used with any other flags, and does not cause a log file to + be generated. + + The following other flags are supported. + --script : Include command-line options from file. + --rerun [log] : Rerun commands in log (default 'plink2.log'). + --version : Display only version number before exiting. + --silent : Suppress regular output to console. (Error-output is + not suppressed.) + --double-id : Set both FIDs and IIDs to the VCF/.bgen sample ID. + --const-fid [ID] : Set all FIDs to the given constant. If '0' (the + default), no FID column is created. + --id-delim [d] : Normally parses single-delimiter sample IDs as + , and double-delimiter IDs as + ; default delimiter is '_'. + --id-delim can no longer be used with + --double-id/--const-fid; it will error out if any ID + lacks the delimiter. + --idspace-to : Convert spaces in VCF/.bgen sample IDs to the given + character. + --iid-sid : Make --id-delim and --sample-diff interpret two-token + sample IDs as IID-SID instead of FID-IID. + --vcf-require-gt : Skip variants with no GT field. + --vcf-min-gq : No-call genotypes when GQ is present and below the + threshold. + --vcf-max-dp : No-call genotypes when DP is present and above/below + --vcf-min-dp the threshold. + --vcf-half-call : Specify how '0/.' and similar VCF GT values should be + handled. The following four modes are supported: + * 'error'/'e' (default) errors out and reports line #. + * 'haploid'/'h' treats them as haploid calls. + * 'missing'/'m' treats them as missing. + * 'reference'/'r' treats the missing value as 0. + --oxford-single-chr : Specify single-chromosome .gen/.bgen file + with no useful chromosome info inside. + --missing-code [string list] : Comma-delimited list of missing phenotype + (alias: --missing_code) values for Oxford-format import (default + 'NA'). + --hard-call-threshold : When importing dosage data, a hardcall is + normally saved when the distance from the + nearest hardcall, defined as + 0.5 * sum_i |x_i - round(x_i)| + (where the x_i's are 0..2 allele dosages), + is not greater than 0.1. You can adjust + this threshold by providing a numeric + parameter to --hard-call-threshold. + You can also use this with --make-[b]pgen + to alter the saved hardcalls while leaving + the dosages untouched, or --make-bed to + tweak hardcall export. + --dosage-erase-threshold : --hard-call-threshold normally preserves + the original dosages, and several PLINK 2 + commands use them when they're available. + Use --dosage-erase-threshold to make PLINK + 2 erase dosages and keep only hardcalls + when distance-from-hardcall <= the given + level. + --import-dosage-certainty : The PLINK 2 file format currently supports + a single dosage for each allele. Some + other dosage file formats include a + separate probability for every possible + genotype, e.g. {P(0/0)=0.2, P(0/1)=0.52, + P(1/1)=0.28}, a highly uncertain call that + is nevertheless treated as a hardcall under + '--hard-call-threshold 0.1'. To make PLINK + 2 treat a dosage as missing whenever the + largest probability is less than a + threshold, use --import-dosage-certainty. + --input-missing-genotype : '.' is always interpreted as a missing + genotype code in input files. By default, '0' + also is; you can change this second missing + code with --input-missing-genotype. + --allow-extra-chr : Permit unrecognized chromosome codes (alias --aec). + --chr-set ['no-x'] ['no-y'] ['no-xy'] ['no-mt'] : + Specify a nonhuman chromosome set. The first parameter sets the number of + diploid autosome pairs if positive, or haploid chromosomes if negative. + Given diploid autosomes, the remaining modifiers indicate the absence of + the named non-autosomal chromosomes. + --cow/--dog/--horse/--mouse/--rice/--sheep : Shortcuts for those species. + --autosome-num : Alias for '--chr-set no-y no-xy no-mt'. + --human : Explicitly specify human chromosome set, and make + output .pvar/VCF files include a ##chrSet header + line. (.pvar/VCF output files automatically + include ##chrSet when a nonhuman set is specified.) + --chr-override ['file'] : By default, if --chr-set/--autosome-num/--cow/etc. + conflicts with an input file ##chrSet header line, + PLINK 2 will error out. --chr-override with no + parameter causes the command line to take + precedence; '--chr-override file' defers to the + file. + --var-min-qual : Skip variants with low/missing QUAL. + --var-filter [exception(s)...] : Skip variants which have FILTER failures. + --extract-if-info : Exclude variants which don't/do satisfy + --exclude-if-info a comparison predicate on an INFO key, + (aliases: --extract-if, e.g. + --exclude-if) --extract-if-info "VT == SNP" + Unless the operator is !=, the predicate + always evaluates to false when the key + is missing. + --require-info : Exclude variants based on nonexistence + --require-no-info or existence of an INFO key. "=." + is treated as nonexistence. + --extract-col-cond [valcol] [IDcol] [skip] : + --extract-col-cond-match <(sub)string(s)...> + --extract-col-cond-mismatch <(sub)string(s)...> + --extract-col-cond-substr + --extract-col-cond-min + --extract-col-cond-max : + Exclude all variants without a value-column entry satisfying a condition. + * By default, values are read from column 2 of the file, and variant IDs + are read from column 1. + * Three types of conditions are supported: + * When --extract-col-cond-match is specified without + --extract-col-cond-substr, the value is checked for equality with the + given strings, and kept iff one of them matches. Similarly, + --extract-col-cond-mismatch without --extract-col-cond-substr causes + the variant to be kept iff the value matches none of the given strings. + * When --extract-col-cond-match and/or -mismatch are specified with + --extract-col-cond-substr, the variant is kept iff none of the + --extract-col-cond-mismatch substrings are contained in the value, and + either --extract-col-cond-match was unspecified or at least one of its + substrings is contained. + * Otherwise, the value is interpreted as a number, and the variant is + kept if the number is in [, ] (default min=0, max=DBL_MAX). + --pheno ['iid-only'] : Specify additional phenotype/covariate file. + Comma-delimited files with a header line are now + permitted. + --pheno-name : Only load the designated phenotype(s) from the + --pheno (if one was specified) or .psam (if no + --pheno) file. Separate multiple names with + spaces or commas, and use dashes to designate + ranges. + --pheno-col-nums <#...> : Only load the phenotype(s) in the designated + column number(s) from the --pheno file. + --no-psam-pheno : Ignore phenotype(s) in .psam/.fam file. + --strict-sid0 : By default, if there is no SID column in the .psam/.fam + (or --update-ids) file, but there is one in another + input file (for e.g. --keep/--remove), the latter SID + column is ignored; sample IDs are considered matching as + long as FID and IID are equal (with missing FID treated + as '0'). If you also want to require SID = '0' for a + sample ID match in this situation, add --strict-sid0. + --input-missing-phenotype : Set nonzero number to treat as a missing + pheno/covar in input files (default -9). + --no-input-missing-phenotype : Don't treat any nonzero number as a missing + pheno/covar. ('NA'/'nan' are still treated + as missing.) + --1 : Expect case/control phenotypes in input files + to be coded as 0 = control, 1 = case, instead + of the usual 0 = missing, 1 = ctrl, 2 = case. + (Unlike PLINK 1.x, this does not force all + phenotypes to be interpreted as case/ctrl.) + --missing-catname : Set missing-categorical-phenotype string + (case-sensitive, default 'NONE'). + --covar ['iid-only'] : Specify additional covariate file. + Comma-delimited files with a header line are now + permitted. + --covar-name : Only load the designated covariate(s) from the + --covar (if one was specified), --pheno (if no + --covar), or .psam (if no --covar or --pheno) + file. + --covar-col-nums <#...> : Only load the covariate(s) in the designated + column number(s) from the --covar (if one was + specified) or --pheno (if no --covar) file. + --within [new pheno name] : Import a PLINK 1.x categorical phenotype. + (Phenotype name defaults to 'CATPHENO'.) + * If any numeric values are present, ALL + values must be numeric. In that case, 'C' + is added in front of all category names. + * 'NA' is treated as a missing value. + --mwithin : Load --within categories from column n+2. + --family [new pheno name] : Create a categorical phenotype from FID. + Restrictions on and handling of numeric + values are the same as for --within. + --family-missing-catname : Make --family treat the specified FID as + missing. + --keep : Exclude all samples not named in a file. + --remove : Exclude all samples named in a file. + --keep-fam : Exclude all families not named in a file. + --remove-fam : Exclude all families named in a file. + --extract [{bed0 | bed1}] : Usually excludes all variants (not) named + --exclude [{bed0 | bed1}] in the given file(s). When multiple files + are named, they are concatenated. + With the 'bed0' or 'bed1' modifier, + variants outside/inside the positional + ranges in the interval-BED file(s) are + excluded instead. 'bed0' tells PLINK 2 to + assume the interval bounds follow the UCSC + 0-based half-open convention, while 'bed1' + (equivalent to PLINK 1.9 'range') + specifies 1-based fully-closed. + --extract-intersect [{bed0 | bed1}] : Just like --extract, except that + a variant must be in the + intersection, rather than just + the union, of the files to + remain. + --keep-cats : These can be used individually or in combination + --keep-cat-names to define a list of categories to keep; all + samples not in one of the named categories are + excluded. Use spaces to separate category names + for --keep-cat-names. Use the --missing-catname + value (default 'NONE') to refer to the group of + uncategorized samples. + --keep-cat-pheno : If more than one categorical phenotype is loaded, + or you wish to filter on a categorical covariate, + --keep-cat-pheno must be used to specify which + phenotype/covariate --keep-cats and + --keep-cat-names apply to. + --remove-cats : Exclude all categories named in the file. + --remove-cat-names <...> : Exclude named categories. + --remove-cat-pheno : Specify pheno for --remove-cats/remove-cat-names. + --split-cat-pheno [{omit-most | omit-last}] ['covar-01'] + [cat. pheno/covar name(s)...] : + Split n-category phenotype(s) into n (or n-1, with 'omit-most'/'omit-last') + binary phenotypes, with names of the form =. + (As a consequence, affected phenotypes and categories are not permitted to + contain the '=' character.) + * This happens after all sample filters. + * If no phenotype or covariate names are provided, all categorical + phenotypes (but not covariates) are processed. + * By default, generated covariates are coded as 1=false, 2=true. To code + them as 0=false, 1=true instead, add the 'covar-01' modifier. + --loop-cats : Run variant filters and subsequent operations + on just the samples in the first category; then + just the samples in the second category; and so + on, for all categories in the named categorical + phenotype. + --no-id-header ['iid-only'] : Don't include a header line in .id output + files. This normally forces two-column FID/IID + output; add 'iid-only' to force just + single-column IID. + --variance-standardize [pheno/covar name(s)...] + --covar-variance-standardize [covar name(s)...] : + Linearly transform named covariates (and quantitative phenotypes, if + --variance-standardize) to mean-zero, variance 1. If no parameters are + provided, all possible phenotypes/covariates are affected. + This is frequently necessary to prevent multicollinearity when dealing with + covariates where abs(mean) is much larger than abs(standard deviation), + such as year of birth. + --quantile-normalize [...] : Force named covariates and quantitative + --pheno-quantile-normalize [...] phenotypes to a N(0,1) distribution, + --covar-quantile-normalize [...] preserving only the original rank orders. + --chr : Exclude all variants not on the given chromosome(s). + Valid choices for humans are 0 (unplaced), 1-22, X, Y, + XY, MT, PAR1, and PAR2. Separate multiple chromosomes + with spaces and/or commas, and use a dash (no adjacent + spaces permitted) to denote a range, e.g. + '--chr 1-4, 22, par1, x, par2'. + --not-chr <...> : Reverse of --chr (exclude variants on listed + chromosomes). + --autosome : Exclude all non-autosomal variants. + --autosome-par : Exclude all non-autosomal variants, except those in a + pseudo-autosomal region. + --snps-only ['just-acgt'] : Exclude non-SNP variants. By default, SNP = all + allele codes are single-character (so + multiallelic variants with a mix of SNPs and + non-SNPs are excluded; split your variants first + if that's a problem). + The 'just-acgt' modifier restricts SNP codes to + {A,C,G,T,a,c,g,t,}. + --from : Use ID(s) to specify a variant range to load. When used + --to together, both variants must be on the same chromosome. + (--snps can be used to specify intervals which cross + chromosome boundaries.) + --snp : Specify a single variant to load. + --exclude-snp : Specify a single variant to exclude. + --window : With --snp/--exclude-snp, loads/excludes all variants + within half the specified kb distance of the named one. + --from-bp : Use base-pair coordinates to define a variant range to + --to-bp load. + --from-kb * You must use these with --chr, specifying a single + --to-kb chromosome. + --from-mb * Decimals and negative numbers are permitted. + --to-mb * The --to-bp(/-kb/-mb) position is no longer permitted + to be smaller than the --from-bp position. + --snps : Use IDs to specify variant range(s) to load or + --exclude-snps <...> exclude. E.g. '--snps rs1111-rs2222, rs3333, rs4444'. + --force-intersect : PLINK 2 normally errors out when multiple variant + inclusion filters (--extract, --extract-col-cond, + --extract-intersect, --from/--to, --from-bp/--to-bp, + --snp, --snps) are specified. --force-intersect + allows the run to proceed; the set intersection will + be taken. + --thin

: Randomly remove variants, retaining each with prob. p. + --thin-count : Randomly remove variants until n of them remain. + --bp-space : Remove variants so that each pair is no closer than + the given bp distance. + --thin-indiv

: Randomly remove samples, retaining with prob. p. + --thin-indiv-count : Randomly remove samples until n of them remain. + --keep-col-match : Exclude all samples without a 3rd column + entry in the given file exactly matching + one of the given strings. (Separate + multiple strings with spaces.) + --keep-col-match-name : Check column with given name instead. + --keep-col-match-num : Check nth column instead. + --geno [val] [{dosage | hh-missing}] + --mind [val] [{dosage | hh-missing}] : + Exclude variants (--geno) and/or samples (--mind) with missing call + frequencies greater than a threshold (default 0.1). (Note that the default + threshold is only applied if --geno/--mind is invoked without a parameter; + when --geno/--mind is not invoked, no missing call frequency ceiling is + enforced at all. Other inclusion/exclusion default thresholds work the + same way.) + By default, when a dosage is present but a hardcall is not, the genotype is + treated as missing; add the 'dosage' modifier to treat this case as + nonmissing. Alternatively, you can use 'hh-missing' to also treat + heterozygous haploid calls as missing. + --require-pheno [name(s)...] : Remove samples missing any of the named + --require-covar [name(s)...] phenotype(s)/covariate(s). If no parameters + are provided, all phenotype(s)/covariate(s) + must be present. + --maf [freq] [mode] : Exclude variants with allele frequency lower than a + (alias: --min-af) threshold (default 0.01). By default, the nonmajor + allele frequency is used; the other supported modes + are 'nref' (non-reference), 'alt1', and 'minor' + (least frequent). bcftools freq:mode notation is + permitted. + --max-maf [mode] : Exclude variants with MAF greater than the + (alias: --max-af) threshold. + --mac [mode] : Exclude variants with allele dosage lower than the + (alias: --min-ac) given threshold. + --max-mac [mode] : Exclude variants with allele dosage greater than + (alias: --max-ac) the given threshold. + --maf-succ : Rule of succession allele frequency estimation (used in + EIGENSOFT). Given j observations of one allele and k + observations of the other for a biallelic variant, infer + allele frequencies of (j+1) / (j+k+2) and + (k+1) / (j+k+2), rather than the default j / (j+k) and + k / (j+k). + Note that this does not affect --freq's output. + --min-alleles : Exclude variants with fewer than the given # of alleles. + (When a variant has exactly one ALT allele, and it's + a missing-code, it's excluded by "--min-alleles 2".) + --max-alleles : Exclude variants with more than the given # of alleles. + --read-freq : Load allele frequency estimates from the given --freq or + --geno-counts (or PLINK 1.9 --freqx) report, instead of + imputing them from the immediate dataset. + --hwe

['midp'] ['keep-fewhet'] : + Exclude variants with Hardy-Weinberg equilibrium exact test p-values below + a threshold. + * By default, only founders are considered. + * chrX p-values are now computed using Graffelman and Weir's method. + * For variants with k alleles with k>2, k separate 'biallelic' tests are + performed, and the variant is filtered out if any of them fail. + * With 'keep-fewhet', variants which fail the test in the too-few-hets + direction are not excluded. On chrX, this uses the ratio between the + Graffelman/Weir p-value and the female-only p-value. + * There is currently no special handling of case/control phenotypes. + --mach-r2-filter [min] [max] : Exclude variants with MaCH imputation quality + metric less than min or greater than max + (defaults 0.1 and 2.0). (Monomorphic + variants, with r2 = nan, are not excluded.) + * This is NOT identical to the R2 metric + reported by Minimac3 0.1.13+; see below. + * If a single parameter is provided, it is + treated as the minimum. + * The metric is not computed on chrX and MT. + --minimac3-r2-filter [max] : Compute Minimac3 R2 values from scratch, + and exclude variants with R2 less than min + or (if max is provided) greater than max. + * Note that this requires phased-dosage + data for all samples and variants; + otherwise this will systematically + underestimate imputation quality, since + unphased hardcalls/dosages are treated + as if they were maximally uncertain. + (Use --extract-if-info/--exclude-if-info + to filter on precomputed Minimac3 R2 in + a VCF/.pvar INFO column.) + --keep-females : Exclude male and unknown-sex samples. + --keep-males : Exclude female and unknown-sex samples. + --keep-nosex : Exclude all known-sex samples. + --remove-females : Exclude female samples. + --remove-males : Exclude male samples. + --remove-nosex : Exclude unknown-sex samples. + --keep-founders : Exclude nonfounder samples. + --keep-nonfounders : Exclude founder samples. + --keep-if : Exclude samples which don't/do satisfy a + --remove-if comparison predicate, e.g. + --keep-if "PHENO1 == case" + Unless the operator is !=, the predicate + always evaluates to false when the + phenotype/covariate is missing. + --nonfounders : Include nonfounders in allele freq/HWE calculations. + --bad-freqs : When PLINK 2 needs decent allele frequencies, it + normally errors out if they aren't provided by + --read-freq and less than 50 founders are available to + impute them from. Use --bad-freqs to force PLINK 2 to + proceed in this case. + --export-allele : With --export A/A-transpose/AD, count alleles named + in the file, instead of REF alleles. + --output-chr : Set chromosome coding scheme in output files by + providing the desired human mitochondrial code. + Options are '26', 'M', 'MT', '0M', 'chr26', 'chrM', + and 'chrMT'; default is now 'MT' (note that this is + a change from PLINK 1.x, which defaulted to '26'). + --output-missing-genotype : Set the code used to represent missing + genotypes in output files (default '.'). + --output-missing-phenotype : Set the string used to represent missing + phenotypes in output files (default 'NA'). + --sort-vars [mode] : Sort variants by chromosome, then position, then + ID. The following string orders are supported: + * 'natural'/'n': Natural sort (default). + * 'ascii'/'a': ASCII. + This must be used with --make-[b]pgen/--make-bed. + --set-hh-missing ['keep-dosage'] : Make --make-[b]pgen/--make-bed set non-MT + heterozygous haploid hardcalls, and all + female chrY calls, to missing. (Unlike + PLINK 1.x, this treats unknown-sex chrY + genotypes like males, not females.) + By default, all associated dosages are + also erased; use 'keep-dosage' to keep + them all. + --set-mixed-mt-missing ['keep-dosage'] : Make --make-[b]pgen/--make-bed set + mixed MT hardcalls to missing. + --split-par : Changes chromosome code of all X chromosome + --split-par variants with bp position <= bp1 to PAR1, and those + with position >= bp2 to PAR2. The following build + codes are supported as shorthand: + * 'b36'/'hg18' = NCBI 36, 2709521/154584237 + * 'b37'/'hg19' = GRCh37, 2699520/154931044 + * 'b38'/'hg38' = GRCh38, 2781479/155701383 + --merge-par : Merge PAR1/PAR2 back with X. Requires PAR1 to be + positioned immediately before X, and PAR2 to be + immediately after X. (Should *not* be used with + "--export vcf", since it causes male + homozygous/missing calls in PAR1/PAR2 to be + reported as haploid.) + --merge-x : Merge XY back with X. This usually has to be + combined with --sort-vars. + --set-missing-var-ids : Given a template string with a '@' where the + --set-all-var-ids chromosome code should go and '#' where the bp + coordinate belongs, --set-missing-var-ids + assigns chromosome-and-bp-based IDs to unnamed + variants, while --set-all-var-ids resets all + IDs. + You may also use '$r'/'$a' to refer to the + ref and alt1 alleles, or '$1'/'$2' to refer to + them in alphabetical order. + --var-id-multi : Specify alternative templates for multiallelic + --var-id-multi-nonsnp variants. ('$a' and '$1'/'$2' should be avoided + here, though they're technically still allowed.) + --new-id-max-allele-len [{error | missing | truncate}] : + Specify maximum number of leading characters from allele codes to include + in new variant IDs, and behavior on longer codes (defaults 23, error). + --missing-var-code : Change unnamed variant code for --rm-dup, + --set-{missing|all}-var-ids, and + --recover-var-ids (default '.'). + --update-map [bpcol] [IDcol] [skip] : Update variant bp positions. + --update-name [newcol] [oldcol] [skip] : Update variant IDs. + --recover-var-ids ['strict-bim-order'] [{rigid | force}] ['partial'] : + Undo --set-all-var-ids, given the original .pvar/VCF/.bim file. Original + IDs are looked up by position and allele codes. + * By default, if the original-ID file is a .bim, allele order is ignored. + Use 'strict-bim-order' to force A1=ALT, A2=REF. + * If any variant has multiple matching records in the original-ID file, and + the IDs conflict, --recover-var-ids writes the affected (current) ID(s) + to .recoverid.dup, and normally errors out. If the + original-ID file has the same number of variants in the same order, you + can still recover the old IDs with the 'rigid' modifier in this case. + Alternatively, to proceed and assign the missing-ID code to these + variants, add the 'force' modifier. (The .recoverid.dup file is still + written when 'rigid' or 'force' is specified.) + * --recover-var-ids normally expects to replace all variant IDs, and errors + out if any are left untouched. Add the 'partial' modifier when you + actually want to update just a proper subset. + --update-alleles : Update variant allele codes. + --update-ids : Update sample IDs. + --update-parents : Update parental IDs. + --update-sex ['col-num='] ['male0'] : + Update sex information. + * By default, if there is a header line starting with '#FID'/'#IID', sex is + loaded from the first column titled 'SEX' (any capitalization); + otherwise, column 3 is assumed. Use 'col-num=' to force a column number. + * Only the first character in the sex column is processed. By default, + '1'/'M'/'m' is interpreted as male, '2'/'F'/'f' is interpreted as female, + and '0'/'N' is interpreted as unknown-sex. To change this to '0'/'M'/'m' + = male, '1'/'F'/'f' = female, anything else other than '2' = unknown-sex, + add 'male0'. + --real-ref-alleles : Treat A2 alleles in a PLINK 1.x fileset as actual REF + alleles; otherwise they're marked as provisional. + --maj-ref ['force'] : Set major alleles to reference, like PLINK 1.x + automatically did. (Note that this is now opt-in + rather than opt-out; --keep-allele-order is no longer + necessary to prevent allele-swapping.) + * This can only be used in runs with + --make-bed/--make-[b]pgen/--export and no other + commands. + * By default, this only affects variants marked as + having 'provisional' reference alleles. Add 'force' + to apply this to all variants. + * All new reference alleles are marked as provisional. + --ref-allele ['force'] [refcol] [IDcol] [skip] + --alt1-allele ['force'] [alt1col] [IDcol] [skip] : + These set the alleles specified in the file to ref (--ref-allele) or alt1 + (--alt1-allele). They can be combined in the same run. + * These can only be used in runs with --make-bed/--make-[b]pgen/--export + and no other commands. + * "--ref-allele 4 3 '#'", which scrapes reference allele + assignments from a VCF file, is especially useful. + * By default, these error out when asked to change a 'known' reference + allele. Add 'force' to permit that (when e.g. switching to a new + reference genome). + * When --alt1-allele changes the previous ref allele to alt1, the previous + alt1 allele is set to reference and marked as provisional. + --ref-from-fa ['force'] : This sets reference alleles from the --fa file when + it can be done unambiguously (note that it's never + possible for deletions or some insertions). + By default, it errors out when asked to change a + 'known' reference allele; add the 'force' modifier + to permit that. + --normalize ['list'] : Left-normalize all variants, using the --fa file. + (alias: --norm) (Assumes no differences in capitalization.) The + 'list' modifier causes a list of affected variant + IDs to be written to .normalized. + --indiv-sort [f] : Specify sample ID sort order for merge and + --make-[b]pgen/--make-bed. The following four + modes are supported: + * 'none'/'0' keeps samples in the order they were + loaded. Default for non-merge. + * 'natural'/'n' invokes "natural sort", e.g. + 'id2' < 'ID3' < 'id10'. Default when merging. + * 'ascii'/'a' sorts in ASCII order, e.g. + 'ID3' < 'id10' < 'id2'. + * 'file'/'f' uses the order in the given file + (named in the last parameter). + --king-table-filter : Specify minimum kinship coefficient for + inclusion in --make-king-table report. + --king-table-subset [kmin] : Restrict current --make-king-table run to + sample pairs listed in the given .kin0 file. + If a second parameter is provided, only + sample pairs with kinship >= that threshold + (in the input .kin0) are processed. + --condition [{dominant | recessive}] ['multiallelic'] + --condition-list [{dominant | recessive}] ['multiallelic'] : + Add the given variant, or all variants in the given file, as --glm + covariates. + By default, this errors out if any of the variants are multiallelic; add + the 'multiallelic' ('m' for short) modifier to allow them. They'll + effectively be split against the major allele (unless --glm's 'omit-ref' + modifier was specified), and all induced covariate names--even for + biallelic variants--will have an underscore followed by the allele code at + the end. + --parameters <...> : Include only the given covariates/interactions in the + --glm model, identified by a list of 1-based indices + and/or ranges of them. + --tests <...> : Perform a (joint) test on the specified term(s) in the + --tests all --glm model, identified by 1-based indices and/or ranges + of them. + * Note that, when --parameters is also present, the + indices refer to the terms remaining AFTER pruning by + --parameters. + * You can use '--tests all' to include all terms. + --vif : Set VIF threshold for --glm multicollinearity check + (default 50). (This is no longer skipped for + case/control phenotypes.) + --max-corr : Skip --glm regression when the absolute value of the + correlation between two predictors exceeds this value + (default 0.999). + --xchr-model : Set the chrX --glm/--condition[-list]/--[v]score model. + * '0' = skip chrX. + * '1' = add sex as a covar on chrX, code males 0..1. + * '2' (default) = chrX sex covar, code males 0..2. + (Use the --glm 'interaction' modifier to test for + interaction between genotype and sex.) + --adjust ['zs'] ['gc'] ['log10'] ['cols='] : + For each association test in this run, report some basic multiple-testing + corrections, sorted in increasing-p-value order. Modifiers work the same + way as they do on --adjust-file. + --lambda : Set genomic control lambda for --adjust[-file]. + --adjust-chr-field : Set --adjust-file input field names. When + --adjust-pos-field multiple parameters are given to these flags, + --adjust-id-field earlier names take precedence over later ones. + --adjust-ref-field + --adjust-alt-field + --adjust-a1-field + --adjust-test-field + --adjust-p-field + --ci : Report confidence ratios for odds ratios/betas. + --pfilter : Filter out assoc. test results with higher p-values. + --score-col-nums <...> : Process all the specified coefficient columns in the + --score file, identified by 1-based indexes and/or + ranges of them. + --q-score-range [i] [j] ['header'] ['min'] : + Apply --score to subset(s) of variants in the primary score list(s) based + on e.g. p-value ranges. + * The first file should have range labels in the first column, p-value + lower bounds in the second column, and upper bounds in the third column. + Lines with too few entries, or nonnumeric values in the second or third + column, are ignored. + * The second file should contain a variant ID and a p-value on each line + (except possibly the first). Variant IDs are read from column #i and + p-values are read from column #j, where i defaults to 1 and j defaults to + i+1. The 'header' modifier causes the first nonempty line of this file + to be skipped. + * By default, --q-score-range errors out when a variant ID appears multiple + times in the data file (and is also present in the main dataset). To use + the minimum p-value in this case instead, add the 'min' modifier. + --vscore-col-nums <...> : Process all the specified coefficient columns in + the --variant-score file, identified by 1-based + indexes and/or ranges of them. + --parallel : Divide the output matrix into n pieces, and only compute + the kth piece. The primary output file will have the + piece number included in its name, e.g. plink2.king.13 + or plink2.king.13.zst if k is 13. Concatenating these + files in order will yield the full matrix of interest. + (Yes, this can be done before decompression.) + N.B. This generally cannot be used to directly write a + symmetric square matrix. Choose square0 or triangle + shape instead, and postprocess as necessary. + --memory ['require'] : Set size, in MiB, of initial workspace malloc + attempt. To error out instead of reducing the + request size when the initial attempt fails, add + the 'require' modifier. + --threads : Set maximum number of compute threads. + --d : Change variant/covariate range delimiter (normally '-'). + --seed : Set random number seed(s). Each value must be an + integer between 0 and 4294967295 inclusive. + Note that --threads and "--memory require" may also be + needed to reproduce some randomized runs. + --output-min-p

: Specify minimum p-value to write to reports. (2.23e-308 + is useful for preventing underflow in some programs.) + --debug : Use slower, more crash-resistant logging method. + --randmem : Randomize initial workspace memory (helps catch + uninitialized-memory bugs). + --warning-errcode : Return a nonzero error code to the OS when a run + completes with warning(s). + --zst-level : Set the Zstd compression level (1-22, default 3). + + Primary methods paper: + Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ (2015) + Second-generation PLINK: rising to the challenge of larger and richer datasets. + GigaScience, 4. + + + + ]]> + + 10.1186/s13742-015-0047-8 + @ARTICLE{Blankenberg20-plink, + author = {Daniel Blankenberg Lab, et al}, + title = {In preparation..}, + } + +