# HG changeset patch
# User blankenberg
# Date 1574199342 0
# Node ID 4c3690a9d729b90c9e07e90244c968f42fee86bf
# Parent ed946e88849459c337fd4e2d1b0fc89a3de80bf7
Fix help text formatting
diff -r ed946e888494 -r 4c3690a9d729 plink.xml
--- a/plink.xml Wed Oct 09 10:01:45 2019 -0400
+++ b/plink.xml Tue Nov 19 21:35:42 2019 +0000
@@ -10411,1442 +10411,1446 @@
denote an optional modifier (or if '|' is present, a set
- of mutually exclusive optional modifiers). Use the EXACT text in the
- definition, e.g. '--dummy acgt'.
- * There's one exception to the angle brackets/exact text rule: when an angle
- bracket term ends with '=[value]', '[value]' designates a variable
- parameter.
- * {curly braces} denote an optional parameter, where the text between the
- braces describes its nature.
- * An ellipsis (...) indicates that you may enter multiple parameters of the
- specified type.
-
- plink [input flag(s)...] {command flag(s)...} {other flag(s)...}
- plink --help {flag name(s)...}
-
-Most PLINK runs require exactly one main input fileset. The following flags
-are available for defining its form and location:
-
- --bfile {prefix} : Specify .bed + .bim + .fam prefix (default 'plink').
- --bed [filename] : Specify full name of .bed file.
- --bim [filename] : Specify full name of .bim file.
- --fam [filename] : Specify full name of .fam file.
-
- --keep-autoconv : With --file/--tfile/--lfile/--vcf/--bcf/--data/--23file,
- don't delete autogenerated binary fileset at end of run.
-
- --file {prefix} : Specify .ped + .map filename prefix (default 'plink').
- --ped [filename] : Specify full name of .ped file.
- --map [filename] : Specify full name of .map file.
-
- --no-fid : .fam/.ped file does not contain column 1 (family ID).
- --no-parents : .fam/.ped file does not contain columns 3-4 (parents).
- --no-sex : .fam/.ped file does not contain column 5 (sex).
- --no-pheno : .fam/.ped file does not contain column 6 (phenotype).
-
- --tfile {prefix} : Specify .tped + .tfam filename prefix (default 'plink').
- --tped [fname] : Specify full name of .tped file.
- --tfam [fname] : Specify full name of .tfam file.
-
- --lfile {prefix} : Specify .lgen + .map + .fam (long-format fileset) prefix.
- --lgen [fname] : Specify full name of .lgen file.
- --reference [fn] : Specify default allele file accompanying .lgen input.
- --allele-count : When used with --lfile/--lgen + --reference, specifies
- that the .lgen file contains reference allele counts.
-
- --vcf [filename] : Specify full name of .vcf or .vcf.gz file.
- --bcf [filename] : Specify full name of BCF2 file.
-
- --data {prefix} : Specify Oxford .gen + .sample prefix (default 'plink').
- --gen [filename] : Specify full name of .gen or .gen.gz file.
- --bgen [f] : Specify full name of .bgen file.
- --sample [fname] : Specify full name of .sample file.
-
- --23file [fname] {FID} {IID} {sex} {pheno} {pat. ID} {mat. ID} :
- Specify 23andMe input file.
-
- --grm-gz {prfx} : Specify .grm.gz + .grm.id (GCTA rel. matrix) prefix.
- --grm-bin {prfx} : Specify .grm.bin + .grm.N.bin + .grm.id (GCTA triangular
- binary relationship matrix) filename prefix.
-
- --dummy [sample ct] [SNP ct] {missing geno freq} {missing pheno freq}
-
- This generates a fake input dataset with the specified number of samples
- and SNPs. By default, the missing genotype and phenotype frequencies are
- zero, and genotypes are As and Bs (change the latter with
- 'acgt'/'1234'/'12'). The 'scalar-pheno' modifier causes a normally
- distributed scalar phenotype to be generated instead of a binary one.
-
- --simulate [simulation parameter file]
- --simulate-qt [simulation parameter file]
- --simulate generates a fake input dataset with disease-associated SNPs,
- while --simulate-qt generates a dataset with quantitative trait loci.
-
-Output files have names of the form 'plink.{extension}' by default. You can
-change the 'plink' prefix with
-
- --out [prefix] : Specify prefix for output files.
-
-Most runs also require at least one of the following commands:
-
- --make-bed
- Create a new binary fileset. Unlike the automatic text-to-binary
- converters (which only heed chromosome filters), this supports all of
- PLINK's filtering flags.
- --make-just-bim
- --make-just-fam
- Variants of --make-bed which only write a new .bim or .fam file. Can be
- used with only .bim/.fam input.
- USE THESE CAUTIOUSLY. It is very easy to desynchronize your binary
- genotype data and your .bim/.fam indexes if you use these commands
- improperly. If you have any doubt, stick with --make-bed.
-
- --recode [output format] <01 | 12>
-
- Create a new text fileset with all filters applied. The following output
- formats are supported:
- * '23': 23andMe 4-column format. This can only be used on a single
- sample's data (--keep may be handy), and does not support multicharacter
- allele codes.
- * 'A': Sample-major additive (0/1/2) coding, suitable for loading from R.
- If you need uncounted alleles to be named in the header line, add the
- 'include-alt' modifier.
- * 'AD': Sample-major additive (0/1/2) + dominant (het=1/hom=0) coding.
- Also supports 'include-alt'.
- * 'A-transpose': Variant-major 0/1/2.
- * 'beagle': Unphased per-autosome .dat and .map files, readable by early
- BEAGLE versions.
- * 'beagle-nomap': Single .beagle.dat file.
- * 'bimbam': Regular BIMBAM format.
- * 'bimbam-1chr': BIMBAM format, with a two-column .pos.txt file. Does not
- support multiple chromosomes.
- * 'fastphase': Per-chromosome fastPHASE files, with
- .chr-[chr #].recode.phase.inp filename extensions.
- * 'fastphase-1chr': Single .recode.phase.inp file. Does not support
- multiple chromosomes.
- * 'HV': Per-chromosome Haploview files, with .chr-[chr #][.ped + .info]
- filename extensions.
- * 'HV-1chr': Single Haploview .ped + .info file pair. Does not support
- multiple chromosomes.
- * 'lgen': PLINK 1 long-format (.lgen + .fam + .map), loadable with --lfile.
- * 'lgen-ref': .lgen + .fam + .map + .ref, loadable with --lfile +
- --reference.
- * 'list': Single genotype-based list, up to 4 lines per variant. To omit
- nonmale genotypes on the Y chromosome, add the 'omit-nonmale-y' modifier.
- * 'rlist': .rlist + .fam + .map fileset, where the .rlist file is a
- genotype-based list which omits the most common genotype for each
- variant. Also supports 'omit-nonmale-y'.
- * 'oxford': Oxford-format .gen + .sample. With the 'gen-gz' modifier, the
- .gen file is gzipped.
- * 'ped': PLINK 1 sample-major (.ped + .map), loadable with --file.
- * 'compound-genotypes': Same as 'ped', except that the space between each
- pair of same-variant allele codes is removed.
- * 'structure': Structure-format.
- * 'transpose': PLINK 1 variant-major (.tped + .tfam), loadable with
- --tfile.
- * 'vcf', 'vcf-fid', 'vcf-iid': VCFv4.2. 'vcf-fid' and 'vcf-iid' cause
- family IDs or within-family IDs respectively to be used for the sample
- IDs in the last header row, while 'vcf' merges both IDs and puts an
- underscore between them. If the 'bgz' modifier is added, the VCF file is
- block-gzipped.
- The A2 allele is saved as the reference and normally flagged as not based
- on a real reference genome (INFO:PR). When it is important for reference
- alleles to be correct, you'll also want to include --a2-allele and
- --real-ref-alleles in your command.
- In addition,
- * The '12' modifier causes A1 (usually minor) alleles to be coded as '1'
- and A2 alleles to be coded as '2', while '01' maps A1 -> 0 and A2 -> 1.
- * The 'tab' modifier makes the output mostly tab-delimited instead of
- mostly space-delimited. 'tabx' and 'spacex' force all tabs and all
- spaces, respectively.
-
- --flip-scan
- (alias: --flipscan)
- LD-based scan for case/control strand inconsistency.
-
- --write-covar
- If a --covar file is loaded, --make-bed/--make-just-fam and --recode
- automatically generate an updated version (with all filters applied).
- However, if you do not wish to simultaneously generate a new genotype file,
- you can use --write-covar to just produce a pruned covariate file.
-
- --write-cluster
- If clusters are specified with --within/--family, this generates a new
- cluster file (with all filters applied). The 'omit-unassigned' modifier
- causes unclustered samples to be omitted from the file; otherwise their
- cluster is 'NA'.
-
- --write-set
- --set-table
- If sets have been defined, --write-set dumps 'END'-terminated set
- membership lists to {output prefix}.set, while --set-table writes a
- variant-by-set membership table to {output prefix}.set.table.
-
- --merge [.ped filename] [.map filename]
- --merge [text fileset prefix]
- --bmerge [.bed filename] [.bim filename] [.fam filename]
- --bmerge [binary fileset prefix]
- Merge the given fileset with the initially loaded fileset, writing the
- result to {output prefix}.bed + .bim + .fam. (It is no longer necessary to
- simultaneously specify --make-bed.)
- --merge-list [filename]
- Merge all filesets named in the text file with the reference fileset, if
- one was specified. (However, this can also be used *without* a reference;
- in that case, the newly created fileset is then treated as the reference by
- most other PLINK operations.) The text file is interpreted as follows:
- * If a line contains only one name, it is assumed to be the prefix for a
- binary fileset.
- * If a line contains exactly two names, they are assumed to be the full
- filenames for a text fileset (.ped first, then .map).
- * If a line contains exactly three names, they are assumed to be the full
- filenames for a binary fileset (.bed, then .bim, then .fam).
-
- --write-snplist
- --list-23-indels
- --write-snplist writes a .snplist file listing the names of all variants
- which pass the filters and inclusion thresholds you've specified, while
- --list-23-indels writes the subset with 23andMe-style indel calls (D/I
- allele codes).
-
- --list-duplicate-vars
- --list-duplicate-vars writes a .dupvar file describing all groups of
- variants with matching positions and allele codes.
- * By default, A1/A2 allele assignments are ignored; use 'require-same-ref'
- to override this.
- * Normally, the report contains position and allele codes. To remove them
- (and produce a file directly usable with e.g. --extract/--exclude), use
- 'ids-only'. Note that this command will fail in 'ids-only' mode if any
- of the reported IDs are not unique.
- * 'suppress-first' causes the first variant ID in each group to be omitted
- from the report.
-
- --freq
- --freqx
- --freq generates a basic allele frequency (or count, if the 'counts'
- modifier is present) report. This can be combined with --within/--family
- to produce a cluster-stratified allele frequency/count report instead, or
- the 'case-control' modifier to report case and control allele frequencies
- separately.
- --freqx generates a more detailed genotype count report, designed for use
- with --read-freq.
-
- --missing
- Generate sample- and variant-based missing data reports. If clusters are
- defined, the variant-based report is cluster-stratified. 'gz' causes the
- output files to be gzipped.
-
- --test-mishap
- Check for association between missing calls and flanking haplotypes.
-
- --hardy
- Generate a Hardy-Weinberg exact test p-value report. (This does NOT
- simultaneously filter on the p-value any more; use --hwe for that.) With
- the 'midp' modifier, the test applies the mid-p adjustment described in
- Graffelman J, Moreno V (2013) The mid p-value in exact tests for
- Hardy-Weinberg Equilibrium.
-
- --mendel
- Generate a Mendel error report. The 'summaries-only' modifier causes the
- .mendel file (listing every single error) to be skipped.
-
- --het
- --ibc
- Estimate inbreeding coefficients. --het reports method-of-moments
- estimates, while --ibc calculates all three values described in Yang J, Lee
- SH, Goddard ME and Visscher PM (2011) GCTA: A Tool for Genome-wide Complex
- Trait Analysis. (That paper also describes the relationship matrix
- computation we reimplement.)
- * These functions require decent MAF estimates. If there are very few
- samples in your immediate fileset, --read-freq is practically mandatory
- since imputed MAFs are wildly inaccurate in that case.
- * They also assume the marker set is in approximate linkage equilibrium.
- * By default, --het omits the n/(n-1) multiplier in Nei's expected
- homozygosity formula. The 'small-sample' modifier causes it to be
- included, while forcing --het to use MAFs imputed from founders in the
- immediate dataset.
-
- --check-sex {female max F} {male min F}
- --check-sex ycount {female max F} {male min F} {female max Y obs}
- {male min Y obs}
- --check-sex y-only {female max Y obs} {male min Y obs}
- --impute-sex {female max F} {male min F}
- --impute-sex ycount {female max F} {male min F} {female max Y obs}
- {male min Y obs}
- --impute-sex y-only {female max Y obs} {male min Y obs}
- --check-sex normally compares sex assignments in the input dataset with
- those imputed from X chromosome inbreeding coefficients.
- * Make sure that the X chromosome pseudo-autosomal region has been split
- off (with e.g. --split-x) before using this.
- * You also need decent MAF estimates (so, with very few samples in your
- immediate fileset, use --read-freq), and your marker set should be in
- approximate linkage equilibrium.
- * By default, F estimates smaller than 0.2 yield female calls, and values
- larger than 0.8 yield male calls. If you pass numeric parameter(s) to
- --check-sex, the first two control these thresholds.
- There are now two modes which consider Y chromosome data.
- * In 'ycount' mode, gender is still imputed from the X chromosome, but
- female calls are downgraded to ambiguous whenever more than 0 nonmissing
- Y genotypes are present, and male calls are downgraded when fewer than 0
- are present. (Note that these are counts, not rates.) These thresholds
- are controllable with --check-sex ycount's optional 3rd and 4th numeric
- parameters.
- * In 'y-only' mode, gender is imputed from nonmissing Y genotype counts.
- The male minimum threshold defaults to 1 instead of zero in this case.
- --impute-sex changes sex assignments to the imputed values, and is
- otherwise identical to --check-sex. It must be used with
- --make-bed/--recode/--write-covar.
-
- --fst
- (alias: --Fst)
- Estimate Wright's Fst for each autosomal diploid variant using the method
- introduced in Weir BS, Cockerham CC (1984) Estimating F-statistics for the
- analysis of population structure, given a set of subpopulations defined via
- --within. Raw and weighted global means are also reported.
- * If you're interested in the global means, it is usually best to perform
- this calculation on a marker set in approximate linkage equilibrium.
- * If you have only two subpopulations, you can represent them with
- case/control status and use the 'case-control' modifier.
-
- --indep [window size] [step size (variant ct)] [VIF threshold]
- --indep-pairwise [window size] [step size (variant ct)] [r^2 threshold]
- --indep-pairphase [window size] [step size (variant ct)] [r^2 threshold]
- Generate a list of markers in approximate linkage equilibrium. With the
- 'kb' modifier, the window size is in kilobase instead of variant count
- units. (Pre-'kb' space is optional, i.e. '--indep-pairwise 500 kb 5 0.5'
- and '--indep-pairwise 500kb 5 0.5' have the same effect.)
- Note that you need to rerun PLINK using --extract or --exclude on the
- .prune.in/.prune.out file to apply the list to another computation.
-
- --r
-
- --r2
-
- LD statistic reports. --r yields raw inter-variant correlations, while
- --r2 reports their squares. You can request results for all pairs in
- matrix format (if you specify 'bin' or one of the shape modifiers), all
- pairs in table format ('inter-chr'), or a limited window in table format
- (default).
- * The 'gz' modifier causes the output text file to be gzipped.
- * 'bin' causes the output matrix to be written in double-precision binary
- format, while 'bin4' specifics single-precision binary. The matrix is
- square if no shape is explicitly specified.
- * By default, text matrices are tab-delimited; 'spaces' switches this.
- * 'in-phase' adds a column with in-phase allele pairs to table-formatted
- reports. (This cannot be used with very long allele codes.)
- * 'dprime' adds the absolute value of Lewontin's D-prime statistic to
- table-formatted reports, and forces both r/r^2 and D-prime to be based on
- the maximum likelihood solution to the cubic equation discussed in Gaunt
- T, Rodriguez S, Day I (2007) Cubic exact solutions for the estimation of
- pairwise haplotype frequencies.
- 'dprime-signed' keeps the sign, while 'd' skips division by D_{max}.
- * 'with-freqs' adds MAF columns to table-formatted reports.
- * Since the resulting file can easily be huge, you're required to add the
- 'yes-really' modifier when requesting an unfiltered, non-distributed all
- pairs computation on more than 400k variants.
- * These computations can be subdivided with --parallel (even when the
- 'square' modifier is active).
- --ld [variant ID] [variant ID]
- This displays haplotype frequencies, r^2, and D' for a single pair of
- variants. When there are multiple biologically possible solutions to the
- haplotype frequency cubic equation, all are displayed (instead of just the
- maximum likelihood solution identified by --r/--r2), along with HWE exact
- test statistics.
-
- --show-tags [filename]
- --show-tags all
- * If a file is specified, list all variants which tag at least one variant
- named in the file. (This will normally be a superset of the original
- list, since a variant is considered to tag itself here.)
- * If 'all' mode is specified, for each variant, each *other* variant which
- tags it is reported.
-
- --blocks
- Estimate haplotype blocks, via Haploview's interpretation of the block
- definition suggested by Gabriel S et al. (2002) The Structure of Haplotype
- Blocks in the Human Genome.
- * Normally, samples with missing phenotypes are not considered by this
- computation; the 'no-pheno-req' modifier lifts this restriction.
- * Normally, size-2 blocks may not span more than 20kb, and size-3 blocks
- are limited to 30kb. The 'no-small-max-span' modifier removes these
- limits.
- The .blocks file is valid input for PLINK 1.07's --hap command. However,
- the --hap... family of flags has not been reimplemented in PLINK 1.9 due to
- poor phasing accuracy relative to other software; for now, we recommend
- using BEAGLE instead of PLINK for case/control haplotype association
- analysis. (You can use '--recode beagle' to export data to BEAGLE 3.3.)
- We apologize for the inconvenience, and plan to develop variants of the
- --hap... flags which handle pre-phased data effectively.
-
- --distance <1-ibs>
-
- Write a lower-triangular tab-delimited table of (weighted) genomic
- distances in allele count units to {output prefix}.dist, and a list of the
- corresponding sample IDs to {output prefix}.dist.id. The first row of the
- .dist file contains a single {genome 1-genome 2} distance, the second row
- has the {genome 1-genome 3} and {genome 2-genome 3} distances in that
- order, etc.
- * It is usually best to perform this calculation on a marker set in
- approximate linkage equilibrium.
- * If the 'square' or 'square0' modifier is present, a square matrix is
- written instead; 'square0' fills the upper right triangle with zeroes.
- * If the 'gz' modifier is present, a compressed .dist.gz file is written
- instead of a plain text file.
- * If the 'bin' modifier is present, a binary (square) matrix of
- double-precision floating point values, suitable for loading from R, is
- instead written to {output prefix}.dist.bin. ('bin4' specifies
- single-precision numbers instead.) This can be combined with 'square0'
- if you still want the upper right zeroed out, or 'triangle' if you don't
- want to pad the upper right at all.
- * If the 'ibs' modifier is present, an identity-by-state matrix is written
- to {output prefix}.mibs. '1-ibs' causes distances expressed as genomic
- proportions (i.e. 1 - IBS) to be written to {output prefix}.mdist.
- Combine with 'allele-ct' if you want to generate the usual .dist file as
- well.
- * By default, distance rescaling in the presence of missing genotype calls
- is sensitive to allele count distributions: if variant A contributes, on
- average, twice as much to other pairwise distances as variant B, a
- missing call at variant A will result in twice as large of a missingness
- correction. To turn this off (because e.g. your missing calls are highly
- nonrandom), use the 'flat-missing' modifier.
- * The computation can be subdivided with --parallel.
- --distance-matrix
- --ibs-matrix
- These deprecated commands are equivalent to '--distance 1-ibs flat-missing
- square' and '--distance ibs flat-missing square', respectively, except that
- they generate space- instead of tab-delimited text matrices.
-
- --make-rel
-
- Write a lower-triangular variance-standardized realized relationship matrix
- to {output prefix}.rel, and corresponding IDs to {output prefix}.rel.id.
- * It is usually best to perform this calculation on a marker set in
- approximate linkage equilibrium.
- * 'square', 'square0', 'triangle', 'gz', 'bin', and 'bin4' act as they do
- on --distance.
- * The 'cov' modifier removes the variance standardization step, causing a
- covariance matrix to be calculated instead.
- * By default, the diagonal elements in the relationship matrix are based on
- --ibc's Fhat1; use the 'ibc2' or 'ibc3' modifiers to base them on Fhat2
- or Fhat3 instead.
- * The computation can be subdivided with --parallel.
- --make-grm-gz
- --make-grm-bin
- --make-grm-gz writes the relationships in GCTA's original gzipped list
- format, which describes one pair per line, while --make-grm-bin writes them
- in GCTA 1.1+'s single-precision triangular binary format. Note that these
- formats explicitly report the number of valid observations (where neither
- sample has a missing call) for each pair, which is useful input for some
- scripts.
- These computations can be subdivided with --parallel.
-
- --rel-cutoff {val}
- (alias: --grm-cutoff)
- Exclude one member of each pair of samples with relatedness greater than
- the given cutoff value (default 0.025). If no later operation will cause
- the list of remaining samples to be written to disk, this will save it to
- {output prefix}.rel.id.
- Note that maximizing the remaining sample size is equivalent to the NP-hard
- maximum independent set problem, so we use a greedy algorithm instead of
- guaranteeing optimality. (Use the --make-rel and --keep/--remove flags if
- you want to try to do better.)
-
- --ibs-test {permutation count}
- --groupdist {iters} {d}
- Given case/control phenotype data, these commands consider three subsets of
- the distance matrix: pairs of affected samples, affected-unaffected pairs,
- and pairs of unaffected samples. Each of these subsets has a distribution
- of pairwise genomic distances; --ibs-test uses permutation to estimate
- p-values re: which types of pairs are most similar, while --groupdist
- focuses on the differences between the centers of these distributions and
- estimates standard errors via delete-d jackknife.
-
- --regress-distance {iters} {d}
- Linear regression of pairwise genomic distances on pairwise average
- phenotypes and vice versa, using delete-d jackknife for standard errors. A
- scalar phenotype is required.
- * With less than two parameters, d is set to {number of people}^0.6 rounded
- down. With no parameters, 100k iterations are run.
- --regress-rel {iters} {d}
- Linear regression of pairwise genomic relationships on pairwise average
- phenotypes, and vice versa. Defaults for iters and d are the same as for
- --regress-distance.
-
- --genome
- Generate an identity-by-descent report.
- * It is usually best to perform this calculation on a marker set in
- approximate linkage equilibrium.
- * The 'rel-check' modifier excludes pairs of samples with different FIDs
- from the final report.
- * 'full' adds raw pairwise comparison data to the report.
- * The P(IBD=0/1/2) estimator employed by this command sometimes yields
- numbers outside the range [0,1]; by default, these are clipped. The
- 'unbounded' modifier turns off this clipping.
- * Then, when PI_HAT^2 < P(IBD=2), 'nudge' adjusts the final P(IBD=0/1/2)
- estimates to a theoretically possible configuration.
- * The computation can be subdivided with --parallel.
-
- --homozyg
-
- --homozyg-snp [min var count]
- --homozyg-kb [min length]
- --homozyg-density [max inverse density (kb/var)]
- --homozyg-gap [max internal gap kb length]
- --homozyg-het [max hets]
- --homozyg-window-snp [scanning window size]
- --homozyg-window-het [max hets in scanning window hit]
- --homozyg-window-missing [max missing calls in scanning window hit]
- --homozyg-window-threshold [min scanning window hit rate]
- These commands request a set of run-of-homozygosity reports, and allow you
- to customize how they are generated.
- * If you're satisfied with all the default settings described below, just
- use --homozyg with no modifiers. Otherwise, --homozyg lets you change a
- few binary settings:
- * 'group{-verbose}' adds a report on pools of overlapping runs of
- homozygosity. (Automatically set when --homozyg-match is present.)
- * With 'group{-verbose}', 'consensus-match' causes pairwise segmental
- matches to be called based on the variants in the pool's consensus
- segment, rather than the variants in the pairwise intersection.
- * Due to how the scanning window algorithm works, it is possible for a
- reported ROH to be adjacent to a few homozygous variants. The 'extend'
- modifier causes them to be included in the reported ROH if that
- wouldn't cause a violation of the --homozyg-density bound.
- * By default, segment bp lengths are calculated as [end bp position] -
- [start bp position] + 1. Therefore, reports normally differ slightly
- from PLINK 1.07, which does not add 1 at the end. For testing
- purposes, you can use the 'subtract-1-from-lengths' modifier to apply
- the old formula.
- * By default, only runs of homozygosity containing at least 100 variants,
- and of total length >= 1000 kilobases, are noted. You can change these
- minimums with --homozyg-snp and --homozyg-kb, respectively.
- * By default, a ROH must have at least one variant per 50 kb on average;
- change this bound with --homozyg-density.
- * By default, if two consecutive variants are more than 1000 kb apart, they
- cannot be in the same ROH; change this bound with --homozyg-gap.
- * By default, a ROH can contain an unlimited number of heterozygous calls;
- you can impose a limit with --homozyg-het.
- * By default, the scanning window contains 50 variants; change this with
- --homozyg-window-snp.
- * By default, a scanning window hit can contain at most 1 heterozygous
- call and 5 missing calls; change these limits with --homozyg-window-het
- and --homozyg-window-missing, respectively.
- * By default, for a variant to be eligible for inclusion in a ROH, the hit
- rate of all scanning windows containing the variant must be at least
- 0.05; change this threshold with --homozyg-window-threshold.
-
- --cluster
- Cluster samples using a pairwise similarity statistic (normally IBS).
- * The 'cc' modifier forces every cluster to have at least one case and one
- control.
- * The 'group-avg' modifier causes clusters to be joined based on average
- instead of minimum pairwise similarity.
- * The 'missing' modifier causes clustering to be based on
- identity-by-missingness instead of identity-by-state, and writes a
- space-delimited identity-by-missingness matrix to disk.
- * The 'only2' modifier causes only a .cluster2 file (which is valid input
- for --within) to be written; otherwise 2 other files will be produced.
- * By default, IBS ties are not broken in the same manner as PLINK 1.07, so
- final cluster solutions tend to differ. This is generally harmless.
- However, to simplify testing, you can use the 'old-tiebreaks' modifier to
- force emulation of the old algorithm.
-
- --pca {count}
- Calculates a variance-standardized relationship matrix (use
- --make-rel/--make-grm-gz/--make-grm-bin to dump it), and extracts the top
- 20 principal components.
- * It is usually best to perform this calculation on a marker set in
- approximate linkage equilibrium.
- * You can change the number of PCs by passing a numeric parameter.
- * The 'header' modifier adds a header line to the .eigenvec output file.
- (For compatibility with the GCTA flag of the same name, the default is no
- header line.)
- * The 'tabs' modifier causes the .eigenvec file(s) to be tab-delimited.
- * The 'var-wts' modifier requests an additional .eigenvec.var file with PCs
- expressed as variant weights instead of sample weights.
-
- --neighbour [n1] [n2]
- (alias: --neighbor)
- Report IBS distances from each sample to their n1th- to n2th-nearest
- neighbors, associated Z-scores, and the identities of those neighbors.
- Useful for outlier detection.
-
- --assoc
-
- --assoc
- --model
-
-
- Basic association analysis report.
- Given a case/control phenotype, --assoc performs a 1df chi-square allelic
- test, while --model performs 4 other tests as well (1df dominant gene
- action, 1df recessive gene action, 2df genotypic, Cochran-Armitage trend).
- * With 'fisher'/'fisher-midp', Fisher's exact test is used to generate
- p-values. 'fisher-midp' also applies Lancaster's mid-p adjustment.
- * 'perm' causes an adaptive permutation test to be performed.
- * 'mperm=[value]' causes a max(T) permutation test with the specified
- number of replications to be performed.
- * 'perm-count' causes the permutation test report to include counts instead
- of frequencies.
- * 'counts' causes --assoc to report allele counts instead of frequencies.
- * 'set-test' tests the significance of variant sets. Requires permutation;
- can be customized with --set-p/--set-r2/--set-max.
- * 'dom', 'rec', 'gen', and 'trend' force the corresponding test to be used
- as the basis for --model permutation. (By default, the most significant
- result among the allelic, dominant, and recessive tests is used.)
- * 'trend-only' causes only the trend test to be performed.
- Given a quantitative phenotype, --assoc normally performs a Wald test.
- * In this case, the 'qt-means' modifier causes trait means and standard
- deviations stratified by genotype to be reported as well.
- * 'lin' causes the Lin statistic to be computed, and makes it the basis for
- multiple-testing corrections and permutation tests.
- Several other flags (most notably, --aperm) can be used to customize the
- permutation test.
-
- --mh
- (alias: --cmh)
- --bd
- --mh2
- --homog
- Given a case/control phenotype and a set of clusters, --mh computes 2x2xK
- Cochran-Mantel-Haenszel statistics for each variant, while --bd also
- performs the Breslow-Day test for odds ratio homogeneity. Permutation and
- variant set testing based on the CMH (default) or Breslow-Day (when
- 'perm-bd' is present) statistic are supported.
- The following similar analyses are also available:
- * --mh2 swaps the roles of case/control status and cluster membership,
- performing a phenotype-stratified IxJxK Cochran-Mantel-Haenszel test on
- association between cluster assignments and genotypes.
- * --homog executes an alternative to the Breslow-Day test, based on
- partitioning of the chi-square statistic.
-
- --gxe {covariate index}
- Given both a quantitative phenotype and a case/control covariate loaded
- with --covar defining two groups, --gxe compares the regression coefficient
- derived from considering only members of one group to the regression
- coefficient derived from considering only members of the other. By
- default, the first covariate in the --covar file defines the groups; use
- e.g. '--gxe 3' to base them on the third covariate instead.
-
- --linear
-
-
- --logistic
+::
+
+
+ PLINK v1.90b4 64-bit (20 Mar 2017) www.cog-genomics.org/plink/1.9/
+ (C) 2005-2017 Shaun Purcell, Christopher Chang GNU General Public License v3
+
+ In the command line flag definitions that follow,
+ * [square brackets] denote a required parameter, where the text between the
+ brackets describes its nature.
+ * denote an optional modifier (or if '|' is present, a set
+ of mutually exclusive optional modifiers). Use the EXACT text in the
+ definition, e.g. '--dummy acgt'.
+ * There's one exception to the angle brackets/exact text rule: when an angle
+ bracket term ends with '=[value]', '[value]' designates a variable
+ parameter.
+ * {curly braces} denote an optional parameter, where the text between the
+ braces describes its nature.
+ * An ellipsis (...) indicates that you may enter multiple parameters of the
+ specified type.
+
+ plink [input flag(s)...] {command flag(s)...} {other flag(s)...}
+ plink --help {flag name(s)...}
+
+ Most PLINK runs require exactly one main input fileset. The following flags
+ are available for defining its form and location:
+
+ --bfile {prefix} : Specify .bed + .bim + .fam prefix (default 'plink').
+ --bed [filename] : Specify full name of .bed file.
+ --bim [filename] : Specify full name of .bim file.
+ --fam [filename] : Specify full name of .fam file.
+
+ --keep-autoconv : With --file/--tfile/--lfile/--vcf/--bcf/--data/--23file,
+ don't delete autogenerated binary fileset at end of run.
+
+ --file {prefix} : Specify .ped + .map filename prefix (default 'plink').
+ --ped [filename] : Specify full name of .ped file.
+ --map [filename] : Specify full name of .map file.
+
+ --no-fid : .fam/.ped file does not contain column 1 (family ID).
+ --no-parents : .fam/.ped file does not contain columns 3-4 (parents).
+ --no-sex : .fam/.ped file does not contain column 5 (sex).
+ --no-pheno : .fam/.ped file does not contain column 6 (phenotype).
+
+ --tfile {prefix} : Specify .tped + .tfam filename prefix (default 'plink').
+ --tped [fname] : Specify full name of .tped file.
+ --tfam [fname] : Specify full name of .tfam file.
+
+ --lfile {prefix} : Specify .lgen + .map + .fam (long-format fileset) prefix.
+ --lgen [fname] : Specify full name of .lgen file.
+ --reference [fn] : Specify default allele file accompanying .lgen input.
+ --allele-count : When used with --lfile/--lgen + --reference, specifies
+ that the .lgen file contains reference allele counts.
+
+ --vcf [filename] : Specify full name of .vcf or .vcf.gz file.
+ --bcf [filename] : Specify full name of BCF2 file.
+
+ --data {prefix} : Specify Oxford .gen + .sample prefix (default 'plink').
+ --gen [filename] : Specify full name of .gen or .gen.gz file.
+ --bgen [f] : Specify full name of .bgen file.
+ --sample [fname] : Specify full name of .sample file.
+
+ --23file [fname] {FID} {IID} {sex} {pheno} {pat. ID} {mat. ID} :
+ Specify 23andMe input file.
+
+ --grm-gz {prfx} : Specify .grm.gz + .grm.id (GCTA rel. matrix) prefix.
+ --grm-bin {prfx} : Specify .grm.bin + .grm.N.bin + .grm.id (GCTA triangular
+ binary relationship matrix) filename prefix.
+
+ --dummy [sample ct] [SNP ct] {missing geno freq} {missing pheno freq}
+
+ This generates a fake input dataset with the specified number of samples
+ and SNPs. By default, the missing genotype and phenotype frequencies are
+ zero, and genotypes are As and Bs (change the latter with
+ 'acgt'/'1234'/'12'). The 'scalar-pheno' modifier causes a normally
+ distributed scalar phenotype to be generated instead of a binary one.
+
+ --simulate [simulation parameter file]
+ --simulate-qt [simulation parameter file]
+ --simulate generates a fake input dataset with disease-associated SNPs,
+ while --simulate-qt generates a dataset with quantitative trait loci.
+
+ Output files have names of the form 'plink.{extension}' by default. You can
+ change the 'plink' prefix with
+
+ --out [prefix] : Specify prefix for output files.
+
+ Most runs also require at least one of the following commands:
+
+ --make-bed
+ Create a new binary fileset. Unlike the automatic text-to-binary
+ converters (which only heed chromosome filters), this supports all of
+ PLINK's filtering flags.
+ --make-just-bim
+ --make-just-fam
+ Variants of --make-bed which only write a new .bim or .fam file. Can be
+ used with only .bim/.fam input.
+ USE THESE CAUTIOUSLY. It is very easy to desynchronize your binary
+ genotype data and your .bim/.fam indexes if you use these commands
+ improperly. If you have any doubt, stick with --make-bed.
+
+ --recode [output format] <01 | 12>
+
+ Create a new text fileset with all filters applied. The following output
+ formats are supported:
+ * '23': 23andMe 4-column format. This can only be used on a single
+ sample's data (--keep may be handy), and does not support multicharacter
+ allele codes.
+ * 'A': Sample-major additive (0/1/2) coding, suitable for loading from R.
+ If you need uncounted alleles to be named in the header line, add the
+ 'include-alt' modifier.
+ * 'AD': Sample-major additive (0/1/2) + dominant (het=1/hom=0) coding.
+ Also supports 'include-alt'.
+ * 'A-transpose': Variant-major 0/1/2.
+ * 'beagle': Unphased per-autosome .dat and .map files, readable by early
+ BEAGLE versions.
+ * 'beagle-nomap': Single .beagle.dat file.
+ * 'bimbam': Regular BIMBAM format.
+ * 'bimbam-1chr': BIMBAM format, with a two-column .pos.txt file. Does not
+ support multiple chromosomes.
+ * 'fastphase': Per-chromosome fastPHASE files, with
+ .chr-[chr #].recode.phase.inp filename extensions.
+ * 'fastphase-1chr': Single .recode.phase.inp file. Does not support
+ multiple chromosomes.
+ * 'HV': Per-chromosome Haploview files, with .chr-[chr #][.ped + .info]
+ filename extensions.
+ * 'HV-1chr': Single Haploview .ped + .info file pair. Does not support
+ multiple chromosomes.
+ * 'lgen': PLINK 1 long-format (.lgen + .fam + .map), loadable with --lfile.
+ * 'lgen-ref': .lgen + .fam + .map + .ref, loadable with --lfile +
+ --reference.
+ * 'list': Single genotype-based list, up to 4 lines per variant. To omit
+ nonmale genotypes on the Y chromosome, add the 'omit-nonmale-y' modifier.
+ * 'rlist': .rlist + .fam + .map fileset, where the .rlist file is a
+ genotype-based list which omits the most common genotype for each
+ variant. Also supports 'omit-nonmale-y'.
+ * 'oxford': Oxford-format .gen + .sample. With the 'gen-gz' modifier, the
+ .gen file is gzipped.
+ * 'ped': PLINK 1 sample-major (.ped + .map), loadable with --file.
+ * 'compound-genotypes': Same as 'ped', except that the space between each
+ pair of same-variant allele codes is removed.
+ * 'structure': Structure-format.
+ * 'transpose': PLINK 1 variant-major (.tped + .tfam), loadable with
+ --tfile.
+ * 'vcf', 'vcf-fid', 'vcf-iid': VCFv4.2. 'vcf-fid' and 'vcf-iid' cause
+ family IDs or within-family IDs respectively to be used for the sample
+ IDs in the last header row, while 'vcf' merges both IDs and puts an
+ underscore between them. If the 'bgz' modifier is added, the VCF file is
+ block-gzipped.
+ The A2 allele is saved as the reference and normally flagged as not based
+ on a real reference genome (INFO:PR). When it is important for reference
+ alleles to be correct, you'll also want to include --a2-allele and
+ --real-ref-alleles in your command.
+ In addition,
+ * The '12' modifier causes A1 (usually minor) alleles to be coded as '1'
+ and A2 alleles to be coded as '2', while '01' maps A1 -> 0 and A2 -> 1.
+ * The 'tab' modifier makes the output mostly tab-delimited instead of
+ mostly space-delimited. 'tabx' and 'spacex' force all tabs and all
+ spaces, respectively.
+
+ --flip-scan
+ (alias: --flipscan)
+ LD-based scan for case/control strand inconsistency.
+
+ --write-covar
+ If a --covar file is loaded, --make-bed/--make-just-fam and --recode
+ automatically generate an updated version (with all filters applied).
+ However, if you do not wish to simultaneously generate a new genotype file,
+ you can use --write-covar to just produce a pruned covariate file.
+
+ --write-cluster
+ If clusters are specified with --within/--family, this generates a new
+ cluster file (with all filters applied). The 'omit-unassigned' modifier
+ causes unclustered samples to be omitted from the file; otherwise their
+ cluster is 'NA'.
+
+ --write-set
+ --set-table
+ If sets have been defined, --write-set dumps 'END'-terminated set
+ membership lists to {output prefix}.set, while --set-table writes a
+ variant-by-set membership table to {output prefix}.set.table.
+
+ --merge [.ped filename] [.map filename]
+ --merge [text fileset prefix]
+ --bmerge [.bed filename] [.bim filename] [.fam filename]
+ --bmerge [binary fileset prefix]
+ Merge the given fileset with the initially loaded fileset, writing the
+ result to {output prefix}.bed + .bim + .fam. (It is no longer necessary to
+ simultaneously specify --make-bed.)
+ --merge-list [filename]
+ Merge all filesets named in the text file with the reference fileset, if
+ one was specified. (However, this can also be used *without* a reference;
+ in that case, the newly created fileset is then treated as the reference by
+ most other PLINK operations.) The text file is interpreted as follows:
+ * If a line contains only one name, it is assumed to be the prefix for a
+ binary fileset.
+ * If a line contains exactly two names, they are assumed to be the full
+ filenames for a text fileset (.ped first, then .map).
+ * If a line contains exactly three names, they are assumed to be the full
+ filenames for a binary fileset (.bed, then .bim, then .fam).
+
+ --write-snplist
+ --list-23-indels
+ --write-snplist writes a .snplist file listing the names of all variants
+ which pass the filters and inclusion thresholds you've specified, while
+ --list-23-indels writes the subset with 23andMe-style indel calls (D/I
+ allele codes).
+
+ --list-duplicate-vars
+ --list-duplicate-vars writes a .dupvar file describing all groups of
+ variants with matching positions and allele codes.
+ * By default, A1/A2 allele assignments are ignored; use 'require-same-ref'
+ to override this.
+ * Normally, the report contains position and allele codes. To remove them
+ (and produce a file directly usable with e.g. --extract/--exclude), use
+ 'ids-only'. Note that this command will fail in 'ids-only' mode if any
+ of the reported IDs are not unique.
+ * 'suppress-first' causes the first variant ID in each group to be omitted
+ from the report.
+
+ --freq
+ --freqx
+ --freq generates a basic allele frequency (or count, if the 'counts'
+ modifier is present) report. This can be combined with --within/--family
+ to produce a cluster-stratified allele frequency/count report instead, or
+ the 'case-control' modifier to report case and control allele frequencies
+ separately.
+ --freqx generates a more detailed genotype count report, designed for use
+ with --read-freq.
+
+ --missing
+ Generate sample- and variant-based missing data reports. If clusters are
+ defined, the variant-based report is cluster-stratified. 'gz' causes the
+ output files to be gzipped.
+
+ --test-mishap
+ Check for association between missing calls and flanking haplotypes.
+
+ --hardy
+ Generate a Hardy-Weinberg exact test p-value report. (This does NOT
+ simultaneously filter on the p-value any more; use --hwe for that.) With
+ the 'midp' modifier, the test applies the mid-p adjustment described in
+ Graffelman J, Moreno V (2013) The mid p-value in exact tests for
+ Hardy-Weinberg Equilibrium.
+
+ --mendel
+ Generate a Mendel error report. The 'summaries-only' modifier causes the
+ .mendel file (listing every single error) to be skipped.
+
+ --het
+ --ibc
+ Estimate inbreeding coefficients. --het reports method-of-moments
+ estimates, while --ibc calculates all three values described in Yang J, Lee
+ SH, Goddard ME and Visscher PM (2011) GCTA: A Tool for Genome-wide Complex
+ Trait Analysis. (That paper also describes the relationship matrix
+ computation we reimplement.)
+ * These functions require decent MAF estimates. If there are very few
+ samples in your immediate fileset, --read-freq is practically mandatory
+ since imputed MAFs are wildly inaccurate in that case.
+ * They also assume the marker set is in approximate linkage equilibrium.
+ * By default, --het omits the n/(n-1) multiplier in Nei's expected
+ homozygosity formula. The 'small-sample' modifier causes it to be
+ included, while forcing --het to use MAFs imputed from founders in the
+ immediate dataset.
+
+ --check-sex {female max F} {male min F}
+ --check-sex ycount {female max F} {male min F} {female max Y obs}
+ {male min Y obs}
+ --check-sex y-only {female max Y obs} {male min Y obs}
+ --impute-sex {female max F} {male min F}
+ --impute-sex ycount {female max F} {male min F} {female max Y obs}
+ {male min Y obs}
+ --impute-sex y-only {female max Y obs} {male min Y obs}
+ --check-sex normally compares sex assignments in the input dataset with
+ those imputed from X chromosome inbreeding coefficients.
+ * Make sure that the X chromosome pseudo-autosomal region has been split
+ off (with e.g. --split-x) before using this.
+ * You also need decent MAF estimates (so, with very few samples in your
+ immediate fileset, use --read-freq), and your marker set should be in
+ approximate linkage equilibrium.
+ * By default, F estimates smaller than 0.2 yield female calls, and values
+ larger than 0.8 yield male calls. If you pass numeric parameter(s) to
+ --check-sex, the first two control these thresholds.
+ There are now two modes which consider Y chromosome data.
+ * In 'ycount' mode, gender is still imputed from the X chromosome, but
+ female calls are downgraded to ambiguous whenever more than 0 nonmissing
+ Y genotypes are present, and male calls are downgraded when fewer than 0
+ are present. (Note that these are counts, not rates.) These thresholds
+ are controllable with --check-sex ycount's optional 3rd and 4th numeric
+ parameters.
+ * In 'y-only' mode, gender is imputed from nonmissing Y genotype counts.
+ The male minimum threshold defaults to 1 instead of zero in this case.
+ --impute-sex changes sex assignments to the imputed values, and is
+ otherwise identical to --check-sex. It must be used with
+ --make-bed/--recode/--write-covar.
+
+ --fst
+ (alias: --Fst)
+ Estimate Wright's Fst for each autosomal diploid variant using the method
+ introduced in Weir BS, Cockerham CC (1984) Estimating F-statistics for the
+ analysis of population structure, given a set of subpopulations defined via
+ --within. Raw and weighted global means are also reported.
+ * If you're interested in the global means, it is usually best to perform
+ this calculation on a marker set in approximate linkage equilibrium.
+ * If you have only two subpopulations, you can represent them with
+ case/control status and use the 'case-control' modifier.
+
+ --indep [window size] [step size (variant ct)] [VIF threshold]
+ --indep-pairwise [window size] [step size (variant ct)] [r^2 threshold]
+ --indep-pairphase [window size] [step size (variant ct)] [r^2 threshold]
+ Generate a list of markers in approximate linkage equilibrium. With the
+ 'kb' modifier, the window size is in kilobase instead of variant count
+ units. (Pre-'kb' space is optional, i.e. '--indep-pairwise 500 kb 5 0.5'
+ and '--indep-pairwise 500kb 5 0.5' have the same effect.)
+ Note that you need to rerun PLINK using --extract or --exclude on the
+ .prune.in/.prune.out file to apply the list to another computation.
+
+ --r
+
+ --r2
+
+ LD statistic reports. --r yields raw inter-variant correlations, while
+ --r2 reports their squares. You can request results for all pairs in
+ matrix format (if you specify 'bin' or one of the shape modifiers), all
+ pairs in table format ('inter-chr'), or a limited window in table format
+ (default).
+ * The 'gz' modifier causes the output text file to be gzipped.
+ * 'bin' causes the output matrix to be written in double-precision binary
+ format, while 'bin4' specifics single-precision binary. The matrix is
+ square if no shape is explicitly specified.
+ * By default, text matrices are tab-delimited; 'spaces' switches this.
+ * 'in-phase' adds a column with in-phase allele pairs to table-formatted
+ reports. (This cannot be used with very long allele codes.)
+ * 'dprime' adds the absolute value of Lewontin's D-prime statistic to
+ table-formatted reports, and forces both r/r^2 and D-prime to be based on
+ the maximum likelihood solution to the cubic equation discussed in Gaunt
+ T, Rodriguez S, Day I (2007) Cubic exact solutions for the estimation of
+ pairwise haplotype frequencies.
+ 'dprime-signed' keeps the sign, while 'd' skips division by D_{max}.
+ * 'with-freqs' adds MAF columns to table-formatted reports.
+ * Since the resulting file can easily be huge, you're required to add the
+ 'yes-really' modifier when requesting an unfiltered, non-distributed all
+ pairs computation on more than 400k variants.
+ * These computations can be subdivided with --parallel (even when the
+ 'square' modifier is active).
+ --ld [variant ID] [variant ID]
+ This displays haplotype frequencies, r^2, and D' for a single pair of
+ variants. When there are multiple biologically possible solutions to the
+ haplotype frequency cubic equation, all are displayed (instead of just the
+ maximum likelihood solution identified by --r/--r2), along with HWE exact
+ test statistics.
+
+ --show-tags [filename]
+ --show-tags all
+ * If a file is specified, list all variants which tag at least one variant
+ named in the file. (This will normally be a superset of the original
+ list, since a variant is considered to tag itself here.)
+ * If 'all' mode is specified, for each variant, each *other* variant which
+ tags it is reported.
+
+ --blocks
+ Estimate haplotype blocks, via Haploview's interpretation of the block
+ definition suggested by Gabriel S et al. (2002) The Structure of Haplotype
+ Blocks in the Human Genome.
+ * Normally, samples with missing phenotypes are not considered by this
+ computation; the 'no-pheno-req' modifier lifts this restriction.
+ * Normally, size-2 blocks may not span more than 20kb, and size-3 blocks
+ are limited to 30kb. The 'no-small-max-span' modifier removes these
+ limits.
+ The .blocks file is valid input for PLINK 1.07's --hap command. However,
+ the --hap... family of flags has not been reimplemented in PLINK 1.9 due to
+ poor phasing accuracy relative to other software; for now, we recommend
+ using BEAGLE instead of PLINK for case/control haplotype association
+ analysis. (You can use '--recode beagle' to export data to BEAGLE 3.3.)
+ We apologize for the inconvenience, and plan to develop variants of the
+ --hap... flags which handle pre-phased data effectively.
+
+ --distance <1-ibs>
+
+ Write a lower-triangular tab-delimited table of (weighted) genomic
+ distances in allele count units to {output prefix}.dist, and a list of the
+ corresponding sample IDs to {output prefix}.dist.id. The first row of the
+ .dist file contains a single {genome 1-genome 2} distance, the second row
+ has the {genome 1-genome 3} and {genome 2-genome 3} distances in that
+ order, etc.
+ * It is usually best to perform this calculation on a marker set in
+ approximate linkage equilibrium.
+ * If the 'square' or 'square0' modifier is present, a square matrix is
+ written instead; 'square0' fills the upper right triangle with zeroes.
+ * If the 'gz' modifier is present, a compressed .dist.gz file is written
+ instead of a plain text file.
+ * If the 'bin' modifier is present, a binary (square) matrix of
+ double-precision floating point values, suitable for loading from R, is
+ instead written to {output prefix}.dist.bin. ('bin4' specifies
+ single-precision numbers instead.) This can be combined with 'square0'
+ if you still want the upper right zeroed out, or 'triangle' if you don't
+ want to pad the upper right at all.
+ * If the 'ibs' modifier is present, an identity-by-state matrix is written
+ to {output prefix}.mibs. '1-ibs' causes distances expressed as genomic
+ proportions (i.e. 1 - IBS) to be written to {output prefix}.mdist.
+ Combine with 'allele-ct' if you want to generate the usual .dist file as
+ well.
+ * By default, distance rescaling in the presence of missing genotype calls
+ is sensitive to allele count distributions: if variant A contributes, on
+ average, twice as much to other pairwise distances as variant B, a
+ missing call at variant A will result in twice as large of a missingness
+ correction. To turn this off (because e.g. your missing calls are highly
+ nonrandom), use the 'flat-missing' modifier.
+ * The computation can be subdivided with --parallel.
+ --distance-matrix
+ --ibs-matrix
+ These deprecated commands are equivalent to '--distance 1-ibs flat-missing
+ square' and '--distance ibs flat-missing square', respectively, except that
+ they generate space- instead of tab-delimited text matrices.
+
+ --make-rel
+
+ Write a lower-triangular variance-standardized realized relationship matrix
+ to {output prefix}.rel, and corresponding IDs to {output prefix}.rel.id.
+ * It is usually best to perform this calculation on a marker set in
+ approximate linkage equilibrium.
+ * 'square', 'square0', 'triangle', 'gz', 'bin', and 'bin4' act as they do
+ on --distance.
+ * The 'cov' modifier removes the variance standardization step, causing a
+ covariance matrix to be calculated instead.
+ * By default, the diagonal elements in the relationship matrix are based on
+ --ibc's Fhat1; use the 'ibc2' or 'ibc3' modifiers to base them on Fhat2
+ or Fhat3 instead.
+ * The computation can be subdivided with --parallel.
+ --make-grm-gz
+ --make-grm-bin
+ --make-grm-gz writes the relationships in GCTA's original gzipped list
+ format, which describes one pair per line, while --make-grm-bin writes them
+ in GCTA 1.1+'s single-precision triangular binary format. Note that these
+ formats explicitly report the number of valid observations (where neither
+ sample has a missing call) for each pair, which is useful input for some
+ scripts.
+ These computations can be subdivided with --parallel.
+
+ --rel-cutoff {val}
+ (alias: --grm-cutoff)
+ Exclude one member of each pair of samples with relatedness greater than
+ the given cutoff value (default 0.025). If no later operation will cause
+ the list of remaining samples to be written to disk, this will save it to
+ {output prefix}.rel.id.
+ Note that maximizing the remaining sample size is equivalent to the NP-hard
+ maximum independent set problem, so we use a greedy algorithm instead of
+ guaranteeing optimality. (Use the --make-rel and --keep/--remove flags if
+ you want to try to do better.)
+
+ --ibs-test {permutation count}
+ --groupdist {iters} {d}
+ Given case/control phenotype data, these commands consider three subsets of
+ the distance matrix: pairs of affected samples, affected-unaffected pairs,
+ and pairs of unaffected samples. Each of these subsets has a distribution
+ of pairwise genomic distances; --ibs-test uses permutation to estimate
+ p-values re: which types of pairs are most similar, while --groupdist
+ focuses on the differences between the centers of these distributions and
+ estimates standard errors via delete-d jackknife.
+
+ --regress-distance {iters} {d}
+ Linear regression of pairwise genomic distances on pairwise average
+ phenotypes and vice versa, using delete-d jackknife for standard errors. A
+ scalar phenotype is required.
+ * With less than two parameters, d is set to {number of people}^0.6 rounded
+ down. With no parameters, 100k iterations are run.
+ --regress-rel {iters} {d}
+ Linear regression of pairwise genomic relationships on pairwise average
+ phenotypes, and vice versa. Defaults for iters and d are the same as for
+ --regress-distance.
+
+ --genome
+ Generate an identity-by-descent report.
+ * It is usually best to perform this calculation on a marker set in
+ approximate linkage equilibrium.
+ * The 'rel-check' modifier excludes pairs of samples with different FIDs
+ from the final report.
+ * 'full' adds raw pairwise comparison data to the report.
+ * The P(IBD=0/1/2) estimator employed by this command sometimes yields
+ numbers outside the range [0,1]; by default, these are clipped. The
+ 'unbounded' modifier turns off this clipping.
+ * Then, when PI_HAT^2 < P(IBD=2), 'nudge' adjusts the final P(IBD=0/1/2)
+ estimates to a theoretically possible configuration.
+ * The computation can be subdivided with --parallel.
+
+ --homozyg
+
+ --homozyg-snp [min var count]
+ --homozyg-kb [min length]
+ --homozyg-density [max inverse density (kb/var)]
+ --homozyg-gap [max internal gap kb length]
+ --homozyg-het [max hets]
+ --homozyg-window-snp [scanning window size]
+ --homozyg-window-het [max hets in scanning window hit]
+ --homozyg-window-missing [max missing calls in scanning window hit]
+ --homozyg-window-threshold [min scanning window hit rate]
+ These commands request a set of run-of-homozygosity reports, and allow you
+ to customize how they are generated.
+ * If you're satisfied with all the default settings described below, just
+ use --homozyg with no modifiers. Otherwise, --homozyg lets you change a
+ few binary settings:
+ * 'group{-verbose}' adds a report on pools of overlapping runs of
+ homozygosity. (Automatically set when --homozyg-match is present.)
+ * With 'group{-verbose}', 'consensus-match' causes pairwise segmental
+ matches to be called based on the variants in the pool's consensus
+ segment, rather than the variants in the pairwise intersection.
+ * Due to how the scanning window algorithm works, it is possible for a
+ reported ROH to be adjacent to a few homozygous variants. The 'extend'
+ modifier causes them to be included in the reported ROH if that
+ wouldn't cause a violation of the --homozyg-density bound.
+ * By default, segment bp lengths are calculated as [end bp position] -
+ [start bp position] + 1. Therefore, reports normally differ slightly
+ from PLINK 1.07, which does not add 1 at the end. For testing
+ purposes, you can use the 'subtract-1-from-lengths' modifier to apply
+ the old formula.
+ * By default, only runs of homozygosity containing at least 100 variants,
+ and of total length >= 1000 kilobases, are noted. You can change these
+ minimums with --homozyg-snp and --homozyg-kb, respectively.
+ * By default, a ROH must have at least one variant per 50 kb on average;
+ change this bound with --homozyg-density.
+ * By default, if two consecutive variants are more than 1000 kb apart, they
+ cannot be in the same ROH; change this bound with --homozyg-gap.
+ * By default, a ROH can contain an unlimited number of heterozygous calls;
+ you can impose a limit with --homozyg-het.
+ * By default, the scanning window contains 50 variants; change this with
+ --homozyg-window-snp.
+ * By default, a scanning window hit can contain at most 1 heterozygous
+ call and 5 missing calls; change these limits with --homozyg-window-het
+ and --homozyg-window-missing, respectively.
+ * By default, for a variant to be eligible for inclusion in a ROH, the hit
+ rate of all scanning windows containing the variant must be at least
+ 0.05; change this threshold with --homozyg-window-threshold.
+
+ --cluster
+ Cluster samples using a pairwise similarity statistic (normally IBS).
+ * The 'cc' modifier forces every cluster to have at least one case and one
+ control.
+ * The 'group-avg' modifier causes clusters to be joined based on average
+ instead of minimum pairwise similarity.
+ * The 'missing' modifier causes clustering to be based on
+ identity-by-missingness instead of identity-by-state, and writes a
+ space-delimited identity-by-missingness matrix to disk.
+ * The 'only2' modifier causes only a .cluster2 file (which is valid input
+ for --within) to be written; otherwise 2 other files will be produced.
+ * By default, IBS ties are not broken in the same manner as PLINK 1.07, so
+ final cluster solutions tend to differ. This is generally harmless.
+ However, to simplify testing, you can use the 'old-tiebreaks' modifier to
+ force emulation of the old algorithm.
+
+ --pca {count}
+ Calculates a variance-standardized relationship matrix (use
+ --make-rel/--make-grm-gz/--make-grm-bin to dump it), and extracts the top
+ 20 principal components.
+ * It is usually best to perform this calculation on a marker set in
+ approximate linkage equilibrium.
+ * You can change the number of PCs by passing a numeric parameter.
+ * The 'header' modifier adds a header line to the .eigenvec output file.
+ (For compatibility with the GCTA flag of the same name, the default is no
+ header line.)
+ * The 'tabs' modifier causes the .eigenvec file(s) to be tab-delimited.
+ * The 'var-wts' modifier requests an additional .eigenvec.var file with PCs
+ expressed as variant weights instead of sample weights.
+
+ --neighbour [n1] [n2]
+ (alias: --neighbor)
+ Report IBS distances from each sample to their n1th- to n2th-nearest
+ neighbors, associated Z-scores, and the identities of those neighbors.
+ Useful for outlier detection.
+
+ --assoc
+
+ --assoc
+ --model
+
+
+ Basic association analysis report.
+ Given a case/control phenotype, --assoc performs a 1df chi-square allelic
+ test, while --model performs 4 other tests as well (1df dominant gene
+ action, 1df recessive gene action, 2df genotypic, Cochran-Armitage trend).
+ * With 'fisher'/'fisher-midp', Fisher's exact test is used to generate
+ p-values. 'fisher-midp' also applies Lancaster's mid-p adjustment.
+ * 'perm' causes an adaptive permutation test to be performed.
+ * 'mperm=[value]' causes a max(T) permutation test with the specified
+ number of replications to be performed.
+ * 'perm-count' causes the permutation test report to include counts instead
+ of frequencies.
+ * 'counts' causes --assoc to report allele counts instead of frequencies.
+ * 'set-test' tests the significance of variant sets. Requires permutation;
+ can be customized with --set-p/--set-r2/--set-max.
+ * 'dom', 'rec', 'gen', and 'trend' force the corresponding test to be used
+ as the basis for --model permutation. (By default, the most significant
+ result among the allelic, dominant, and recessive tests is used.)
+ * 'trend-only' causes only the trend test to be performed.
+ Given a quantitative phenotype, --assoc normally performs a Wald test.
+ * In this case, the 'qt-means' modifier causes trait means and standard
+ deviations stratified by genotype to be reported as well.
+ * 'lin' causes the Lin statistic to be computed, and makes it the basis for
+ multiple-testing corrections and permutation tests.
+ Several other flags (most notably, --aperm) can be used to customize the
+ permutation test.
+
+ --mh
+ (alias: --cmh)
+ --bd
+ --mh2
+ --homog
+ Given a case/control phenotype and a set of clusters, --mh computes 2x2xK
+ Cochran-Mantel-Haenszel statistics for each variant, while --bd also
+ performs the Breslow-Day test for odds ratio homogeneity. Permutation and
+ variant set testing based on the CMH (default) or Breslow-Day (when
+ 'perm-bd' is present) statistic are supported.
+ The following similar analyses are also available:
+ * --mh2 swaps the roles of case/control status and cluster membership,
+ performing a phenotype-stratified IxJxK Cochran-Mantel-Haenszel test on
+ association between cluster assignments and genotypes.
+ * --homog executes an alternative to the Breslow-Day test, based on
+ partitioning of the chi-square statistic.
+
+ --gxe {covariate index}
+ Given both a quantitative phenotype and a case/control covariate loaded
+ with --covar defining two groups, --gxe compares the regression coefficient
+ derived from considering only members of one group to the regression
+ coefficient derived from considering only members of the other. By
+ default, the first covariate in the --covar file defines the groups; use
+ e.g. '--gxe 3' to base them on the third covariate instead.
+
+ --linear
-
- Multi-covariate association analysis on a quantitative (--linear) or
- case/control (--logistic) phenotype. Normally used with --covar.
- * 'perm' normally causes an adaptive permutation test to be performed on
- the main effect, while 'mperm=[value]' starts a max(T) permutation test.
- * 'perm-count' causes the permutation test report to include counts instead
- of frequencies.
- * 'set-test' tests the significance of variant sets. Requires permutation;
- can be customized with --set-p/--set-r2/--set-max.
- * The 'genotypic' modifier adds an additive effect/dominance deviation 2df
- joint test (0/1/2 and 0/1/0 coding), while 'hethom' uses 0/0/1 and 0/1/0
- coding instead. If permutation is also requested, these modifiers cause
- permutation to be based on the joint test.
- * 'dominant' and 'recessive' specify a model assuming full dominance or
- recessiveness, respectively, for the A1 allele.
- * 'no-snp' causes regression to be performed only on the phenotype and the
- covariates, without reference to genomic data. If permutation is also
- requested, results are reported for all covariates.
- * 'hide-covar' removes covariate-specific lines from the report.
- * By default, sex (male = 1, female = 0) is automatically added as a
- covariate on X chromosome variants, and nowhere else. The 'sex' modifier
- causes it to be added everywhere, while 'no-x-sex' excludes it.
- * 'interaction' adds genotype x covariate interactions to the model. This
- cannot be used with the usual permutation tests; use --tests to define
- the permutation test statistic instead.
- * 'intercept' causes intercepts to be included in the main report.
- * For logistic regressions, the 'beta' modifier causes regression
- coefficients instead of odds ratios to be reported.
- * With --linear, the 'standard-beta' modifier standardizes the phenotype
- and all predictors to zero mean and unit variance before regression.
-
- --dosage [allele dosage file]
-
-
- --dosage [list file] list
-
-
- --write-dosage
- Process (possibly gzipped) text files with variant-major allelic dosage
- data. This cannot be used with a regular input fileset; instead, you must
- *only* specify a .fam and possibly a .map file, and you can't specify any
- other commands.
- * PLINK 2.0 will have first-class support for genotype probabilities. An
- equivalent data import flag will be provided then, and --dosage will be
- retired.
- * By default, --dosage assumes that only one allelic dosage file should be
- loaded. To specify multiple files,
- 1. create a master list with one entry per line. There are normally two
- supported formats for this list: just a filename per line, or variant
- batch numbers in the first column and filenames in the second.
- 2. Provide the name of that list as the first --dosage parameter.
- 3. Add the 'list' modifier.
- * By default, --dosage assumes the allelic dosage file(s) contain a header
- line, which has 'SNP' in column i+1, 'A1' in column i+j+2, 'A2' in column
- i+j+3, and sample FID/IIDs starting from column i+j+k+4. (i/j/k are
- normally zero, but can be changed with 'skip0', 'skip1', and 'skip2'
- respectively.) If such a header line is not present,
- * when all samples appear in the same order as they do in the .fam file,
- you can use the 'noheader' modiifer.
- * Otherwise, use the 'sepheader' modifier, and append sample ID filenames
- to your 'list' file entries.
- * The 'format' modifier lets you specify the number of values used to
- represent each dosage. 'format=1' normally indicates a single 0..2 A1
- expected count; 'dose1' modifies this to a 0..1 frequency. 'format=2'
- (the default) indicates a 0..1 homozygous A1 likelihood followed by a
- 0..1 het likelihood, while 'format=3' indicates 0..1 hom A1, 0..1 het,
- 0..1 hom A2.
- * 'Zout' causes the output file to be gzipped.
- * Normally, an association analysis is performed. 'standard-beta' and
- 'sex' behave as they are supposed to with --linear/--logistic.
- 'case-control-freqs' causes case and control allele frequencies to be
- reported separately.
- * There are three alternate modes which cause the association analysis to
- be skipped.
- * 'occur' requests a simple variant occurrence report.
- * --write-dosage causes a simple merged file matching the 'format'
- specification (not including 'dose1') to be generated.
- * --score applies a linear scoring system to the dosages.
-
- --lasso [h2 estimate] {min lambda}
- Estimate variant effect sizes via LASSO regression. You must provide an
- additive heritability estimate to calibrate the regression.
- Note that this method may require a very large sample size (e.g. hundreds
- of thousands) to be effective on complex polygenic traits.
-
- --test-missing
- Check for association between missingness and case/control status, using
- Fisher's exact test. The 'midp' modifier causes Lancaster's mid-p
- adjustment to be applied.
-
- --make-perm-pheno [ct]
- Generate phenotype permutations and write them to disk, without invoking an
- association test.
-
- --tdt
-
- Report transmission disequilibrium test statistics, given case/control
- phenotypes and pedigree information.
- * A Mendel error check is performed before the main tests; offending
- genotypes are treated as missing by this analysis.
- * By default, the basic TDT p-value is based on a chi-square test unless
- you request the exact binomial test with 'exact' or 'exact-midp'.
- * 'perm'/'mperm=[value]' requests a family-based adaptive or max(T)
- permutation test. By default, the permutation test statistic is the
- basic TDT p-value; 'parentdt1'/'parentdt2' cause parenTDT or combined
- test p-values, respectively, to be considered instead.
- * 'set-test' tests the significance of variant sets. This cannot be used
- with exact tests for now.
- The 'poo' modifier causes a parent-of-origin analysis to be performed
- instead, with transmissions from heterozygous fathers and heterozygous
- mothers considered separately.
- * The parent-of-origin analysis does not currently support exact tests.
- * By default, the permutation test statistic is the absolute
- parent-of-origin test Z score; 'pat'/'mat' cause paternal or maternal TDT
- chi-square statistics, respectively, to be considered instead.
-
- --qfam
- --qfam-parents
- --qfam-between
- --qfam-total
- QFAM family-based association test for quantitative traits.
- * A Mendel error check is performed before the main tests; offending
- genotypes are treated as missing by this analysis.
- * This procedure requires permutation. 'perm' and 'perm-count' have the
- usual meanings. However, 'mperm=[value]' just specifies a fixed number
- of permutations; the method does not support a proper max(T) test.
- * The 'emp-se' modifier adds BETA and EMP_SE (empirical standard error for
- beta) fields to the .perm output file.
-
- --annotate [PLINK report]
-