Skip to content

Releases: samtools/bcftools

1.20

15 Apr 15:22
1.20
Compare
Choose a tag to compare

Download the source code here: bcftools-1.20.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

Changes affecting the whole of bcftools, or multiple commands:

  • Add short option -W for --write-index. The option now accepts an optional parameter which allows to choose between TBI and CSI index format.

Changes affecting specific commands:

  • bcftools consensus

    • Add new --regions-overlap option which allows to take into account overlapping deletions that start out of the fasta file target region.
  • bcftools isec

    • Add new option -l, --file-list to read the list of file names from a file
  • bcftools merge

    • Add new option --force-single to support single-file edge case (#2100)
  • bcftools mpileup

    • Add new option --indels-cns for an alternative indel calling model, which should increase the speed on long read data (thanks to using edlib) and the precision (thanks to a number of heuristics).
  • bcftools norm

    • Change the order of atomization and multiallelic splitting (when both -a,-m are given) from "atomize first, then split" to "split first, then atomize". This usually results in a simpler VCF representation. The previous behaviour can be achieved by explicitly streaming the output of the --atomize command into the --multiallelics splitting command.

    • Fix Type=String multiallelic splitting for Number=A,R,G tags with incorrect number of values.

    • Merging into multiallelic sites with bcftools norm -m +indels did not work. This is now fixed and the merging is now more strict about variant types, for example complex events, such as AC>TGA, are not considered as indels anymore (#2084)

  • bcftools reheader

    • Allow reading the input file from a stream with --fai (#2088)
  • bcftools +setGT

    • Support for custom genotypes based on the allele with higher depth, such as --new-gt c:0/X custom genotypes (#2065)
  • bcftools +split-vep

    • When only one of the tags is present, automatically choose INFO/BCSQ (the default tag name produced by bcftools csq) or INFO/CSQ (produced by VEP). When both tags are present, use the default INFO/CSQ.

    • Transcript selection by MANE, PICK, and user-defined transcripts, for example:

      --select CANONICAL=YES
      --select MANE_SELECT!=""
      --select PolyPhen~probably_damaging

    • Select all matching transcripts via --select, not just one

    • Change automatic type parsing of VEP fields DNA_position, CDS_position, and Protein_position from Integer to String, as it can be of the form "8586-8599/9231". The type Integer can be still enforced with
      -c cDNA_position:int,CDS_position:int,Protein_position:int.

    • Recognize -c field:str, not just -c field:string, as advertised in the usage page

    • Fix a bug which made filtering expression containing missing values crash (#2098)

  • bcftools stats

    • When GT is missing but AD is present, the program determines the alternate allele from AD. However, if the AD tag has incorrect number of values, the program would exit with an error printing "Requested allele outside valid range". This is now fixed by taking into account the actual number of ALT alleles.
  • bcftools +tag2tag

    • Support for conversion from tags using localized alleles (e.g. LPL, LAD) to the family of standard tags (PL, AD)
  • bcftools +trio-dnm2

    • Extend --strictly-novel to exclude cases where the non-Mendelian allele is the reference allele. The change is motivated by the observation that this class of variants is enriched for errors (especially for indels), and better corresponds with the option name.

1.19

12 Dec 16:18
1.19
Compare
Choose a tag to compare

Download the source code here: bcftools-1.19.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

Changes affecting the whole of bcftools, or multiple commands:

  • Filtering expressions can be given a file with list of strings to match, this was previously possible only for the ID column. For example
   ID=@file            .. selects lines with ID present in the file
   INFO/TAG=@file.txt  .. selects lines where TAG has a string value listed in the file
   INFO/TAG!=@file.txt .. TAG must not have a string value listed in the file
  • Allow to query REF,ALT columns directly, for example -e 'REF="N"'

Changes affecting specific commands:

  • bcftools annotate

  • bcftools call

    • Output MIN_DP rather than MinDP in gVCF mode

    • New -*, --keep-unseen-allele option to output the unobserved allele <*>, intended for gVCF.

  • bcftools head

    • New -s, --samples option to include the #CHROM header line with samples.
  • bcftools gtcheck

    • Add output options -o, --output and -O, --output-type

    • Add filtering options -i, --include and -e, --exclude

    • Rename the short option -e, --error-probability from lower case to upper case -E, --error-probability

    • Changes to the output format, replace the DC section with DCv2:

      • adds a new column for the number of matching genotypes

      • The --error-probability is newly interpreted as the probability of erroneous allele rather than genotype. In other words, the calculation of the discordance score now considers the probability of genotyping error to be different for HOM and HET genotypes, i.e. P(0/1|dsg=0) > P(1/1|dsg=0).

      • fixes in HWE score calculation plus output average HWE score rather than absolute HWE score

      • better description of fields

  • bcftools merge

    • Add -m modifiers to suppress the output of the unseen allele <*> or <NON_REF> at variant sites (e.g. -m both,*) or all sites (e.g. -m both,**)
  • bcftools mpileup

    • Output MIN_DP rather than MinDP in gVCF mode
  • bcftools norm

    • Add the number of joined lines to the summary output, for example
      Lines total/split/joined/realigned/skipped: 6/0/3/0/0

    • Allow combining -m and -a with --old-rec-tag (#2020)

    • Symbolic <DEL> alleles caused norm to expand REF to the full length of the deletion. This was not intended and problematic for long deletions, the REF allele should list one base only (#2029)

  • bcftools query

    • Add new -N, --disable-automatic-newline option for pre-1.18 query formatting behavior when newline would not be added when missing

    • Make the automatic addition of the newline character in a more predictable way and, when missing, always put it at the end of the expression. In version 1.18 it could be added at the end of the expression (for per-site expressions) or inside the square brackets (for per-sample expressions). The new behavior is:

      • if the formatting expression contains a newline character, do nothing
      • if there is no newline character and -N, --disable-automatic-newline is given, do nothing
      • if there is no newline character and -N is not given, insert newline at the end of the expression
        See #1969 for details
    • Add new -F, --print-filtered option to output a default string for samples that would otherwise be filtered by -i/-e expressions.

    • Include sample name in the output header with -H whenever it makes sense (#1992)

  • bcftools +spit-vep

    • Fix on the fly filtering involving numeric subfields, e.g. -i 'MAX_AF<0.001' (#2039)

    • Interpret default column type names (--columns-types) as entire strings, rather than substrings to avoid unexpected spurious matches (i.e. internally add ^ and $ to all field names)

  • bcftools +trio-dnm2

    • Do not flag paternal genotyping errors as de novo mutations. Specifically, when father's chrX genotype is 0/1 and mother's 0/0, 0/1 in the child will not be marked as DNM.
  • bcftools view

    • Add new -A, --trim-unseen-allele option to remove the unseen allele <*> or <NON_REF> at variant sites (-A) or all sites (-AA)

bcftools release 1.18:

25 Jul 13:08
1.18
Compare
Choose a tag to compare

Download the source code here: bcftools-1.18.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

Changes affecting the whole of bcftools, or multiple commands:

  • Support auto indexing during writing BCF and VCF.gz via new --write-index option

Changes affecting specific commands:

  • bcftools annotate

    • The -m, --mark-sites option can be now used to mark all sites without the need to provide the -a file (#1861)

    • Fix a bug where the -m function did not respect the --min-overlap option (#1869)

    • Fix a bug when update of INFO/END results in assertion error (#1957)

  • bcftools concat

    • New option --drop-genotypes
  • bcftools consensus

    • Support higher-ploidy genotypes with -H, --haplotype (#1892)

    • Allow --mark-ins and --mark-snv with a character, similarly to --mark-del

  • bcftools convert

    • Support for conversion from tab-delimited files (CHROM,POS,REF,ALT) to sites-only VCFs
  • bcftools csq

    • New --unify-chr-names option to automatically unify different chromosome naming conventions in the input GFF, fasta and VCF files (e.g. "chrX" vs "X")

    • More versatility in parsing various flavors of GFF

    • A new --dump-gff option to help with debugging and investigating the internals of hGFF parsing

    • When printing consequences in nonsense mediated decay transcripts, include 'NMD_transcript' in the consequence part of the annotation. This is to make filtering easier and analogous to VEP annotations. For example the consequence annotation 3_prime_utr|PCGF3|ENST00000430644|NMD is newly printed as 3_prime_utr&NMD_transcript|PCGF3|ENST00000430644|NMD

  • bcftools gtcheck

    • Add stats for the number of sites matched in the GT-vs-GT, GT-vs-PL, etc modes. This information is important for interpretation of the discordance score, as only the GT-vs-GT matching can be interpreted as the number of mismatching genotypes.
  • bcftools +mendelian2

    • Fix in command line argument parsing, the -p and -P options were not functioning (#1906)
  • bcftools merge

    • New -M, --missing-rules option to control the behavior of merging of vector tags to prevent mixtures of known and missing values in tags when desired

    • Use values pertaining to the unknown allele (<*> or <NON_REF>) when available to prevent mixtures of known and missing values (#1888)

    • Revamped line matching code to fix problems in gVCF merging where split gVCF blocks would not update genotypes (#1891, #1164).

  • bcftool mpileup

    • Fix a bug in --indels-v2.0 which caused an endless loop when CIGAR operator H or P was encountered
  • bcftools norm

    • The -m, --multiallelics + mode now preserves phasing (#1893)

    • Symbolic <DEL.*> alleles are now normalized too (#1919)

    • New -g, --gff-annot option to right-align indels in forward transcripts to follow HGVS 3'rule (#1929)

  • bcftools query

    • Force newline character in formatting expression when not given explicitly

    • Fix -H header output in formatting expressions containing newlines

  • bcftools reheader

    • Make -f, --fai aware of long contigs not representable by 32-bit integer (#1959)
  • bcftools +split-vep

    • Prevent a segfault when -i/-e use a VEP subfield not included in -f or -c (#1877)

    • New -X, --keep-sites option complementing the existing -x, --drop-sites options

    • Force newline character in formatting expression when not given explicitly

    • Fix a subtle ambiguity: identical rows must be returned when -s is applied regardless of -f containing the -a VEP tag itself or not.

  • bcftools stats

    • Collect new VAF (variant allele frequency) statistics from FORMAT/AD field

    • When counting transitions/transversions, consider also alternate het genotypes

  • plot-vcfstats

    • Add three new VAF plots

bcftools release 1.17:

21 Feb 14:31
1.17
Compare
Choose a tag to compare

Download the source code here: bcftools-1.17.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

Changes affecting the whole of bcftools, or multiple commands:

  • The -i/-e filtering expressions

    • Error checks were added to prevent incorrect use of vector arithmetics. For example, when evaluating the sum of two vectors A and B, the resulting vector could contain nonsense values when the input vectors were not of the same length. The fix introduces the following logic:

      • evaluate to C_i = A_i + B_i when length(A)==B(A) and set length(C)=length(A)
      • evaluate to C_i = A_i + B_0 when length(B)=1 and set length(C)=length(A)
      • evaluate to C_i = A_0 + B_i when length(A)=1 and set length(C)=length(B)
      • throw an error when length(A)!=length(B) AND length(A)!=1 AND length(B)!=1
    • Arrays in Number=R tags can be now subscripted by alleles found in FORMAT/GT. For example,
      FORMAT/AD[GT] > 10 .. require support of more than 10 reads for each allele
      FORMAT/AD[0:GT] > 10 .. same as above, but in the first sample
      sSUM(FORMAT/AD[GT]) > 20 .. require total sample depth bigger than 20

  • The commands consensus -H and +split-vep -H

    • Drop unnecessary leading space in the first header column and newly print #[1]columnName instead of the previous # [1]columnName (#1856)

Changes affecting specific commands:

  • bcftools +allele-length

    • Fix overflow for indels longer than 512bp and aggregate alleles equal or larger than that in the same bin (#1837)
  • bcftools annotate

    • Support sample reordering of annotation file (#1785)

    • Restore lost functionality of the --pair-logic option (#1808)

  • bcftools call

    • Fix a bug where too many alleles passed to -C alleles via -T caused memory corruption (#1790)

    • Fix a bug where indels constrained with -C alleles -T would sometimes be missed (#1706)

  • bcftools consensus

    • BREAKING CHANGE: the option -I, --iupac-codes newly outputs IUPAC codes based on FORMAT/GT of all samples. The -s, --samples and -S, --samples-file options can be used to subset samples. In order to ignore samples and consider only the REF and ALT columns (the original behavior prior to 1.17), run with -s - (#1828)
  • bcftools convert

    • Make variantkey conversion work for sites without an ALT allele (#1806)
  • bcftool csq

    • Fix a bug where a MNV with multiple consequences (e.g. missense + stop_gained) would report only the less severe one (#1810)

    • GFF file parsing was made slightly more flexible, newly ids can be just XXX rather than, for example, gene:XXX

    • New gff2gff perl script to fix GFF formatting differences

  • bcftools +fill-tags

    • More of the available annotations are now added by the -t all option
  • bcftools +fixref

    • New INFO/FIXREF annotation

    • New -m swap mode

  • bcftools +mendelian

    • The +mendelian plugin has been deprecated and replaced with +mendelian2. The function of the plugin is the same but the command line options and the output format has changed, and for this was introduced as a new plugin.
  • bcftools mpileup

    • Most of the annotations generated by mpileup are now optional via the -a, --annotate option and add several new (mostly experimental) annotations.

    • New option --indels-2.0 for an EXPERIMENTAL indel calling model. This model aims to address some known deficiencies of the current indel calling algorithm, specifically, it uses diploid reference consensus sequence. Note that in the current version it has the potential to increase sensitivity but at the cost of decreased specificity.

    • Make the FS annotation (Fisher exact test strand bias) functional and remove it from the default annotations

  • bcftools norm

    • New --multi-overlaps option allows to set overlapping alleles either to the ref allele (the current default) or to a missing allele (#1764 and #1802)

    • Fixed a bug in -m - which does not split missing FORMAT values correctly and could lead to empty FORMAT fields such as :: instead of the correct :.: (#1818)

    • The --atomize option previously would not split complex indels such as C>GGG. Newly these will be split into two records C>G and C>CGG (#1832)

  • bcftools query

    • Fix a rare bug where the printing of SAMPLE field with query was incorrectly suppressed when the -e option contained a sample expression while the formatting query did not. See #1783 for details.
  • bcftools +setGT

    • Add new --new-gt X option (#1800)

    • Add new --target-gt r:FLOAT option to randomly select a proportion of genotypes (#1850)

    • Fix a bug where -t ./x mode was advertised as selecting both phased and unphased half-missing genotypes, but was in fact selecting only unphased genotypes (#1844)

  • bcftools +split-vep

    • New options -g, --gene-list and --gene-list-fields which allow to prioritize consequences from a list of genes, or restrict output to the listed genes

    • New -H, --print-header option to print the header with -f

    • Work around a bug in the LOFTEE VEP plugin used to annotate gnomAD VCFs. There the LoF_info subfield contains commas which, in general, makes it impossible to parse the VEP subfields. The +split-vep plugin can now work with such files, replacing the offending commas with slash (/) characters. See also Ensembl/ensembl-vep#1351

    • Newly the -c, --columns option can be omitted when a subfield is used in -i/-e filtering expression. Note that -c may still have to be given when it is not possible to infer the type of the subfield. Note that this is an experimental feature.

  • bcftools stats

    • The per-sample stats (PSC) would not be computed when -i/-e filtering options and the -s - option were given but the expression did not include sample columns (1835)
  • bcftools +tag2tag

    • Revamp of the plugin to allow wider range of tag conversions, specifically all combinations from FORMAT/GL,PL,GP to FORMAT/GL,PL,GP,GT
  • bcftools +trio-dnm2

    • New -n, --strictly-novel option to downplay alleles which violate Mendelian inheritance but are not novel

    • Allow to set the --pn and --pns options separately for SNVs and indels and make the indel settings more strict by default

    • Output missing FORMAT/VAF values in non-trio samples, rather than random nonsense values

  • bcftools +variant-distance

    • New option -d, --direction to choose the directionality: forward, reverse, nearest (the default) or both (#1829)

1.16

18 Aug 14:11
1.16
Compare
Choose a tag to compare

Download the source code here: bcftools-1.16.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

  • New plugin bcftools +variant-distance to annotate records with distance to the nearest variant (#1690)

Changes affecting the whole of bcftools, or multiple commands:

  • The -i/-e filtering expressions

    • Added support for querying of multiple filters, for example -i 'FILTER="A;B"' can be used to select sites with two filters "A" and "B" set. See the documentation for more examples.

    • Added modulo arithmetic operator

Changes affecting specific commands:

  • bcftools annotate

    • A bug introduced in 1.14 caused that records with INFO/END annotation would incorrectly trigger -c ~INFO/END mode of comparison even when not explicitly requested, which would result in not transferring the annotation from a tab-delimited file (#1733)
  • bcftools merge

    • New -m snp-ins-del switch to merge SNVs, insertions and deletions separately (#1704)
  • bcftools mpileup

    • New NMBZ annotation for Mann-Whitney U-z test on number of mismatches within supporting reads

    • Suppress the output of MQSBZ and FS annotations in absence of alternate allele

  • bcftools +scatter

    • Fix erroneous addition of duplicate PG lines
  • bcftools +setGT

    • Custom genotypes (e.g. -n c:1/1) now correctly override ploidy

1.15.1

07 Apr 16:45
1.15.1
Compare
Choose a tag to compare

Download the source code here: bcftools-1.15.1.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

  • bcftools annotate

    • New -H, --header-line convenience option to pass a header line on command line, this complements the existing -h, --header-lines option which requires a file with header lines
  • bcftools csq

    • A list of consequence types supported by bcftools csq has been added to the manual page. (#1671)
  • bcftools +fill-tags

    • Extend generalized functions so that FORMAT tags can be filled as well, for example:

      bcftools +fill-tags in.bcf -o out.bcf -- -t 'FORMAT/DP:1=int(smpl_sum(FORMAT/AD))'

    • Allow multiple custom functions in a single run. Previously the program would silently go with the last one, assigning the same values to all (#1684)

  • bcftools norm

    • Fix an assertion failure triggered when a faulty VCF file with a '-' character in the REF allele was used with bcftools norm --atomize. This option now checks that the REF allele only includes the allowed characters A, C, G, T and N. (#1668)

    • Fix the loss of phasing in half-missing genotypes in variant atomization (#1689)

  • bcftools roh

    • Fix a bug that could result in an endless loop or incorrect AF estimate when missing genotypes are present and the --estimate-AF - option was used (#1687)
  • bcftools +split-vep

    • VEP fields with characters disallowed in VCF tag names by the specification (such as - in M-CAP) couldn't be queried. This has been fixed, the program now sanitizes the field names, replacing invalid characters with underscore (#1686)

1.15

21 Feb 15:05
1.15
Compare
Choose a tag to compare

Download the source code here: bcftools-1.15.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

  • New bcftools head subcommand for conveniently displaying the headers of a VCF or BCF file. Without any options, this is equivalent to bcftools view --header-only --no-version but more succinct and memorable.

  • The -T, --targets-file option had the following bug originating in HTSlib code: when an uncompressed file with multiple columns CHR,POS,REF was provided, the REF would be interpreted as 0 gigabases (#1598)

Changes affecting specific commands:

  • bcftools annotate

    • In addition to --rename-annots, which requires a file with name mappings, it is now possible to do the same on the command line -c NEW_TAG:=OLD_TAG

    • Add new option --min-overlap which allows to specify the minimum required overlap of intersecting regions

    • Allow to transfer ALT from VCF with or without replacement using:
      bcftools annotate -a annots.vcf.gz -c ALT file.vcf.gz
      bcftools annotate -a annots.vcf.gz -c +ALT file.vcf.gz

  • bcftools convert

    • Revamp of --gensample, --hapsample and --haplegendsample family of options which includes the following changes:

    • New --3N6 option to output/input the new version of the .gen file format, see https://www.cog-genomics.org/plink/2.0/formats#gen

    • Deprecate the --chrom option in favor of --3N6. A simple cut command can be used to convert from the new 3*M+6 column format to the format printed with --chrom (cut -d' ' -f1,3-).

    • The CHROM:POS_REF_ALT IDs which are used to detect strand swaps are required and must appear either in the "SNP ID" column or the "rsID" column. The column is autodetected for --gensample2vcf, can be the first or the second for --hapsample2vcf (depending on whether the --vcf-ids option is given), must be the first for --haplegendsample2vcf.

  • bcftools csq

    • Allow GFF files with phase column unset
  • bcftools filter

    • New --mask, --mask-file and --mask-overlap options to soft filter variants in regions (#1635)
  • bcftools +fixref

    • The -m id option now works also for non-dbSNP ids, i.e. not just rsINT

    • New -m flip-all mode for flipping all sites, including ambiguous A/T and C/G sites

  • bcftools isec

    • Prevent segfault on sites filtered with -i/-e in all files (#1632)
  • bcftools mpileup

    • More flexible read filtering using the options:
      --ls, --skip-all-set .. skip reads with all of the FLAG bits set
      --ns, --skip-any-set .. skip reads with any of the FLAG bits set
      --lu, --skip-all-unset .. skip reads with all of the FLAG bits unset
      --nu, --skip-any-unset .. skip reads with any of the FLAG bits unset

      The existing synonymous options will continue to function but their use is discouraged:

      --rf, --incl-flags STR|INT Required flags: skip reads with mask bits unset
      --ff, --excl-flags STR|INT Filter flags: skip reads with mask bits set

  • bcftools query

    • Make the --samples and --samples-file options work also in the --list-samples mode. Add a new --force-samples option which allows to proceed even when some of the requested samples are not present in the VCF (#1631)
  • bcftools +setGT

    • Fix a bug in -t q -e EXPR logic applied on FORMAT fields, sites with all samples failing the expression EXPR were incorrectly skipped. This problem affected only the use of -e logic, not the -i expressions (#1607)
  • bcftools sort

    • make use of the TMPDIR environment variable when defined
  • bcftools +trio-dnm2

    • The --use-NAIVE mode now also adds the de novo allele in FORMAT/VA

1.14

22 Oct 14:37
1.14
Compare
Choose a tag to compare

Download the source code here: bcftools-1.14.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

Changes affecting the whole of bcftools, or multiple commands

  • New --regions-overlap and --targets-overlap options which address a long-standing design problem with subsetting VCF files by region. BCFtools recognize two sets of options, one for streaming (-t/-T) and one for index-gumping (-r/-R). They behave differently, the first includes only records with POS coordinate within the regions, the other includes overlapping regions. The two new options allow to modify the default behaviour, see the man page for more details.

  • The --output-type option can be used to override the default compression level

Changes affecting specific commands

  • bcftools annotate

    • when --set-id and --remove are combined, --set-id cannot use tags deleted by --remove. This is now detected and the program exists with an informative error message instead of segfaulting (#1540)

    • while non-symbolic variation are uniquely identified by POS,REF,ALT, symbolic alleles starting at the same position were indistinguishable. This prevented correct matching of records with the same positions and variant type but different length given by INFO/END (samtools/htslib@60977f2). When annotating from a VCF/BCF, the matching is done automatically. When annotating from a tab-delimited text file, this feature can be invoked by using -c INFO/END.

    • add a new . modifier to control whether missing values should be carried over from a tab-delimited file or not. For example:

      -c TAG .. adds TAG if the source value is not missing. If TAG exists in the target file, it will be overwritten.
      -c .TAG .. adds TAG even if the source value is missing. This can overwrite non-missing values with a missing value and can create empty VCF fields (TAG=.)

  • bcftools +check-ploidy

    • by default missing genotypes are not used when determining ploidy. With the new option -m, --use-missing it is possible to use the information carried in the missing and half-missing genotypes (e.g. ., ./. or ./1)
  • bcftools concat:

    • new --ligate-force and --ligate-warn options for finer control of -l, --ligate behavior in imperfect overlaps. The new default is to throw an error when sites present in one chunk but absent in the other are encountered. To drop such sites and proceed, use the new --ligate-warn option (previously this was the default). To keep such sites, use the new --ligate-force option (#1567).
  • bcftools consensus:

    • Apply mask even when the VCF has no notion about the chromosome. It was possible to encounter this problem when contig lines were not present in the VCF header and no variants were called on that chromosome (#1592)
  • bcftools +contrast:

    • support for chunking within map/reduce framework allowing to collect NASSOC counts even for empty case/control sample sets (#1566)
  • bcftools csq:

    • bug fix, compound indels were not recognised in some cases (#1536)

    • compound variants were incorrectly marked as 'inframe' even when stop codon would occur before the frame was restored (#1551)

    • bug fix, FORMAT/BCSQ bitmasks could have been assigned incorrectly to some samples at multiallelic sites, a superset of the correct consequences would have been set (#1539)

    • bug fix, the upstream stop could be falsely assigned to all samples in a multi-sample VCF even if the stop was relevant for a single sample only (#1578)

    • further improve the detection of mismatching chromosome naming (e.g. "chrX" vs "X") in the GFF, VCF and fasta files

  • bcftools merge:

    • keep (sum) INFO/AN,AC values when merging VCFs with no samples (#1394)
  • bcftools mpileup:

    • new --indel-size option which allows to increase the maximum considered indel size considered, large deletions in long read data are otherwise lost.
  • bcftools norm:

    • atomization now supports Number=A,R string annotations (#1503)

    • assign as many alternate alleles to genotypes at multiallelic sites in the-m + mode, disregarding the phase. Previously the program assumed to be executed as an inverse operation of -m -, but when that was not the case, reference alleles would have been filled instead of multiple alternate alleles (#1542)

  • bcftools sort:

    • increase accuracy of the --max-mem option limit, previously the limit could be exceeded by more than 20% (#1576)
  • bcftools +trio-dnm:

    • new --with-pAD option to allow processing of VCFs without FORMAT/QS. The existing --ppl option was changed to the analogous --with-pPL
  • bcftools view:

    • the functionality of the option --compression-level lost in 1.12 has been restored

1.13

09 Jul 11:15
1.13
Compare
Choose a tag to compare

Download the source code here: bcftools-1.13.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

This release brings new options and significant changes in BAQ parametrization in bcftools mpileup. The previous behaviour can be triggered by providing the --config 1.12 option. Please see #1474 for details.

Changes affecting the whole of bcftools, or multiple commands:

  • Improved build system

Changes affecting specific commands:

  • bcftools annotate:

    • Fix rare a bug when INFO/END is present, all INFO fields are removed with bcftools annotate -x INFO and BCF output is produced. Then the removed INFO/END continues to inform the end coordinate and causes incorrect retrieval of records with the -r option (#1483)

    • Support for matching annotation line by ID, in addition to CHROM,POS,REF, and ALT (#1461)
      bcftools annotate -a annots.tab.gz -c CHROM,POS,~ID,REF,ALT,INFO/END input.vcf

  • bcftools csq:

    • When GFF and VCF/fasta use a different chromosome naming convention (e.g. chrX vs X), no consequences would be added. Newly the program attempts to detect these differences and remove/add the "chr" prefix to chromosome name to match the GFF and VCF/fasta (#1507)

    • Parametrize brief-predictions parameter to allow explicit number of amino acids to be printed. Note that the -b, --brief-predictions option is being replaced with -B, --trim-protein-seq INT

  • bcftools +fill-tags:

    • Generalization and better support for custom functions that allow adding new INFO tags based on arbitrary -i, --include type of expressions. For example, to calculate a missing INFO/DP annotation from FORMAT/AD, it is possible to use:
      -t 'DP:1=int(sum(FORMAT/AD))'
      Here the optional ":1" part specifies that a single value will be added (by default Number=. is used) and the optional int(...) adds an integer value (by default Type=Float is used).

    • When FORMAT/GT is not present, the INFO/AF tag will be newly calculated from INFO/AC and INFO/AN.

  • bcftools gtcheck:

    • Switch between FORMAT/GT or FORMAT/PL when one is (implicitly) requested but only the other is available

    • Improve diagnostics, printing warnings when a line cannot be matched and the number of lines skipped for various reasons (#1444)

    • Minor bug fix, with PLs being the default, the --distinctive-sites option started to require explicit --error-probability 0

  • bcftools index:

    • The program now accepts both data file name and the index file name. This adds to user convenience when running index statistics (-n, -s)
  • bcftools isec:

    • Always generate sites.txt with isec -p (#1462)
  • bcftools +mendelian:

    • Consider only complete trios, do not crash on sample name typos (#1520)
  • bcftools mpileup:

    • New --seed option for reproducibility of subsampling code in HTSlib

    • The SCR annotation which shows the number of soft-clipped reads now correctly pools reads together regardless of the variant type. Previously only reads with indels were included at indel sites.

    • Major revamp of BAQ. Please see #1474 for details. The previous behaviour can be triggered by providing the --config 1.12 option.

    • Thanks to improvements in HTSlib, the removal of overlapping reads (which can be disabled with the -x, --ignore-overlaps options) is not systematically biased any more (samtools/htslib#1273)

    • Modified scale of Mann-Whitney U tests. Newly INFO/*Z annotations will be printed, for example MQBZ replaces MQB.

  • bcftools norm:

    • Fix Type=Flag output in norm --atomize (#1472)

    • Atomization must not discard ALT=. records

    • Atomization of AD and QS tags now correctly updates occurrences of duplicate alleles within different haplotypes

    • Fix a bug in atomization of Number=A,R tags

  • bcftools reheader:

    • Add -T, --temp-prefix option
  • bcftools +setGT:

    • A wider range of genotypes can be set by the plugin by allowing specifying custom genotypes. For example, to force a heterozygous genotype it is now possible to use expressions like: c:'m|M' c:0/1 c:0
  • bcftools +split-vep:

    • New -u, --allow-undef-tags option

    • Better handling of ambiguous keys such as INFO/AF and CSQ/AD. The -p, --annot-prefix option is now applied before doing anything else which allows its use with -f, --format and -c, --columns options.

    • Some consequence field names may not constitute a valid tag name, such as "pos(1-based)". Newly field names are trimmed to exclude brackets.

  • bcftools +tag2tag:

    • New --QR-QA-to-QS option to convert annotations generated by Freebayes to QS used by BCFtools
  • bcftools +trio-dnm:

    • Add support for sites with more than four alleles. Note that only the four most frequent alleles are considered, the model remains unchanged. Previously such sites were skipped.

    • New --use-NAIVE option for a naive DNM calling based solely on FORMAT/GT and expected Mendelian inheritance. This option is suitable for pre-filtering.

    • Fix behaviour to match the documentation, the --dnm-tag DNG option now correctly outputs log scaled values by default, not phred scaled.

    • Fix bug in VAF calculation, homozygous de novo variants were incorrectly reported as having VAF=50%

    • Fix arithmetic underflow which could lead to imprecise scores and improve sensitivity in high coverage regions

    • Allow combining --pn and --pns to set the noise thresholds independently

1.12

17 Mar 16:21
1.12
Compare
Choose a tag to compare

Download the source code here: bcftools-1.12.tar.bz2.
(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

Changes affecting the whole of bcftools, or multiple commands:

  • The output file type is determined from the output file name suffix, where available, so the -O/--output-type option is often no longer necessary.

  • Make F_MISSING in filtering expressions work for sites with multiple ALT alleles (#1343)

  • Fix N_PASS and F_PASS to behave according to expectation when reverse logic is used (#1397). This fix has the side effect of query (or programs like +trio-stats) behaving differently with these expressions, operating now in site-oriented rather than sample-oriented mode. For example, the new behavior could be:

    bcftools query -f'[%POS %SAMPLE %GT\n]' -i'N_PASS(GT="alt")==1'
    11	A	0/0
    11	B	0/0
    11	C	1/1
    

    while previously the same expression would return:

    11	C	1/1
    

    The original mode can be mimicked by splitting the filtering into two steps:

    bcftools view -i'N_PASS(GT="alt")==1' | bcftools query -f'[%POS %SAMPLE %GT\n]' -i'GT="alt"'
    

Changes affecting specific commands:

  • bcftools annotate:

    • New --rename-annots option to help fix broken VCFs (#1335)

    • New -C option allows to read a long list of options from a file to prevent very long command lines.

    • New append-missing logic allows annotations to be added for each ALT allele in the same order as they appear in the VCF. Note that this is not bullet proof. In order for this to work:

      • the annotation file must have one line per ALT allele

      • fields must contain a single value as multiple values are appended as they are and would break the correspondence between the alleles and values

  • bcftools concat:

    • Do not phase genotypes by mistake if they are not already phased with -l (#1346)
  • bcftools consensus:

    • New --mask-with, --mark-del, --mark-ins, --mark-snv options (#1382, #1381, #1170)

    • Symbolic <DEL> should have only one REF base. If there are multiple, take POS+1 as the first deleted base.

    • Make consensus work when the first base of the reference genome is deleted. In this situation the VCF record has POS=1 and the first REF base cannot precede the event. (#1330)

  • bcftools +contrast:

    • The NOVELGT annotation was previously not added when requested.
  • bcftools convert:

    • Make the --hapsample and --hapsample2vcf options consistent with each other and with the documentation.
  • bcftools call:

    • Revamp of call -G, previously sample grouping by population was not truly independent and could still be influenced by the presence of other sample groups.

    • Optional addition of INFO/PV4 annotation with call -a INFO/PV4

    • Remove generation of useless HOB and ICB annotation; use +fill-tags -- -t HWE,ExcHet instead

    • The call -f option was renamed to -a to (1) make it consistent with mpileup and (2) to indicate that it includes both INFO and FORMAT annotations, not just FORMAT as previously

    • Any sensible Number=R,Type=Integer annotation can be used with -G, such as AD or QS

    • Don't trim QUAL; although usefulness of this change is questionable for true probabilistic interpretation (such high precision is unrealistic), using QUAL as a score rather than probability is helpful and permits more fine-grained filtering

    • Fix a suspected bug in call -F in the worst case, for certain improve readability

    • call -C trio is temporarily disabled

  • bcftools csq:

    • Fix a bug wich caused incorrect FORMAT/BCSQ formatting at sites with too many per-sample consequences

    • Fix a bug which incorrectly handled the --ncsq parameter and could clash with reserved BCF values, consequently producing truncated or even incorrect output of the %TBCSQ formatting expression in bcftools query. To account for the reserved values, the new default value is --ncsq 15 (#1428)

  • bcftools +fill-tags:

    • MAF definition revised for multiallelic sites, the second most common allele is considered to be the minor allele (#1313)

    • New FORMAT/VAF, VAF1 annotations to set the fraction of alternate reads provided FORMAT/AD is present

  • bcftools gtcheck:

    • support matching of a single sample against all other samples in the file with -s qry:sample -s gt:-. This was previously not possible, either full cross-check mode had to be run or a list of pairs/samples had to be created explicitly
  • bcftools merge:

    • Make merge -R behavior consistent with other commands and pull in overlapping records with POS outside of the regions (#1374)

    • Bug fix (#1353)

  • bcftools mpileup:

    • Add new optional tag mpileup -a FORMAT/QS
  • bcftools norm:

    • New -a, --atomize functionality to decompose complex variants, for example MNVs into consecutive SNVs

    • New option --old-rec-tag to indicate the original variant

  • bcftools query:

    • Incorrect fields were printed in the per-sample output when subset of samples was requested via -s/-S and the order of samples in the header was different from the requested -s/-S order (#1435)
  • bcftools +prune:

    • New options --random-seed and --nsites-per-win-mode (#1050)
  • bcftools +split-vep:

    • Transcript selection now works also on the raw CSQ/BCSQ annotation.

    • Bug fix, samples were dropped on VCF input and VCF/BCF output (#1349)

  • bcftools stats:

    • Changes to QUAL and ts/tv plotting stats: avoid capping QUAL to predefined bins, use an open-range logarithmic binning instead

    • plot dual ts/tv stats: per quality bin and cumulative as if threshold applied on the whole dataset

  • bcftools +trio-dnm2:

    • Major revamp of +trio-dnm plugin, which is now deprecated and replaced by +trio-dnm2.
      The original trio-dnm calling model used genotype likelihoods (PLs) as the input for calling. However, that is flawed because PLs make assumptions which are unsuitable for de novo calling: PL(RR) can become bigger than PL(RA) even when the ALT allele is present in the parents. Note that this is true also for other programs such as DeNovoGear which rely on the same samtools calculation.
      The new recommended workflow is:
      bcftools mpileup -a AD,QS -f ref.fa -Ou proband.bam father.bam mother.bam | \
      bcftools call -mv -Ou | \
      bcftools +trio-dnm -p proband,father,mother -Oz -o output.vcf.gz
      
      This new version also implements the DeNovoGear model. The original behavior of trio-dnm is no longer supported.
      For more details see http://samtools.github.io/bcftools/trio-dnm.pdf