Skip to content

1.22

Latest
Compare
Choose a tag to compare
@daviesrob daviesrob released this 30 May 10:16
· 7 commits to develop since this release
1.22

Download the source code here: htslib-1.22.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they are missing some generated files.)

Note this release changes the default output CRAM version from 3.0 to 3.1. HTSlib and SAMtools have been able to read CRAM 3.1 since version 1.12, however other tools may not yet be able to cope. We know Noodles reads CRAM3.1 and htsjdk has a draft implementation that has not yet been released.
HTSlib has options for modifying the output formats, which are exposed in SAMtools. When specifying an output format you can explicitly change the version via e.g. samtools view -O cram,version=3.0 ....
Further documentation on this change can be found at https://www.htslib.org/benchmarks/CRAM.html

HTSlib no longer fetches CRAM reference data from EBI's server by default. Your organisation may wish to set up local infrastructure to supply reference sequences, e.g., using the new ref-cache tool included in this HTSlib release. See the REF_CACHE and REF_PATH environment variables documented in https://www.htslib.org/doc/reference_seqs.html and the SAMtools manpage for details.

Updates

  • NEW. Add ref-cache, a caching proxy for reference sequences. This is a local server of reference sequences, for use when encoding or decoding CRAM files that use reference-based compression. (PR #1911, PR #1921, PR #1922)

  • Add support for matching VCF lines by ID. (PR #1844, addresses issue samtools/bcftools#1739 reported by Han Cao)

  • Make it possible to test for VCF_REF as declared in the documentation. (PR #1879)

  • Updated VCF code to work with VCF 4.4 prefixed phasing info. (PR #1861, fixes #1847. Reported by John Marshall)

  • Use the highest VCF version when merging headers. (PR #1912, see samtools/bcftools#2395 and samtools/bcftools#2404)

  • Update RLEN calculation for VCF 4.4 and 4.5. (PR #1897, fixes #1820. Reported by Dave Lawrence)

  • Convert U to T instead of U to N when sam_parsing. Though SAM format itself can contain U the BAM format cannot. (PR #1854, fixes samtools/samtools#2131 reported by James Ferguson)

  • Add an hts_crc32 function to use zlib or libdeflate. The libdeflate crc32 function is faster than native zlib and should be used when available. (PR #1850)

  • Increase the input block size for bgzip. This deals with a slow down introduced in PR #1493 when reading from a pipe. (PR #1768, fixes #1767. Reported by Konstantin Riege)

  • Allow BYTE_ARRAY_STOP to work on non-zero STOP code with TOK3. Although the htscodecs name tokeniser uses a NUL between names there is no reason why another value could not be used. This change lets CRAM recognise other separator values. (PR #1871)

  • Remove cram seek ability to do range queries via SEEK_CUR. A probable misfeature from the original implementation. (PR #1878, fixes #1877. Reported by Rick Wertenbroek)

  • Add hts_tpool_worker_id() API. This may be used to associate data with a thread rather than to a job. (PR #1875)

  • Update bcf_synced_reader to use htsFile. (PR #1868, implements #1862. Requested by Brent Pedersen)

  • Exit with return value 1 on tabix parse error. This previously returned 0. (PR #1887, fixes #1885. Reported by Fan-iX)

  • Automatically recognise BED vs TSV files and add the option -C, --coords to set index positions (1 or 0 based coordinates) in annot-tsv. (PR #1894)

  • Reading SQ lines with multiple differing LN will now fail. Such lines are invalid (by the spec) and previous handling was inconsistent. (PR #1882, fixes #1866)

  • Return errors instead of EOF after all I/O errors etc in hts_itr_multi_next/sam_itr_next/sam_read1/vcf_parse/bcf_read. (PR #1899. Thanks to John Marshall)

  • Remove UR:file:// and UR:ftp:// from ref search path, plus REF_PATH to EBI. Removing EBI as the default fallback when REF_PATH not set prevents the unintended DDOS on EBI's servers. (PR #1881. PR #1915, fixes oss-fuzz issue 418125747)

Build Changes

  • Detect the presence of getauxval() and elf_aux_info() for *BSD variants. (PR #1835, thanks to Brad Smith)

  • Make HAVE_ATTRIBUTE_TARGET check also check that SSSE3 intrinsics work. Mainly for use with old compilers. (PR #1886, fixes #1838 and pysam-developers/pysam#1327. Thanks to John Marshall)

  • Fix broken tests due to MSYS2 changes. Due to changes in how MSYS2 perl reported the identity of the OS it was built for, our tests were failing to adapt to the Windows style file locations. (PR #1892)

  • Updated htscodecs submodule to version 1.6.3 (PR #1917)

  • Fix the script used to build the symbol version file. (PR #1918)

Bug fixes

  • Fix possible 1 byte underflow in find_file_extension(). Fixes an issue reported by OSS-Fuzz. (PR #1840, fixes oss-fuzz id 71740)

  • Replace home-brew string end searching with memchr() to speed up looking at long aux tags. (PR #1842)

  • Prevent segfault on empty tbi index. This could happen when a VCF file has a header but no data lines. (PR #1845, fixes samtools/bcftools#2286. Reported by Devon Ryan)

  • Fix CRAM embed_ref=2 with seqs overlapping ref end. (PR #1848 and PR #1849 which fixed oss-fuzz issue 372547397)

  • Fix sam_hdr_remove_line_pos() not dealing with the 0 index position properly. (PR #1853. Thanks to Julian Regalado Perez)

  • Fix threaded sam_read1() after EOF. Prevents sam_read1() getting stuck when trying to read after EOF and waiting forever for data that is never going to arrive. (PR #1856, fixes #1855. Reported by Yan Gao)

  • Fix a bug in breakend detection. It was incorrectly assuming that the ALT allele is of equal length to REF allele, but the VCF specification allows breakend insertions. (PR #1858, fixes samtools/bcftools#2317. Reported by Nicolai von Kügelgen).

  • Fix cram_encode fuzzer issue caused by negative reference lengths. Reported by OSS-Fuzz. (PR #1863 fixes oss-fuzz issue 382922241)

  • Fixed a typo in vcf.h. (PR #1870, thanks to Yu Wang)

  • Reset variant types after updating alleles with bcf_update_alleles() or bcf_update_alleles_str(). Prevents an out-of-bounds access by bcftools consensus. (PR #1883)

  • Recognize T > A[chr15:12345[ breakend type in VCF. (PR #1903, fixes samtools/bcftools#2389. Reported by Dennis Hendriksen)

  • Fix possible buffer overruns in expand_path(). (PR #1907)

Documentation updates

  • Add instructions to INSTALL for FreeBSD, NetBSD and OpenBSD. (PR #1843)

  • Clarify bam_set1() parameter documentation to note that quality values do not have the ASCII 33 offset. (PR #1891. Thanks to Chris Wright)

  • Fixed incorrectly named table in bam1_t structure documentation. (PR #1923. Thanks to Julian Hess)


Download the source code here: htslib-1.22.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they are missing some generated files.)