Skip to content
This repository was archived by the owner on Aug 11, 2021. It is now read-only.
This repository was archived by the owner on Aug 11, 2021. It is now read-only.

What is the actual sorting required for tabix? #29

@Shians

Description

@Shians

On http://www.htslib.org/doc/tabix.html it is indicated that the file should be position sorted.

The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface.

However in many usages I see that the files are in fact first sorted by seqname THEN position. The tabix paper also seems to indicate this https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3042176/.

Before being indexed, the data file needs to be sorted first by sequence name
and then by leftmost coordinate

So does the documentation need to be updates, or has tabix been updated since to allow the seqname to be out of order?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions