Skip to content

ArcInstitute/binseq-bindings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BINSEQ Bindings

This repository provides language bindings for the BINSEQ library, a high-performance Rust library for working with fixed-length DNA sequences in the BINSEQ (.bq) file format.

Currently supported bindings:

  • C bindings via FFI
  • C++ bindings via CXX

Installation

Prerequisites:

  • cargo - Rust's package manager
  • C/C++ compiler (gcc, clang, or MSVC)
  • Make (for building examples)

Building from Source

  1. Clone the repository:
git clone github.com:arcinstitute/binseq-bindings.git
cd binseq-bindings
  1. Compile the Rust libraries:
cargo build --release

The compiled libraries will be available at:

  • C bindings: ./target/release/libbinseq.so (Linux) or ./target/release/libbinseq.dylib (macOS)
  • C++ bindings: ./target/release/libbinseq_cxx.a (static library)

Headers will be auto-generated at:

  • C header: ./binseq-c/binseq.h
  • C++ header: ./target/cxxbridge/binseq-cxx/src/lib.rs.h

Using the C Bindings

The C bindings provide a simple, C-compatible API for working with BINSEQ files.

Building and Running the C Example

cd binseq-c/examples/
make
./simple_example <path_to_file>.bq

Example C Usage

// Create a reader
struct BinseqReader *reader = binseq_reader_open("sequence.bq");

// Get file info
size_t num_records = binseq_reader_num_records(reader);
uint32_t seq_len = binseq_reader_slen(reader);

// Create decoding context and record container
struct BinseqContext *ctx = binseq_context_new();
struct BinseqRecord *record = binseq_record_new();

// Process records
for (size_t i = 0; i < num_records; i++) {
    binseq_reader_get_record(reader, i, record);

    // Decode sequence
    size_t len = binseq_record_decode_primary(record, ctx);

    // Access sequence data
    const char *seq = binseq_context_primary_ptr(ctx);
    // Process sequence...
}

// Cleanup
binseq_record_free(record);
binseq_context_free(ctx);
binseq_reader_close(reader);

Using the C++ Bindings

The C++ bindings provide a more idiomatic C++ API with better type safety.

Building and Running the C++ Example

cd binseq-cxx/examples/
make
./simple_example <path_to_file>.bq

Example C++ Usage

#include "rust/cxx.h"
#include "binseq-cxx/src/lib.rs.h"

// Open a reader
auto reader = open_mmap_reader("sequence.bq");

// Get file info
size_t num_records = reader->num_records();
uint32_t slen = reader->get_slen();

// Create buffer for sequence data
rust::Vec<uint8_t> sbuf;

// Process records
for (size_t i = 0; i < num_records; i++) {
    auto record = reader->get_record(i);

    // Decode sequence
    sbuf.clear();
    record->decode_s(sbuf);

    // Convert to string if needed
    std::string seq_str(sbuf.begin(), sbuf.end());

    // Process sequence...
}

Generating BINSEQ Files

You can use bqtools to encode FASTA or FASTQ files into BINSEQ.

# Encode into BINSEQ
bqtools encode <fastq> -o <output>.bq

# Decode from BINSEQ into FASTQ
bqtools decode <binseq> -fq -o <fastq>

BINSEQ is built for paired records and encodes them both into a single file:

# Encode record pairs into BINSEQ
bqtools encode <R1> <R2> -o <output>.bq

BINSEQ Format

BINSEQ (.bq) is a binary file format designed for efficient storage of fixed-length DNA sequences. It uses 2-bit encoding for nucleotides (A=00, C=01, G=10, T=11) and focuses exclusively on sequence data, optimizing for modern high-throughput sequencing applications.

For more details on the BINSEQ format, see the original BINSEQ documentation.

Performance

The BINSEQ format and these bindings are designed for high-performance processing of sequence data. Performance benchmarks can be run with the included examples.

About

C and C++ Bindings to the BINSEQ library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published