Skip to content

rob-p/metagraph

 
 

Repository files navigation

Metagenome Graph Project

Install

See installation instructions or docs online.

Typical workflow

  1. Build de Bruijn graph from Fasta files, FastQ files, or KMC k-mer counters:
    ./metagraph build
  2. Annotate graph using the column compressed annotation:
    ./metagraph annotate
  3. Transform the built annotation to a different annotation scheme:
    ./metagraph transform_anno
  4. Query annotated graph
    ./metagraph query

Example

DATA="../tests/data/transcripts_1000.fa"

./metagraph build -k 12 -o transcripts_1000 $DATA

./metagraph annotate -i transcripts_1000.dbg --anno-filename -o transcripts_1000 $DATA

./metagraph query -i transcripts_1000.dbg -a transcripts_1000.column.annodbg $DATA

./metagraph stats -a transcripts_1000.column.annodbg transcripts_1000.dbg

Print usage

./metagraph

Build graph

  • Simple build

./metagraph build -v --parallel 30 -k 20 --mem-cap-gb 10 \
                        -o <GRAPH_DIR>/graph <DATA_DIR>/*.fasta.gz \
2>&1 | tee <LOG_DIR>/log.txt
  • Build with disk swap (use to limit the RAM usage)

./metagraph build -v --parallel 30 -k 20 --mem-cap-gb 10 --disk-swap <GRAPH_DIR> \
                        -o <GRAPH_DIR>/graph <DATA_DIR>/*.fasta.gz \
2>&1 | tee <LOG_DIR>/log.txt

Build from k-mers filtered with KMC

K=20
./KMC/kmc -ci5 -t4 -k$K -m5 -fm <FILE>.fasta.gz <FILE>.cutoff_5 ./KMC
./metagraph build -v -p 4 -k $K --mem-cap-gb 10 -o graph <FILE>.cutoff_5.kmc_pre

Annotate graph

./metagraph annotate -v --anno-type row --fasta-anno \
                           -i primates.dbg \
                           -o primates \
                           ~/fasta_zurich/refs_chimpanzee_primates.fa

Convert annotation to Multi-BRWT

  1. Cluster columns
./metagraph transform_anno -v --linkage --greedy \
                           -o linkage.txt \
                           --subsample R \
                           -p NCORES \
                           primates.column.annodbg

Requires N*R/8 + 6*N^2 bytes of RAM, where N is the number of columns and R is the number of rows subsampled.

  1. Construct Multi-BRWT
./metagraph transform_anno -v -p NCORES --anno-type brwt \
                           --linkage-file linkage.txt \
                           -o primates \
                           --parallel-nodes V \
                           -p NCORES \
                           primates.column.annodbg

Requires M*V/8 + Size(BRWT) bytes of RAM, where M is the number of rows in the annotation and V is the number of nodes merged concurrently.

Query graph

./metagraph query -v -i <GRAPH_DIR>/graph.dbg \
                        -a <GRAPH_DIR>/annotation.column.annodbg \
                        --discovery-fraction 0.8 --labels-delimiter ", " \
                        query_seq.fa

Align to graph

./metagraph align -v -i <GRAPH_DIR>/graph.dbg query_seq.fa

Assemble sequences

./metagraph assemble -v <GRAPH_DIR>/graph.dbg \
                        -o assembled.fa \
                        --unitigs

Assemble differential sequences

./metagraph assemble -v <GRAPH_DIR>/graph.dbg \
                        --unitigs \
                        -a <GRAPH_DIR>/annotation.column.annodbg \
                        --label-mask-in LABEL_1 \
                        --label-mask-in LABEL_2 \
                        --label-mask-out LABEL_3 \
                        -o diff_assembled.fa

Get stats

Stats for graph

./metagraph stats graph.dbg

Stats for annotation

./metagraph stats -a annotation.column.annodbg

Stats for both

./metagraph stats -a annotation.column.annodbg graph.dbg

Developer Notes

Makefile

The Makefile in the top level source directory can be used to build and test metagraph more conveniently. The following arguments are supported:

  • env: environment in which to compile/run ("": on the host, docker: in a docker container)
  • alphabet: compile metagraph for a certain alphabet (e.g. DNA or Protein, default DNA)
  • additional_cmake_args: additional arguments to pass to cmake.

Examples:

# compiles metagraph in a docker container for the `DNA` alphabet
make build-metagraph env=docker alphabet=DNA

License

Metagraph is distributed under the GPLv3 License (see LICENSE). Please find further information in the AUTHORS and COPYRIGHTS files.

About

Scalable annotated de Bruijn graphs for DNA indexing, alignment, and assembly

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 82.6%
  • Python 11.8%
  • Shell 4.4%
  • CMake 0.9%
  • Makefile 0.2%
  • Dockerfile 0.1%