Scripts and pipeline to inspect genetic variation in a series of bacterial genomes
The pipeline and scripts come with limited documentation. Please do get in touch with the author (Marco Galardini, [email protected]) if you need any guidance.
A reference genome in FASTA and Genbank format is needed
(deafult filenames are genome.fasta and genome.gbk).
All the genomes to be analysed should be assemblies: place
nucleotides fasta files in the genomes directory
(genomes/*.fasta), protein fasta files in the proteomes directory
(proteomes/*.faa) and gff files in the gff directory (gff/*.gff).
We reccommend using prokka to generate the .faa and .gff files.
The makefile contains the various bits of the pipeline:
make tree: core genome alignment phylogenetic tree and mash whole genome kmer distancemake roary: pangenomemake oma: pairwise pangenome for each strain against the referencemake nonsyn stop: pairwise alignment of each strain agains the reference to derive SNPs
You might want to type make -n TARGET first to make sure which commands are gonna be launched
- prokka
- parsnp and harvest
- mash
- snpeff
- roary
- oma
- python (2.7+ AND 3.3+), plus the following libraries:
- biopython
- bcbio-gff
- numpy
- pandas
- pyvcf
Copyright (C) <2015> EMBL-European Bioinformatics Institute
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Neither the institution name nor the name pangenome_variation can be used to endorse or promote products derived from this software without prior written permission. For written permission, please contact [email protected].
Products derived from this software may not be called pangenome_variation nor may pangenome_variation appear in their names without prior written permission of the developers. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.