454sim Code

Simulate 454-data using configurable statistical models at high speed

Brought to you by: flysholm

Tree [417539] master / History

HTTPS access

File	Date	Author	Commit
examples	2011-10-10	Fredrik Lysholm	[7b11a4] Initial commit
gen	2012-08-17	Fredrik Lysholm	[e2b6c3] several bugfixes including fix of handling nflo...
src	2012-08-21	Fredrik Lysholm	[c59dc2] Added stable output
AUTHOR	2011-10-19	Fredrik Lysholm	[aa4f1e] added
LICENSE	2011-10-10	Fredrik Lysholm	[7b11a4] Initial commit
README	2012-01-12	Fredrik Lysholm	[654950] Newline added
fragsim	2012-08-24	Fredrik Lysholm	[417539] Fixed U/C notation
makefile	2012-08-21	Fredrik Lysholm	[c59dc2] Added stable output
test454sim	2012-08-21	Fredrik Lysholm	[c59dc2] Added stable output

Read Me

#
# INFORMATION about 454sim
#

454sim process standard FASTA either by command line argument or on standard in 
and will generate reads, one for each FASTA entry present in the file starting 
from the first base and until either the sequence ends or the simulated read 
ends (due to quality deteriorates or all flows have been spent).

As a consequence it is very useful to utilize a script which will generate 
fragments from a larger genome as is done in the preparation face with 454 
sequencing. We include a perl-script in all zip-files called “fragsim” 
which will produce fragments given an genome which could then be further 
feed through 454sim.

454sim takes a number of parameters visible via the --help flag:
-a   Processors to use count (default=8)
-n   Number of flows to simulate per sample (default=800)
-g   generation to simulate (available=GS,FLX,Ti, default=Ti)
-d   directory with generations (default=gen)
-o   output file (default=none specified)
-i   simulation info file (default=none written)


Typical run-line in a Linux environment:

./fragsim -c 1000000 -l 1000 genome.fasta | ./454sim -o genome.sff

or

./fragsim -c 1000000 -l 1000 genome.fasta > genome.fragments.fasta
./454sim -o genome.sff genome.fragments.fasta

or (compressing intermediate data)

./fragsim -c 1000000 -l 1000 genome.fasta | gzip > genome.fragments.fasta.gz
zcat genome.fragments.fasta.gz | ./454sim -o genome.sff

The above example will generation 1 million reads from genome.fasta and store 
the output in genome.sff (the first example will squash intermediate fragment 
output while the second/third will store the output in genome.fragments.fasta.)



#
# Generation files
#

The generation files are by default found in the gen folder. 454sim is shipped 
with a couple of generation files, for example ti.gen, which describes the the 
Roche Titantium 454 chemistry/instrument. A more detailed explanation of 
parameters and their values can be found in the included generations files, 
like the ti.gen file.


#
# Compiling
#

# gcc
make

# intel
make CC=intel

# gcc 32-bit (on 64-bit system, 32-bit is used by default on a 32-bit system)
make CC=gcc32



#
# Testing with test454sim
#

A test script is provided in order to perform simple testing of 454sim with 
default or custom parameters.

test454sim takes the following parameters:

-b <binary>          default = ./454sim
-d <output-details>  default = 2 [0 = simple, 1 = normal, 2 = detailed]
-f <fasta>           default = examples/example.fragments.fasta
-p <454sim-param>    default = ''                                               # these parameters are passed on to 454sim (use quotes)

-s <sff-out>         default = /tmp/454sim.test.sff
-n <info-out>        default = /tmp/454sim.test.sff.txt                         # this file is analysed


An example run just testing 454sim (on Linux with Perl in /usr/bin/perl):
./test454sim -d0

Detailed info of the GS chemistry with custom FASTA
./test454sim -d2 -f[my-fasta-file] -p'-gGS'

An example run testing 454sim (Windows version) using wine:
./test454sim -b'wine 454sim.exe'

454sim Code

Simulate 454-data using configurable statistical models at high speed

Branches

Tree [417539] master / Download Snapshot History

Read Me

Tree [417539] master /

History