454sim Code
Simulate 454-data using configurable statistical models at high speed
Brought to you by:
flysholm
File | Date | Author | Commit |
---|---|---|---|
examples | 2011-10-10 |
![]() |
[7b11a4] Initial commit |
gen | 2012-08-17 |
![]() |
[e2b6c3] several bugfixes including fix of handling nflo... |
src | 2012-08-21 |
![]() |
[c59dc2] Added stable output |
AUTHOR | 2011-10-19 |
![]() |
[aa4f1e] added |
LICENSE | 2011-10-10 |
![]() |
[7b11a4] Initial commit |
README | 2012-01-12 |
![]() |
[654950] Newline added |
fragsim | 2012-08-24 |
![]() |
[417539] Fixed U/C notation |
makefile | 2012-08-21 |
![]() |
[c59dc2] Added stable output |
test454sim | 2012-08-21 |
![]() |
[c59dc2] Added stable output |
# # INFORMATION about 454sim # 454sim process standard FASTA either by command line argument or on standard in and will generate reads, one for each FASTA entry present in the file starting from the first base and until either the sequence ends or the simulated read ends (due to quality deteriorates or all flows have been spent). As a consequence it is very useful to utilize a script which will generate fragments from a larger genome as is done in the preparation face with 454 sequencing. We include a perl-script in all zip-files called “fragsim” which will produce fragments given an genome which could then be further feed through 454sim. 454sim takes a number of parameters visible via the --help flag: -a Processors to use count (default=8) -n Number of flows to simulate per sample (default=800) -g generation to simulate (available=GS,FLX,Ti, default=Ti) -d directory with generations (default=gen) -o output file (default=none specified) -i simulation info file (default=none written) Typical run-line in a Linux environment: ./fragsim -c 1000000 -l 1000 genome.fasta | ./454sim -o genome.sff or ./fragsim -c 1000000 -l 1000 genome.fasta > genome.fragments.fasta ./454sim -o genome.sff genome.fragments.fasta or (compressing intermediate data) ./fragsim -c 1000000 -l 1000 genome.fasta | gzip > genome.fragments.fasta.gz zcat genome.fragments.fasta.gz | ./454sim -o genome.sff The above example will generation 1 million reads from genome.fasta and store the output in genome.sff (the first example will squash intermediate fragment output while the second/third will store the output in genome.fragments.fasta.) # # Generation files # The generation files are by default found in the gen folder. 454sim is shipped with a couple of generation files, for example ti.gen, which describes the the Roche Titantium 454 chemistry/instrument. A more detailed explanation of parameters and their values can be found in the included generations files, like the ti.gen file. # # Compiling # # gcc make # intel make CC=intel # gcc 32-bit (on 64-bit system, 32-bit is used by default on a 32-bit system) make CC=gcc32 # # Testing with test454sim # A test script is provided in order to perform simple testing of 454sim with default or custom parameters. test454sim takes the following parameters: -b <binary> default = ./454sim -d <output-details> default = 2 [0 = simple, 1 = normal, 2 = detailed] -f <fasta> default = examples/example.fragments.fasta -p <454sim-param> default = '' # these parameters are passed on to 454sim (use quotes) -s <sff-out> default = /tmp/454sim.test.sff -n <info-out> default = /tmp/454sim.test.sff.txt # this file is analysed An example run just testing 454sim (on Linux with Perl in /usr/bin/perl): ./test454sim -d0 Detailed info of the GS chemistry with custom FASTA ./test454sim -d2 -f[my-fasta-file] -p'-gGS' An example run testing 454sim (Windows version) using wine: ./test454sim -b'wine 454sim.exe'