Hi all, I am analyzing amplicon deep-seq data, going for rare variants. While I've always run LoFreqFilter with default strand bias filtering (multiple test - FDR), now I would like to filter on a specific SB threshold. Can you advise on reasonable value? Should I consider SB=0 as the only true variants? For example, how do you see this: DP=12219;AF=0.009657;SB=5;DP4=822,11273,11,107 thanks Luca
Hi Catherine, most variant callers produce rather arbitrary variant scores. LoFreq's variant quality scores are"proper" error probabilities converted into a Phred score. The error probabilities are computed using a poisson binomial distribution, which takes all multiple quality scores (mapping quality, alignment quality, base quality) into account. If you look up the definition of Phred scores you will see that Q20 corresponds to an error probability of 0.01, Q30 to 0.001 etc. 49314 is simply the...
Hi, I am using LoFreq in combination with another SNV caller. The other SNV caller lists quality scores as Q20, Q30, etc. My LoFreq output is giving a number string with the highest being 49314 in the QUAL column. How is this score calculated and how does it compare to a Phred score call like Q20?
Dear Eugenia, thanks for you patience, while waiting for a reply. Source quality was a rather experimental attempt to add one more error source to LoFreq's core: it tries to account for contamination/mismappings etc. by looking at the amount of mismatches in a read (think of it as a variation of mapping quality). An accumulation of mismatches in a particular read leads to a penalty. However, you will want to ignore known variants, during the mismatch counting and for this you can for example use...
Hi, I would like to know what 'Source quality' means, and how the -s and -S options affect its computation. I'm trying to call human variants, including indels. As suggested in the online documentation, I would like to use dbSNP, however NCBI holds several databases and I'm not sure which one to use. I hope that understanding what 'source quality' means, will help me decide what is the most suitable database for my current problem. Thank you, Eugenia
Thanks for the answers.
That's only indirectly possible. You can run it with all filters off on the region of interest: lofreq call -r sq:start-end --no-default-filter -a 1 Andreas On 23 May 2018 at 11:53, Camilo cvillaman@users.sourceforge.net wrote: Thanks for the answers, they are very helpful. I have a final question, though. Is there a way to check why a possible variant is not being called by LoFreq? Source quality and ignore VCF in single tumor sample. https://sourceforge.net/p/lofreq/discussion/general/thread/cdeddc89/?limit=25#e4b0/2978/3285/2592/75a0...
Thanks for the answers, they are very helpful. I have a final question, though. Is there a way to check why a possible variant is not being called by LoFreq?
Oh I see. In general, using source quality will give you more conservative calls. There is a chance that it will undercall in mutational hotspots. Variants in the "ignore vcf" file are just used to tune the source quality computation. Normally reads with lots of variants get a low source quality, however, variants listed in the aforementioned file are ignored for this. These variants are not used to mask final calls! Hope this answers the question, Andreas On 22 May 2018 at 23:14, Camilo cvillaman@users.sourceforge.net...
I'm running lofreq call to call the variants, not lofreq somatic, and since I'm using human samples, according to the online documentation I should be using -s (source quality) in combination with -S.
Hi Camilo, -S won't mask variants. It just affect the somatic variant quality score. In fact, adding dbSNP here should have increased the quality of this call. What happens if you run it without the extra -S? Also, there is not (lowercase) '-s' option. Was that a typo? Best, Andreas On 18 May 2018 at 23:13, Camilo cvillaman@users.sourceforge.net wrote: Hello, I'm using LoFreq to call variants on some human tumor samples. We had analized those samples beforehand, so I had an idea about which variants...
Hello, I'm using LoFreq to call variants on some human tumor samples. We had analized those samples beforehand, so I had an idea about which variants should be called. The samples had some variants reported on dbSNP and according to the recommendations in the home page I decided to enable the -s flag and use -S with a dbSNP VCF file, however, those variants weren't being called. So, I've been wanting to ask: Does the -S option mask/remove the variants on the positions on the file?
Hi Steve, for duplicate marking (if needed) you can use any tools of your choice, e.g. sambamba. For realignment you can use LoFreq's own realigned lofreq viterbi (requires resorting afterwards). For base quality calibration you can still use GATK or alternatively Lacer https://www.biorxiv.org/content/early/2017/04/25/130732. You should get decent results even without recalibration. Best, Andreas On 16 May 2018 at 22:04, Steve stevekm@users.sourceforge.net wrote: In the documentation for LoFreq,...
In the documentation for LoFreq, it is suggested: For Illumina data, we suggest that you preprocess your BAM files by following GATK’s best practice protocol, i.e. that you mark duplicates (not for very high coverage data though), realign indels and recalibrate base qualities with GATK (BQSR). The latter will also add indel qualities, which is needed for indel calling (alternatively use lofreq indelqual). However, GATK has upgraded to version 4, and has dropped many of these tools since they've been...
Hi Steve, yes, these variants are not filtered, even though if you just look at the pvalue/quality, they should be. The reason is that strand-bias is a messy beast and we use some hacks: No one really knows why it happens (AFAIK). In viral amplicon data (for which LoFreq was originally designed) we often saw cases, where simply due to the ultra high coverage, you'd get very high p-values even though nothing seem wrong with these variants if you were to evaluate them by eye (plenty of coverage for...
Thanks Andreas. a significance threshold of 0.01 I was looking in the source code and saw here: https://github.com/CSB5/lofreq/blob/master/src/lofreq/lofreq_filter.c#L1093 if (! no_defaults) { if (cfg.sb_filter.mtc_type==MTC_NONE && ! cfg.sb_filter.thresh) { LOG_VERBOSE("%s\n", "Setting default SB filtering method to FDR"); cfg.sb_filter.mtc_type = MTC_FDR; cfg.sb_filter.alpha = 0.001; } Does this mean that the default Strand Bias filter is at a p-value of 0.001? (cfg.sb_filter.alpha = 0.001) As...
Hi Steve, sure. The basics are explained in the NAR paper (Wilm, 2012): We compute a poisson-binomial distribution taking error probabilities at each pileup site into consideration and derive a p-value from that. Error probabilities were originally just converted base qualities (because that's what they are). In later LoFreq versions we merged base alignment, mapping and base quality into one error probability per base. The logic goes like this: either the read is misaligned (mapping quality) or...
Hi Steve, the strand-bias p-values is turned into a phred-quality, whose upper bound depends on the precision of the float. In practice it can get much higher then 1900. The fact that you see phred values <60 in other programs is simply because it's mostly arbitrary capped there. Andreas On 4 May 2018 at 03:50, Steve stevekm@users.sourceforge.net wrote: I have another question about the SB score values from the .vcf output. It is my understanding that these values are Phred quality scores, which...
Hi Steve, not sure why the actually quality filtering is not mentioned there. Let me look into this. Anyway, the main filtering step is working on the variant qualities (which are converted p-values) and it's by default based on Bonferroni correction and a significance threshold of 0.01 Best, Andreas On 4 May 2018 at 07:12, Steve stevekm@users.sourceforge.net wrote: The FAQ page for LoFreq says Do I need to filter LoFreq predictions? You usually don't. Predicted variants are already filtered using...
The FAQ page for LoFreq says Do I need to filter LoFreq predictions? You usually don't. Predicted variants are already filtered using default parameters (which include coverage, strand-bias, snv-quality etc). However, I do not see any details about what these default filtering parameters are. Is there a description anywhere? When I try to run lofreq filter --verbose, the only output I get is: Setting default SB filtering method to FDR Setting default minimum coverage to 10 What other criteria are...
I have another question about the SB score values from the .vcf output. It is my understanding that these values are Phred quality scores, which usually are in the range of 0 - 50. However, I am getting many with values of 500 - 1900. Is this expected? And if SB=0 mean no strand bias, then this means that these regions are extremely strand biased? Also, in this thread you state: 2147483647: This corresponds to a p-value close to zero, i.e. a highly significant SNV. What is the meaning of 2147483647...
I have another question about the SB score values from the .vcf output. It is my understanding that these values are Phred quality scores, which usually are in the range of 0 - 50. However, I am getting many with values of 500 - 1900. Is this expected? And if SB=0 mean no strand bias, then this means that these regions are extremely strand biased?
As sources of errors, it takes base-qualities, mapping qualities etc into account. Thanks for this. However I was wondering if there was a more thorough explanation of each of the values that are used in calculation of the 'QUAL' score values that are output in the VCF? I did not see it covered in the publication (maybe I missed it?) and wasn't able to figure out what was going on in the source code.
As sources of errors, it takes base-qualities, mapping qualities etc into account. Thanks for this. However I was wondering if there was a more thorough explanation of each of the values that are used in calculation of the 'QUAL' score values that are output in the VCF?
Hi Francisco, LoFreq doesn't have an AF filter. The default filter is based on variant quality only. It furthermore actually doesn't report genotypes. Taken together this makes it likely that your collaborator post-processed the vcf file somehow. Hope this helps, Andreas On 24 March 2018 at 13:45, Francisco De La Vega ribozyme@users.sourceforge.net wrote: I have received a VCF from LowFeq form a collaborator that used it for calling SNVs from a cfDNA targeted sequencing assay at a high depth of coverage....
I have received a VCF from LowFeq form a collaborator that used it for calling SNVs from a cfDNA targeted sequencing assay at a high depth of coverage. They develop scripts to use UMIs in the adapters to error correct the aligned reads and then produce a BAM file to feed to LowFreq. The aim is to detect somatic variants in the range of 0.5-2% VAF. However, it appears LowFreq not adding the PASS filter tag to variants under ~2% VAF. Further, since these variants are not passed, the genotypes are reported...
Hi Nils, in short: the BAM file was created with a different reference. The checkref subcommand checks whether the reference fasta given on the command line matches the one given in the BAM header. In your case the BAM header contains a sequence named "1", which is not part of the fasta file. Hope this helps, Andreas On 13 November 2017 at 06:05, Nils Engel nils321@users.sf.net wrote: Hi, I have a problem using lofreq with human sequencing data and hg19 or GRCh38 reference sequences ( downloaded...
Hi, I have a problem using lofreq with human sequencing data and hg19 or GRCh38 reference sequences ( downloaded from NCBI with manually changed file extension .fna -> .fa). I guess it might be a problem with improper file format or index. I get an output as follows: nils321$ lofreq checkref GRCh38_latest_genomic.fa 1214474-H8.bam [fai_load] build FASTA index. [fai_fetch_seq] The sequence "1" not found FATAL(samutils.c|checkref:653): Failed to fetch sequence 1 from fasta file Failed An fasta index...
Hello, DP4 only lists the reference and variant base counts. There are usually other bases present as well, which are taken into account for computing AF. Hoping this explains the discrepancy, Andreas On 26 October 2017 at 04:58, siva siva80@users.sf.net wrote: Hi I have several variants (especially those with almost hom-alt allele) that have different allele fraction estimates from DP4 and the AF= tag. for example DP=4088;AF=0.872798;SB=171;DP4=9,33,3329,685 Here from DP4, the AF can be estimated...
Hi I have several variants (especially those with almost hom-alt allele) that have different allele fraction estimates from DP4 and the AF= tag. for example DP=4088;AF=0.872798;SB=171;DP4=9,33,3329,685 Here from DP4, the AF can be estimated to be about 0.98189 which is very different from what is published in the AF= tag. Could you please explain?
Hello, strand-bias is defined as in samtools: reference and alternate base counts on forward and reverse strand are used as input for Fisher's exact test. This tries to quantify in how far the reference and alternate counts on forward and reverse strand differ, i.e. you'll get high p-values if you have lots of reference bases on one and lots of alternate bases on the other strand. It does not test however whether both, reference and alternate bases, are mainly on the same strand. I hope this explanation...
Hello, we have analyzed some viral genomes where the strand bias has been estimated as zero. In these results, we have noticed that when the value is zero for the forward or the reverse strands that have the alternate base, the SB=0. Is it that in most cases when in the alternative strands tha value is zero, the SB=0 will be zero (implying no bias) when there actually is bias just by looking at the DP4 data? And maybe such results should not be considered at all? And then is the last example, where...
Hi Erik, hard to tell from this output. Might be because of strand bias. Could you...
Can someone tell me why a call was not made at this location? lofreq call -f /var/www/hg19.fa...
Hi Jessica, these are SNVs that show significant strand bias (sb) and are therefore...
Hello, I'm sorry but I can't seem to find this information in the manual. Can you...
Hi Erik, when you switch of default filtering in the call subcommand[s] LoFreq will...
I have a dataset that I am processing with the --bed flag set to a list of mutations...
Sorry, I know what's happening: the filters will only affect the actual SNV calling...
Hi gmy, that is indeed a bit strange. Which exact LoFreq version are you using? Would...
Hi, Andreas Sorry for late reply. The corresponding output of lofreq is : gi|57116681|ref|NC_000962.2|...
Hi gmy, I would strongly encourage you to stick to default parameters in LoFreq,...
Hi, I want to filter bases with quality below 20. And I use command like this "lofreq...
Yes actually I went through the results back and forth and it seems I do not have...
Hi Chris, LoFreq results are filtered already relatively stringent (1% p-value threshold...
Hi Andreas, I have used your Lofreq to on my normal/tumor pair to retrieve INDELS....
Hey Chris, if you get a final vcf file, then there is no need to rerun LoFreq. Whether...
Hi Andreas, Thank you very much for the reply. I would like to let you know that...
Hi Andreas, Thank you very much for the reply. I would like to let you know that...
Hey Chris, yes, the file somatic_final_minus-dbsnp.snvs.vcf.gz is not there, because...
Hi Andreas, Thank you for your reply. I would like to say that I had a successful...
Hi Andreas, Thank you for your reply. I would like to say that I had a successful...
Hi Chris, When you call somatic SNVs then you only need to look at the file that...
Also I carefully noticed some log comments where I see these comments WARNING [2015-01-15...
I am watching 3 different outout, 2 is of snvs ( one is relaxed and other is stringent)...
Yes the tool is working now, I modified the bed file of the company to a general...
Oh ok. That looks like an extension of the bed format. LoFreq (and samtools) expect...
The format of the bed file looks like this head -5 S03723314_Covered.bed browser...
The format of the bed file looks like this head -5 S03723314_Covered.bed browser...
I have used the S03723314_Covered.bed file which you can download from the agilent...
I am not sure how I can share the bed file, can i host it up anywhere? its almost...
Hi Chris, this looks like an unhandled error triggered in the bed reading function....
Hi Andreas, I am trying to use lofreq for somatic indel calling using the somatic...
Home
Moved website and blog to github: http://csb5.github.io/lofreq/
Release LoFreq 2.1
Release LoFreq 2.1
Release LoFreq 2.1
Release LoFreq 2.1
Release LoFreq 2.1
OK great, thanks!
Hi Jessica, the strand-bias test checks whether the proportion of bases on forward...
Hi, I am running Lofreq on data that has been run through the program SeqPrep, which...
LoFreq as Docker container
Alpha testers for release 2.1 needed
LoFreq-Star-Best-Practices
Performance issues when using bed-file with many regions
Thank, Andreas. I will try again with the suggested argument. Joon
Hi Joon There are at least two things going on here. One of the errors seems to come...
Hi, While running LoFreq, I've got the following error. /cm/local/apps/sge/var/spool/usnee1-lph001-n062/job_scripts/6960547:...
Hi Brian, thanks for pointing this out! Parsing of the "--cons-as-ref" option was...
It actually seems to have to do with the --cons-as-ref option. Without that, it works....
Hi Brian, this is very likely caused by an error in the argument list, i.e. wrong...
Thanks for posting the release. Version 2.0.0 (Linux) does get past the previous...
LoFreq-Star-Usage
LoFreq-Star-Usage
LoFreq-Star-Best-Practices
LoFreq-Star-Installation
LoFreq-Star-Installation
LoFreq-Star-Installation
Release of final 2.0.0