PINTS is available on PyPI and bioconda, which means you can install PINTS easily with:
pip install pyPINTS
or
conda install bioconda::pypints
Alternatively, you can clone this repo to a local directory, then run the following command in that directory:
python setup.py install
PINTS can call peaks from either bigWig or BAM files. If you have signals for the forward and reverse strands in
two separate bigWig files (path_to_pl.bw
and path_to_mn.bw
), you can use command like the following to get the peaks:
pints_caller --save-to output_dir \
--file-prefix output_prefix \
--bw-pl path_to_pl.bw \
--bw-mn path_to_mn.bw \
--thread 16
To call peaks from BAM files:
you'll need to provide PINTS a path to the BAM file and what kind of experiment it was from.
If it's from a standard protocol, like PROcap, then you can set --exp-type PROcap
.
Other supported experiments including GROcap/
CoPRO/
csRNAseq/
NETCAGE/
CAGE/
RAMPAGE/
STRIPEseq. For a comprehensive list of directly supported assays, please run
pints_caller --help
If the data was generated by other methods, you need to tell PINTS where it can find ends of RNAs you are interested in.
For example, --exp-type R_5
tells the tool that:
- this alignment is from a single-end library;
- the tool should look at 5' of reads. Other supported values are
R_3
,R1_5
,R1_3
,R2_5
,R2_3
.
If reads represent the reverse complement of original RNAs, like PROseq, then you need to use --reverse-complement
(not necessary for standard protocols).
One example for calling peaks from BAM file:
pints_caller --bam-file input.bam \
--save-to output_dir \
--file-prefix output_prefix \
--thread 16 \
--exp-type PROcap
We have prepared several case studies demonstrating steps from processing the raw fastq files to calling peaks/TREs for your reference.
- prefix+
_{SID}_divergent_peaks.bed
: Divergent TREs; - prefix+
_{SID}_bidirectional_peaks.bed
: Bidirectional TREs (divergent + convergent); - prefix+
_{SID}_unidirectional_peaks.bed
: Unidirectional TREs, maybe lncRNAs transcribed from enhancers (e-lncRNAs) as suggested here.
{SID}
will be replaced with the number of samples that peaks are called from,
if you only provide PINTS with one sample, then {SID}
will be replaced with 1,
if you try to use PINTS with three replicates (--bam-file A.bam B.bam C.bam
), then {SID}
for peaks identified from A.bam
will be replaced with 1.
For divergent or bidirectional TREs, there will be 6 columns in the outputs:
- Chromosome
- Start site: 0-based
- End site: 0-based
- Confidence about the peak pair. Can be:
Stringent(qval)
, which means the two peaks on both forward and reverse strands are significant based on their q-values;Stringent(pval)
, which means one peak is significant according to q-value while the other one is significant according to p-value;Relaxed
, which means only one peak is significant in the pair.- A combination of the three types above, because of overlap for nearby elements.
- If epigenomic annotation is enabled by
--epig-annotation <biosample>
, then peaks that are less significant (--relaxed-fdr-target
, default is 2*fdr_target
), but overlap with epigenomic annotations from PINTS web server, will be listed with the confidence level:Marginal
.
- Major TSSs on the forward strand, if there are multiple major TSSs, they will be separated by comma
,
- Major TSSs on the reverse strand, if there are multiple major TSSs, they will be separated by comma
,
For unidirectional TREs, there will be 9 columns in the output:
- Chromosome
- Start
- End
- Peak ID
- Q-value
- Strand
- Read counts
- Position of the summit TSS
- Height of the summit
For all three types of TREs, if a valid biosample name for --epig-annotation
is provided, then an additional column with epigenomic annotation for each TRE will show up in the final output.
- If you want to use BAM files as inputs:
--bam-file
: input bam file(s);--exp-type
: Type of experiment. If the experiment is not listed as a choice, or you know the position of RNA ends on the reads and you want to override the defaults, you can specify:R_5
(5' of the read for single-end lib),R_3
(3' of the read for single-end lib),R1_5
(5' of the read1 for paired-end lib),R1_3
(3' of the read1 for paired-end lib),R2_5
(5' of the read2 for paired-end lib),- or
R2_3
(3' of the read2 for paired-end lib)
--reverse-complement
: Set this switch if 1)exp-type
isRx_x
and 2) reads in this library represent the reverse complement of RNAs, like PROseq;--ct-bam
: Bam file for input/control (optional);
- If you want to use bigwig files as inputs:
--bw-pl
: Bigwig for signals on the forward strand;--bw-mn
: Bigwig for signals on the reverse strand;--ct-bw-pl
: Bigwig for input/control signals on the forward strand (optional);--ct-bw-mn
: Bigwig for input/control signals on the reverse strand (optional);
--save-to
: save peaks to this path (a folder), by default, current folder--file-prefix
: prefix to all outputs
--dont-merge-reps
: Starting with PINTS 1.2.x, the software automatically merges multiple replicates for a joint peak calling process. To call peaks individually for each sample, as in previous versions, use this option.--epig-annotation <biosample>
: Use this option together with the name of the biosample that the library was derived from, for example K562; then epigenomic annotations will be downloaded from the PINTS web server and used for annotating and augmenting TREs identified by PINTS (for hg38 only);--relaxed-fdr-target <relaxed fdr>
: In the presence of--epig-annotation
, peaks that do not pass the original FDR cutoff but pass this relaxed cutoff and have support from DNase-seq and H3K27ac ChIP-seq will also be included in final outputs. By default, 2*fdr;--mapq-threshold <min mapq>
: Minimum mapping quality, by default: 30 orNone
;--close-threshold <close distance>
: Distance threshold for two peaks (on opposite strands) to be merged, by default: 300;--fdr-target <fdr>
: FDR target for multiple testing, by default: 0.1;--chromosome-start-with <chromosome prefix>
: Only keep reads mapped to chromosomes with this prefix. By default, all reads will be analyzed;--thread <n thread>
: Max number of threads the tool can create;--borrow-info-reps
: Borrow information from reps to refine calling of divergent elements;--sensitive
: Call peaks in a more sensitive mode (LRT+FC).
More parameters can be seen by running pints_caller -h
.
In this section, we try to identify differentially expressed TREs (promoters and enhancers) from two conditions.
First, call peaks for each condition with pints_caller
:
# control samples
pints_caller --bw-pl DMSO_r1_pl.bw DMSO_r2_pl.bw \
--bw-mn DMSO_r1_mn.bw DMSO_r2_mn.bw \
--thread 16 --file-prefix DMSO
# and treatment samples
pints_caller --bw-pl E2_r1_pl.bw E2_r2_pl.bw \
--bw-mn E2_r1_mn.bw E2_r2_mn.bw \
--thread 16 --file-prefix E2
Second, build the counts table with pints_counter
:
pints_counter -b DMSO_1_bidirectional_peaks.bed E2_1_bidirectional_peaks.bed \
-u DMSO_1_unidirectional_peaks.bed E2_1_unidirectional_peaks.bed \
-p DMSO_r1_pl.bw DMSO_r2_pl.bw E2_r1_pl.bw E2_r2_pl.bw \
-m DMSO_r1_mn.bw DMSO_r2_mn.bw E2_r1_mn.bw E2_r2_mn.bw \
-c DMSO DMSO E2 E2 \
-r 1 2 1 2 \
-s counts.csv
The counts table look like the following:
,DMSO_1,DMSO_2,E2_1,E2_2
chr1:10609-10620,17,22,44,43
chr1:629905-629938,169,13,224,82
chr1:633956-634096,218,12,271,102
chr1:778554-778929,1180,195,1327,495
chr1:779719-779721,0,0,12,2
chr1:779846-780119,6,0,30,6
chr1:827199-827316,48,22,101,46
chr1:827326-827736,634,88,752,318
chr1:827742-827773,19,0,32,8
Third, feed DESeq2/edgeR with the counts table for differential expression analysis
pints_visualizer
: Generate bigwig files for the inputs.pints_counter
: Generate count matrix for downstream usages (e.g. differential expression analysis).pints_normalizer
: Normalize inputs.pints_boundary_extender
: Extend peaks from summits.
You can use tool_name --help
to see parameters for each tool.
- Citation: If you use PINTS in your work, please cite: https://www.nature.com/articles/s41587-022-01211-7.
- Support: Please submit an issue with any questions or if you experience any issues/bugs.