Skip to content

Commit 9e9b969

Browse files
committed
Added Dockerfile, updated README.md with instructions to use docker
1 parent 00337aa commit 9e9b969

File tree

2 files changed

+80
-11
lines changed

2 files changed

+80
-11
lines changed

Dockerfile

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
FROM ubuntu:18.04
2+
3+
SHELL ["/bin/bash", "-c"]
4+
5+
RUN mkdir /home/CRISPRcasIdentifier
6+
WORKDIR /home/CRISPRcasIdentifier
7+
COPY *.py ./
8+
COPY crispr-env.yml ./
9+
COPY README.md ./
10+
COPY HMM_sets.tar.gz ./
11+
COPY trained_models_2015.tar.gz ./
12+
ADD examples ./examples
13+
ADD software ./software
14+
15+
RUN apt-get update
16+
RUN apt-get install -y wget && rm -rf /var/lib/apt/lists/*
17+
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
18+
RUN bash Miniconda3-latest-Linux-x86_64.sh -b
19+
RUN rm Miniconda3-latest-Linux-x86_64.sh
20+
ENV PATH /root/miniconda3/bin:$PATH
21+
22+
RUN conda env create -f crispr-env.yml -n crispr-env
23+
RUN echo "source ~/miniconda3/etc/profile.d/conda.sh && conda activate crispr-env" >> ~/.bashrc

README.md

Lines changed: 57 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,64 @@
1-
## CRISPRcasIdentifier
1+
# CRISPRcasIdentifier
22

3-
CRISPRcasIdentifier is an effective machine learning approach for the identification and classification of CRISPR-Cas proteins. It consists of a holistic strategy which allows us to: (i) combine regression and classification approaches for improving the quality of the input protein cassettes and predicting their subtypes with high accuracy; (ii) to detect signature genes for the different subtypes; (iii) to extract several types of information for each protein, such as potential rules that reveal the identity of neighboring genes; and (iv) define a complete and extensible framework able to integrate newly discovered Cas proteins and CRISPR subtypes. We achieve balanced accuracy scores above 0.95 in the classification experiment of CRISPR subtypes, mean absolute error values below 0.05 for the prediction of the normalized bit-score of different Cas proteins and a balanced accuracy of 0.88 in our benchmarking against other available tools.
3+
CRISPRcasIdentifier is an effective machine learning approach for the identification and classification of CRISPR-Cas proteins. It consists of a holistic strategy which allows us to: (i) combine regression and classification approaches for improving the quality of the input protein cassettes and predicting their subtypes with high accuracy; (ii) to detect signature genes for the different subtypes; (iii) to extract several types of information for each protein, such as potential rules that reveal the identity of neighboring genes; and (iv) define a complete and extensible framework able to integrate newly discovered Cas proteins and CRISPR subtypes. We achieve balanced accuracy scores above 0.95 in the classification experiment of CRISPR subtypes, mean absolute error values below 0.05 for the prediction of the normalized bit-score of different Cas proteins and a balanced accuracy of 0.89 in our benchmarking against other available tools.
44

5-
### Requirements
5+
## Requirements
66

7-
CRISPRcasIdentifier has been tested with Python 3.7.6. To run it, we recommend installing the same library versions we used. Since we exported our classifiers using [joblib.dump](https://scikit-learn.org/stable/modules/model_persistence.html), it is not guaranteed that they will work properly if loaded using other Python and/or libraries versions. For such, we recommend the use of conda virtual environments, which make it easy to install the correct Python and library dependencies without affecting the whole operating system (see below).
7+
CRISPRcasIdentifier has been tested with Python 3.7.6. To run it, we recommend installing the same library versions we used. Since we exported our classifiers using [joblib.dump](https://scikit-learn.org/stable/modules/model_persistence.html), it is not guaranteed that they will work properly if loaded using other Python and/or libraries versions. For such, we recommend the use of our docker image or conda virtual environments, which make it easy to install the correct Python and library dependencies without affecting the whole operating system (see below).
88

9-
### Setting up a virtual environment
9+
### First step: clone this repository
1010

11-
The easiest way to install the correct python version and its dependencies to run CRISPRcasIdentifier is by using [miniconda](https://docs.conda.io/en/latest/miniconda.html).
11+
```
12+
git clone https://github.com/BackofenLab/CRISPRcasIdentifier.git
13+
```
14+
15+
### Second step: download the Hidden Markov (HMM) and Machine Learning (ML) models
16+
17+
Due to GitHub's file size constraints, we made our HMM and ML models available in Google Drive. You can download them [here](https://drive.google.com/file/d/166bh1sAjoB9kW5pn8YrEuEWrsM2QDV78/view?usp=sharing) and [here](https://drive.google.com/file/d/1ZOR1e-wIb_rxtCiU3OaBVdrHrup1svq3/view?usp=sharing). Save both tar.gz files inside CRISPRcasIdentifier's folder. It is not necessary to extract them, since the tool will do that the first time it is run.
18+
19+
Next, you can choose which third step to follow: either using a docker container or using conda.
20+
21+
### Third step (docker)
22+
23+
The easiest way to run CRISPRcasIdentifier is by using docker (please refer to its [installation guideline](https://docs.docker.com/get-docker/) for details).
24+
25+
After installing docker, build an image from the Dockerfile
26+
27+
```
28+
cd CRISPRcasIdentifier
29+
docker build -t crispr-cas-identifier .
30+
```
31+
32+
Run the docker image in a new container
33+
34+
```
35+
docker run -it crispr-cas-identifier:latest /bin/bash
36+
```
37+
38+
To avoid creating multiple containers everytime you want to use CRISPRcasIdentifier, you can reuse the created container by using the following commands
39+
40+
```
41+
docker restart CONTAINER_ID
42+
docker exec -it CONTAINER_ID /bin/bash
43+
```
44+
45+
You can obtain the CONTAINER_ID by using
46+
47+
```
48+
docker ps --all
49+
```
50+
51+
You can also copy a local fasta input file to CRISPRcasIdentifier's container by using
52+
53+
```
54+
docker cp file.fa CONTAINER_ID:/home/CRISPRcasIdentifier
55+
```
56+
57+
After this, everything should be set up and you can skip to the "How to use" section.
58+
59+
### Third step (conda)
60+
61+
Another way to install the correct python version and its dependencies to run CRISPRcasIdentifier is by using [miniconda](https://docs.conda.io/en/latest/miniconda.html).
1262

1363
Install Miniconda
1464

@@ -25,11 +75,7 @@ conda env create -f crispr-env.yml -n crispr-env
2575
conda activate crispr-env
2676
```
2777

28-
### Downloading the Hidden Markov (HMM) and Machine Learning (ML) models
29-
30-
Due to GitHub's file size constraints, we made our HMM and ML models available in Google Drive. You can download them [here](https://drive.google.com/file/d/166bh1sAjoB9kW5pn8YrEuEWrsM2QDV78/view?usp=sharing) and [here](https://drive.google.com/file/d/1ZOR1e-wIb_rxtCiU3OaBVdrHrup1svq3/view?usp=sharing). Save both tar.gz files inside CRISPRcasIdentifier's folder. It is not necessary to extract them, since CRISPRcasIdentifier will do that the first time it is run.
31-
32-
### How to use
78+
## How to use
3379

3480
To list the available command line arguments type
3581

0 commit comments

Comments
 (0)