You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+57-11Lines changed: 57 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,64 @@
1
-
##CRISPRcasIdentifier
1
+
# CRISPRcasIdentifier
2
2
3
-
CRISPRcasIdentifier is an effective machine learning approach for the identification and classification of CRISPR-Cas proteins. It consists of a holistic strategy which allows us to: (i) combine regression and classification approaches for improving the quality of the input protein cassettes and predicting their subtypes with high accuracy; (ii) to detect signature genes for the different subtypes; (iii) to extract several types of information for each protein, such as potential rules that reveal the identity of neighboring genes; and (iv) define a complete and extensible framework able to integrate newly discovered Cas proteins and CRISPR subtypes. We achieve balanced accuracy scores above 0.95 in the classification experiment of CRISPR subtypes, mean absolute error values below 0.05 for the prediction of the normalized bit-score of different Cas proteins and a balanced accuracy of 0.88 in our benchmarking against other available tools.
3
+
CRISPRcasIdentifier is an effective machine learning approach for the identification and classification of CRISPR-Cas proteins. It consists of a holistic strategy which allows us to: (i) combine regression and classification approaches for improving the quality of the input protein cassettes and predicting their subtypes with high accuracy; (ii) to detect signature genes for the different subtypes; (iii) to extract several types of information for each protein, such as potential rules that reveal the identity of neighboring genes; and (iv) define a complete and extensible framework able to integrate newly discovered Cas proteins and CRISPR subtypes. We achieve balanced accuracy scores above 0.95 in the classification experiment of CRISPR subtypes, mean absolute error values below 0.05 for the prediction of the normalized bit-score of different Cas proteins and a balanced accuracy of 0.89 in our benchmarking against other available tools.
4
4
5
-
###Requirements
5
+
## Requirements
6
6
7
-
CRISPRcasIdentifier has been tested with Python 3.7.6. To run it, we recommend installing the same library versions we used. Since we exported our classifiers using [joblib.dump](https://scikit-learn.org/stable/modules/model_persistence.html), it is not guaranteed that they will work properly if loaded using other Python and/or libraries versions. For such, we recommend the use of conda virtual environments, which make it easy to install the correct Python and library dependencies without affecting the whole operating system (see below).
7
+
CRISPRcasIdentifier has been tested with Python 3.7.6. To run it, we recommend installing the same library versions we used. Since we exported our classifiers using [joblib.dump](https://scikit-learn.org/stable/modules/model_persistence.html), it is not guaranteed that they will work properly if loaded using other Python and/or libraries versions. For such, we recommend the use of our docker image or conda virtual environments, which make it easy to install the correct Python and library dependencies without affecting the whole operating system (see below).
8
8
9
-
### Setting up a virtual environment
9
+
### First step: clone this repository
10
10
11
-
The easiest way to install the correct python version and its dependencies to run CRISPRcasIdentifier is by using [miniconda](https://docs.conda.io/en/latest/miniconda.html).
### Second step: download the Hidden Markov (HMM) and Machine Learning (ML) models
16
+
17
+
Due to GitHub's file size constraints, we made our HMM and ML models available in Google Drive. You can download them [here](https://drive.google.com/file/d/166bh1sAjoB9kW5pn8YrEuEWrsM2QDV78/view?usp=sharing) and [here](https://drive.google.com/file/d/1ZOR1e-wIb_rxtCiU3OaBVdrHrup1svq3/view?usp=sharing). Save both tar.gz files inside CRISPRcasIdentifier's folder. It is not necessary to extract them, since the tool will do that the first time it is run.
18
+
19
+
Next, you can choose which third step to follow: either using a docker container or using conda.
20
+
21
+
### Third step (docker)
22
+
23
+
The easiest way to run CRISPRcasIdentifier is by using docker (please refer to its [installation guideline](https://docs.docker.com/get-docker/) for details).
24
+
25
+
After installing docker, build an image from the Dockerfile
26
+
27
+
```
28
+
cd CRISPRcasIdentifier
29
+
docker build -t crispr-cas-identifier .
30
+
```
31
+
32
+
Run the docker image in a new container
33
+
34
+
```
35
+
docker run -it crispr-cas-identifier:latest /bin/bash
36
+
```
37
+
38
+
To avoid creating multiple containers everytime you want to use CRISPRcasIdentifier, you can reuse the created container by using the following commands
39
+
40
+
```
41
+
docker restart CONTAINER_ID
42
+
docker exec -it CONTAINER_ID /bin/bash
43
+
```
44
+
45
+
You can obtain the CONTAINER_ID by using
46
+
47
+
```
48
+
docker ps --all
49
+
```
50
+
51
+
You can also copy a local fasta input file to CRISPRcasIdentifier's container by using
After this, everything should be set up and you can skip to the "How to use" section.
58
+
59
+
### Third step (conda)
60
+
61
+
Another way to install the correct python version and its dependencies to run CRISPRcasIdentifier is by using [miniconda](https://docs.conda.io/en/latest/miniconda.html).
### Downloading the Hidden Markov (HMM) and Machine Learning (ML) models
29
-
30
-
Due to GitHub's file size constraints, we made our HMM and ML models available in Google Drive. You can download them [here](https://drive.google.com/file/d/166bh1sAjoB9kW5pn8YrEuEWrsM2QDV78/view?usp=sharing) and [here](https://drive.google.com/file/d/1ZOR1e-wIb_rxtCiU3OaBVdrHrup1svq3/view?usp=sharing). Save both tar.gz files inside CRISPRcasIdentifier's folder. It is not necessary to extract them, since CRISPRcasIdentifier will do that the first time it is run.
0 commit comments