Name	Name	Last commit message	Last commit date
Latest commit History 62 Commits
configs	configs
images	images
inputs	inputs
scripts	scripts
src	src
.gitignore	.gitignore
README.md	README.md
__init__.py	__init__.py
__main__.py	__main__.py
environment.yml	environment.yml
evodock.py	evodock.py
requirements.txt	requirements.txt
setup.py	setup.py

Protein-Protein docking using a Memetic Algorithm: EvoDOCK

Repository corresponding to the code used at article:

A memetic algorithm enables global all-atom protein-protein docking with sidechain flexibility

Dependencies

PyRosetta==4
numpy>=1.21.0
pandas>=1.3.4
scipy>=1.7.1
seaborn>=0.11.2
setuptools>=44.0.0
imageio>=2.10.1
matplotlib>=3.4.3

Installation

This package is only compatible with Python 3.4 and above. To install this package, please follow the instructions below:

Install the previous descripted dependencies
Download and install PyRosetta following the instructions found at http://www.pyrosetta.org/dow
Install the package itself:

git clone https://github.com/Andre-lab/evodock.git
cd evodock
pip install -r requirements.txt

git clone https://github.com/Andre-lab/evodock.git
pip setup.py install

pip install git+https://github.com/Andre-lab/evodock.git

A setup.py and environment.yml files are provided to use alternative installation using pip or conda.

Basic Usage

Preprocess complex pdb with prepacking

python ./scripts/prepacking.py <input_pdb>

Create a configuration file following the example found at sample_dock.ini

[Docking]
# selects docking protocl [Global, Local]
type=Global

[Inputs]
# complex pdb
pose_input=/inputs/input_pdb/1ACB/1ACB_c_u_0001.pdb
native_input=/inputs/native_pdb/1ACB/1ACB_c_b.pdb

[Outputs]
# output file log
output_path=sample_dock/
output_pdb=True

[DE]
# evolution algorithm parent strategy [RANDOM, BEST] 
scheme=BEST
# population size
popsize=10
# mutation rate (weight factor F) 
mutate=0.9
# crossover probability (CR) 
recombination=0.3
# maximum number of generations/iterations (stopping criteria)
maxiter=10
# hybrid local search strategy [None, only_slide, mcm_rosetta]
local_search=mcm_rosetta

information about the DE parameters can be found at https://en.wikipedia.org/wiki/Differential_evolution

Run with the algorithm with the desired configuration

python evodock.py configs/sample_dock_global.ini

python -m evodock configs/sample_dock_global.ini

Configuration Details

Files configs/sample_dock_global.ini, configs/sample_dock_flexbb.ini and configs/sample_dock_refinement.ini contains configuration examples for Global Docking, Flexible Backbone Docking and Global Docking with and initial population.

Section [Inputs]

At pose_input, you might provide the path to a complex with two chains, which previously was preprocessed with a prepack protocol in order to fix possible collisions at the sidechain. An script at script folders is provided.

Section [Outputs]

At output_path indicates the output folder for the results in .csv format. output_pdb is a boolean to dump pdbs during the evolution and the final evolved protein.

Section [Docking]

Option "type" allows to select between global docking (Global), local docking (Local), flexible backbone (Flexbb) and using an starting population such as models from ClusPro (Refinement).

Section [DE]

The set of parameters for Differential Evolution ([DE]) that you must change for a production run are populsize (from 10 to 100) and maxiter (from 10 to 100), which would lead into an evolution of 100 individuals during 100 iterations/generations. Evolutionary parameters (mutation F and crossover CR), can be fine tuned for specific purposes, although this set (0.3 and 0.9) have shown a good balance between exploration and exploration at our benchmark runs, which leads into good results. Scheme corresponds to the selection strategy for the base vector at mutation operation (https://en.wikipedia.org/wiki/Differential_evolution for more details). Parameter "local_search" can be changed to None (aka, only DE is performed), only_slide (local search operation is equivalent to apply slide_into_contact) or mcm_rosetta (which applies slide_into_contact + MC energy minimization and sidechain optimization, recommended option and used at our benchmarks)

Section [Flexbb] (optional for Docking type "Flexbb")

Uses path_ligands and path_receptors to indicate the path of *.pdb files with different backbone ensembles.

Section [Refine] (optional for Docking type "Refine")

Uses init_pdbs to indicate the path of *.pdbs used as initial population, i.e. models from ClusPro.

Interpret output:

It is going to produce 4 different log files:

evolution*csv is a summary of the evolutionary process, which indicates the number of generation,

average energy of the population, lowest energy of population and the RMSD of the best individual with the lowest energy.

popul*csv is the status of each generation during the evolution. Each line correponds to the population information of one generation.
interface*csv is similar to popul, but it reports the interface energy value and the iRMSD for each corresponding individual at each generation.
trials*csv is the equivalent file to popul*csv, but it reports the trials (candidates) generated during the each generation. This can be practically useful in case that you want to check if the DE+MC is creating proper candidates that can contribute to the evolution.
time*csv is the computational time (in seconds) for each generation.
best*csv contains, at each line, the rotation (first 3 values) and translation (3 values) of the individual with lowest energy value.

Getting images

Get scatter plot

python ./scripts/make_scatter_plot.py "<path_to_popul*.csv>"

It creates the global energy value vs RMSD plot if input is populcsv or interface energy vs iRMSD plot if input corresponds to interfacecsv. Each point corresponds to an individual in the last generation. Several *csv files can be specified in order to collect the results from different independent runs, where each color corresponds to a run.

Get evolution performance

For each popul*csv

python ./scripts/make_evolution_plot.py <path to evolution*.csv>

Creates a lineplot where y-axis corresponds to the global energy function (used as fitness function during the evolution) and x-axis corresponds to each generation.

Green line corresponds to the average energy value of the population, while the red line corresponds to the lowest energy value of the population. A proper evolution should maintain a close distance between both lines and average line should follow the tend of the lowest energy line. That would indicate that the population evolves towards the best energy individual. In case that there is a large different between both lines, F and CR parameters should be tuned. For example, reducing the exploration of the algorithm by decreasing the value of F.

Differential Evolution Algorithm

Differential Evolution [Price97] is a population-based search method. DE creates new candidate solutions by combining existing ones according to a simple formula of vector crossover and mutation, and then keeping whichever candidate solution has the best score or fitness on the optimization problem at hand.

Bibliography

Storn, R., Price, K. Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. Journal of Global Optimization 11, 341–359 (1997). https://doi.org/10.1023/A:1008202821328

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Protein-Protein docking using a Memetic Algorithm: EvoDOCK

Dependencies

Installation

Basic Usage

Configuration Details

Section [Inputs]

Section [Outputs]

Section [Docking]

Section [DE]

Section [Flexbb] (optional for Docking type "Flexbb")

Section [Refine] (optional for Docking type "Refine")

Interpret output:

Getting images

Get scatter plot

Get evolution performance

Differential Evolution Algorithm

Bibliography

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Andre-lab/evodock

Folders and files

Latest commit

History

Repository files navigation

Protein-Protein docking using a Memetic Algorithm: EvoDOCK

Dependencies

Installation

Basic Usage

Configuration Details

Section [Inputs]

Section [Outputs]

Section [Docking]

Section [DE]

Section [Flexbb] (optional for Docking type "Flexbb")

Section [Refine] (optional for Docking type "Refine")

Interpret output:

Getting images

Get scatter plot

Get evolution performance

Differential Evolution Algorithm

Bibliography

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages