Skip to content

veghen/DivPro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DivPro: Diverse Protein Sequence Design with Direct Structure Recovery Guidance

This is the code for paper DivPro: Diverse Protein Sequence Design with Direct Structure Recovery Guidance.

File structure

We provide the model checkpoint in folder model_ckpt and the TS50 and TS500 datasets in folder data. CATH 4.2 dataset can be downloaded from http://people.csail.mit.edu/ingraham/graph-protein-design/data/. Please put the downloaded chain_set.jsonl and chain_set_splits.json under folder data for inference.

Environemnt setup

pip install torch torchvision torchaudio
pip install tqdm

We recommend running on Linux systems.

Run protein sequence design

Run the following script to start design.

python infer.py <dataset>

The dataset can be CATH (CATH 4.2 test set), ts50 and ts500. The script will sample 5 sequences for each structure in the dataset and print the results of the first structure for demonstration. An example output for running python infer.py ts50:

3a4rA
Native sequence:
GPLGSQELRLRVQGKEKHQMLEISLSPDSPLKVLMSHYEEAMGLSGHKLSFFFDGTKLSGKELPADLGLESGDLIEVWG
Generated sequences:
GPLGSTPIKITVKGNKPDDVLTLDLPPTAPLETVIKEVQKALGLEGAELTFYYNGKKLTGTEYPADLGLKSGDTITIEG
GSLGSKPIKVTVKGDKPDDVLELELEPTAKLKELKEAFLEALGLKGKDLKFYYNGKELTGDEYPEDLGLKDGDTITVKG
GPLGDEPIRVTVRGDKPDDVVTVELRPDEPLAALMAEFQAALGKEGADLTFYYKGKRLSGEELPADLGLKDGDTVTVEG
GSLGSKPIKVTVRGEKKDDVVEVDLAPSAPLKHLIDKFQEALGKKGKDLKFYYNGKELTGSELPSDLGLKSGDVIEVKG
GPLGSTPITLTVVGEDASDVLTITLSPTAPLATVIDAFQEALGLKGADLTFYYNGKKLSGSELPADLGLKSGDTITVTG

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages