Development of molecular dynamics-based features from all-atom simulations of αGAL mutants to assess pharmacoperone responsiveness
| Preprint and code: | |
| Data: |
The repo contains:
- report, this preprint: I. Cazzaniga, T. Giorgino, Development of molecular dynamics-based features from all-atom simulations of αGAL mutants to assess pharmacoperone responsiveness, 2025. preprint
- DGJ, contains mol2 files used to build the correct protein structure and corresponding forcefield parameters (charmm36).
- functions, contains notebooks and scripts used to obtain the data described in the report.
- glycosylation, contains the reglycosylated pdb structure* and informations about the glycans.
- prepared_systems, contains the six systems described in the report (apo and holo, wt, N215S, R301Q) before equilibrating.
- results, contains csv tables and plots obtained from the analysis of the trajectories.
*The glycans in this pdb have been processed as described at step 0.
Trajectories obtained from the molecules present in prepared_systems folder are available at doi:10.5281/zenodo.17552241: 1 ns per frame, 3 replicas, water stripped out.
This package relies on HTMD, Moleculekit and ACEMD software, which can be installed as:
conda create -n ace_software #create a new conda environment (recommended but not necessary)
conda activate ace_software #activate the new conda environment if you decided to create one
conda install htmd acemd cuda-version=12 python=3.10 -c acellera -c conda-forgeThe protein structure we used here is based on the PDB 3GXT model, which underwent a reglycosylation step via glycoshape.org
to have the same glycan structure in each glycosylated sites, in particular:
The NOJ (ligand) will be addressed in step 1 of the tutorial.
It can be easily done by running the mol_prep_fabry.ipynb which includes:
- mutation and glycosylation handling
- NOJ replacement with migalastat (DGJ)
- system segmentation and preparation
- equilibration folder creation
The notebook generates a folder for each mutant apo/holo structure, each folder contains a build folder with the intermediate steps of the system preparation and an equilibration folder.
If multiple replicas of the system are to be run, please make copies of the folders at this point and store them in a parent folder following this pattern:
github_directory/
└── parent_folder/ (i.e. 3GXT_reglyco)
└── structure_replica/ (i.e. apo_1)
├── build/
└── equilibration/
The equilibration must be runned on HPC.
From within the equilibration folder:
conda activate htmd
sbatch sbatch_acemd #single strucutre_replica
for dir in *; do [ -d "$dir/equilibration" ] || { echo "Skipping $dir: no equilibration folder"; continue; }; echo "Entering $dir/equilibration"; (cd "$dir/equilibration" && sbatch ../../../functions/sbatch_acemd); done #multiple submissionsNOTE the single equilibration submission must be launched from within the specific equilibration folder, the multiple submission must be launched from the parernt folder.
If multiple systems are run in parallel, it is possible to check if the equilibrations are ended by running check_end.py, called from the parent folder:
cd 3GXT_reglyco # change with the name of your parent folder
python ../functions/check_end.py equilibrationNOTE this file looks for the slurm file and checks if it ended correctly, more precise evaluation must be carried.
Once the equilibration is completed it is possible to generate the required files for the production run.
From within the specific structure_replica folder do:
conda activate htmd
python ../../functions/production_prep.pyIf running multiple systems in parallel, from the parent folder, do:
conda activate htmd
for dir in */; do [ -d "$dir" ] || continue; echo "Entering $dir"; (cd "$dir" && python ../../functions/production_prep.py); doneAt this point, in both cases, it is possible to run the production. In some cases the production could require multiple job submission if maximum runtime 24 hours, to avoid doing so manually we use sbatch_many.
conda activate htmd
sbatch sbatch_many n #single structure_replica
for dir in *; do [ -d "$dir/production" ] || ( echo "Skipping $dir: no production folder"; continue; ); echo "Entering $dir/production"; (cd "$dir/production" && sbatch ../../../functions/sbatch_many n); done #multiple submissions in parallelWith n being the number of jobs to be generated for each structure_replica (for example, in our case a 1 μs long simulation required about 7 jobs).
If multiple systems are run in parallel, it is possible to check if the equilibrations are ended by running check_end.py from within the parent folder generated by the notebook:
cd 3GXT_reglyco #change with the name of your parent folder
python ../functions/check_end.py productionNOTE this file looks for the slurm file with the highest number (the last run) and checks if it ended correctly, more precise evaluation must be carried.
We included a set of simple analysis based on RMSD and RMSF on protein, ligand and glycans. This involves:
- A non mandatory filter_prod.py filtering step to remove water (TIP3, otherwise needs to be fixed) and reduce the size of both the topology (psf) and trajectory (xtc). The new trajectory is generated by skipping every 10 frames. This can be run from within the parent folder (i. e. 3GXT_reglyco) as:
python residence_time.py-
A notebook for computing the RMSD and RMSF analysis, evaluation.ipynb, with some examples we included in our report. Both rmsd and rmsf functions included in the notebook generate csv files and plots stored in the results/ folder.
-
Python script to compute the residence time of each DGJ in the corresponding monomer, called residence_time.py. This can be run from within the
functionsfolder as:
python ../functions/filter_prod.py #run on GPUs in case of many structuresThe report was conduced as part of the PROPHECY-GlycoRare project, funded by Partenariato Esteso “Health Extended ALliance for Innovative Therapies, Advanced Lab-research, and Integrated Approaches of Precision Medicine -- HEAL ITALIA -- PE 00000019”, a valere sulle risorse del Piano Nazionale di Ripresa e Resilienza (PNRR) Missione 4 “Istruzione e Ricerca” – Componente 2 “Dalla Ricerca all'Impresa” -- Investimento 1.3, finanziato dall’Unione europea – NextGenerationEU -- a valere sull’Avviso pubblico del Ministero dell'Università e della Ricerca (MUR) n. 341 del 15.03.2022 (CUP Spoke leader: Università degli Studi di Milano-Bicocca – CUP H43C22000830006 – Spoke 5 “Next-Gen Therapeutics”).