This repository provides:
- ICEPIC Toolkit – A reusable Python toolkit for analyzing protein mutation effects using sequence embeddings.
- Manuscript Figures Notebook – A reproducibility companion that regenerates all figures for the ICEPIC manuscript.
| File | Description |
|---|---|
ICEPIC Toolkit Pub.ipynb |
A reusable, modular toolkit for ice binding protein analysis with ICEPIC. Designed for integration in user workflows. |
ICEPIC Manuscript Figures.ipynb |
A reproducible pipeline that generates all figures and tables presented in the ICEPIC manuscript. |
-
Clone the repository:
git clone https://github.com/your-username/ICEPIC-Toolkit.git cd ICEPIC-Toolkit -
Install dependencies:
pip install -r requirements.txt
Alternatively, use conda:
conda create -n icepic python=3.9 conda activate icepic pip install -r requirements.txt
-
Please download supporting models and data from:
Supporting DataThis contains the zipped file for all supporting data and a 'Models' folder for model weights. Please input the saved directories into the toolkit and figures notebook prior to using.
Installing blastp and cd-hit - For use with reproducing manuscript data/figures only (toolkit does not use these)
This project uses the following tools for sequence alignment and clustering:
blastp— for pairwise protein sequence alignment.cd-hit— for clustering protein sequences by similarity.
You must install both command-line tools to run the full pipeline.
conda install -c bioconda blastThis installs blastp and related tools (makeblastdb, etc.).
- Go to: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
- Download the latest release for your system:
- macOS: ncbi-blast-*-x64-macosx.tar.gz
- Linux: Linux: ncbi-blast-*-x64-linux.tar.gz
- Extract the archive:
tar -xzf ncbi-blast-*-x64-*.tar.gz- Add the bin/ directory to your PATH:
export PATH=$PATH:/path/to/ncbi-blast-*/binTo make this change permanent, add the line above to your ~/.bashrc, ~/.zshrc, or equivalent shell config file.
conda install -c bioconda cd-hit- Clone the repository:
git clone https://github.com/weizhongli/cdhit.git
cd cdhit- Compile with 'make':
make- Add the compiled cd-hit binary to your PATH:
export PATH=$PATH:$(pwd)Before getting started, make sure you have:
- Python 3.8 or higher
- Installed dependencies from requirements.txt
- Access to a FASTA file or raw amino acid sequences
- GPU (optional but recommended for embedding models)
-
Prepare folder directory for protein sequence(s):
You can use a .fasta file with a single sequence. If intending to run multiple sequences, please store each sequence in separate .fasta files within the same folder directory.
-
Run all cells in the notebook. The notebook will prompt for:
- the folder directory where the .fasta file(s) are stored (in string format)
- Whether the .fasta files have headers (Input 'Y' for yes and 'N' for no. - in string format)
- the file path for the output CSV containing predictions for all models in the toolkit (in string format)
- the concentration (in uM) of protein(s) for activity predictions (Please specify one concentration for all input proteins or individual concentrations for each input protein. - in int format)
-
The notebook will output the table of predictions with the following columns:
- Protein Accession ID: Name of input protein provided by name of .fasta file
- Has Ice Binding Potential?: Binary prediction from ice binding model stating if protein is considered 'ice binding'
- Ice Binding Potential Probability: Probabilistic score (from 0 to 1) of the likelihood of the protein being 'ice binding'. A value closer to 1 indicates a higher likelihood of ice binding potential
- Ice Class Prediction: The predicted class of the protein according to GenBank annotations (antifreeze, ice binding, ice structuring, ice nucleation, non-ice). Note that this prediction is independent of the prediction from the ice binding model given in previous columns.
- Expression: The predicted value for the expression of the protein as given by the HiBiT assay in a Pichia host
- Activity (TH): The predicted value (in degrees C) for ice binding activity as given by the thermal hysteresis assay
- Concentration for Predicted Activity (uM): Concentration used when computing predicted activity (as input by user above)
-
Running the last cell of the notebook will automatically save the table of predictions to the file path specified by the user as a CSV file.
To regenerate all manuscript figures:
-
Launch Jupyter:
jupyter notebook
-
Open and run ICEPIC Manuscript Figures.ipynb. Notebook sections are labeled for easy navigation for specific data and/or figures.
If you use this codebase or reference the associated manuscript, please cite: https://www.biorxiv.org/content/10.1101/2025.08.08.669420v1
For questions or contributions, contact: https://www.netrias.com/contact/ citing ICE-PIC as the Area of Interest