ICEPIC Toolkit and Reproducible Figures

This repository provides:

ICEPIC Toolkit – A reusable Python toolkit for analyzing protein mutation effects using sequence embeddings.
Manuscript Figures Notebook – A reproducibility companion that regenerates all figures for the ICEPIC manuscript.

Repository Contents

File	Description
`ICEPIC Toolkit Pub.ipynb`	A reusable, modular toolkit for ice binding protein analysis with ICEPIC. Designed for integration in user workflows.
`ICEPIC Manuscript Figures.ipynb`	A reproducible pipeline that generates all figures and tables presented in the ICEPIC manuscript.

Installation

Clone the repository:

git clone https://github.com/your-username/ICEPIC-Toolkit.git
cd ICEPIC-Toolkit

Install dependencies:

pip install -r requirements.txt

Alternatively, use conda:

conda create -n icepic python=3.9
conda activate icepic
pip install -r requirements.txt

Please download supporting models and data from: Supporting Data This contains the zipped file for all supporting data and a 'Models' folder for model weights. Please input the saved directories into the toolkit and figures notebook prior to using.

Installing `blastp` and `cd-hit` - For use with reproducing manuscript data/figures only (toolkit does not use these)

This project uses the following tools for sequence alignment and clustering:

blastp — for pairwise protein sequence alignment.
cd-hit — for clustering protein sequences by similarity.

You must install both command-line tools to run the full pipeline.

1. Install `blastp` (part of NCBI BLAST+)

Option A: Using `conda` (recommended)

conda install -c bioconda blast

This installs blastp and related tools (makeblastdb, etc.).

Option B: Manual Install (Linux/MacOS)

Go to: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
Download the latest release for your system:
1. macOS: ncbi-blast-*-x64-macosx.tar.gz
2. Linux: Linux: ncbi-blast-*-x64-linux.tar.gz
Extract the archive:

tar -xzf ncbi-blast-*-x64-*.tar.gz

Add the bin/ directory to your PATH:

export PATH=$PATH:/path/to/ncbi-blast-*/bin

To make this change permanent, add the line above to your ~/.bashrc, ~/.zshrc, or equivalent shell config file.

2. Install cd-hit

Option A: Using `conda` (recommended)

conda install -c bioconda cd-hit

Option B: Manual Install (Linux/MacOS)

Clone the repository:

git clone https://github.com/weizhongli/cdhit.git
cd cdhit

Compile with 'make':

make

Add the compiled cd-hit binary to your PATH:

export PATH=$PATH:$(pwd)

Toolkit Overview and Instructions

Prerequisites

Before getting started, make sure you have:

Python 3.8 or higher
Installed dependencies from requirements.txt
Access to a FASTA file or raw amino acid sequences
GPU (optional but recommended for embedding models)

Instructions

Prepare folder directory for protein sequence(s):

You can use a .fasta file with a single sequence. If intending to run multiple sequences, please store each sequence in separate .fasta files within the same folder directory.
Run all cells in the notebook. The notebook will prompt for:
1. the folder directory where the .fasta file(s) are stored (in string format)
2. Whether the .fasta files have headers (Input 'Y' for yes and 'N' for no. - in string format)
3. the file path for the output CSV containing predictions for all models in the toolkit (in string format)
4. the concentration (in uM) of protein(s) for activity predictions (Please specify one concentration for all input proteins or individual concentrations for each input protein. - in int format)
The notebook will output the table of predictions with the following columns:
1. Protein Accession ID: Name of input protein provided by name of .fasta file
2. Has Ice Binding Potential?: Binary prediction from ice binding model stating if protein is considered 'ice binding'
3. Ice Binding Potential Probability: Probabilistic score (from 0 to 1) of the likelihood of the protein being 'ice binding'. A value closer to 1 indicates a higher likelihood of ice binding potential
4. Ice Class Prediction: The predicted class of the protein according to GenBank annotations (antifreeze, ice binding, ice structuring, ice nucleation, non-ice). Note that this prediction is independent of the prediction from the ice binding model given in previous columns.
5. Expression: The predicted value for the expression of the protein as given by the HiBiT assay in a Pichia host
6. Activity (TH): The predicted value (in degrees C) for ice binding activity as given by the thermal hysteresis assay
7. Concentration for Predicted Activity (uM): Concentration used when computing predicted activity (as input by user above)
Running the last cell of the notebook will automatically save the table of predictions to the file path specified by the user as a CSV file.

Reproducing Figures

To regenerate all manuscript figures:

Launch Jupyter:
```
jupyter notebook
```
Open and run ICEPIC Manuscript Figures.ipynb. Notebook sections are labeled for easy navigation for specific data and/or figures.

Citation

If you use this codebase or reference the associated manuscript, please cite: https://www.biorxiv.org/content/10.1101/2025.08.08.669420v1

Contact

For questions or contributions, contact: https://www.netrias.com/contact/ citing ICE-PIC as the Area of Interest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ICEPIC Toolkit and Reproducible Figures

Repository Contents

Installation

Installing `blastp` and `cd-hit` - For use with reproducing manuscript data/figures only (toolkit does not use these)

1. Install `blastp` (part of NCBI BLAST+)

Option A: Using `conda` (recommended)

Option B: Manual Install (Linux/MacOS)

2. Install cd-hit

Option A: Using `conda` (recommended)

Option B: Manual Install (Linux/MacOS)

Toolkit Overview and Instructions

Prerequisites

Instructions

Reproducing Figures

Citation

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
ICEPIC Manuscript Figures.ipynb		ICEPIC Manuscript Figures.ipynb
ICEPIC Toolkit Pub.ipynb		ICEPIC Toolkit Pub.ipynb
README.md		README.md
requirements.txt		requirements.txt

netrias/ICEPIC

Folders and files

Latest commit

History

Repository files navigation

ICEPIC Toolkit and Reproducible Figures

Repository Contents

Installation

Installing blastp and cd-hit - For use with reproducing manuscript data/figures only (toolkit does not use these)

1. Install blastp (part of NCBI BLAST+)

Option A: Using conda (recommended)

Option B: Manual Install (Linux/MacOS)

2. Install cd-hit

Option A: Using conda (recommended)

Option B: Manual Install (Linux/MacOS)

Toolkit Overview and Instructions

Prerequisites

Instructions

Reproducing Figures

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Installing `blastp` and `cd-hit` - For use with reproducing manuscript data/figures only (toolkit does not use these)

1. Install `blastp` (part of NCBI BLAST+)

Option A: Using `conda` (recommended)

Option A: Using `conda` (recommended)

Packages