Skip to content

netrias/ICEPIC

Repository files navigation

ICEPIC Toolkit and Reproducible Figures

This repository provides:

  1. ICEPIC Toolkit – A reusable Python toolkit for analyzing protein mutation effects using sequence embeddings.
  2. Manuscript Figures Notebook – A reproducibility companion that regenerates all figures for the ICEPIC manuscript.

Repository Contents

File Description
ICEPIC Toolkit Pub.ipynb A reusable, modular toolkit for ice binding protein analysis with ICEPIC. Designed for integration in user workflows.
ICEPIC Manuscript Figures.ipynb A reproducible pipeline that generates all figures and tables presented in the ICEPIC manuscript.

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/ICEPIC-Toolkit.git
    cd ICEPIC-Toolkit
  2. Install dependencies:

    pip install -r requirements.txt

    Alternatively, use conda:

    conda create -n icepic python=3.9
    conda activate icepic
    pip install -r requirements.txt
  3. Please download supporting models and data from: Supporting Data This contains the zipped file for all supporting data and a 'Models' folder for model weights. Please input the saved directories into the toolkit and figures notebook prior to using.


Installing blastp and cd-hit - For use with reproducing manuscript data/figures only (toolkit does not use these)

This project uses the following tools for sequence alignment and clustering:

  • blastp — for pairwise protein sequence alignment.
  • cd-hit — for clustering protein sequences by similarity.

You must install both command-line tools to run the full pipeline.

1. Install blastp (part of NCBI BLAST+)

Option A: Using conda (recommended)

conda install -c bioconda blast

This installs blastp and related tools (makeblastdb, etc.).

Option B: Manual Install (Linux/MacOS)

  1. Go to: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
  2. Download the latest release for your system:
    1. macOS: ncbi-blast-*-x64-macosx.tar.gz
    2. Linux: Linux: ncbi-blast-*-x64-linux.tar.gz
  3. Extract the archive:
tar -xzf ncbi-blast-*-x64-*.tar.gz
  1. Add the bin/ directory to your PATH:
export PATH=$PATH:/path/to/ncbi-blast-*/bin

To make this change permanent, add the line above to your ~/.bashrc, ~/.zshrc, or equivalent shell config file.

2. Install cd-hit

Option A: Using conda (recommended)

conda install -c bioconda cd-hit

Option B: Manual Install (Linux/MacOS)

  1. Clone the repository:
git clone https://github.com/weizhongli/cdhit.git
cd cdhit
  1. Compile with 'make':
make
  1. Add the compiled cd-hit binary to your PATH:
export PATH=$PATH:$(pwd)

Toolkit Overview and Instructions

Prerequisites

Before getting started, make sure you have:

  1. Python 3.8 or higher
  2. Installed dependencies from requirements.txt
  3. Access to a FASTA file or raw amino acid sequences
  4. GPU (optional but recommended for embedding models)

Instructions

  1. Prepare folder directory for protein sequence(s):

    You can use a .fasta file with a single sequence. If intending to run multiple sequences, please store each sequence in separate .fasta files within the same folder directory.

  2. Run all cells in the notebook. The notebook will prompt for:

    1. the folder directory where the .fasta file(s) are stored (in string format)
    2. Whether the .fasta files have headers (Input 'Y' for yes and 'N' for no. - in string format)
    3. the file path for the output CSV containing predictions for all models in the toolkit (in string format)
    4. the concentration (in uM) of protein(s) for activity predictions (Please specify one concentration for all input proteins or individual concentrations for each input protein. - in int format)
  3. The notebook will output the table of predictions with the following columns:

    1. Protein Accession ID: Name of input protein provided by name of .fasta file
    2. Has Ice Binding Potential?: Binary prediction from ice binding model stating if protein is considered 'ice binding'
    3. Ice Binding Potential Probability: Probabilistic score (from 0 to 1) of the likelihood of the protein being 'ice binding'. A value closer to 1 indicates a higher likelihood of ice binding potential
    4. Ice Class Prediction: The predicted class of the protein according to GenBank annotations (antifreeze, ice binding, ice structuring, ice nucleation, non-ice). Note that this prediction is independent of the prediction from the ice binding model given in previous columns.
    5. Expression: The predicted value for the expression of the protein as given by the HiBiT assay in a Pichia host
    6. Activity (TH): The predicted value (in degrees C) for ice binding activity as given by the thermal hysteresis assay
    7. Concentration for Predicted Activity (uM): Concentration used when computing predicted activity (as input by user above)
  4. Running the last cell of the notebook will automatically save the table of predictions to the file path specified by the user as a CSV file.


Reproducing Figures

To regenerate all manuscript figures:

  1. Launch Jupyter:

    jupyter notebook
  2. Open and run ICEPIC Manuscript Figures.ipynb. Notebook sections are labeled for easy navigation for specific data and/or figures.


Citation

If you use this codebase or reference the associated manuscript, please cite: https://www.biorxiv.org/content/10.1101/2025.08.08.669420v1


Contact

For questions or contributions, contact: https://www.netrias.com/contact/ citing ICE-PIC as the Area of Interest


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •