PALP: A Scalable Pretraining Framework for Link Prediction with Efficient Adaptation

This repository contains the official implementation for the paper "A Scalable Pretraining Framework for Link Prediction with Efficient Adaptation" accepted at KDD 2025.

Overview

This repository contains the implementation for reproducing the results presented in Table 2 of our paper. The code currently supports experiments on two downstream datasets:

cora
citeseer

Environment Setup

Prerequisites

CUDA 11.8
Python 3.8 or higher
pip or conda

Installation

# Clone the repository
git clone https://github.com/yourusername/PALP.git
cd PALP

# Install dependencies
pip install -r requirements.txt

Pretraining and Checkpoints

The pretraining of PALP was conducted on ogbn-papers100M. Due to the large scale of this graph, which takes up ~200G to store the node features and structures, we were unable to release the processed data in the repo. To reproduce the results from our paper, we provide the pretrained checkpoints directly in the ckpt-1 and ckpt-2 directories.

Data Structure

The repository contains the following key directories:

Data Directories

node_data/: Contains graph structures and node features. These files were originally processed using TSGFM.
link_data/: Contains training data for link prediction, including:
- Positive/negative edges for train, validation and test
- BUDDY features for edges (processed using subgraph-sketching)

Model Checkpoints

ckpt-1/: Contains pretrained model checkpoints for the node module
ckpt-2/: Contains pretrained model checkpoints for the edge module

Both models were pretrained on ogbn-papers100M. The configuration files for these models are:

model_1_config.yaml
model_2_config.yaml

Usage

To test the pretrained checkpoints with our proposed adaptation strategy, use the following command:

python test_link_merge_all.py --data_name 'cora' --train_ratio 0.4

Command Line Arguments

--data_name: Name of the dataset to use ('cora' or 'citeseer')
--train_ratio: Training data ratio (default: 0.4)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

We thank the authors of TSGFM for their data processing pipeline
We thank the authors of subgraph-sketching for their BUDDY feature implementation
We thank the authors of NAGphormer for their implementation of NAGphormer

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ckpt-1		ckpt-1
ckpt-2		ckpt-2
link_data		link_data
node_data		node_data
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GFM_utils.py		GFM_utils.py
GT.py		GT.py
LICENSE		LICENSE
README.md		README.md
data_buddy.py		data_buddy.py
evaluation_buddy.py		evaluation_buddy.py
gate_models.py		gate_models.py
model_1_config.yaml		model_1_config.yaml
model_2_config.yaml		model_2_config.yaml
pretrain_model_buddy.py		pretrain_model_buddy.py
requirements.txt		requirements.txt
scoring.py		scoring.py
test_link_merge_all.py		test_link_merge_all.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PALP: A Scalable Pretraining Framework for Link Prediction with Efficient Adaptation

Overview

Environment Setup

Prerequisites

Installation

Pretraining and Checkpoints

Data Structure

Data Directories

Model Checkpoints

Usage

Command Line Arguments

License

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

SongYYYY/PALP

Folders and files

Latest commit

History

Repository files navigation

PALP: A Scalable Pretraining Framework for Link Prediction with Efficient Adaptation

Overview

Environment Setup

Prerequisites

Installation

Pretraining and Checkpoints

Data Structure

Data Directories

Model Checkpoints

Usage

Command Line Arguments

License

Acknowledgments

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages