Code run within a Miniconda Virtual Environment built from the environment.yml file
Additional Dataset Details Included in the Article Appendix
bash gen_synth.sh
bash gcn.sh
# Necessary Packages for Minimal Result Replication
# All packages installed via Conda unless specified
Python 3.9
CUDA 11.6
PyTorch 1.13.1
PyTorch Geometric 2.5.2
Torch Scatter 2.1.0
Torch Sparse 0.6.15
Torch Cluster 1.6.0
OGB 1.3.6 # pip
- After gen_synth.py finishes running: 'forward' splits of the shifted dataset will then be available in the dataset/ folder under the name:
{data_name}_{split_type}_0_{valid_rat}_{test_rat}_seed1Dataset
- 'backward' splits swap 'test_rat' and 'valid_rat' parameters:
{data_name}_{split_type}_{test_rat}_{valid_rat}_0_seed1Dataset
- LPShift datasets follow the OGB format for positive samples and HeaRT for negative valid and testing samples.
- We advise running different size batches for training, validation, and testing to ensure efficient run time.
from synth_dataset import SynthDataset
data = SynthDataset(dataset_name="ogbl-collab_CN_2_1_0_seed1").get() # PyG graph object for training adjacency matrix
split_edge = SynthDataset(dataset_name="ogbl-collab_CN_2_1_0_seed1").get_edge_split()
pos_train_edge = split_edge['train']['edge']
pos_valid_edge = split_edge['valid']['edge']
pos_test_edge = split_edge['test']['edge']
with open(f'dataset/{dataset_name}Dataset/heart_valid_samples.npy', "rb") as f:
neg_valid_edge = np.load(f)
neg_valid_edge = torch.from_numpy(neg_valid_edge)
with open(f'dataset/{dataset_name}Dataset/heart_test_samples.npy', "rb") as f:
neg_test_edge = np.load(f)
neg_test_edge = torch.from_numpy(neg_test_edge)
@article{revolinsky2024understanding,
title={Understanding the Generalizability of Link Predictors Under Distribution Shifts on Graphs},
author={Revolinsky, Jay and Shomer, Harry and Tang, Jiliang},
journal={arXiv preprint arXiv:2406.08788},
year={2024}
}