This repository provides the official implementation for the paper:
Complete-Tree Space Favors Data-Efficient Link Prediction
The codebase is organized as follows:
./
├── dataset/
├── logs/
├── special_req/
│ └── embedding_link_prediction_dw.py
├── main.py
├── utils.py
├── split_datasets.py
├── synthetic_datasets.py
├── dataset_data_efficiency.sh
├── dataset_practicality.sh
├── dataset_scalability.sh
├── exp_data_efficiency.sh
├── exp_practicality.sh
└── exp_scalability.sh
- dataset/: Where the datasets (like
Cora) should be placed. - logs/: Output directory for training and testing logs.
- special_req/: A modified file for CogDL’s
embedding_link_prediction_dw.py. - main.py: Entry point for main experiments.
- utils.py: Utility functions.
- split_datasets.py: Dataset splitter in our setting.
- bash scripts: Shell scripts to run different experiment scenarios.
-
Python Environment
Make sure you have the following packages installed:pytorchscikit-learnnumpypandasnetworkxtorch_geometric
-
Datasets
We use CogDL for automatic dataset downloading. Please install CogDL in your environmentpip install cogdl
or refer to CogDL's GitHub repo for detailed installation instructions.
-
Replace the CogDL Link Prediction Wrapper In your local CogDL installation, find the file:
cogdl/wrappers/data_wrapper/link_prediction/embedding_link_prediction_dw.pyReplace its content with the file provided in our repository:
./special_req/embedding_link_prediction_dw.pyThis step ensures compatibility with our experimental setup.
We provide several pre-configured bash scripts to reproduce different experiment settings described in the paper. All results will be logged in ./logs/ for further analysis. As default, all the metrics including roc_auc, pr_auc, mrr, f1, hits20, hits50, hits100 would be report in every epoch for performance comparison. The loss of each epoch is reported for convergence judgement.
When splitting data, CogDL or torch_geometric may fail to download the dataset. Please manually download them in ./datasets.
To reproduce the data-efficiency experiments (varying cora, citeseer, pubmed, icews18 and ogbl-collab):
bash ./dataset_data_efficiency.sh
bash ./exp_data_efficiency.shTo assess the model’s performance on ogbl-collab and ogbl-ppa with
bash ./data_practicality.sh
bash ./exp_practicality.shTo evaluate scalability on synthetic graphs and the real graph (ogbl-collab):
bash ./dataset_scalability.sh
bash ./exp_scalability.sh