This repository contains the implementation for the paper:
Correcting Exposure Bias for Link Recommendation
International Conference on Machine Learning (ICML) 2021 [PDF]
Shantanu Gupta, Hao Wang, Zachary Lipton, Yuyang Wang
Link recommender systems can exhibit exposure bias when users are systematically underexposed to certain relevant items. If such a recommender system is trained naively on the observed data, it can inherit this bias and underestimate the relevance of low-exposure items. In our paper, we propose three estimators that use learned exposure probabilities to correct for this bias. We use these estimators to construct loss functions for training the link recommender system. The key idea is to weight the positive and negative links differently during training and each of the three loss functions uses a different weighting scheme.
We run experiments on an academic citation network constructed
from the Microsoft Academic Graph (MAG)
dataset. The notebook Train_on_semi_synthetic_data.ipynb contains the implementation of the three loss
functions (i.e., R_w, R_PU, and R_AP) as well as the MLE and No_Prop (which does not use exposure probabilities).
In the notebook, we demonstrate the training pipeline on a small subset of the semi-synthetic dataset
used in the paper.
Our implementation can easily be extended to incorporate other loss functions based on
different weighting schemes.
Each loss function is defined as a separate Keras layer which can
be plugged into the training pipeline.
The notebook contains comments at the relevant locations showing how this can be done.
In the paper, we tested five different loss functions: No_Prop, MLE, R_w, R_PU, and R_AP.
These loss functions can potentially be used as baselines in
future work for evaluating other weighting schemes
for exposure bias correction.
Furthermore, the propensity score model is also implemented as its own Keras layer.
We currently use a simple model where the propensity score only depends on the fields-of-study of the
academic papers. Other propensity models that use different features (e.g. paper text) can be incorporated in
our training pipeline by swapping out that layer with a different one.
The notebook Nodes_for_the_real_data_experiments.ipynb shows how to load the list of
nodes (from the MAG) that we used for our experiments on the two real datasets.
For each academic paper, we generate a 768-dimensional embedding from the paper text (title and abstract)
using SciBERT and bert-as-a-service.
The preprocessed data that contains the embeddings
for each of the papers used in our experiments can be found here.
In the notebook, we also show how to load this data.
If you find this code useful, please consider citing our work:
@inproceedings{gupta2021correcting,
title={Correcting Exposure Bias for Link Recommendation},
author={Gupta, Shantanu and Wang, Hao and Lipton, Zachary C and Wang, Yuyang},
booktitle = {ICML},
year = {2021}
}