Skip to content
/ fair-cf Public

Code to reproduce the paper "When Collaborative Filtering is not Collaborative: Unfairness of PCA for Recommendations". Published at FAccT'25.

License

Notifications You must be signed in to change notification settings

dliu18/fair-cf

Repository files navigation

Reproducing "When Collaborative Filtering is not Collaborative: Unfairness of PCA for Recommendations"

Authors: David Liu, Jackie Baek, Tina Eliassi-Rad

Published at FAccT'25

ArXiv

DOI

General Notes

  • A conda yml file specifying the computing environment can be found in environment.yml.
  • The directories figs and pickles are created as empty directories that will be populated by the scripts.

Data

The datasets can be accessed from the GroupLens website. For LastFM, download the following zip file and unzip the files into data/lastfm. For MovieLens, download the following zip file and place the unzipped folder (ml-1m) into data/.

To create the train/test datasets for LightGCN, execute:

python write_to_lgn_format.py lastfm
python write_to_lgn_format.py movielens

Execution

Training Item-Weighted PCA

All of the models are implemented in models.py. It is best to generate the Item-Weighted PCA projection matrices in advance and save them to intermediary files. For the paper, we generate three batches of results, differing in how they define $r$ and $\gamma$. The three batches are:

  • sparse: fixes $\gamma=-1$ and runs Item-Weighted PCA for $r \in {2, 4, 8, ... , 1024}$.
  • dense: also fixes $\gamma=-1$ but does a more fine grained sweep of $r$.
  • sweep_gamma: fixes $r=32$ and sweeps $\gamma \in {-2, -1.9, ... , 0}$.

All of the batches can be executed with sbatch sbatch.sh. Before each call, update line 4 with the correct number of jobs, as documented in the same file. Also, in models.py, uncomment the corresponding code block in the main function. Ensure the file_suffix parameter (second parameter to models.py is set to "_sparse", "_dense", or "_sweep_gamma").

Training LightGCN

Execute sweep_dimensions.sh in LightGCN/code via:

./sweep_dimensions.sh movielens

The script will run the LightGCN model for each value of $r \in {2, 4, 8, ... , 1024}$. The output embeddings and predictions for all models will be stored in a single pickle file. Recall that we only run LightGCN for the movielens dataset as LightGCN requires binary interaction data.

The implementation of LightGCN was forked from gusye1234/LightGCN-PyTorch

Instruction to reproduce results in paper

The results figures from the paper can all be reproduced via the main.py script. Below, we specify the modifications to main.py needed for each figure:

  • Figure 2: Uncomment the call to _specialization in the main function (see "Uncomment depending on the figure"). Also uncomment rs = utils.get_ds(R). Execute:
python main.py lastfm-explicit -1 dense
python main.py movielens -1 dense
  • Figure 3: Uncomment the two calls to _performance_by_popularity (one for precision and another for AUC-ROC). Note that setting out_sample=False ensures that the performance metrics are in-sample. Also uncomment rs = utils.get_ds(R). Execute:
python main.py lastfm-explicit -1 dense
python main.py movielens -1 dense
  • Figure 4: Uncomment _aggregate_performance and ensure rs = [2**i for i in range(2, 11)]. Execute:
python main.py lastfm-explicit -1 sparse
python main.py movielens -1 sparse
  • Figure 6: Already reproduced following the steps for Figure 4.

  • Figure 7: In _aggregate_performance, uncomment the addition of "Weighted MF" and "LightGCN" to the model_list. Ensure that all four metric names are included in metrics. Comment out the PCA baseline. In the main function, ensure rs = [2**i for i in range(2, 11)]. Execute:

python main.py lastfm-explicit -1 sparse
python main.py movielens -1 sparse
  • Figure 8: In the main function, uncomment _performance_by_gamma. Execute:
python main.py lastfm-explicit -1 sweep_gamma
python main.py movielens -1 sweep_gamma

For any of the above figures, also comment the final two lines of main.py to save the figure values to a csv file.

Miscellaneous

Utilize the script python check_pickle.py file_suffix to view the contents of an Item-Weighted PCA pickle file. Specify the pickle file with a file_suffix (the "file_suffix" passed to models.py).

The teaser figure (Figure 1), can be reproduced via Teaser.ipynb.

The empirical validation in Figure 5 can be reproduced via Bernoulli matrix eigenvalue scaling.ipynb.

About

Code to reproduce the paper "When Collaborative Filtering is not Collaborative: Unfairness of PCA for Recommendations". Published at FAccT'25.

Topics

Resources

License

Stars

Watchers

Forks