DSC180B Capstone Project on Graph Data Analysis
Project Website: https://nhtsai.github.io/graph-rec/
Amazon Product Recommendation using a graph neural network approach.
- dask
- pandas
- torch
- torchtext
- dgl
Amazon Product Dataset from Professor Julian McAuley (link)
- Product Reviews (5-core)
- Product Metadata
- Product Image Features
The graph is a heterogeneous, bipartite user-product graph, connected by reviews.
- Product Nodes (
ASIN)- Features:
title,price, image representation
- Features:
- User Nodes (
reviewerID) - Edges (
user,reviewed,product) and (product,reviewed-by,user)- Features:
helpful,overall
- Features:
We use an unsupervised PinSage model (adapted from DGL).
name: model configuration namerandom-walk-length: maximum number traversals for a single random walk,default: 2random-walk-restart-prob: termination probability after each random walk traversal,default: 0.5num-random-walks: number of random walks to try for each given node,default: 10num-neighbors: number of neighbors to select for each given node,default: 3num-layers: number of sampling layers,default: 2hidden-dims: dimension of product embedding,default: 64 or 128batch-size: batch size,default: 64num-epochs: number of training epochs,default: 500batches-per-epoch: number of batches per training epoch,default: 512num-workers: number of workers, `default: 3 or (#cores - 1)lr: learning rate,default: 3e-4k: number of recommendations,default: 500model-dir: directory of existing model to continue trainingexisting-model: filename of existing model to continue training,default: nullid-as-features: use id as features, makes model transductiveeval-freq: evaluates model on validation set whenepoch % eval-freq == 0, also evaluates model after last training epochsave-freq: saves model whenepoch % save-freq == 0, also saves model after last training epoch