0% found this document useful (0 votes)
68 views

NLMF: Nonlinear Matrix Factorization Methods For Top-N Recommender Systems

This document summarizes a research paper that proposes a new method called NLMF (Non Linear Matrix Factorization) for top-N recommender systems. NLMF models each user as having both a global preference and interest-specific preferences to better capture a user's multiple interests and preferences that vary across interests. It represents users with a tensor of latent factors instead of a single vector. The proposed method is shown to outperform other state-of-the-art recommendation methods on multiple datasets.

Uploaded by

Guru Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

NLMF: Nonlinear Matrix Factorization Methods For Top-N Recommender Systems

This document summarizes a research paper that proposes a new method called NLMF (Non Linear Matrix Factorization) for top-N recommender systems. NLMF models each user as having both a global preference and interest-specific preferences to better capture a user's multiple interests and preferences that vary across interests. It represents users with a tensor of latent factors instead of a single vector. The proposed method is shown to outperform other state-of-the-art recommendation methods on multiple datasets.

Uploaded by

Guru Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

NLMF: NonLinear Matrix Factorization Methods


for Top-N Recommender Systems
Santosh Kabbur and George Karypis
Department of Computer Science, University of Minnesota
Twin Cities, USA
{skabbur,karypis}@cs.umn.edu

Abstract—Many existing state-of-the-art top-N recommenda- SVD++ (4)). In content based methods, users/items features
tion methods model users and items in the same latent space and are used to build models (5; 6). In this work, we limit our
the recommendation scores are computed via the dot product focus to only CF based methods.
between those vectors. These methods assume that the user
preference is consistent across all the items that he/she has rated. One of the recently developed methods called MaxMF (7),
This assumption is not necessarily true, since many users can extends the traditional Matrix Factorization (MF) based ap-
have multiple personas/interests and their preferences can vary proaches by representing the user with multiple latent vectors,
with each such interest. To address this, a recently proposed each corresponding to a different “taste” associated with the
method modeled the users with multiple interests. In this paper,
we build on this approach and model users using a much richer user. These different tastes associated with each user repre-
representation. We propose a method which models the user sentation are termed as interests. The assumption behind this
preference as a combination of having global preference and approach is that, by letting the users to have multiple interests,
interest-specific preference. The proposed method uses a nonlin- it helps to capture user preferences better, especially when the
ear model for predicting the recommendation score, which is used itemsets or user’s interests is diverse. The authors then propose
to perform top-N recommendation task. The recommendation
score is computed as a sum of the scores from the components a max function based non linear model, which takes the
representing global preference and interest-specific preference. A maximum scoring interest as the final recommendation score
comprehensive set of experiments on multiple datasets show that for a given user item pair. It was shown that MaxMF achieves
the proposed model outperforms other state-of-the-art methods better recommendation performance than other state-of-the-art
for top-N recommendation task. methods. However, one of the limitations of this method is, it
Keywords-Database Applications, Data mining, Personaliza- models the users with only interest-specific component. This
tion, Mining methods and algorithms can potentially dilute the learnt latent factors for users who
have not provided enough preferences or who do not have
enough diversity in their itemsets due to lack of support (in
I. I NTRODUCTION
terms of number of rated items) for each of the interests.
Recommender Systems are prevelant and are widely used in In this paper, we propose a new method called NLMF
many applications. Specifically, top-N recommender systems (Non Linear Matrix Factorization), which models the user
are widely used in e-commerce applications to recommend as a combination of global preference and interest-specific
ranked list of items to users in order to identify the items that latent factors. This representation of user allows NLMF to
best fit their personal tastes obtained from their feedback. Over effectively capture both the global preference and multiple
the years many algorithms and methods have been developed interest-specific preference. This approach implicitly allows
to address the top-N recommender problem (1; 2). These the model to strike a balance between the global and interest-
algorithms make use of the user feedback data available in the specific components. Our experimental evaluation on multiple
form of purchase, rating or review. The existing methods can datasets show that NLMF performs better than MaxMF and
be broadly classified into two groups: collaborative filtering other state-of-the-art methods.
(CF) based methods and content based methods. User/Item The key contributions of the work presented in this paper
co-rating is used in collaborative filtering methods to build are the following:
models. Typically these methods represent the user ratings on
items in a user-item rating matrix and act on it. One class (i) proposes a new nonlinear method, which models the users
of state-of-the-art methods in top-N recommender problem is multiple interests as a combination of global and interest-
based on learning latent factors for users and items. In these specific preferences;
methods, users and items are represented as vectors in common (ii) proposes two different approaches based on shared and
latent space and the recommendation score for a given user and independent item factors between the global preference
item pair is computed as the dot product of the corresponding and interest-specific preferences; and
user and item latent vectors. Most notable methods rely on (iii) compares the performance of the proposed method with
matrix factorization (MF) (2) or singular value decomposition other state-of-the-art methods for top-N recommendation
(SVD) (3) to learn the user and item latent factors. Some task, and investigates the impact of various parameters
extensions and variations to SVD are also proposed (like as they relate to number of latent factors and number of
2

interests. TABLE I: Symbols used and definitions.


The rest of the paper is organized as follows. Section II
introduces the notations used in this paper. In Section III, we Symbol Definition
present the relavant existing methods. Section IV motivates the Set of users.
C
need for a better model and constrasts the proposed method Set of items.
D
with the existing ones. In Section V, we present the details u Individual user u.
of the NLMF methods. Section VI presents the evaluation i Individual item i.
methodology, the data sets used along with their character- n Number of users.
istics and the details of the baseline algorithms that we will m Number of items.
be comparing the proposed approach with. In Section VII, k Number of latent factors.
we present the experimental evaluation with the discussion. l Number of latent factors for interest-specific
Finally, Section VIII provides the concluding remarks. component in NLMFi.
T Number of user interests.
II. N OTATIONS R Binary Rating Matrix, R ∈ Rn×m .
In this paper, all vectors are represented by bold lower case rui Rating by user u on item i.
letters and they are row vectors (e.g., p, q). All matrices are r̂ui Predicted rating for user u on item i.
represented by bold upper case letters (e.g., R, W). The ith P User Latent Factor Matrix, P ∈ Rn×k .
row of a matrix A is represented by ai . We use calligraphic Q Item Latent Factor Matrix, Q ∈ Rm×k .
letters to denote sets (e.g., C , D). A predicted and an estimated
value are denoted by having aˆ(hat) over it (e.g., r̂). W User Latent Factor Tensor, W ∈ Rn×k ×T .
Y Item Latent Factor Matrix, for interest-specific
C and D are used to denote the sets of users and items,
respectively, whose respective cardinalities are n and m (i.e., component in NLMFi, Y ∈ Rm×k .
|C | = n and |D| = m). Matrix R will be used to represent the λ `F regularization weight.
user-item implicit feedback (purchase/review) matrix of size ρ Sampling factor for learning algorithm.
n × m, i.e., R ∈ Rn×m . Symbols u and i are used to denote η Learning Rate for learning algorithm.
individual users and items, respectively. An entry (u, i) in R,
denoted by rui , is used to represent the rating on item i by A recent method for top-N recommendation task proposed
user u. R is a binary matrix. That is, if the user has provided by Weston et. al. called MaxMF (7) defines T interest latent
feedback for a particular item, then the corresponding entry in vectors per user. The user factors are thus represented by
R is 1, otherwise it is 0. We will refer to the entries for which
a tensor P, where P ∈ Rn×k ×T . The items factors, Q
the user has provided feedback as rated items and those for
remains similar to MF based approaches. Thus, each user u is
which the user has not provided feedback as unrated items.
represented by pu , where pu ∈ k × T . For a given user u and
For quick reference, all the important symbols used in this
item i pair, the predicted recommendation score is calculated
paper, along with their definition is summarized in Table I.
by computing T dot products between each of the T user
vectors and the corresponding item vector. The highest scoring
III. R EVIEW OF R ELEVANT R ESEARCH dot product is taken as the estimated/predicted rating. That is,
UserKNN (8; 9) is a classical user based CF method, which
computes k-nearest neighbors for each user, based on their r̂ui = max put qT
i, (2)
t=1,...,T
rating profiles. These nearest neighbors are then used to predict
where the max function computes the maximum of the set of
the rating for a user on an unrated item as the weighted average
dot products between each of put and qi .
of the rating of the nearest neighbors of the user. This method
The main intuition behind this approach is that, the user is
is nonlinear in terms of the preferences of the user, which are
represented with T different interests and the interest which
implicitly captured via the nearest neighbors. However, this
matches the best with the given item is captured using the
method relies on the co-rating information between the users
max function. In other words, the set of items is partitioned
to compute the similarity. Thus, it suffers from data sparsity
into T partitions for each user, and this partitioning process
issue and fails to capture relations between users who do not
is personalized on the user. For each such item partition, a
have enough co-rated items.
different scoring function is used to estimate the rating.
In the recent user-item factorization methods based on MF
(3), the rating matrix R is approximated as a product of two
low-rank matrices P and Q, where P ∈ Rn×k is the users IV. M OTIVATION
latent vector matrix, Q ∈ Rm×k is the items latent vector Users typically provide ratings to only a handful of items
matrix, k is the number of latent factors and k  n, m. The out of possible thousands or millions of items. Due to limited
recommendation score of a user u for item i is predicted as, preferences given out by users, the user-item rating matrix
becomes sparse. Methods like MaxMF learns only interest-
r̂ui = pu qT
i, (1)
specific user preferences, by implicitly partitioning the items
where pu is the latent vector associated with the user u and rated by the user into multiple subsets and learning a separate
qi is the latent vector associated with the item i. user latent preference vector for each partition. In case of
3

users who have not provided sufficient ratings, learning only where wut is the user latent vector for u in the interest-
interest-specific preferences will result in lesser support (in specific preference component corresponding to the interest
terms of number of items) for each interest. This can poten- t and yi is the item latent vector in the interest-specific
tially affect the learning process and can result in learning preference component. We can see that, for a given item i,
less meaningful (latent) factors for all the item partitions NLMFi has two independent item factors (qi and yi ), each one
corresponding to that user. corresponding to the global preference and interest-specific
To overcome this problem, our proposed approach NLMF preference components.
learns the user preferences as a combination of global prefer- The recommendation score r̂ui for a given user u and item
ence and interest-specific preference components. The global i is computed as,
preference is learned using all the ratings provided by the
user. Thus, it helps to better estimate the user’s preferences r̂ui = pu qT T
i + max wut yi (6)
t=1,...,T
when the available data is limited. With regularization, this
method allows the model to be flexible, i.e., it implicitly allows where pu and qi are the user and item latent vectors in
the learning process to strike a balance between the global the global preference component respectively. Thus, NLMFi
preference and interest-specific preference components. Hence is an additive model which independently learns two non-
this model is expected to perform better than MaxMF. overlapping models corresponding to global preference and
interest-specific component and computes their sum as the
V. NLMF - N ONLINEAR M ETHODS FOR CF final prediction score.
In NLMF, given a user u and an item i, the estimated Note that, the number of latent factors for the global
rating r̂ui is given by the sum of the estimations from global preference component (i.e., pu qT i ) and the interest-specific
preference and interest-specific preference components. That component (i.e., wut yiT ) need not be the same. Thus, this
is, model has the flexibility of having different number of latent
r̂ui = pu qT
i + max f (u, i, t), (3) factors for the two components. We use k to represent the
t=1,...,T
number of latent factors for the global preference component
where pu is the latent vector associated with user u and qi (i.e., pu , qi ∈ R1×k ) and we use l to represent the number
is the latent vector associated with item i. Thus, pu qT i gives of latent factors for the interest-specific component (i.e.,
the prediction score from global preference component of the wut , yi ∈ R1×l ).
model and f (u, i, t) is the prediction score from interest- In NLMFi, the martices P, Q, W and Y are learned by
specific preference component. The final prediction score minimizing the following regularized optimization problem:
is the sum of the predictions from global preference and
interest-specific preference components. Figure 1 illustrates 1 X λ
minimize krui − r̂ui k2F + (kPk2F + kQk2F +
the overview of the NLMF method. P,Q,W,Y 2 2
u,i∈R
The selection of the best interest t∗ is done by choosing
the interest which results in the maximum score from the kWk2F + kYk2F ), (7)
multiple interests model. The max function is used to compute where λ is the l2 -regularization constant for latent factor
the maximum recommendation score for the item amongst all matrices. l2 regularization is used to prevent overfitting.
the interests of the user in the multiple interest function. The The optimization problem in Equation 7 is solved using
intuition behind this idea is that, for an item to be ranked a Stochastic Gradient Descent (SGD) algorithm (10). Algo-
higher in the top-N list of the user, at least one of the interests rithm 1 provides the detailed procedure and the gradient update
of the user must provide a high score for that item. rules for the learning algorithm. Initially the matrices P, Q
We use squared error loss function to compute and minimize and Y, and tensor W are initialized with small random values
the loss. That is, as the initial estimate. Then, in each iteration the parameter
XX
L(·) = (rui − r̂ui )2 , (4) values are updated based on the gradients computed w.r.t. the
i∈D u∈C
parameter being updated. This process is repeated until the
error on validation set does not decrease further or the number
where rui is the ground truth value and r̂ui is the estimated of iterations has reached a predefined threshold.
value.
Note that the gradient updates for model paramters are
We propose two different methods to represent the interest-
computed for both rated and non-rated entries of R. This
specific preference component, f (u, i, t). First one has inde-
is in accordance with common practice followed for top-
pendent item factors in f (u, i, t) compared to that of global
N recommendation problem (3; 11; 12). This is in contrast
preference component, whereas the second one shares the item
with rating prediction problem, where only the rated items
factors of f (u, i, t) with the global preference component.
are typically used for computing gradient updates. In order to
These two methods are described in the next two sections.
reduce the computational complexity of the learning process,
A. NLMFi - Independent Item Factors the zero entries corresponding to non-rated items are sampled
and used along with all the non-zero entries (corresponding to
In NLMFi, the interest-specific preference component
rated items) of R. Given a sampling constant ρ and nnz(R),
f (u, i, t) is given by,
the number of non-zeros in R, ρ · nnz(R) zeros are sampled
f (u, i, t) = wut yiT , (5) and used for optimization in each iteration of the learning
4

Item-1 Item-2 Item-3 Item-n


Preferences

All Preferences Personalized Item Clustering

Interest-1
Learned User
Preferences Global Preference Interest-2

Interest-3

Global Component Interest-Specific Component


Learning

Fig. 1: NLMF Method Overview.

algorithm. Our experimental results indicate that a small value B. NLMFs - Shared Item Factors
of ρ (in the range 3−5 is sufficient to produce the best model.
This sampling strategy makes NLMF methods computationally
efficient and scalable. In NLMFs, the interest-specific component f (u, i, t) is
given by,
Algorithm 1 NLMFi:Learn. f (u, i, t) = wut qT
i, (8)
1: procedure NLMF I L EARN
2: η ← learning rate where wut is the user latent vector for u in the interest-specific
3: λ ← `F regularization weight component corresponding to the interest t and qi is the shared
4: ρ ← sample factor item latent vector between the global preference and interest-
5: iter ← 0 specific components. By using the shared item latent vectors,
6: Init P, Q, W and Y with random values in (-0.001, this model has the ability to transfer the learning between the
0.001) global preference and interest-specific components. Contrast
7:
this with NLMFi model, which has independent item factors
8: while iter < maxIter or error on validation set (qi and yi ) for global preference and interest-specific compo-
decreases do nents.
9: R0 ← R ∪ SampleZeros(R, ρ) The recommendation score r̂ui for a given user u and item
10: R0 ← RandomShuffle(R0 ) i is computed as,
11:
12: for all rui ∈ R0 do r̃ui = pu qT T
i + max wut qi (9)
t=1,...,T
13: r̂ui ← pu qT T
i + max wut yi
t=1,...,T where pu is the user latent vector for u in the global preference
14: component.
15: t∗ ← interest corresponding to max score
16: eui ← rui − r̂ui In NLMFs, the martices P, Q and W are learned by
17: pu ← pu + η · (eui · qi − λ · pu ) minimizing the following regularized optimization problem:
18: qi ← qi + η · (eui · pu − λ · qi ) 1 X λ
19: wut∗ ← wut∗ + η · (eui · yi − λ · wut∗ ) minimize krui − r̂ui k2F + (kPk2F + kQk2F
P,Q,W 2 2
20: yi ← yi + η · (eui · wut − λ · yi ) u,i∈R
21: end for + kWk2F ), (10)
22:
23: iter ← iter + 1 where the common terms mean the same as in Equation 7.
24: end while Similar to NLMFi, the optimization problem in Equation 10
25: is solved using a SGD based algorithm. The details of the
26: return P, Q, W, Y procedure is presented in Algorithm 2. The learning algorithm
27: end procedure and details are similar to Algorithm 1, except the gradient
update rules.
5

Algorithm 2 NLMFs:Learn. TABLE II: Datasets.


1: procedure NLMF S L EARN
2: η ← learning rate Dataset #Users #Items #Ratings Rsize Csize Density
3: λ ← `F regularization weight
Netflix 5,403 2,933 2,197,096 406.64 749.09 13.86%
4: ρ ← sample factor
5: iter ← 0 Flixster 4,627 3,295 1,184,817 256.06 359.58 7.77%
6: Init P, Q and W with random values in (-0.001,
0.001) The “#Users”, “#Items” and “#Ratings” columns are the
7:
number of users, items and ratings respectively, in each of the
8: while iter < maxIter or error on validation set datasets. The “Rsize” and “Csize” columns are the average
decreases do number of ratings for each user and for each item (i.e., row
9: R0 ← R ∪ SampleZeros(R, ρ) and column density of the user-item matrix), respectively, in
10: R0 ← RandomShuffle(R0 ) each of the datasets. The “Density” column is the density of
11:
each dataset (i.e., density = #Ratings/(#Users × #Items)).
12: for all rui ∈ R0 do
r̂ui ← pu qT T B. Evaluation Methodology
13: i + max wut qi
t=1,...,T
14: To evaluate the performance of the proposed model, we
15: t∗ ← interest corresponding to max score employ a 5-fold Leave-One-Out-Cross-Validation (LOOCV)
16: eui ← rui − r̂ui method similar to the one employed in (11; 12). Training and
17: pu ← pu + η · (eui · qi − λ · pu ) test set is created by randomly selecting one item per user
18: qi ← qi + η · (eui · (pu + wut∗ ) − λ · qi ) from the dataset and placing it in the test set. The rest of the
19: wut∗ ← wut∗ + η · (eui · qi − λ · wut∗ ) data is used as the training set. This process is repeated to
20: end for create five different folds. Training set is used to build the
21: model and the trained model is used to generate a ranked list
22: iter ← iter + 1 of size-N items for each user. The model is then evaluated by
23: end while comparing the ranked list of recommended items with the item
24: in the test set. N is equal to 10, for all the results presented
25: return P, Q, W in this paper.
26: end procedure The recommendation quality is measured using Hit Rate
(HR) and Average Reciprocal Hit Rank (ARHR) (14). HR is
C. Scalability defined as,
#hits
The optimization algorithm used in the training phase of HR = ,
#users
NLMFs and NLMFi is based on SGD algorithm. The gradient
computations and updates for SGD can be parallelized. Hence, where #hits is the number of users for which the model
these algorithms can be efficiently applied to larger datasets. was successfully able to recall the test item in the size-N
In (13), a distributed SGD is proposed. A similar algorithm recommendation list and #users is the total number of test
with modifications can be used to scale the NLMF methods to users. The ARHR is defined as,
larger datasets. Software packages like Spark1 can be used to #hits
1 X 1
execute SGD based algorithms on a large cluster of processing ARHR = ,
nodes. #users i=1 posi
where posi is the position of the test item in the ranked
recommendation list for the ith hit. ARHR represents the
VI. E XPERIMENTAL E VALUATION weighted version of HR, as it measures the inverse of the
A. Data Sets position of the recommended item in the ranked list.
We chose HR and ARHR as evaluation metrics since they
We evaluated the performance of NLMF methods on two directly measure the performance of the model on the ground
different real datasets, namely Netflix and Flixster. Netflix is truth data i.e., what users have already provided feedback for.
a subset of data extracted from Netflix Prize Dataset2 and
Flixster is a subset of data extracted from publicly available
data set collected from Flixster3 . For both the datasets, we C. Comparison Algorithms
removed the top 5% of the frequently rated items. All the
ratings were binarized, i.e., converted to 1. The characterisitcs We compare the performance of NLMF against that
of all the datasets is summarized in Table II. achieved by UserKNN (14), PureSVD (3), BPRMF (15),
SLIM (11) and MaxMF (7). This set of methods constitute the
1 http://spark.apache.org/ current state-of-the-art for top-N recommendation task. Hence
2 http://www.netflixprize.com/ they form a good set of methods to compare and evaluate our
3 http://www.flixster.com/ proposed approach against.
6

VII. R ESULTS
0.198
The experimental evaluation consists of three parts. First,
we assess the effect of various model parameters of NLMF 0.196

on the recommendation performance. These include how the


0.194
number of latent factors and the number of interests affect the
top-N performance. Second, we present the top-N performance 0.192 K = 32

HR
comparison with the MaxMF method, which is also a non- K = 64
0.190
linear method based on modeling user with multiple interests. K = 96
K = 128
Due to lack of space, we present these studies only for 0.188
the Netflix dataset. However the same trend in results and
conclusions carry over to the Flixster dataset as well. In the 0.186

third part of the results, we present the comparison with other


0.184
competing state-of-the-art methods (Section VI-C). 32 64 96 128 160
L

A. Effect of Number of Latent Factors


Figure 2 shows the effect of varying the number of latent Fig. 3: NLMFi - Effect of Number of Latent Factors.
factors (k) on the performance of the NLMFs model. For this
experiment, the number of interests was set to 2 (i.e., T = same as k. We can see that in both models, the performance
2). We can see that the hit-rate gradually increases with the initially increases with increasing T and reaches a peak value
increasing number of latent factors and reaches the peak when when T = 3 or T = 4. This indicates that, modeling the
k = 192 and then it starts to decline. The possible reason for users with three or four distinct interests provides the best
this is that, the model starts to overfit the training data due to recommendation performance. Further increasing the value of
large number of latent factors. T , the performance starts to decrease. This is possibly due to
the fact that, the support for each interest in terms of number
of items decreases, thus leading to learning less meaningful
0.200
user preferences for different interests.
0.195

0.196
0.190
HR

0.194
0.185

0.192
0.180
NLMFs K = 64
HR

0.190 NLMFs K = 128


0.175 NLMFi K = 64
32 64 96 128 160 192 224 0.188 NLMFi K = 128
K
0.186

Fig. 2: NLMFs - Effect of Number of Latent Factors. 0.184


1 2 3 4 5

For NLMFi, Figure 3 shows the effect of varying number of #Interests


latent factors. Since NLMFi model can have different number
of latent factors for global preference (k) and interest-specific
Fig. 4: Effect of Number of Interests.
preference (l) components, the figure shows multiple line
graphs, each corresponding to a different value of k. For a
given value of k, l is varied and the performance is presented
C. Comparision with MaxMF
in this figure. Similar to NLMFs, the performance for NLMFi
initially increases with l and then either plateaus out or starts As discussed in Section III, MaxMF is an existing method
declining due to overfitting. For this experiment, the number which also employs a non-linear method based on max
of interests was set to 2 (i.e., T = 2) function to learn multiple interest preferences for users. In this
study, we compare the performance of NLMF methods with
MaxMF for different number of latent factors (k) and interests
B. Effect of Number of Interests (T ). The results of this study are presented in Figure 5. We
In this study, we compare the effect of number of interests can see that both NLMF methods outperform MaxMF for
(T ) on the recommendation performance. Figure 4 shows different values of k, with NLMFi performing better than
the results for both NLMFs and NLMFi models for two NLMFs. It is interesting to note that, for some values of
different values of k. For NLMFi model, we have set l to be k, one interest model of MaxMF performs better than the
7

two interests one. Whereas, for NLMF methods, two interests two components compared to NLMFs, which shares the item
model performs better than one interest model for all values factors during the learning process.
of k. This is possibly due to the reason that, MaxMF learns
only the interest-specific user preference, which can potentially VIII. C ONCLUSION
lead to decrease in the support for each interest in terms of In this paper we presented a non-linear matrix factorization
number of items. On the other hand, NLMF methods balance based method (NLMF) for the top-N recommendation task.
the interest-specific preference with the global preference by NLMF models the users preference using a richer repre-
learning a combined model with both the global preference sentation using a nonlinear model for predicting the recom-
and interest-specific components. mendation score to perform top-N recommendation task. The
recommendation score is computed as a sum of the scores
from the components representing the global preference and
0.20
interest-specific user preference. For modeling the interest-
specific component, we presented two different approaches.
0.19 First approach learns the item factors independently in the
MaxMF (T = 1) global preference and interest-specific components, whereas
MaxMF (T = 2)
the second approach shares the item factors between the
HR

0.18 NLMFs (T = 1)
NLMFs (T = 2) global preference and interest-specific components. The results
NLMFi (T = 1) showed that the proposed method outperforms rest of the
NLMFi (T = 2)
0.17 state-of-the-art methods in terms of top-N recommendation
performance. As future work, we plan to evaluate this method
0.16 on multiple datasets at different sparsity levels to measure how
32 64 96 128 160
NLMF methods perform relative to other methods when the
# Factors
training data gets sparser. We also plan to extend this work
for rating prediction task.
Fig. 5: Comparison with MaxMF.
ACKNOWLEDGEMENTS
This work was supported in part by NSF (IIS-0905220,
D. Comparision with Other Approaches OCI-1048018, CNS-1162405, IIS-1247632, IIP-1414153, IIS-
Table III shows the overall recommendation performance 1447788), Army Research Office (W911NF-14-1-0316), Intel
of NLMF methods in terms of HR and ARHR in comparison Software and Services Group, and the Digital Technology
to other state-of-the-art methods (Section VI-C). For all the Center at the University of Minnesota. Access to research and
results presented, the number of top-N items chosen is 10 (i.e., computing facilities was provided by the Digital Technology
N = 10). Following parameter space was explored for each of Center and the Minnesota Supercomputing Institute.
the methods and the best performing model in that parameter
space in terms of HR is reported. For UserKNN, PureSVD and R EFERENCES
BPRMF, parameter k was selected from the range 2 to 800. [1] G. Adomavicius and A. Tuzhilin, “Toward the next
Learning rate for BPRMF was selected from the range 10−5 generation of recommender systems: A survey of the
to 1.0, with a multiplicative increment of 10. For SLIM, the state-of-the-art and possible extensions,” Knowledge and
regularization constants were selected from the range 10−5 Data Engineering, IEEE Transactions on, vol. 17, no. 6,
to 20. For MaxMF and NLMF methods the regularization pp. 734–749, 2005.
constants were selected from the range 10−5 to 5 and learning [2] F. Ricci, L. Rokach, B. Shapira, and P. Kantor, “Rec-
rate was selected from the range 10−5 to 1.0. ommender systems handbook,” Recommender Systems
The results in Table III show that NLMF methods perform Handbook:, ISBN 978-0-387-85819-7. Springer Sci-
better than the rest of the competing methods for all the ence+ Business Media, LLC, 2011, vol. 1, 2011.
datasets. The performance gains of NLMF methods compared [3] P. Cremonesi, Y. Koren, and R. Turrin, “Performance
to the next best performing baseline method are of the order of recommender algorithms on top-n recommendation
of 6% and 10% for Netflix and Flixster respectively. Note tasks,” in Proceedings of the fourth ACM conference on
that, contrary to the results presented in (7), the MaxMF Recommender systems, 2010, pp. 39–46.
model does not outperform PureSVD for the datasets con- [4] Y. Koren, “Factorization meets the neighborhood: a mul-
sidered in this study. In terms of the two proposed NLMF tifaceted collaborative filtering model,” in Proceeding
methods, independent item factors model (NLMFi) achieved of the 14th ACM SIGKDD international conference on
better performance than shared item factors model (NLMFs). Knowledge discovery and data mining. ACM, 2008, pp.
The reason for this could be that, NLMFi has the ability to 426–434.
learn the global preference and interest-specific components [5] R. J. Mooney and L. Roy, “Content-based book rec-
independently, as the items factors are not overlapping, thereby ommending using learning for text categorization,” in
resulting in learning better respresentation of users and items. Proceedings of the fifth ACM conference on Digital
This allows the model to strike a better balance between the libraries. ACM, 2000, pp. 195–204.
8

TABLE III: Comparison of performance of top-N recommendation algorithms with NLMF

Netflix Flixster
Method
Params HR ARHR Params HR ARHR
UserKNN 100 - - - 0.1412 0.0515 100 - - - 0.1013 0.0295
PureSVD 50 - - - 0.1821 0.0807 100 - - - 0.1273 0.0494
BPRMF 400 0.01 - - 0.1890 0.0813 200 0.01 - - 0.1165 0.0437
SLIM 0.001 0.1 - - 0.1888 0.0872 0.01 1.0 - - 0.1303 0.0502
MaxMF 192 2 0.0005 0.0005 0.1743 0.0704 160 2 0.0001 0.0005 0.1345 0.0493
NLMFs 192 2 0.01 0.0005 0.1975 0.0870 256 2 0.01 0.005 0.1401 0.0532
NLMFi 256/160 2 0.008 0.001 0.1999 0.0835 288/192 2 0.01 0.001 0.1441 0.0546

Columns corresponding to “params” indicate the model parameters for the corresponding method. For UserKNN
method, the parameter is the number of neighbors. For PureSVD method, the parameter is the number of latent factors.
For BPRkNN method, the parameters are the number of latent factors used and the learning rate. For SLIM method,
the parameters correspond to the `2 and `1 regularization constants. For MaxMF and NLMFs methods, the parameters
correspond to the number of latent factors, number of interests, regularization constant and learning rate. For NLMFi
method, the parameters correspond to number of latent factors for global preference and interest-specific preference
components, number of interests, regularization constant and learning rate. The columns corresponding to HR and
ARHR represent the hit rate and average reciprocal hit rank metrics. Underlined numbers represent the best performing
model measured in terms of HR for each dataset.

[6] M. Pazzani and D. Billsus, “Content-based recommen- [11] X. Ning and G. Karypis, “Slim: Sparse linear methods for
dation systems,” The adaptive web, pp. 325–341, 2007. top-n recommender systems,” in Data Mining (ICDM),
[7] J. Weston, R. J. Weiss, and H. Yee, “Nonlinear latent 2011 IEEE 11th International Conference on. IEEE,
factorization by embedding multiple user interests,” in 2011, pp. 497–506.
Proceedings of the 7th ACM conference on Recommender [12] S. Kabbur, X. Ning, and G. Karypis, “Fism: factored
systems. ACM, 2013, pp. 65–68. item similarity models for top-n recommender systems,”
[8] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, in Proceedings of the 19th ACM SIGKDD international
L. R. Gordon, and J. Riedl, “Grouplens: applying collab- conference on Knowledge discovery and data mining.
orative filtering to usenet news,” Communications of the ACM, 2013, pp. 659–667.
ACM, vol. 40, no. 3, pp. 77–87, 1997. [13] R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sisma-
[9] U. Shardanand and P. Maes, “Social information fil- nis, “Large-scale matrix factorization with distributed
tering: algorithms for automating word of mouth,” in stochastic gradient descent,” in Proceedings of the 17th
Proceedings of the SIGCHI conference on Human factors ACM SIGKDD international conference on Knowledge
in computing systems. ACM Press/Addison-Wesley discovery and data mining. ACM, 2011, pp. 69–77.
Publishing Co., 1995, pp. 210–217. [14] M. Deshpande and G. Karypis, “Item-based top-n recom-
[10] L. Bottou, “Online algorithms and stochastic mendation algorithms,” ACM Transactions on Informa-
approximations,” in Online Learning and Neural tion Systems (TOIS), vol. 22, no. 1, pp. 143–177, 2004.
Networks, D. Saad, Ed. Cambridge, UK: Cambridge [15] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-
University Press, 1998, revised, oct 2012. [Online]. thieme, “Ls: Bpr: Bayesian personalized ranking from
Available: http://leon.bottou.org/papers/bottou-98x implicit feedback,” in In: Proceedings of the 25th Confer-
ence on Uncertainty in Artificial Intelligence (UAI, 2009.

You might also like