10 Recommendation Engine Problem Statement
10 Recommendation Engine Problem Statement
Instructions:
Please share your answers filled in-line in the word document. Submit code
separately wherever applicable.
3. Data Pre-processing
2.1 Data Cleaning and Data Mining.
4. Exploratory Data Analysis (EDA):
4.1. Summary.
4.2. Univariate analysis.
4.3. Bivariate analysis.
5. Model Building
5.1 Build the Recommender Engine model on the given data sets.
6. Write about the benefits/impact of the solution - in what way does the
business (client) benefit from the solution provided?
Problem Statement: -
© 2013 - 2021 360DigiTMG. All Rights Reserved.
Q) Build a recommender system with the given data using UBCF.
This dataset is related to the video gaming industry and a survey was conducted to build a
recommendation engine so that the store can improve the sales of its gaming DVDs. Snapshot of
the dataset is given below. Build a Recommendation Engine and suggest top selling DVDs to the
store customers.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
%matplotlib inline
a=pd.read_csv(r"D:\game.csv")
a.head()
z=a.groupby('game')['rating'].mean().sort_values(ascending=False).head()
z1=a.groupby('game')['rating'].count().sort_values(ascending=False).head()
ratings= pd.DataFrame(a.groupby('game')['rating'].mean())
#add No of ratings column
ratings['num of ratings']=pd.DataFrame(a.groupby('game')['rating'].count())
plt.figure(figsize=(10,5))
ratings['num of ratings'].hist(bins=5)
sns.jointplot(x='rating',y='num of ratings',data=ratings,alpha=0.5)
Similar_to_TOP_GAME=gamemat.corrwith(TOP_GAME)
corr_TOP_GAME=pd.DataFrame(Similar_to_TOP_GAME,columns=['Correlatoion'])
corr_TOP_GAME.dropna(inplace=True)
corr_TOP_GAME.head(10)
The Entertainment Company, which is an online movie watching platform, wants to improve its
collection of movies and showcase those that are highly rated and recommend those movies to
its customer by their movie watching footprint. For this, the company has collected the data and
shared it with you to provide some analytical insights and also to come up with a
Ans:
Note:There is One correction in data set,some of movies ratings
is 99 in data set. But in question Clearly Mention that rating lies
between 9 to -9. Question solved by taking 99 as 9.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
%matplotlib inline
a=pd.read_csv(r"D:\Entertainment.csv")
a.head()
z=a.groupby('Titles')['Reviews'].mean().sort_values(ascending=False).head()
z1=a.groupby('Titles')['Reviews'].count().sort_values(ascending=False).head()
ratings= pd.DataFrame(a.groupby('Titles')['Reviews'].mean())
#add No of ratings column
© 2013 - 2021 360DigiTMG. All Rights Reserved.
ratings['num of ratings']=pd.DataFrame(a.groupby('Titles')['Reviews'].count())
plt.figure(figsize=(10,5))
ratings['num of ratings'].hist(bins=5)
plt.figure(figsize=(10,5))
ratings['Reviews'].hist(bins=5)
sns.jointplot(x='Reviews',y='num of ratings',data=ratings,alpha=0.5)
Similar_to_Top_Movie=moviemat.corrwith(Top_Movie)
corr_Top_Movie=pd.DataFrame(Similar_to_Top_Movie,columns=['Correlatoion'])
corr_Top_Movie.dropna(inplace=True)
corr_Top_Movie.head(10)