Skip to content

Commit 2bc35ba

Browse files
authored
add recommender; add intro
1 parent d9763fe commit 2bc35ba

File tree

1 file changed

+35
-11
lines changed

1 file changed

+35
-11
lines changed

README.md

Lines changed: 35 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,16 @@
11
# Data Science Question Answer
22

3+
The purpose of this repo is two fold:
4+
5+
* To help you (data science practitioners) prepare for data science related interviews
6+
* To introduce to people who don't know but want to learn some basic data science concepts
7+
8+
The focus is on the knowledge breadth so this is more of a quick reference rather than an in-depth study material. If you want to learn a specific topic in detail please refer to other content or reach out and I'd love to point you to materials I found useful.
9+
10+
I might add some topics from time to time but hey, this should also be a community effort, right? Any pull request is welcome!
11+
12+
Here are the categorizes:
13+
314
* [SQL](#sql)
415
* [Statistics and ML In General](#statistics-and-ml-in-general)
516
* [Supervised Learning](#supervised-learning)
@@ -10,13 +21,6 @@
1021

1122
## SQL
1223

13-
First off some good SQL resources:
14-
15-
* [W3schools SQL](https://www.w3schools.com/sql/)
16-
* [SQLZOO](http://sqlzoo.net/)
17-
18-
Questions:
19-
2024
* [Difference between joins](#difference-between-joins)
2125

2226

@@ -45,8 +49,8 @@ Questions:
4549
* [Bagging](#bagging)
4650
* [Stacking](#stacking)
4751
* [Generative vs discriminative](#generative-vs-discriminative)
48-
* [Paramteric vs Nonparametric](#paramteric-vs-nonparametric)
49-
52+
* [Parametric vs Nonparametric](#parametric-vs-nonparametric)
53+
* [Recommender System](#recommender-system)
5054

5155
### Project Workflow
5256

@@ -196,14 +200,31 @@ generated.
196200
[back to top](#data-science-question-answer)
197201

198202

199-
### Paramteric vs Nonparametric
203+
### Parametric vs Nonparametric
200204

201205
* A learning model that summarizes data with a set of parameters of fixed size (independent of the number of training examples) is called a parametric model.
202206
* A model where the number of parameters is not determined prior to training. Nonparametric does not mean that they have no parameters. On the contrary, nonparametric models (can) become more and more complex with an increasing amount of data.
203207

204208
[back to top](#data-science-question-answer)
205209

206210

211+
### Recommender System
212+
213+
* I put recommend system here since technically it falls neither under supervised nor unsupervised learning
214+
* A recommender system seeks to predict the 'rating' or 'preference' a user would give to items and then recommend items accordingly
215+
* Content based recommender systems recommends items similar to those a given user has liked in the past, based on either explicit (ratings, like/dislike button) or implicit (viewed/finished an article) feedbacks. Content based recommenders work solely with the past interactions of a given user and do not take other users into consideration.
216+
* Collaborative filtering is based on past interactions of the whole user base. There are two Collaborative filtering approaches: **item-based** or **user-based**
217+
- item-based: for user u, a score for an unrated item is produced by combining the ratings of users similar to u.
218+
- user-based: a rating (u, i) is produced by looking at the set of items similar to i (interaction similarity), then the ratings by u of similar items are combined into a predicted rating
219+
* In recommender systems traditionally matrix factorization methods are used, although we recently there are also deep learning based methods
220+
* Cold start and sparse matrix can be issues for recommender systems
221+
* Widely used in movies, news, research articles, products, social tags, music, etc.
222+
223+
![cf](assets/collaborative_filtering.gif)
224+
225+
[back to top](#data-science-question-answer)
226+
227+
207228
## Supervised Learning
208229

209230
* [Linear regression](#linear-regression)
@@ -236,7 +257,7 @@ generated.
236257

237258
### Logistic regression
238259

239-
* Generalized linear model (GLM) for classification problems
260+
* Generalized linear model (GLM) for binary classification problems
240261
* Apply the sigmoid function to the output of linear models, squeezing the target
241262
to range [0, 1]
242263
* Threshold to make prediction: usually if the output > .5, prediction 1; otherwise prediction 0
@@ -510,3 +531,6 @@ Using **Ubuntu** as an example.
510531
* Install package: `sudo apt-get install <package>`
511532

512533
[back to top](#data-science-question-answer)
534+
535+
536+
Confession: some images are adopted from the internet without proper credit. If you are the author and this would be an issue for you, please let me know.

0 commit comments

Comments
 (0)