Amazon Food Review Notes
Amazon Food Review Notes
We use pandas as the alias name and use sql select query in
order to fetch data from the connection ,we fetch all the data
except 3 ,where we call the database by the command ‘con’.
All the reviews are given by one user only ,we use the above
the following s ql command from the table review and sorted
according to the product id.
Finding out the error:
Now we notice the same user enters two reviews at the same
tym stamp which is impossible.So we came to know the same
product with different varieties collect the same review for
all the product varities .
Now we need to dedupe the datas.
First we sort the data using quick sort through the product id
column and store it in a data frame known as sorted_data.
1.Bag of words:
Basically in this technique we create a dictionary or set to
identify the similar texts of unique words.
The stop words are the words that are not meaningful .But in
some cases these can be highly meaningful and as soon as
the stop words are removed the vector becomes
smaller,efficient and meaningful.