GitHub - ECS171-Project/Final-project

ECS171 2019Fall Project

Comparing the Robustness of Machine Learning Approaches on Spam Filtering Problems

Project objective:

Perform different methods on spam messages detecting, comparing methods like KNN, Naive Bayes classifier, SVM, Neural Networks classifier and find the one gives the best precision rate.

Data source:

The dataset for this project comes from Kaggle. link to Kaggle!

The SMS Spam Collection is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, tagged acording being ham (legitimate) or spam.

Data preprocessing

Project code:

KNN

Support Vector Machine

Ramdon Forest

Naive Bayes

LSTM

Gated Recurrent Units: Note that you must download the pre-trained word vector "glove6B.zip" from here:https://nlp.stanford.edu/projects/glove/ in order to run the code about gated recurrent unit with pretrained word embedding layer. You need to unzip the folder once you have downloaded it. Then put "glove.6B.300d.txt" on your working directory.

Model comparisons

Final Report

Report

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
RF_model		RF_model
bayes_model		bayes_model
figs		figs
knn_model		knn_model
rf_fig		rf_fig
rnn_model		rnn_model
svc_model		svc_model
.gitignore		.gitignore
Data Preprocessing.ipynb		Data Preprocessing.ipynb
Gated Recurrent Neural Network (Embedding layer trained from scratch and pretrained layer).ipynb		Gated Recurrent Neural Network (Embedding layer trained from scratch and pretrained layer).ipynb
LSTM.ipynb		LSTM.ipynb
Machine_Learning_Approaches_to_Spam_Filtering_Problems.pdf		Machine_Learning_Approaches_to_Spam_Filtering_Problems.pdf
Models comparisons across different methods.ipynb		Models comparisons across different methods.ipynb
NaiiveBayes.ipynb		NaiiveBayes.ipynb
READMD.md		READMD.md
README.md		README.md
RNN.py		RNN.py
SVC and KNN.ipynb		SVC and KNN.ipynb
combine_enron_dataset.ipynb		combine_enron_dataset.ipynb
pca_KNN.png		pca_KNN.png
pca_SVM_c1.png		pca_SVM_c1.png
pca_SVM_c10.png		pca_SVM_c10.png
random_forest.ipynb		random_forest.ipynb
rt-polaritydata.tar.gz		rt-polaritydata.tar.gz
spam.csv		spam.csv
test_data.pkl		test_data.pkl
test_data2.pkl		test_data2.pkl
train_data.pkl		train_data.pkl
train_data2.pkl		train_data2.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ECS171 2019Fall Project

Comparing the Robustness of Machine Learning Approaches on Spam Filtering Problems

Project objective:

Data source:

Data preprocessing

Project code:

Final Report

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 9

Uh oh!

Languages

ECS171-Project/Final-project

Folders and files

Latest commit

History

Repository files navigation

ECS171 2019Fall Project

Comparing the Robustness of Machine Learning Approaches on Spam Filtering Problems

Project objective:

Data source:

Data preprocessing

Project code:

Final Report

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 9

Uh oh!

Languages

Packages