Spam_Detection_Project

We are only bothered by SPAM emails in our daily life. This project helps detect potential spam email with two different machine learning algorithms -- Decision Tree and K-nearest-neighbor.

Data is already included in the first part

Anatomized tons of training emails into Head, Body and Attachment with Stringr package, regular expression, etc.
Feature Engineering: Created nearly 30 features (e.g. number of recipient, attachments, sent hour, etc) to separate SPAM and HAM. Features were summarized from email context and other information.
Built two models with all the created features using two different methods—k-nearest neighbors and Decision tree.
Predicted the training data with our models, explored the results, and improved their performance by exploring variable transformations, applying cross-validation and selecting the best k, distance metric as well as voting mechanism.
Predicted the blind test data with our optimized model, the accuracy reached to nearly 90%

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Part1		Part1
Part2		Part2
Part3		Part3
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spam_Detection_Project

About

Uh oh!

Releases

Packages

Languages

ArsMing276/Spam_Detection_Project

Folders and files

Latest commit

History

Repository files navigation

Spam_Detection_Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages