Simple Naive Bayes classifier that uses n-gram models to try and predict whether a given sentence is of positive or negative sentiment.
Since some of the corpora are rather large, Git LFS must be installed before cloning this repository.
Written in Python version 3.6.7, dependencies are listed in 'requirements.txt'. Using virtualenv the dependencies can be installed as follows:
user@user:~$ virtualenv envThe environment can be activated using:
user@user:~$ source env/bin/activateuser@user:~$ pip install -r requirements.txtThe environment can be deactivated using:
user@user:~$ deactivateContains different corpora of which the format has been standardized as follows:
- Two .csv files for training data: one for positive and one for negative examples
- One .csv file for development data, used to evaluate the classifier
- Optional if the corpus is large enough: one .csv file for testing data
Contains the raw corpus data
Contains n-gram models that were created with specific settings and saved, used to save time when classifying by loading instead of recreating certain models
Contains the class files that are used in the scripts
Contains preliminary results for each different training corpus
To classify a given sentence, the following bash command can be used:
user@user:~$ python classify_sentence.py "..."Where ... denotes the sentence to be classified
To classify a sentence in a Python script:
import resources.LanguageModel as ngram
import resources.NaiveBayesClassifier as NBclassifier
# Modify the model_file argument to select another model from models/
LM_pos = ngram.LanguageModel(model_file='models/positive_n2_stemmed_rottentomatoes.p')
LM_neg = ngram.LanguageModel(model_file='models/negative_n2_stemmed_rottentomatoes.p')
# Construct classifier from the two models
classifier = NBclassifier.NaiveBayesClassifier(('positive', LM_pos), ('negative', LM_neg))
sentence = '...'
classifier.classify(sentence)