Using Naive Bayes to Predict Votes of Congressmen from Bill Texts. By David Gibson, Thomas Chang, Mustafa Bal.
All code is written for python 3.6 and is assumed to be executed in the top level directory of the git repository. Additionally, nltk and matplotlib are dependencies for this project. Generated plots will vary slightly from displayed plots because of randomization in the training and validation sets.
- Clone the git repo
- Unzip complete.zip and keep complete.json in the same directory as model.py
- Execute
python model.py
(this creates three files training_set.json, validation_set.json, and model.json) - Execute
python validate.py
1 100 1" to parameter sweep c from the values 1-100. (Usage for this file ispython validate.py start numIter step
where start is the first value of c tested, numIter is the number of different values of c tested, and step is the amount c is incremented every iteration) - Execute
python plotChyperparam.py
- Open plot.png to examine the plot
In order to test multiple values for k, the value was adjusted by hand in the source code. Additionally, other values of c can be tested by following the usage of validate.py.
To generate baseline information execute python baseline.py
and all three baselines will generate their respective result text files. Like validate.py, baseline.py outputs text files of the form results##.txt, where the pounds indicate the c value that was validated for. These files contain the correct prediction probability for every congressman, the proportion of correctly predicted bills, and the average of congressmen success predictions (these last two are indicated in the last two lines of the file).
Histograms are generated using by executing python hist_gen.py file, where file is any text file generated by validation.py or baseline.py.