Scraping_Torah.pynb NLP.ipynb visualization.py
These files are intended to be run in order.
When this notebook is run, it retrieves verses from the Jewish Virtual Library, and packages them into Torah_Verses, Torah_Chapters, and Chapter_Indices. Additionally, it generates a labeling scheme based on https://en.wikipedia.org/wiki/Composition_of_the_Torah, which is stored in Verse_Labels.csv.
When this notebook is run, it uses Torah_Verses.csv and Verse_Labels.csv to produce a trained vectorizer, topic modeler, and classification algorithm, which are stored in model.p.
When this is run using
streamlit run visualization.py
it runs a streamlit application that allows users to enter verses and determine what
Torah_Chapters.csv: Contains all Torah verses, grouped by chapter. Generated by Scraping_Torah.ipynb
Chapter_Indices.csv: For each chapter, labels it with its chapter number and the book it is from. Intended for visualization purposes. Generated by Scraping_Torah.ipynb
Torah_Verses.csv: Contains all Torah verses, as individual rows. Generated by Scraping_Torah.ipynb
Verse_Labels.csv: Simple array, containing either a 'p', 'y', or 'y', corresponding to the source for the appropriate verse. Generated by Scraping_Torah.ipynb
model.p: A pickled tuple containing a vectorizer, topic modeler, and classification algorithm. All have been trained appropriately on the torah verses. Generated by NLP.ipynb.