Ready-to-use text classification corpus generation from TechTC-300 Test Collection . The final dataset uses chi-squared feature selection and tf-idf feature weighting. Classification is done using a Decision-Jungle classifier in AzureML .
- Clone the repo:
git clone https://github.com/jaindeepali/Adler - Create config file from sample:
cp Adler/config/sample.config.json Adler/config/config.json - Open config.json and edit the path to the data directory
- Create python virtual environment:
virtualenv .venv - Activate virtual environment:
source .venv/bin/activate - Install Adler package:
python setup.py install - Run script to generate dataset:
/scripts/generate.py