This repository contains code used in a study to predict the price of Bitcoin using various blockchain features and news sentiment. The experiment is focused on determining the accuracy of Bitcoin price prediciton with and without sentiment. The corpus for the sentiment analysis is pulled from various news sources. The project includes various deep learning model architectures to perform sentiment analysis and predict Bitcoin prices using different size time steps. There are two types of Bitcoin prediction models, the regression models attempt to predict the price and the classification models attempt to predict whether the price will increase or decrease.
All methods can be executed via the command line using main.py. Rubicon is used for experiment tracking and Optuna is used for hyperparameter optimization.
├── artifacts <- trained models, tokenizers, scalers, and optuna studies
├── rubicon-root <- rubicon experiment results
├── base <- base classes
├── data <- bitcoin chart data and news corpus
├── processed <- cleansed, transformed, scaled data
├── raw <- raw chart and news data from apis
├── data_loaders <- classes that load preprocessed data and splits it
├── data_producers <- classes that load data from apis and preprocesses data
├── models <- price and sentiment models
├── optimizers <- optuna optimizers for price and sentiment models
├── predictors <- model prediction classes
├── trainers <- model training classes
├── bitcoin-price-prediction-eda.ipynb <- Jupyter notebook for project eda
├── main.py <- command line methods
├── price-prediction-experiments.sh <- script for price prediction experiments
├── requirements.txt <- conda environment requirements
The news articles used to produce the corpus were obtained from cryptonews-api.com. An API key must be obtained from them and placed in a config.yaml file in the root of the project.
The blockchain metrics are obtained from the blockchain.com API. More information about the information is available at https://www.blockchain.com/charts. An API key is not required.
The command line methods below represent the steps in the model pipeline for retrieving the news corpus and blockchain chart data, training the sentiment model using hyperparameter optimization, performing sentiment predictions on the hold out for Bitcoin price prediction, and training the regression and classification Bitcoin price prediction models.
Usage: main.py sentiment-data-retrieve [OPTIONS]
Options:
--pull-data BOOLEAN load saved data or pull new data. Default is False
--help Show this message and exit.
Usage: main.py price-data-retrieve [OPTIONS]
Options:
--pull-data BOOLEAN load saved data or pull new data. Default is False
--help Show this message and exit.
Usage: main.py sentiment-optimize [OPTIONS]
Options:
--model-type TEXT LSTM, CNN. Default uses all
--pull-data BOOLEAN load saved data or pull new data. Default is False
--n-trials INTEGER number of study trials. Default is 2
--version INTEGER version of study. Default is 1
--help Show this message and exit.
Usage: main.py sentiment-predict [OPTIONS]
Options:
--help Show this message and exit.
Usage: main.py price-regression-optimize [OPTIONS]
Options:
--model-type TEXT LSTM, CNN. Default uses all
--pull-data BOOLEAN load saved data or pull new data. Default is
False
--n-trials INTEGER number of study trials. Default is 2
--with-sent BOOLEAN With or without sentiment feature. Default is
True
--timestep INTEGER Number of timesteps, if not set the optimizer
will select the timesteps. Default is 0
--prediction-length INTEGER The number of predictions in the test set. The
size of the test set will be prediction length
+ timesteps + 1. Be careful not to go above the
total size of data set. Default is 5
--version INTEGER version of study. Default is 1
--help Show this message and exit.
Usage: main.py price-regression-train [OPTIONS]
Options:
--model-type TEXT LSTM, CNN. Default uses all
--epochs INTEGER Epochs
--batch-size INTEGER Batch size
--pull-data BOOLEAN load saved data or pull new data. Default is False
--with-sent BOOLEAN With or without sentiment feature. Default is True
--help Show this message and exit.
Usage: main.py price-classification-optimize [OPTIONS]
Options:
--model-type TEXT LSTM, CNN. Default uses all
--pull-data BOOLEAN load saved data or pull new data. Default is
False
--n-trials INTEGER number of study trials. Default is 2
--with-sent BOOLEAN With or without sentiment feature. Default is
True
--timestep INTEGER Number of timesteps, if not set the optimizer
will select the timesteps. Default is 0
--prediction-length INTEGER The number of predictions in the test set. The
size of the test set will be prediction length
+ timesteps + 1. Be careful not to go above the
total size of data set. Default is 5
--version INTEGER version of study. Default is 1
--help Show this message and exit.
Usage: main.py price-classification-train [OPTIONS]
Options:
--model-type TEXT LSTM, CNN. Default uses all
--epochs INTEGER Epochs
--batch-size INTEGER Batch size
--pull-data BOOLEAN load saved data or pull new data. Default is False
--with-sent BOOLEAN With or without sentiment feature. Default is True
--help Show this message and exit.
The following scripts can be run to replicate the price regression and classification experiments:
./price-prediction-regression-experiment.sh
./price-prediction-classification-experiment.shoptuna-dashboard sqlite:///artifacts/db.sqlite3 rubicon_ml ui --root-dir ./artifacts/rubicon-root