Author: Ning Chen
- Overview
- Business Understanding
- Data Collection
- Exploratory Data Analysis
- Classification
- Time Series
- Sentimental Analysis
- Frontend
- Next Steps
- For More Information
- Repository Structure
Accurate prediction of stock market asset is a significant and challenging task due to complicated nature of the financial stock markets. Considering the increasing availability and affordability of powerful computational engines, deep learning methods of prediction have proved its efficiency in finance.
A time series regression model using NN or other advanced techniques is built to predict the stock market. Stock market prediction aims to determine the future movement of the stock value of a financial exchange. This project is helpful for Stock investors and investment banks to have a better understanding in developing economical Strategy and in making financial decisions.
Data was collected from three different web sources by API calls or Web Scraping.
- Quarterly Report for Classification by Web Scrapping.
- Yahoo Finance and IEX API for Time Series by API calls.
- Twitter for sentimental data by VADER.
- Quarterly Report data was cleaned and analyzed. A simple Trade Strategy wad made: local minimum of the price to buy, local maximum of the price to sell, and all other time to hold.
- Time series data was joined with Quarterly Report data. The missing data for weekends and holidays was filled by interpolation method. The missing data of exogenous features was filled by propagating nearest valid observation backward/forward to next valid observation.
- Using TF-IDF for feature extraction in Sentiment Analysis.
Quarterly Report data was used to train several different classification models. More than 100 features were presented in the dataframe. Therefore, Principle Component Analysis (PCA) was implemented to reduce the dimensionality.
Time series data was fitted and trained to two time series models. All models are evaluated by RMSE and MAPE.
SARIMAX Model with exogenous features
Facebook Prophet
LSTM
GRU
Use NLP & Deep Learning to predict stock prices.
- Count of tweets for the stock is calculated
- Sentiment of the stock is analyzed
- Word Cloud
Streamlit was used to create a frontend for each form of analysis with their respective machine learning models.
- To access the updated quarterly reports timely and obtain more important features.
- To tune the hyperparameters (exogenous variables) in Time Series models. Technical indicators such as MACD, Stochastic, RSI, etc can be used.
- Besides Twitter, gathering more relevant sentimental data from other web sources.
Please review our full analysis in Jupyter Notebook or presentation.
For any additional questions, please contact Ning Chen—[email protected].
Description of the structure of the repository and its contents:
├── README.md <- The top-level README for reviewers of this project
├── stock_market <- Narrative documentation of analysis in Jupyter notebook
├── Presentation.pdf <- PDF version of project presentation
├── data <- Both sourced externally and generated from code
└── images <- Both sourced externally and generated from code













